May 10, 2006

Doctor! Doctor! It hurts when I do this!

I came across a bug while working on runkit today. On testing a batch of changes I'd made I was suddenly* running into a buffer overrun error on any request which instantiates the Runkit_Sandbox class. A few valgrind and gdb rounds later and I'd traced the corruption to apc.enabled's ini_entry->value which was actually encountering a form of double-free.


Now, I have no illusions regarding the fact that runkit is a black sheep. The things it does are contrary to PHP's design and probably shouldn't be done, so my first assumption is that APC is not the one at fault and that runkit is just coincidentally stepping on APC's toes. Time to break out the caffeine and get dirty...

After a little more digging I notice this block of code in APC:


PHP_MINIT_FUNCTION(apc)
{
ZEND_INIT_MODULE_GLOBALS(apc, php_apc_init_globals, php_apc_shutdown_globals);

REGISTER_INI_ENTRIES();

/* Disable APC in cli mode unless overridden by apc.enable_cli */
if(!APCG(enable_cli) && !strcmp(sapi_module.name, "cli")) {
zend_alter_ini_entry("apc.enabled", sizeof("apc.enabled"), "0", 1,
PHP_INI_SYSTEM, PHP_INI_STAGE_ACTIVATE);
}

if (APCG(enabled)) {
apc_module_init(module_number TSRMLS_CC);
}

return SUCCESS;
}

Ignore for the moment the fact that it's claiming we're in the ACTIVATE stage (we're really in the STARTUP phase -- but that's not where the problem lies). The problem with this statement is that zend_alter_ini_entry() uses estrndup(), a call which assumes that thread storage has been spun up and that the per-request pointer list is ready for the new entry to be indexed. During MINIT, that's just not the case.


The "right" solution for this is to move the CLI mode override to RINIT and sure enough, this fixes the overrun/double-free issues just fine. The trouble is, it's also a wasteful proposition. Compound this by the fact that the error only presents itself when all of the following conditions are met:

  • APC being used on command line
  • apc.enable_cli switch not set
  • Thread Safety turned on (requires --enable-maintainer-zts during bulid)

Should APC take a performance hit in order to satisfy such an edge case scenario? &#@% NO! This is one bug that can be casually swept under the rug. Move along now, nothing to see here.


P.S. - In case you were wondering, the only reason I didn't notice this before is that I'd only just recently added APC to my development build. The coincidence of this bug showing up with this last batch of changes was precisely that, a coincidence.

1 comment:

Note: Only a member of this blog may post a comment.