Jun 18, 2006

How long is a piece of string

Sunday morning I was asked by an IRC regular: "Where does the engine parse quoted strings?". Being a sunday morning, I began to launch into a sermon on the distinction between CONSTANT_ENCAPSED_STRING and the problems which befall a single-pass compiler when you start to introduce interpolation. Not what he asked precisely, but an important component in answering his question. Unfortunately, at the time I was busy watching the Brasil-Australia game so I didn't go into the kind of detail I would have. Now, some 12 hours later, since Angela is off buying toe-socks in Santa Cruz, I'll bore anyone with little enough life to read my blog by explaining the pitfalls of using PHP's string interpolation without using an optimizer.

To start things off, let's take a page from my earlier discourse on Compiled Variables and look at the opcodes generated by a few simple PHP scripts:

<?php
echo "This is a constant string";
?>

Yields the nice, simple opcode:

ECHO            'This is a constant string'

No problem... Exactly what you'd expect... Now let's complicate the expressions a little:

<?php
echo "This is an interpolated $string";
?>

Yields the surprisingly messy instruction set:

INIT STRING  ~0
ADD_STRING ~0 ~0 'This'
ADD_STRING ~0 ~0 ' '
ADD_STRING ~0 ~0 'is'
ADD_STRING ~0 ~0 ' '
ADD_STRING ~0 ~0 'an'
ADD_STRING ~0 ~0 ' '
ADD_STRING ~0 ~0 'interpolated'
ADD_STRING ~0 ~0 ' '
ADD_VAR ~0 ~0 !0
ECHO ~0

Where !0 represents the compiled variable named $string. Looking at these opcodes: INIT_STRING allocates an IS_STRING variable of one byte (to hold the terminating NULL). Then it's realloc'd to five bytes by the first ADD_STRING ('This' plus the terminating NULL). Next it's realloc'd to six bytes in order to add a space, then again to eight bytes for 'is', then nine to add a space, and so on until the temporary string has the contents of the interpolated variable copied into its contents before being used by the echo statement and finally discarded. Now let's rewrite that line to avoid interpolation and use concatenation instead:

<?php
echo "This is a concatenated " . $string;
?>

Which yields the significantly shorter and simpler set of ops:

CONCAT       ~0 'This is a concatenated ' !0
ECHO ~0

A vast improvement already, but this version still creates a temporary IS_STRING variable to hold the combined string contents meaning that data is duplicated when it's being used in a const context anyway. Now let's try out this oft-overlooked use of the echo statement:

<?php
echo "This is a stacked echo " , $string;
?>

Look close, there is a meaningful difference from the last one. This time we're using a comma rather than a dot between the operands. If you don't know what the comma is doing there, ask the manual then check back here. Here's the resulting opcodes:

ECHO            'This is a stacked echo '
ECHO !0

Same number of opcodes, but this time no temporary variables are being created so there's no duplication and no pointless copying (unless of course $string wasn't of type IS_STRING, in which case it does have to be converted for output, but don't get picky now). Think this is bad? Consider the average heredoc string which spans several lines of prepared output embedding perhaps a handful of variables along the way. Here's one of several such blocks found in run-tests.php within the PHP distribution source tree:


<?php
echo <<NO_PCRE_ERROR

+-----------------------------------------------------------+
| ! ERROR ! |
| The test-suite requires that you have pcre extension |
| enabled. To enable this extension either compile your PHP |
| with --with-pcre-regex or if you've compiled pcre as a |
| shared module load it via php.ini. |
+-----------------------------------------------------------+

NO_PCRE_ERROR;
?>

Notice that we're not even embedding variables to be interpolated here, yet does this come out to a simple, single opcode? Nope, because the rules necessary to catch a heredoc's end token demand the same careful examination as double-quoted variable substitution and you wind up (in this case) with SEVENTY-EIGHT opcodes! One INIT_STRING, 76 ADD_STRINGs. and a final ECHO. That means a malloc, 76 reallocs, and a free which will be executed every time that code snippet comes along. Even the original contents take up more memory because they're stored in 76 distinct zval/IS_STRING structures.

Why does this happen? Because there are about a dozen ways that a variable can be hidden inside an interpolated string. Similarly, when looking for a heredoc end-token, the token can be an arbitrary length, containing any of the label characters, and may or may not sit on a line by itself. Put simply, it's too difficult to encompass in one regular expression.

The engine could perform a second-pass during compilation, however the time saved reassembling these strings will typically be about the same amount of time spent actually processing them during runtime (if one assumes that each instance will execute exactly once). Rather than complicate the build process (potentially slowing down overall run-times in the process), the compiler leaves this optimization step to opcode caches which can achieve exponentially greater advantage cleaning up this mess then caching the results and reusing the faster, leaner versions on all subsequent runs.

If you're using APC, you'll find just such an optimizer built in, but not enabled by default. To turn it on, you'll need to set apc.optimization=on in your php.ini. In addition to stitching these run-on opcodes back together, it'll also add run-time speed-ups like pre-resolving persistent constants to their actual values, folding static scalar expressions (like 1 + 1) to their fixed results (e.g. 2), and simpler stuff like avoiding the use of JMP when the target is the next opcode, or boolean casts when the original expression is known to be a boolean value. (It should be noted that these speed-ups also break some of the runtime-manipulation features of runkit, but that was stuff you....probably should have been doing anyway)

Can't use an optimizer because your webhost doesn't know how to set php.ini options? You can still avoid 90% of the INIT_STRING/ADD_STRING dilema by simply using single quotes and concatenation (or commas when dealing with echo statements). It's a simple trick and one which shouldn't harm maintainability too much, but on a large, complicated script, you just might see an extra request or two per second.


Jun 7, 2006

Extending and Embedding PHP


It's official!!!! After a year in development Extending and Embedding PHP is now shipping from fine book stores everywhere.


I've gotten good reviews from the half dozen people I know who've gotten their hands on it, and I am really satisfied with most of it. Do I think it could be better? Of course I do, but I don't think I was ever going to be completely satisfied.


I've learned a lot through this process and while I don't see any more titles in my immediate future (there are things coming down the pipe which are likely to change my availability), I do expect that my next book, should it materialize, will be even better.


If you pre-ordered it, you should see it soon, if you've already got your copy, let me know what you think! Did I skim over some topic too quickly? Did I belabour something else? If this book eventually finds it's way into a 2nd edition (no promises mind you), are there topics you want to see added? Tossed out? Expanded/Compressed?

Jun 1, 2006

What the heck is TSRMLS_CC, anyway?

If you've ever worked on the PHP internals or built an extension, you've seen this construct floating around here and there, but noone ever talks about it. Those who know what this is typically answer questions from those who don't with "Don't worry about what it is, just use it here here here and here. And if the compiler says you're missing a tsrm_ls, put it there too..." This isn't laziness on the part of the person answering the question (okay, maybe it is a little bit), it's just that the engine goes so far out of its way to simplify what this magic values does, that there's no profit in a new extension developer knowing the mechanics of it. The information is like a cow's opinion, it doesn't matter, it's Moo.


Since I love to listen to myself rattle on about pointless topics (and I havn't blogged much this month), I thought I'd cover this topic and see if anyone manages to stay awake through it. You can blame Lukas, he got me rolled onto planet-php.net...

Glossary

TSRM
Thread Safe Resource Manager - This is an oft overlooked, and seldom if ever discussed layer hiding in the /TSRM directory of your friendly neighborhood PHP source code bundle. By default, the TSRM layer is only enabled when compiling a SAPI which requires it (e.g. apache2-worker). All Win32 builds have this layer enabled enabled regardless of SAPI choice.


ZTS
Zend Thread Ssafety - Often used synonymously with the term TSRM. Specifically, ZTS is the term used by ./configure ( --enable-experimental-zts for PHP4, --enable-maintainer-zts for PHP5), and the name of the #define'd preprocessor token used inside the engine to determine if the TSRM layer is being used.


tsrm_ls
TSRM local storage - This is the actual variable name being passed around inside the TSRMLS_* macros when ZTS is enabled. It acts as a pointer to the start of that thread's independent data storage block which I'll cover in just a minute


TSRMLS_??
A quartet of macros designed to make the differences between ZTS and non-ZTS mode as painless as possible. When ZTS is not enabled, all four of these macros evaluate to nothing. When ZTS is enabled however, they expand out to the following definitions:
  • TSRMLS_C tsrm_ls
  • TSRMLS_D void ***tsrm_ls
  • TSRMLS_CC , tsrm_ls
  • TSRMLS_DC , void ***tsrm_ls

Globals

In any normal C program (just like in PHP) you have two methods of getting data access to the same block of data in two different functions. One method is to pass the value on the parameter stuck like so:

#include 

void output_func(char *message)
{
printf("%s\n", message);
}

int main(int argc, char *argv[])
{
output_func(argv[0]);

return 0;
}

Alternately, you could store the value in a variable up in the global scope and let the function access it there:

#include 

char *message;

void output_func(void)
{
printf("%s\n", message);
}

int main(int argv, char *argv[])
{
message = argv[0];
output_func();

return 0;
}

Both approaches have their merits and drawbacks and typically you'll see some combination of the two used in a real application. Indeed, PHP is covered in global variables from resource type identifiers, to function callback pointers, to request specific information such as the symbol tables used to store userspace variables. Attempting to pass these values around in the parameter stack would be more than unruly, it'd be impossible for an application like PHP where it's often necessary to register callbacks with external libraries which don't support context data.


So common information, like the execution stack, the function and class tables, and extension registries all sit up in the global scope where they can be picked up and used at any point in the application. For single-threaded SAPIs like CLI, Apache1, or even Apache2-prefork, this is perfectly fine. Request specific structures are initialized during the RINIT/Activation phase, and reset back to their original values during the RSHUTDOWN/Deactivation phase in preparation for the next request. A given webserver like Apache1 can serve up multiple pages at once because it spawns multiple processes each in their own process space with their own independant copies of global data.


Now let's introduce threaded webservers like Apache2-worker, or IIS. Under these conditions, only one process space is active at a given time with multiple threads spun off. Each of these threads then act in the same manner as a single-threaded process might; Servicing requests one-at-a-time as dispatched by inbound requests. The trouble starts to brew as two or more threads try to service the a request at the same time. Each thread wants to use the global scope to store its request-specific information, and tries to do so by writing to the same storage space. At the least, this would result in userspace variables declared in one script showing up in another. In practice, it leads to quick and disasterous segfaults and completely unpredictable behavior as memory is double freed or written with conflicting information by separate threads.


Non-Global Globals

The solution is to require the engine, the core, and any extension using global storage to determine how much memory will be used by request-specific data. Then, at the spin-up of each new thread, allocate a chunk of memory for each of these players to store their data into thus giving each thread its own local storage. In order to group all the individual chuncks used by a given thread together, one last vector of pointers is allocated to store the individual sub-structure pointers into. It's the pointer to this vector which is passed around as the tsrm_ls variable by the TSRMLS_* family of macros. To see how this works, let's look at a example extension:


typedef struct _zend_myextension_globals {
int foo;
char *bar;
} zend_myextension_globals;

#ifdef ZTS
int myextension_globals_id;
#else
zend_myextension_globals myextension_globals;
#endif

/* Triggered at the beginning of a thread */
static void php_myextension_globals_ctor(zend_myextension_globals *myext_globals TSRMLS_DC)
{
myext_globals->foo = 0;
myext_globals->bar = NULL;
}

/* Triggered at the end of a thread */
static void php_myextension_globals_dtor(zend_myextension_globals *myext_globals TSRMLS_DC)
{
if (myext_globals->bar) {
efree(myext_globals->bar);
}
}

PHP_MINIT_FUNCTION(myextension)
{
#ifdef ZTS
ts_allocate_id(&myextension_globals_id, sizeof(zend_myextension_globals),
php_myextension_globals_ctor, php_myextension_globals_dtor);
#else
php_myextension_globals_ctor(&myextension_globals TSRMLS_CC);
#endif

return SUCCESS;
}

PHP_MSHUTDOWN_FUNCTION(myextension)
{
#ifndef ZTS
php_myextension_globals_dtor(&myextension_globals TSRMLS_CC);
#endif

return SUCCESS;
}

Here you can see the extension declaring its global requirements to the TSRM layer by stating that it needs sizeof(zend_myextension_globals) bytes of storage, and providing callbacks to use when initializing (or destroying) a given thread's local storage. The value populated into myextension_globals_id represents the offset (common to all threads) into the tsrm_ls vector where the pointer to that thread's local storage can be found. In the event that ZTS is not enabled, the data storage is simply placed into the true global scope and the thread initialization and shutdown routines are called manually during the Module's Startup and Shutdown phases. If you're wondering why TSRMLS_CC was included in the non-ZTS blocks, then I clearly havn't made you fall asleep yet. Those aren't needed there since we know they evaluate to nothing, but it helps encourage good habits to include them anywhere the function's prototype calls for them.


Putting it all together

The final piece of this thread-safe puzzle comes from the question: "How do I access data in these structures?" And the answer to that question comes in the form of another familiar looking macro. Each extension or core component defines, in one of its header files, a macro which looks something like the following:


#ifdef ZTS
# define MYEXTENSION_G(v) \
(((zend_myextension_globals*)(*((void ***)tsrm_ls))[(myextension_globals_id)-1])->v)
#else
# define MYEXTENSION_G(v) (myextension_globals.v)
#endif

Thus, when ZTS is not enabled, this macro simply plucks the right value out of the imediate value in the global scope, otherwise it uses the ID to locate the thread's local storage copy of the structure and derefence the value from there.


Wanna know more, like how to deal with foreign callbacks where tsrm_ls isn't available? Buy my book!