May 24, 2006

Compiled Variables

Last month at php|tek I gave a presentation on "How PHP Ticks" where I covered, among other things the process of compiling source code into opcodes (an intermediate psuedo-language similar to what java calls "bytecode" or what .NET calls "MSIL"). As part of this section of the presentation, I showed one of the more interresting changes between ZE 2.0 (PHP 5.0) and ZE 2.1 (PHP 5.1), namely: How variables are retreived and used in an operation. More specifically, how they provide a small, yet cumulative, speedup to applications in a way that's transparent to the end-user -- One more reason to like PHP 5.1 right?


After listening to Marcus Whitney's interview with Brion Vibber of WikiMedia in which he mentions my presentation and makes reference to this engine change, I realized that I should clarify what this feature is (and more importantly, what it isn't) before any FUD spreads.

What Compiled Variables (CVs) are

First, let's look at the anantomy of an OpArray. Say you have the following simple script:


<?php
$a = 123;
$b = 456;
$c = $a + $b;
echo $c;

Now let's see how ZE 2.0 (PHP5.0) compiles this (ZE1.x/PHP4.x comes out to nearly identical opcodes). The $0 and ~0 references you see are (for lack of a better one sentence explanation) types of temporary variables (the latter moreso than the former, but don't worry about the distinction right now). What's important to know about this block and its statements are in the accompaning comments:

FETCH_W                  $0, 'a'          /* Retreive the $a variable for writing */
ASSIGN $1, $0, 123 /* Assign the numeric value 123 to retreived variable 0 */
FETCH_W $2, 'b' /* Retreive the $b variable for writing */
ASSIGN $3, $2, 456 /* Assign the numeric value 456 to retreived variable 2 */
FETCH_R $5, 'a' /* Retreive the $a variable for reading */
FETCH_R $6, 'b' /* Retreive the $b variable for reading */
ADD ~7, $5, $6 /* Add the retreived variables (5 & 6) to gether and store the result in 7 */
FETCH_W $4, 'c' /* Retreive the $c variable for writing */
ASSIGN $8, $4, ~7 /* Assign the value in temporary variable 7 into retreived variable 4 */
FETCH_R $9, 'c' /* Retreive the $c variable for reading */
ECHO $9 /* Echo the retreived variable 9 */

Seem like a lot of work for one plus one? It is, here's the same code snippet compiled by ZE 2.1/PHP 5.1 (or later).

ASSIGN                   $0, !0, 123      /* Assign the numeric value 123 to compiled variable 0 */
ASSIGN $1, !1, 456 /* Assign the numeric value 456 to compiled variable 1 */
ADD ~2, !0, !1 /* Add compiled variable 0 to compiled variable 1 */
ASSIGN $3, !2, ~2 /* Assign the value of temporary variable 2 to compiled variable 2 */
ECHO !2 /* Echo the value of compiled variable 2 */

These !0 variables refer to a new structure in the execution stack which stores and references to the "real" variables out in userspace. The hash value for each variable is computed at compile time (meaning that it's only done once per variable no matter how often it's used and that opcode caches save this work from being done during subsequent page views at all). The first time one of these CVs is used, the engine looks it up in the active symbol table and updates the CV cache to know where it is. All subsequent uses of that compiled variable use that pre-fetched address and don't have to look it up again. On an individual lookup, this isn't a major leap forward in speed, however consider a for loop where the test value is checked on every iteration; To put it in PHP terms, which would you rather do?

for($i = 0; $i < foo =" lookup_variable('foo');">increment();
$foo = lookup_variable('foo');
$foo->check_value();
}

or

$foo = lookup_variable('foo');
for($i = 0; $i <>increment();
$foo->check_value();
}


What compiled variables are not

Don't assume you're going to get a speedup on all of your code, especially if you use arrays or objects (which most code taking advantage of PHP5's new features does). The CV speedup has one minor achilles heel: It only works on simple variables. Putting it in terms of opcodes, let's consider this PHP source:

<?php
$f->a = 123;
$f->b = 456;
$f->c = $f->a + $f->b;
echo $f->c;
?>

Basicly the same code right? Just a little oopified... Let's look at the ZE2.1/PHP5.1 compilation of that:

ASSIGN_OBJ                $0, !0, 'a'     /* Assign the numeric value 123 to property 'a' of compiled variable 0 object */
OP_DATA 123 /* Additional data for ASSIGN_OBJ opcode */
ASSIGN_OBJ $1, !0, 'b' /* Assign the numeric value 456 to property 'b' of compiled variable 0 object */
OP_DATA 456 /* Additional data for ASSIGN_OBJ opcode */
FETCH_OBJ_R $3, !0, 'a' /* Retreive property 'a' from compiled variable 0 object */
FETCH_OBJ_R $4, !0, 'b' /* Retreive property 'b' from compiled variable 0 object */
ADD ~5, $3, $4 /* Add those values and store the result in temp var 5 */
ASSIGN_OBJ $2, !0, 'c' /* Assign the ADD result to property 'c' of compiled variable 0 object */
OP_DATA ~5 /* Additional data for ASSIGN_OBJ opcode */
FETCH_OBJ_R $6, !0, 'c' /* Retreive property 'c' from compiled variable 0 object */
ECHO $6 /* Echo the value */

What's important to see here is that the properties are refetched each time a read or write is performed on them, which at first glance looks as bad as the pre PHP5.1 way of dealing with variables. Don't let your enthusiam for compiled variables blind you though. Remember the magic __get(), __set(), __offsetget(), and __offsetset() methods which objects allow for. These overloading tricks are great, but they mean that the variable returned by one fetch may not be the variable returned by a subsequent fetch. It's unfortunate that this can't be guaranteed, but it's the reality of a dynamic language like PHP. Know your particular class isn't overloaded? You can get that speedup (at least some of it) back by using good 'ol references to turn your object variables into simple variables:

<?php
$a = &$f->a;
$b = &$f->b;
$c = &$f->c;
$a = 123;
$b = 456;
$c = $a + $b;
echo $c;
?>

Becomes:

FETCH_OBJ_W               $0, !1, 'a'     /* Retreive property 'a' from compiled variable 1 object */
ASSIGN_REF $1, !0, $0 /* Make compiled variable 0 a reference to the retreived variable */
FETCH_OBJ_W $2, !1, 'b' /* Retreive property 'b' from compiled variable 1 object */
ASSIGN_REF $3, !2, $2 /* Make compiled variable 2 a reference to the retreived variable */
FETCH_OBJ_W $4, !1, 'c' /* Retreive property 'c' from compiled variable 1 object */
ASSIGN_REF $5, !3, $4 /* Make compiled variable 3 a reference to the retreived variable */
ASSIGN $6, !0, 123 /* Assign the numeric value 123 to compiled variable 0 */
ASSIGN $7, !2, 456 /* Assign the numeric value 456 to compiled variable 2 */
ADD ~8, !0, !2 /* Add compiled variable 0 to compiled variable 2 */
ASSIGN $9, !3, ~8 /* Assign the value of temporary variable 8 to compiled variable 3 */
ECHO !3 /* Echo the value of compiled variable 3 */

Now, this particular example is actually a few more opcodes, and a little bit more work for the engine too, but the more your code following this point uses the local/simple-variable/reference copies rather than the object copies, the balance will quickly tip towards your favor because the variables are already fetched, and they don't need to go through the expensive re-fetch process (which is worse for objects than it ever was for regular variables).


Another caveat to CVs, is that they are entirely scope-local. This should make since as $a in the globals scope is not the same as $a in a given function. What this means for your script, is that when execution enters a new function (execution scope) the CVs for that function are a blank slate and everything has to be fetched anew, even if that function was called before.

Globals and Statics

Statics and Globals are treated to CV status, but only by way of the reference trick I just mentioned:


<?php
static $bar;
echo $bar;

Turns into:

FETCH_W      static      $0, 'bar'
ASSIGN_REF !0, $0
ECHO !0

What does that mean for your use of the $GLOBALS array? That's right, the global keyword is technically faster. Now, I want to be really clear about one thing here. The minor speed affordance given by using your globals as localized CVs needs to be seriously weighed against the maintainability of looking at your code in five years and knowing that $foo came from the global scope. something_using($GLOBALS['foo']); will ALWAYS be clearer to you down the line than global $foo; /* buncha code */ something_using($foo); Don't be penny-wise and pound foolish..


No, seriously. This post shouldn't be taken as a guide-book to speeding up your apps, they're slow for other reasons.

No comments:

Post a Comment