May 8, 2006

re2c is no lemon

For the longest time I've been wanting to learn how to actually write a compiler. Sure, I've been twiddling the Zend Engine here and there for a few years, but there's this gulf between tweaking and actually building up from scratch which, until today, I had just never found my way across. Enter PDO_User, my most recent extension for implementing PDO drivers in userspace. Although some uses of PDO_User might be as simple as wrapping real database drivers that don't have a PDO version (e.g. informix), there's also a class of userspace implementation that might want to perform decidedly non-SQL actions via a SQL front end.


Since writing a compiler in userspace is... difficult to say the least, I decided this was the perfect opportunity to sharpen my teeth on the task and provide a generic SQL compiler via the PDO_User class. For the lexer I chose re2c as Marcus has been pushing that and what I'd seen of the syntax made it look pretty straight forward, for the parser I went with lemon which Wez is generally ga-ga over. I considerd flex and yacc, but tossed them out since I've already got a basic understanding of their syntaxes. I wanted a start fresh.

After a couple-three days and some aborted attempts, I've managed to implement a relatively feature-rich SQL tokenizer and compiler accessible to userspace via array PDO_User::tokenizeSQL($sql[, $ignore_whitespce=true]) and array PDO_User::parseSQL($sql). Those of you intimately familiar with SQL specs will certainly find faults in my grammar, and just about everyone should notice that there's no support for floating point numbers yet (a minor oversight). What's important is: (A) I've got a working implementation to build from, and (B) It was damned easy to learn both these tools.


Some thoughts on re2c and lemon:

  • Not enough samples provided on the manual pages. As an OSS documentation maintainer myself I know that this is easier to complain of than actually do something about.
  • I got stuck early on with re2c due to having an old version (the one packaged with debian) which didn't support case-insensitive matching. If you run into this problem, check your re2c version. If `re2c -v` isn't even able to output a version, then it's certainly too old.
  • What the @!#& is the "error" non-terminal for in lemon? The documentation talks about using it to gracefully deal with parse errors, but doesn't talk about how to use it.

If all you want is some sample output, have a look at this:

<?php
print_r(PDO_User::parseSQL("SELECT foo,bar as b,baz
FROM boop LEFT JOIN doop as d
ON (bid=did)"));



Array
(
[type] => statement
[statement] => select
[fields] => Array
(
[0] => foo
[1] => Array
(
[type] => alias
[field] => bar
[as] => b
)
[2] => baz
)
[from] => Array
(
[type] => join
[table1] => boop
[join-type] => left
[table2] => Array
(
[type] => alias
[table] => doop
[as] => d
)
[on] => Array
(
[type] => condition
[op1] => bid
[condition] => =
[op2] => did
)
)

[modifiers] =>
[terminating-semicolon] =>
)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.