Jan 8, 2015

HHVM Extension Writing, Part IV

Hopefully you've had some time to digest the first three parts of my HHVM Extension Writing series, and you're ready to embark into the world of Resources. Not quite as exciting as objects, they're certainly less commonly used in new extensions since Objects are so much more versatile, but there were an integral part of PHP's history prior to PHP5, and are still in heavy use by such features as streams, curl, and most database connectors.

We'll be continuing with the same examples repo at https://github.com/sgolemon/hhvm-extension-writing where I've already landed a skeleton for example3 with commit 144e9698b5.

Resource

Defining a new resource type shares some similarity with Objects, except that you won't be defining anything in systemlib initially, because a resource doesn't have an API in and of itself. It's just an opaque pointer used by seemingly unrelated functions and methods. We'll define some global functions in the second half of this post when we start accessing the resource. Any wonder Objects are winning out?

As with the Object example, we'll be implementing a bit of filesystem access to demonstrate how one might use resources.
class Example3File : public SweepableResourceData {
 public:
  DECLARE_RESOURCE_ALLOCATION_NO_SWEEP(Example3File)
  CLASSNAME_IS("example3-file")
  const String& o_getClassNameHook() const override { return classnameof(); }

  Example3File(const String& filename, const String& mode) {
    m_file = fopen(filename.c_str(), mode.c_str());
    if (!m_file) {
      throw Object(SystemLib::AllocExceptionObject(
        "Unable to open file"
      ));
    }
  }

  ~Example3File() { sweep(); }
  void sweep() { close(); }

  void close() {
    if (m_file) {
      fclose(m_file);
      m_file = nullptr;
    }
  }

  bool isInvalid() const override {
    return !m_file;
  }

  FILE* m_file{nullptr};
};

The first three lines of our class are basically boilerplate. DECLARE_RESOURCE_ALLOCATION_NO_SWEEP handles some specifics about the MemoryManager, because unlike objects which are allocated from userspace, resources are allocated from C++, and a bare c++ new won't do a "smart" allocation unless told to do so by this macro. The CLASSNAME_IS macro, and the o_getClassNameHook() virtual below it, define the "resource name" as seen from PHP when you var_dump() or call get_resource_type.

As with the Object version of this example, the class destructor is automatically called when the resource variable falls out of scope, meanwhile sweep() is invoked if the variable is still "live" at the end of a request. Since in this case we want the same behavior to occur, we simple chain one through the other, and on to their actual purpose; Closing the file.

isInvalid() exists for Resource types because it's very common to invoke functions like fclose() who's purpose is to make the resource no longer usable as its normal type. HHVM's mechanism for dealing with this is very different from PHP's, but the end result is the same. So long as your class has some way of knowing that it's "dead", HHVM can detect it, and report accordingly in var_dump() and other calls.

So now that we've defined a Resource type, let's make a function to create instances and do things with them:
Resource HHVM_FUNCTION(example3_fopen, const String& filename, const String& mode) {
#ifdef NEWOBJ
  return Resource(NEWOBJ(Example3File)(filename, mode)); 
#else
  return Resource(newres<Example3File>(filename, mode));
#endif
}

void HHVM_FUNCTION(example3_fclose, const Resource& fp) {
  // By default, if fp is not of type "Example3File" and valid,
  // HHVM will throw an exception here
  auto f = fp.getTyped<Example3File>();
  f->close();
}

Variant HHVM_FUNCTION(example3_ftell, const Resource& fp) {
  // By passing "true" for "badTypeOkay", invalid resources
  // result in returning nullptr, rather than throwing an exception
  // So check the return type!
  auto f = fp.getTyped<Example3File>(true /* nullOkay */, true /* badTypeOkay */);
  if (!f) {
    raise_warning("Instance of example3-file resource expected");
    return init_null();
  }
  return (int64_t)ftell(f->m_file);
}

As you can see, allocating a resource is literally as simple as newing up a C++ class with the newres<T>() template (or if you're using an older version of HHVM, the NEWOBJ() macro) and passing it as an argument to Resource's constructor.

For getting at that C++ instance, I've presented two different, equally valid methods. The first is certainly more concise, but you may not want to throw exceptions (you've probably eschewed making an Object for a reason, after all). On the other hand, while the second form allows you to handle your error cases more explicitly, this particularly common case forced us to change our return type from int to the far less precise mixed. Take this into account when designing your APIs. Many extensions follow the latter pattern, using a simple macro to avoid excessive copypasta from one function to the next. I've added an example of that to the repo.

What's with the nullOkay arg? TL;DR version: It's a bit of legacy logic and doesn't really have a place in modern HHVM extensions. You can usually set it to the same value as you pass for badTypeOkay. Note that both of these args are false by default.

Update: In the initial version of this post, I used NEWOBJ() exclusively to allocate a new resource instance, but as @maide pointed out in IRC, that's been removed from the newest versions of HHVM and replaced with newres<T>(). We use an #ifdef to figure out which version we're dealing with because the HHVM team is lousy at updating the API version constant. ;p

The code so far is at commit 5f852d45cc.

What's next...

We've covered all the userspace data types, declaration of functions, classes, and resources. Next up, in Part V, we'll back up for a few moments and look at the build system so that we can start linking in external libraries. If you're already familiar with CMake, then you can probably skip that chapter. Part VI is TBD at the moment, but I'll probably pick up a lot of the parts I glossed over in Parts I-IV, we'll see what comes out...

Jan 7, 2015

HHVM Extension Writing, Part III

In Part I of this series, we looked at the basic building blocks of a simple HHVM extension skeleton. Part II continued with three of the five "smart" datatypes used throughout the API. But now we get to have some fun. It's time to start looking into declaring classes with methods and properties and constants (oh my!).

I'll be continuing with the same git repository at https://github.com/sgolemon/hhvm-extension-writing where we left off at the end of Part II with commit 8e82e3e416.

Declaring methods

As with functions, the bulk of your class definition is going to appear in your systemlib file, maybe something like the following in ext_example1.php.
class Example1_Greeter {
  public function greet() {
    echo "Hello {$this->name}\n";
  }

  public function __construct(protected string $name = 'Stranger') {}
}

As you should expect by now, compiling your extension with this bit of code should mean that the Example1_Greeter class is now available to all requests and may be invoked like any other class definition. Let's apply what we already know from making native functions, and see how it works with methods...

  <<__Native>>
  public function getName(): string;

  <<__Native>>
  static public function DefaultGreeting(): string;

While you're probably tempted to rush off and add an HHVM_FE() and HHVM_FUNCTION() implementation to the C++ file, you'd only be half right. These aren't functions, they're methods, and as such have a different set of macros.
const StaticString
  s_Example1_Greeter("Example1_Greeter");

String HHVM_METHOD(Example1_Greeter, getName) {
  return this_->o_get(s_name, false, s_Example1_Greet);
}

String HHVM_STATIC_METHOD(Example1_Greeter, DefaultGreeting) {
  return "Hello";
}

Meanwhile, in moduleInit(), we'll add:
  HHVM_ME(Example1_Greeter, getName);
  HHVM_STATIC_ME(Example1_Greeter, DefaultGreeting);

The code so far is at commit 335dca9573.

Similarly, properties and constants may be declared directly on the Hack definition of the class in your systemlib file. I won't bother showing it here, since you all know how to write PHP code, but I'll put an example or two in the git repo.

What's marginally more interesting, and something you can probably guess at from the coverage in Part I, is that you can declare class constants from C++, meaning that they can take on values defined in external headers or computed values. Let's add one from moduleInit().

  Native::registerClassConstant(s_Example1_Greeter.get(), s_DEFAULT_GREETING.get(), s_Hello.get());

In the examples repo, I've changed ->getName() to now use this constant, so that you can see it propagate up.

The code so far is at commit 007c314b2d.

Binding internal data

What we have so far is all well and good for simple classes, but most extension classes will need to store some opaque pointer from an external library somewhere on the class that can be easily referenced later on. For that, things start to get a little bit more complicated. To help clarify what we're doing, I'm going to wipe the slate clean by moving example1 to its own subdirectory, and starting fresh with example2.

Yeah, kinda messy changing my whole directory structure around midway, but would you rather I ran `git push --force`? Yeah, I thought not.

The code skeleton we're starting out with is at commit: 20dc4ef824 in example2/.

To illustrate something slightly less contrived than earlier examples, I'll be wrapping the POSIX FILE* object in a simple PHP class. Let's start with something basic in our systemlib, containing just a constructor:
<<__NativeData("Example2_File")>>
class Example2_File {
  <<__Native>>
  public function __construct(string $filename, string $mode): void;
}

A new user attribute has appeared! <<__NativeData("Example2_File")>> tells the runtime that this is no ordinary object. This object should be over-allocated with enough to space to handle some internal C++ object, identified by the quoted name. In practice this is usually the name of the class it goes with, but it doesn't have to be. How does this hook up to internals? That comes next, within ext_example2.cpp by adding an #include "hphp/runtime/vm/native-data.h" and the following code:
const StaticString
  s_Example2_File("Example2_File");

class Example2_File {
 public:
  Example2_File() { /* new Example2_File */ }
  Example2_File(const Example2_File&) = delete;
  Example2_File& operator=(const Example2_File& src) {
    /* clone $instanceOfExample2_File */
    throw Object(SystemLib::AllocExceptionObject(
      "Cloning Example2_File is not allowed"
    ));
  }

  ~Example2_File() { sweep(); }
  void sweep() {
    if (m_file) {
      fclose(m_file);
      m_file = nullptr;
    }
  }

  FILE* m_file{nullptr};
};

void HHVM_METHOD(Example2_File, __construct, const String& filename, const String& mode) {
  auto data = Native::data<Example2_File>(this_);
  if (data->m_file) {
    throw Object(SystemLib::AllocExceptionObject(
      "File is already open!"
    ));
  }
  data->m_file = fopen(filename.c_str(), mode.c_str());
  if (!data->m_file) {
    String message("Unable to open ");
    message += filename + ": errno=" + String(errno);
    throw Object(SystemLib::AllocExceptionObject(message));
  }
}

And some glue code in moduleInit() to tie both the constructor and the data class into the class definition:
  HHVM_ME(Example2_File, __construct);

  Native::registerNativeDataInfo<Example2_File>(s_Example2_File.get());

The code so far is at commit 97b3cfd49e.

We're only opening (and ultimately closing) our file at this point, but these are really important stages in an object's lifecycle, so this is worth going though slowly. When a PHP script calls $o = new Example2_File(__FILE__, "r");, the first thing the engine does is allocate space for the object. This is done by adding sizeof(ObjectData) (the standard, base object size) to the size given by any NativeDataInfo associated with it. We made that association by calling Native::registerNativeDataInfo(StringData* id);, where T is the C++ class type to allocate with the ObjectData, and id is the symbolic name we gave it in the syetemlib file using <<__NativeData("id")>>.

Next, the engine invokes the constructor, providing a pointer to the object via a hidden ObjectData* this_ property in the C++ method's signature. From here, we can get access to our private data structure by using the Native::data() accessor to jump to the correct offset from this_. At this point, we have access to a normal C++ object which just so happens to be bound to a PHP object, and we have constructor parameters as well!

From here, there are two reasons a PHP object might die. In the expected case, it runs out of references and is destructed during the course of request's runtime. In this case, our auxiliary object has its destructor called as well, so that external pointers can be cleaned up nicely. The other time a PHP object can die is when the request is shutting down. This is somewhat more exceptional since the memory manager is sweeping ALL request-local data, not necessarily in the most ideal order. It's up to your auxiliary class to deal with non-sweepable resources, but trust that the runtime will deal with resources which are. This is resolved by having a secondary psuedo-destructor called sweep();. For simple implementations like ours, we want the regular destructor and sweep to do the same time, since the only members of this C++ class are external pointers. If we have sweepable resources such as an HPHP::String however, we'd want to avoid the implicit member destruction which comes with calling ~Example2_File(). It's entirely possible that the internal state of that String is no longer valid because it was sweeped first. Hence the need for a separate sweep() function.

TL;DR? - Just have your destructor call sweep(), and deal with external pointers in sweep(). That's good 90% of the time.

You might also have noticed that I'm throwing an exception in the assignment operator. This is normally used for handling a clone, where you'd probably duplicate the FILE* handle, but I realized midway that POSIX file streams don't really have that notion, so I took the easy way out and threw a standard exception. In practice, the implementation would probably look something like:
  Example2_File& operator=(const Example2_File& src) {
    /* copy/clone class members, then return self */
    if (m_file) {
      fclose(m_file);
      m_file = nullptr;
    }
    if (src.m_file) {
      m_file = fclone(src.m_file);
    }
    return *this;
  }

But like I said, there doesn't seem to be an fclone() as such. Instead, let's add a few more methods to flesh out our class:
String HHVM_METHOD(Example2_File, read, int64_t len) {
  auto data = Native::data<Example2_File>(this_);
  String ret(len, ReserveString);
  auto slice = ret.bufferSlice();
  len = fread(slice.ptr, 1, len, data->m_file);
  return ret.setSize(len);
}

int64_t HHVM_METHOD(Example2_File, tell) {
  auto data = Native::data(this_);
  return ftell(data->m_file);
}

bool HHVM_METHOD(Example2_File, seek, int64_t pos, int64_t whence) {
  if ((whence != SEEK_SET) && (whence != SEEK_CUR) && (whence != SEEK_END)) {
    raise_warning("Invalid seek-whence");
    return false;
  }
  auto data = Native::data<Example2_File>(this_);
  return 0 == fseek(data->m_file, pos, whence);
}
The code so far is at commit df4acf359ca.

Jan 6, 2015

HHVM Extension Writing, Part II

In our last installment I walked through setting up a dev environment and creating a simple HHVM extension which exposed some constants and global scope functions. Today, we'll expand on that by delving deeper into three of the five "Smart" types: String, Array, and Variant. The other two "Smart" types will be covered in Parts III(Objects) and IV(Resources) since they require a bit more explaining.

All code in the following examples can be found at https://github.com/sgolemon/hhvm-extension-writing and we'll be starting from where Part I left off: commit ad9618ac8c.

The String class

HPHP::String resembles C++'s std::string class in many ways, but also builds in several assumptions about how PHP strings should behave, is able to be encapsulated in a Variant (mixed) object, and performs common string related tasks, such as numeric conversion.

This post is going to highlight the most common features of the String class, but you should look through the header file yourself for a more in-depth exploration.

/* Basic inspection */
class String {
 public:
  const char* c_str() const;
  int size() const;
  bool empty() const { return size() == 0; }
  int length() const ( return size(); }
  bool isNumeric() const;
  bool isInteger() const;
  bool isZero() const;
  bool toBoolean() const;
  char toByte() const;
  short toInt16() const;
  int toInt32() const;
  int64_t toInt64() const;
  double toDouble() const;
  std::string toCppString() const;

  char charAt(int pos) const;
  char operator[](int pos) const;
};

The meaning and use of these methods should all be straightforward. In practice, c_str(), size(), and empty() are going to cover 90% of your uses for reading values from the String class.

/* Creation */
class String {
 public:
  String(); // empty string
  String(const char* cstr);
  String(const std::string& cppstr);
  String(const String& hphpstr);
  String(int64_t num);
  String(double num);

  static StaticString FromCStr(const char* cstr);

  String(size_t cap, ReserveStringMode mode);
  MutableSlice bufferSlice();
  uint32_t capacity() const;
  const String& setSize(int len);
};

The constructors, as you can see, are generally built around making new runtime string values from an existing string or numeric value, and are again straight-forward to use. String::FromCStr() is a somewhat special case in that it creates a StaticString, rather than a String. While a String is cleaned up at the end of the request it was created in, StaticStrings live forever, and can even be shared between multiple requests. Because overuse of StaticString could easily lead to memory bloat, they're typically only used for defining persistent features (such as constant names/values) as seen in Part I.

The most interesting part of this API is the ReserveStringMode and MutableSlice. Ordinarily, you shouldn't save the pointer you get from String::c_str() as it can potentially change between calls, and you generally shouldn't go modifying a String unless you know you own it anyway. If you do have need to modify a string, call bufferSlice() on it. The MutableSlice structure you get back will contain a pointer to a (relatively) stable block of memory which can be populated. Here's an example:

String HHVM_FUNCTION(example1_count_preallocate) {
  /* 30 bytes: 3 per number: 'X, ' */
  String ret(30, ReserveString);
  auto slice = ret.bufferSlice();
  for (int i = 0; i < 10; ++i) {
    snprintf(slice.ptr + (i*3), 4, "%d, ", i);
  }
  /* Terminate just after the 9th digit, overwriting the ',' with a null byte */
  return ret.setSize((9*3) + 1);
}

This contrived example allocates enough space for 10 single-digit numbers, and a comma and space following them. It uses snprintf() to fill that buffer up, then it truncates it as 28 characters, since the final ', ' wasn't actually necessary. You'll find this pattern in use anywhere an API expects you to provide it with a buffer for it to fill, such as in the intl extension where it calls into ICU.

Another approach to building up a string from parts would be to use the operator+ overload which allows you to simply concatenate Strings such as in the following:

String HHVM_FUNCTION(example1_count_concatenate) {
  String ret, delimiter(", ");
  for (int i = 0; i < 10; ++i) {
    if (i > 0) {
      ret += delimiter;
    }
    ret += String(i);
  }
  return ret;
}

There are costs and benefits to both versions. The former is more efficient as it only does one allocation, as opposed to the latter which does at least 11, and far less copying around. On the other hand, the second version is far more readable and far less error prone. For the contrived example, I'd call the second version "better", but there are certainly cases where the first version is superior.
The code so far is at commit: fa82b3cd70

The Array Class

Arrays are the do-all bucket of "stuff" of the PHP language. They can behave like vectors, maps, sets, or weird hybrid hodgepodge containers without rhyme or reason. You already know how to interact with them from userspace, so let's take a look at how to interact with them from C++. As with Strings, we're only going to go into the most common API calls here, check out the header for the full story.

/* Core API */
class Array {
 public:
  static Array Create(); // array()
  static Array Create(const Variant& value); // array($value)
  static Array Create(const Variant& key, const Variant& value); // array($key => $value)

  /* Read */
  const Variant operator[](int64_t key) const;
  const Variant operator[](const String& key) const;
  const Variant operator[](const Variant& key) const;

  /* count($arr) */
  ssize_t count() const;

  /* array_key_exists($arr, $key); */
  bool exists(int64_t key) const;
  bool exists(const String& key, bool isKey = false) const;
  bool exists(const Variant& key, bool isKey = false) const;

  /* Write */
  void clear();

  /* $arr[$key] = $v; */
  void set(int64_t key, const Variant& v);
  void set(const String& key, const Variant& v, bool isKey = false);
  void set(const Variant& key, const Variant& v, bool isKey = false);

  void prepend(const Variant& v); // array_unshift($v);
  Variant dequeue();              // array_shift($v);
  void append(const Variant& v);  // array_push($v); aka => $arr[] = $v;
  Variant pop();                  // array_pop($v);

  /* $arr[$key] =& $v; */
  void setRef(int64_t key, const Variant& v);
  void setRef(const String& key, const Variant& v, bool isKey = false);
  void setRef(const Variant& key, const Variant& v, bool isKey = false);

  /* $arr[] =& $v; */
  void appendRef(Variant& v);

  /* unset($arr[$key]); */
  void remove(int64_t key);
  void remove(const String& key, bool isKey = false);
  void remove(const Variant& key);
};

As you can see, the Array APIs mirror PHP's userspace API very closely, down to the read API using square-bracket notation just like PHP code. Let's write a couple new methods dealing with arrays as arguments and return values.
const StaticString
  s_name("name"),
  s_hello("hello"),
  s_Stranger("Stranger");

void HHVM_FUNCTION(example1_greet_options, const Array& options) {
  String name(s_Stranger);
  if (options.exists(s_name)) {
    name = options[s_name].toString();
  }
  bool hello = true;
  if (options.exists(s_hello)) {
    hello = options[s_hello].toBoolean();
  }
  g_context->write(greet ? "Hello " : "Goodbyte ");
  g_context->write(name);
  g_context->write("\n");
}

Array HHVM_FUNCTION(example1_greet_make_options, const String& name, bool hello) {
  Array ret = Array::Create();
  if (!name.empty()) {
    ret.set(s_name, name);
  }
  ret.set(s_hello, hello);
  return ret;
}

Pretty similar syntax to writing PHP code, yeah?

The code so far is at commit: 3966bb1da1

The Variant Class

The last "smart" class doesn't represent a single PHP type, rather it represents all types in a sort of meta-container which knows what it's holding, and knows how to convert between the concrete types. Variant is useful when you need to accept and/or return multiple possible types. For a start, let's list out the core API. Remember that there are far more methods than I'll cover here, and you can find the reset in the header file.

/* Creation/Assignment */
class Variant {
 public:
  Variant();
  Variant(bool bval);
  Variant(int64_t lval);
  Variant(double dval);
  Variant(const String& strval);
  Variant(const char* cstrval);
  Variant(const Array& arrval);
  Variant(const Resource& resval);
  Variant(const Object& objval);
  Variant(const Variant& val);

  template Variant &operator=(const T &v);
}

These APIs together mean that a Variant may be initialized or assigned from any other variable type supported by userspace code. This becomes especially powerful when looking at Variant return types.
Variant HHVM_FUNCTION(example1_password, const String& guess) {
  if (guess.same(s_secret)) {
    return "Password accepted: A winner is you!";
  }
  return false;
}

These seemingly incompatible return types (const char* and bool) work because they are implicitly constructed into a Variant instance. Explicit types are generally preferred, because the IR can make better assumptions during optimization, but sometimes you just want your return values to be adaptable like that.

/* Introspection and Unboxing */
class Variant {
 public:
  bool isNull() const;
  bool isBoolean() const;
  bool isInteger() const;
  bool isDouble() const;
  bool isNumeric(bool checkString = false) const;
  bool isString() const;
  bool isArray() const;
  bool isResource() const;
  bool isObject() const;

  bool toBoolean() const;
  int64_t toInt64() const;
  double toDouble() const;
  DataType toNumeric(int64_t &ival, double &dval, bool checkString = false) const;
  String toString() const;
  Array toArray() const;
  Resource toResource() const;
  Object toObject() const;
}

These APIs allow pulling a concrete data type out of a Variant so they can be operated on directly. Note that the to*() APIs will convert the type if necessary, even if the is*() call returned false, but that not all conversions make sense. Let's make a contrived example by implementing a simplistic var_dump():

<<__Native>>
function example1_var_dump(mixed $value): void;

void HHVM_FUNCTION(example1_var_dump, const Variant &value) {
  if (value.isNull()) {
    g_context->write("null\n");
    return;
  }
  if (value.isBoolean()) {
    g_context->write("bool(");
    g_context->write(value.toBoolean() ? "true" : "false");
    g_context->write(")\n");
    return;
  }
  if (value.isInteger()) {
    g_context->write("int(");
    g_context->write(String(value.toInt64()));
    g_context->write(")\n");
    return;
  }
  // etc...
}

The code so far is at commit: 8e82e3e416

What's next...

We'll continue in the next installment by exploring Objects. These get a bit more complicated with the introduction of visibility, properties, constants, inheritance, and internal data structures.

HHVM Extension Writing, Part I

I've written a number of blogposts and even one book over the years on writing extensions for PHP, but very little documentation is available for writing HHVM extensions.  This is kinda sad since I built a good portion of the latter's API. Let's fix that, starting with this article.

All the code in this post (and its followups) will be found at https://github.com/sgolemon/hhvm-extension-writing.

Setting up a build environment

The first thing you need to do is get all the dependencies in place.  I'm going to start from a clean install of Ubuntu 14.04 LTS and use the prebuilt HHVM binaries for Ubuntu. Other distros should work (with varying degrees of success), but sorting them all out is beyond the scope of this blog entry.

First, let's trust HHVM's package repo and pull in its package list. Then we can install the hhvm-dev package to pull in the binary along with all needed headers.

$ wget -O - http://dl.hhvm.com/conf/hhvm.gpg.key | \
  sudo apt-key add -
$ echo deb http://dl.hhvm.com/ubuntu trusty main | \
  sudo tee /etc/apt/sources.list.d/hhvm.list
$ sudo apt-get update

$ sudo apt-get install hhvm-dev

Creating an extension skeleton

The most basic, no-nothing extension imaginable requires two files. A C++ source file to declare itself, and a config.cmake file to describe what's being built. Let's start with the build file, which is a simple, single line:

HHVM_EXTENSION(example1 ext_example1.cpp)

This macro declares a new extension named "example1" with a single source file named "ext_example1.cpp". If we had multiple source files, we'd delimit them with a space (HHVM_EXTENSION(example1 ext_example1.cpp ex1lib.cpp utilex1.cpp etc.cpp))

The source file has a little more boilerplate, but fortunately it's also just a handful of lines:

#include "hphp/runtime/base/base-includes.h"

namespace HPHP {

class Example1Extension : public Extension {
 public:
  Example1Extension(): Extension("example1", "1.0") {}
} s_example1_extension;

HHVM_GET_MODULE(example1);

} // namespace HPHP

All we're doing here is exposing a specialization of the "Extension" class which gives itself the name "example1". It doesn't do anything more than declare itself into the runtime environment. Those familiar with PHP extension development can think of this as the zend_module_entry struct, with all callbacks and the function table set to NULL.

The code so far is at commit: 214e2e7be6

Building an extension and testing it out

To build an extension, first run hphpize to generate a CMakeLists.txt file, then cmake . to generate a Makefile from that. Finally, issue make to actually build it. You should see output like the following:

$ hphpize
** hphpize complete, now run 'cmake . && make` to build
$ cmake .
-- Configuring for HHVM API version 20140829
-- Configuring done
-- Generating done
-- Build files have been written to: /home/username/hhvm-ext-writing
$ make
Scanning dependencies of target example1
[100%] Building CXX object CMakeFiles/example1.dir/ext_example1.cpp.o
[100%] Built target example1

Now we're ready to load it into our runtime. Start by creating a simple test file:
<?php
var_dump(extension_loaded('example1.php'));
Then fire up hhvm: hhvm -d extension_dir=. -d hhvm.extensions[]=example1.so tests/loaded.php and you should see bool(true).

Adding functionality

The simplest way to add functionality is to write some Hack code. You could write straight PHP code, but you'll see in a few moments why Hack is preferable for extension systemlibs. Let's introduce a new file: ext_example1.php and link it into our project:
<?hh

function example1_hello() {
  echo "Hello World\n";
}

Then load it in during the moduleInit() (aka MINIT) phase:
class Example1Extension : public Extension {
 public:
  Example1Extension(): Extension("example1", "1.0") {}
  void moduleInit() override {
    loadSystemlib();
  }
} s_example1_extension;
And finally, add the following to your config.cmake file to embed it into the .so, where HHVM can load it from at runtime.
HHVM_EXTENSION(example1 ext_example1.cpp)
HHVM_SYSTEMLIB(example1 ext_example1.php)

Rebuild your extension according to the instructions above, then try it out:
$ hhvm -d extension_dir=. -d hhvm.extensions[]=example1.so tests/hello.php
Hello World

The code so far is at commit: 54782f157d

Bridging the gap

If all you wanted to do was write PHP code implementations you could create a normal library for that. Extensions are for bridging PHP-script into native code, so let's do that. Make a new entry in your systemlib file using some hack specific syntax:
<<__Native>>
function example1_greet(string $name, bool $hello = true): void;

The <<__Native>> UserAttribute tells HHVM that this is the declaration for an internal function. The hack types tell the runtime what C++ type to pair them with, and the usual rules for default arguments apply.

To pair it with an internal implementation, we'll add the following to ext_example1.cpp:

void HHVM_FUNCTION(example1_greet, const String& name, bool hello) {
  g_context->write(hello ? "Hello " : "Goodbye ");
  g_context->write(name);
  g_context->write("\n");
}


And link it to the systemlib by adding HHVM_FE(example1_greet); to moduleInit().

As you can see, internal functions are declared with the HHVM_FUNCTION() macro where the first arg is the name of the function, as exposed to userspace, and the remaining map to the userspace functions argument signature. The argument types map according the following table:

Hack type C++ type (argument) C++ type (return type)
voidN/Avoid
boolboolbool
intint64_tint64_t
floatdoubledouble
stringconst String&String
arrayconst Array&Array
resourceconst Resource&Resource
object
const Object&Object
ClassNameconst Object&Object
mixedconst Variant&Variant
mixed&VRefParamN/A


Since this is Hack syntax, you may declare the types as soft (with an @) or nullable (with a question mark), but since these types are not limited to a primitive, they need to be represented internally as the more generic const Variant& for arguments or Variant or return types (essentially, mixed).

Reference arguments use the VRefParam type noted above. An example of which can be seen below:

<<__Native>>
function example1_life(mixed &$meaning): void;
void HHVM_FUNCTION(example1_life, VRefParam meaning) {
  meaning = 42;
}

The code so far is at commit: df16aca35e

Constants

Constants, like any other bit of PHP, may be declared in the systemlib file, or if they depend on some native value (such as a define from an external library), they may be declare in moduleInit() using the Native::registerConstant() template as with the following:
const StaticString s_EXAMPLE1_YEAR("EXAMPLE1_YEAR");

class Example1Extension: public Extension {
 public:
  Example1Extension(): Extension("example1", "1.0") {}
  void moduleInit() override {
    Native::registerConstant<KindOfInt64>(s_EXAMPLE1_YEAR.get(), 2015);
  }
} s_example1_extension;
The use of this function should be mostly obvious, in that it takes the name of a constant as a StringData* (which comes from a StaticString's .get() accessor), and a value appropriate to the constant's type. The type, in turn, is given as the function's template parameter and is one of the DataType enum values. The kinds correspond roughly to the basic PHP data types.
DataTypeC++ type
KindOfNullN/A
KindOfBooleanbool
KindOfInt64int64_t
KindOfDoubledouble
KindOfStaticStringStringData*
The code so far is at commit: ad9618ac8c

What's next...

In the next part of this series, we'll look at the String, Array, and Variant types. Part III will continue with Objects, then Resources in Part IV.