Jan 7, 2015

HHVM Extension Writing, Part III

In Part I of this series, we looked at the basic building blocks of a simple HHVM extension skeleton. Part II continued with three of the five "smart" datatypes used throughout the API. But now we get to have some fun. It's time to start looking into declaring classes with methods and properties and constants (oh my!).

I'll be continuing with the same git repository at https://github.com/sgolemon/hhvm-extension-writing where we left off at the end of Part II with commit 8e82e3e416.

Declaring methods

As with functions, the bulk of your class definition is going to appear in your systemlib file, maybe something like the following in ext_example1.php.
class Example1_Greeter {
  public function greet() {
    echo "Hello {$this->name}\n";
  }

  public function __construct(protected string $name = 'Stranger') {}
}

As you should expect by now, compiling your extension with this bit of code should mean that the Example1_Greeter class is now available to all requests and may be invoked like any other class definition. Let's apply what we already know from making native functions, and see how it works with methods...

  <<__Native>>
  public function getName(): string;

  <<__Native>>
  static public function DefaultGreeting(): string;

While you're probably tempted to rush off and add an HHVM_FE() and HHVM_FUNCTION() implementation to the C++ file, you'd only be half right. These aren't functions, they're methods, and as such have a different set of macros.
const StaticString
  s_Example1_Greeter("Example1_Greeter");

String HHVM_METHOD(Example1_Greeter, getName) {
  return this_->o_get(s_name, false, s_Example1_Greet);
}

String HHVM_STATIC_METHOD(Example1_Greeter, DefaultGreeting) {
  return "Hello";
}

Meanwhile, in moduleInit(), we'll add:
  HHVM_ME(Example1_Greeter, getName);
  HHVM_STATIC_ME(Example1_Greeter, DefaultGreeting);

The code so far is at commit 335dca9573.

Similarly, properties and constants may be declared directly on the Hack definition of the class in your systemlib file. I won't bother showing it here, since you all know how to write PHP code, but I'll put an example or two in the git repo.

What's marginally more interesting, and something you can probably guess at from the coverage in Part I, is that you can declare class constants from C++, meaning that they can take on values defined in external headers or computed values. Let's add one from moduleInit().

  Native::registerClassConstant(s_Example1_Greeter.get(), s_DEFAULT_GREETING.get(), s_Hello.get());

In the examples repo, I've changed ->getName() to now use this constant, so that you can see it propagate up.

The code so far is at commit 007c314b2d.

Binding internal data

What we have so far is all well and good for simple classes, but most extension classes will need to store some opaque pointer from an external library somewhere on the class that can be easily referenced later on. For that, things start to get a little bit more complicated. To help clarify what we're doing, I'm going to wipe the slate clean by moving example1 to its own subdirectory, and starting fresh with example2.

Yeah, kinda messy changing my whole directory structure around midway, but would you rather I ran `git push --force`? Yeah, I thought not.

The code skeleton we're starting out with is at commit: 20dc4ef824 in example2/.

To illustrate something slightly less contrived than earlier examples, I'll be wrapping the POSIX FILE* object in a simple PHP class. Let's start with something basic in our systemlib, containing just a constructor:
<<__NativeData("Example2_File")>>
class Example2_File {
  <<__Native>>
  public function __construct(string $filename, string $mode): void;
}

A new user attribute has appeared! <<__NativeData("Example2_File")>> tells the runtime that this is no ordinary object. This object should be over-allocated with enough to space to handle some internal C++ object, identified by the quoted name. In practice this is usually the name of the class it goes with, but it doesn't have to be. How does this hook up to internals? That comes next, within ext_example2.cpp by adding an #include "hphp/runtime/vm/native-data.h" and the following code:
const StaticString
  s_Example2_File("Example2_File");

class Example2_File {
 public:
  Example2_File() { /* new Example2_File */ }
  Example2_File(const Example2_File&) = delete;
  Example2_File& operator=(const Example2_File& src) {
    /* clone $instanceOfExample2_File */
    throw Object(SystemLib::AllocExceptionObject(
      "Cloning Example2_File is not allowed"
    ));
  }

  ~Example2_File() { sweep(); }
  void sweep() {
    if (m_file) {
      fclose(m_file);
      m_file = nullptr;
    }
  }

  FILE* m_file{nullptr};
};

void HHVM_METHOD(Example2_File, __construct, const String& filename, const String& mode) {
  auto data = Native::data<Example2_File>(this_);
  if (data->m_file) {
    throw Object(SystemLib::AllocExceptionObject(
      "File is already open!"
    ));
  }
  data->m_file = fopen(filename.c_str(), mode.c_str());
  if (!data->m_file) {
    String message("Unable to open ");
    message += filename + ": errno=" + String(errno);
    throw Object(SystemLib::AllocExceptionObject(message));
  }
}

And some glue code in moduleInit() to tie both the constructor and the data class into the class definition:
  HHVM_ME(Example2_File, __construct);

  Native::registerNativeDataInfo<Example2_File>(s_Example2_File.get());

The code so far is at commit 97b3cfd49e.

We're only opening (and ultimately closing) our file at this point, but these are really important stages in an object's lifecycle, so this is worth going though slowly. When a PHP script calls $o = new Example2_File(__FILE__, "r");, the first thing the engine does is allocate space for the object. This is done by adding sizeof(ObjectData) (the standard, base object size) to the size given by any NativeDataInfo associated with it. We made that association by calling Native::registerNativeDataInfo(StringData* id);, where T is the C++ class type to allocate with the ObjectData, and id is the symbolic name we gave it in the syetemlib file using <<__NativeData("id")>>.

Next, the engine invokes the constructor, providing a pointer to the object via a hidden ObjectData* this_ property in the C++ method's signature. From here, we can get access to our private data structure by using the Native::data() accessor to jump to the correct offset from this_. At this point, we have access to a normal C++ object which just so happens to be bound to a PHP object, and we have constructor parameters as well!

From here, there are two reasons a PHP object might die. In the expected case, it runs out of references and is destructed during the course of request's runtime. In this case, our auxiliary object has its destructor called as well, so that external pointers can be cleaned up nicely. The other time a PHP object can die is when the request is shutting down. This is somewhat more exceptional since the memory manager is sweeping ALL request-local data, not necessarily in the most ideal order. It's up to your auxiliary class to deal with non-sweepable resources, but trust that the runtime will deal with resources which are. This is resolved by having a secondary psuedo-destructor called sweep();. For simple implementations like ours, we want the regular destructor and sweep to do the same time, since the only members of this C++ class are external pointers. If we have sweepable resources such as an HPHP::String however, we'd want to avoid the implicit member destruction which comes with calling ~Example2_File(). It's entirely possible that the internal state of that String is no longer valid because it was sweeped first. Hence the need for a separate sweep() function.

TL;DR? - Just have your destructor call sweep(), and deal with external pointers in sweep(). That's good 90% of the time.

You might also have noticed that I'm throwing an exception in the assignment operator. This is normally used for handling a clone, where you'd probably duplicate the FILE* handle, but I realized midway that POSIX file streams don't really have that notion, so I took the easy way out and threw a standard exception. In practice, the implementation would probably look something like:
  Example2_File& operator=(const Example2_File& src) {
    /* copy/clone class members, then return self */
    if (m_file) {
      fclose(m_file);
      m_file = nullptr;
    }
    if (src.m_file) {
      m_file = fclone(src.m_file);
    }
    return *this;
  }

But like I said, there doesn't seem to be an fclone() as such. Instead, let's add a few more methods to flesh out our class:
String HHVM_METHOD(Example2_File, read, int64_t len) {
  auto data = Native::data<Example2_File>(this_);
  String ret(len, ReserveString);
  auto slice = ret.bufferSlice();
  len = fread(slice.ptr, 1, len, data->m_file);
  return ret.setSize(len);
}

int64_t HHVM_METHOD(Example2_File, tell) {
  auto data = Native::data(this_);
  return ftell(data->m_file);
}

bool HHVM_METHOD(Example2_File, seek, int64_t pos, int64_t whence) {
  if ((whence != SEEK_SET) && (whence != SEEK_CUR) && (whence != SEEK_END)) {
    raise_warning("Invalid seek-whence");
    return false;
  }
  auto data = Native::data<Example2_File>(this_);
  return 0 == fseek(data->m_file, pos, whence);
}
The code so far is at commit df4acf359ca.

2 comments:

  1. There is in fact a clone method: see http://linux.die.net/man/2/dup

    ReplyDelete
    Replies
    1. dup() is for int file descriptor handles, not for FILE* streams.

      Delete

Note: Only a member of this blog may post a comment.