PHP documentation and sockets

PHP's documentation gets way too much credit.  I often hear people rave about how great it is.  Many of them are newbies, but I hear the same thing from experienced developers who've been writing PHP code for years.

Well, they're wrong.  PHP's documentation sucks.  And if you disagree, you're just plain wrong.

Actually, let me add some nuance to that.  It's not that the documentation sucks per se, it's that it sucks as documentation

You see, a lot of PHP's documentation is written with an eye to beginners.  It has lots of examples and it actually does a very good job of showing you what's available and giving you a general idea of how to use it.  So in terms of a tutorial on how to use the language, the documentation is actually quite good.

The problem is that, sometimes, you don't need a tutorial.  You need actual documentation.  By that, I mean that sometimes you care less about the generalities and more about the particulars.  For instance, you might want to know exactly what a function returns in specific circumstances, or exactly what the behavior is when you pass a particular argument.  Software is about details, and these details matter.  However, PHP frequently elides these details in favor of a more tutorial-like format.  And while that might pass muster for a rookie developer, it's decidedly not OK from the perspective of a seasoned professional.

Case in point: the socket_read() function.  I had to deal with this function the other day.  The documentation page is rather short and I was less than pleased with what I found on it. 

By way of context, I was trying to talk to the OpenVPN management console, which runs on a UNIX domain socket.  We had a small class (lifted from another project) that basically provided a nice facade over the socket communication functions.  I'd noticed that, for some reason, the socket communication was slow.  And I mean really slow.  Like, a couple of seconds per call slow.  Remember, this is not a network call - this is to a domain socket on the same box.  It might not be the fastest way to do IPC, but it should still be reasonably quick.

So I did some experimentation.  Nothing fancy - just injecting some microtime() and var_dump() calls to get a general idea of how long things were taking.  Turns out that's all I needed.  It quickly became obvious that each call to the method to read from the was taking about 1 second, which is completely absurd.

For context, the code in that method was doing something like this (simplified for illustration):

$timeoutTime = time() + 30;
$message = '';
while (time() < $timeoutTime) {
    $character = socket_read($this->socket, 1);
    if ($character === '' || $character === false) {
        break;  // We're done reading
    }
    $message .= $character;
}

Looks reasonable, right?  After all, the documentation says that socket_read() will return the number of characters requested (in this case one), or false on error, or the empty string if there's no more data.  So this seems like it should work just fine. 

Well...not so much.

The problem is with the last read.  It turns out that the documentation is wrong - socket_read() doesn't return the empty string when there's no more data.  In fact, I couldn't get it to return an empty string ever.  What actually happens is that it goes along happily until it exhausts the available data, and then it waits for more data.  So the last call just hangs until it reaches a timeout that's set on the connection (in our case, it was configured to 1 second) and then returns false.

So because we were relying on that "empty string on empty buffer" behavior to detect the end of input, calling that method always resulted in a one-second hang.  This was fairly easily fixed by just reading the data in much larger chunks and checking how much was actually returned to determine if we needed another read call.  But that's not the point.  The point is that we relied on what was in the documentation, and it was just totally wrong!

And it's not like this is the first time I've been bitten by the PHP docs.  Historically, PHP has been very bad about documenting edge cases.  For example, what happens if a particular parameter is null?  What's the exact behavior if the parameters do not match the expected preconditions?  Or what about that "flags" parameter that a bunch of functions take?  Sometimes the available flags are well documented, but sometimes it's just an opaque one-line description that doesn't really tell you what the flag actually does.  It's a crap shoot.

To be fair, the PHP documentation is not the worst I've ever seen.  Not even close.  And it really is very good about providing helpful examples.  It's just that it errs on the side of being light on details, and software is details.

You can reply to this entry by leaving a comment below. This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.

Add your comments #

A comment body is required. No HTML code allowed. URLs starting with http:// or ftp:// will be automatically converted to hyperlinks.