PHP suckiness: XML

After weeks of mind-numbing IT type stuff, I'm finally getting back into programming a little. I've been playing with the .NET XML libraries the past couple of days. In particular, the System.XML.XPath library, which I found quite handy for accessing XML configuration files. So, after reading up a bit on XPath, XSLT, and XML in general, I was naturally overcome with a fit of optimism and decided to look at converting LnBlog to store data in XML files.

Currently, LnBlog stores it's data in "text files." What that really means is that it dumps each piece of entry meta into a "name: value" line at the beginning of the file and then dumps all the body data after that. It's not a pretty format in terms of interoperability or standardization. However, when you look at it in a text editor, it is very easy to see what's going on. It's also easy to parse in code, as each piece of metadata is one line with a particular name, and everything else is the body.

This scheme works well enough, but it's obviously a bit ad hoc. A standard format like XML would be much better. And since PHP is geared mostly toward developing web applications, and XML is splattered all over the web like an over-sized fly on a windshield, I figured reading and writing XML files would be a cinch.

Little did I know.

You see, for LnBlog, because it's targeted at lower-end shared hosting environments, and because I didn't want to limit myself to a possible userbase of seven people, I use PHP 4. It seems that XML support has improved in PHP 5, but that's still not as widely deployed as one might hope. So I'm stuck with the XML support in PHP4, which is kind of crappy.

If you look at the PHP 4 documentation, there are several XML extensions available. However, the only one that's not optional or experimetal, and hence the only one you can count on existing in the majority of installations, is the XML_Parser extension. What is this? It's a wrapper around expat, that's what. And that's my only option.

Don't get me wrong - it's not that expat is bad. It's just that it's not what I need. Expat is an event-driven parser, which means that you set up callback functions that get called when the parser encounters tags, attributes, etc. while scanning the data stream. The problem is, I need something more DOM-oriented. In particular, I just need something that will read the XML and parse it into an array or something based on the DOM.

The closest thing to that in the XML_Parser extension is the xml_parse_into_struct() function, which parses the file into one or two arrays, depending on the number of arguments you give. These don't actually correspond to the DOM, but rather to the sequence in which tags, data, etc. were encountered. So, in other words, if I want to get the file data into my objects, I have to write a parser to parse the output of the XML parser.

And did I mention writing XML files? What would be really nice is a few classes to handle creating nodes with the correct character encoding (handling character encoding in PHP is non-trivial), escape entities, and generally make sure the document is well-formed. But, of course, those classes don't exist. Or, rather, they exist in the PEAR repository, but I can't count on my users having shell access to install new modules. Hell, I don't have shell access to my web host, so I couldn't install PEAR modules if I wanted to. My only option is to write all the code myself. Granted, it's not a huge problem, so long as nobody ever uses a character set other than UTF-8, but it's still annoying.

Maybe tomorrow I can rant about the truly brain-dead reference passing semantics in PHP 4. I had a lovely time with that when I was trying to optimize the plugin system.

This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.