« Trouble | Main | Big yellow taxi »

Adventures in XML

This morning I created my first RSS feed. (I know, there’s one for this weblog, but Movable Type made that for me; I didn’t make it.) We’re starting work on overhauling biopsychology.com for the fourth edition of Rosenzweig, and an RSS feed makes sense for that site, so I’m figuring out how to produce one from the existing database.

So far, so good; NetNewsWire will read it, which means it is mostly clean. But there’s still issues with the validator: all kinds of non-XML characters. I can filter some of them with a custom PHP function (mainly just a string of preg_replace() calls) between the database and the output page, but I can’t scan the output for every accented vowel.

So far, most of the PHP functions I’ve found for XML deal with going from XML to something else—taking an RSS feed and putting it in a web page, for instance. I’m going the other way, with “dirty” text which needs to be valid XML, and I’m not quite flying yet.

Later: I think the PHP function I’m looking for is htmlentities(). Still, the validator is complaining about character set and MIME type. Your feed appears to be encoded as 'UTF-8', but your server is reporting 'ISO-8859-1' I don’t know if I can tweak the MIME type and character set, given that I need to send this through PHP. Maybe PHP can indicate the MIME type?

Even Later: I had been worried that monkeying with the MIME type in Apache would cause the PHP processor to ignore the file. Well, only in httpd.conf; if I change the type in the mime.types file, all appears to be well.

Of course, htmlentities() doesn’t handle curly single apostrophes, so the Feed Validator still chokes. Oh, the humanity!

Post a comment