« Good data leads to good science | Main | Heads up »

Information architecture for athletics events: wild speculation and idealism

If anyone actually read through both of my previous posts on web coverage of major track meets and open data formats for training logs, they may have seen the link between them. On the other hand, that might be evidence that this person thinks enough like me to require professional help of some sort.

I described the team and technical skills I would want to provide good coverage of a track meet, but I didn’t describe much of the technology they might use to make it happen. I raised the question of whether a consultant could make a business out of setting up such coverage for multiple events, but didn’t address the issue of integrating such standardized coverage with a wildly heterogeneous array of event websites. The answer to this, as I hinted in the training logs post, is more information architecture—a service-based architecture.

More similarly dull ideas after the jump…

The difficulty with being a contractor charged with event coverage is that your work would be required to integrate with an existing event website. You can’t simply graft your own site onto theirs; it looks silly and unprofessional. You need to be able to provide a standardized information framework to your reporters, photographers, et al (a Content Management Application or CMA, sometimes CMS for Content Management System) while at the same time producing data streams which can be presented appropriately by the event’s existing website (a Content Display Application or CDA.)

Given that different events’ CDAs are likely to be wildly different, ranging from flat HTML pages to intricate applications, the only sane way to do this is to produce some kind of display-neutral information feeds (almost inevitably XML), either allowing the event to format them as necessary, or providing that service as an extra through use of XSL and XSLT transformations. A service-oriented architecture, in other words. What a shock, right?

This shouldn’t just be the strategy for our hypothetical consultant, either. It’s a reasonable strategy for all content providers. The one that sticks out as a good example is the timing company. I’ve mentioned “consumer-level” meet management software before; the guys who run major meets are using a very high grade of this sort of software, often something they’ve written and customized themselves. It wouldn’t surprise me to find out that Lynx Systems are already flinging a lot of XML around the venue on their private network, for example, which is already set up to allow field-event officials with wireless-enabled handheld devices to plug data directly into the scoring system using their FieldLynx software.

The next logical step is to produce the meet results data, item #1 in my list of content produced at a meet, as an XML feed which the event (and the assembled media) can then format and use as they need. It shouldn’t be difficult to produce a DTD or XML Schema for event results, together with a suite of XSD documents to produce HTML in whatever format necessary from a conforming XML document. As an extension to this, the schema should allow for publishing of data from ChampionChip and other transponder-based timing systems. I’d love to see a major marathon make all the “chip” data from its race (or even just the elite race) available in a standard format, to allow interested hackers to create new visualizations of the data.

Now, do the same thinking about the other chunks of data I identified. Athlete quotes, race recaps, and analysis are all pretty easy to manage, since they’re largely chunks of text; the important part (and the part that makes them valuable, in my opinion) is the metadata that can be attached with a good schema. If quotes can be sorted by athlete, by race (i.e. rounds) and by event (i.e. which meet) as well as by date, that opens the door for sorting all quotes from a given athlete over several seasons by the athlete’s finish time at a particular distance, for example.

Here I’ll note that the USATF, and to a lesser extent the IAAF, are already producing quotes as a service at their event, though not in anything like this kind of format. What they do is have a team hanging out in the mixed zone; they sit in on the media interviews (or conduct their own if there’s an unfortunate athlete the media appears totally uninterested in) and look for two to four good sentences which they think will be useful to the wider media. Of course, this presumes (a) they know what’s going to be useful, and (b) they write it down more or less correctly. Then they print three to five of these quotes out—yes, on paper, from Word—and distribute them thus to the media on some kind of irregular basis. It might be worth trying to enforce some kind of metadata on this, particularly if you could convince them there was some kind of payoff, but otherwise the effort may just be duplicated. Before that, though, you’d need to convince them to make the quotes electronically available; right now that’s not happening.

Do the same thing with analysis. Our #5 articles are formatted in much the same way, but with different metadata. You could do that with a relational database, but that would be something monolithic and centralized; an XML schema would allow events to both own their own data and publish it for aggregation.

(I’m beginning to leave the bounds of reality here, but run with me for a bit longer.)

Now hook this in to the last three pieces I identified: still photos, audio, and video. The actual data formats for these are pretty well defined: GIF/JPG/PNG (almost certainly JPG) for still images, probably .mp3 for any audio other than a live stream (which is another issue entirely), and video… whatever works, though apparently Flash is the video standard with browser support now.

These are all formats with their own metadata conventions, so it’s worth considering whether to enforce use of those conventions (which would, incidentally, require photographers to caption their photos at upload time: not a practical requirement) or “wrap” them in an XML schema which would incorporate them by reference to an URL. (It might be possible to write a utility which would then retroactively insert the relevant metadata in the media file itself.) Either way, you’ll want a feed to keep track of them, and the concepts behind “enclosing” this kind of media in feeds is already nailed down. (And had a goofy buzzword, “podcast,” hung on it, even though that’s nothing at all like the use to which we’d be applying it.)

Now you have reduced this varied collection of assets produced by your team to a collection of feeds, and the only remaining step is integrating that with the event’s website. That section, unfortunately, can’t really be described in general terms, because no two websites have much in common. There would need to be a process, though, for identifying the best way to present the data within the existing structure, then implementing the glue.

Now that I’ve thought all this stuff out, though, I’d like to throw a little cold water out on the idea.

  1. I doubt there are (m)any events out there willing to pay for quality online coverage. Most of them still do not have a handle on the web and are praying for television to save them.
  2. If somebody else does this well, I’ll be happy to read their work, or freelance for them, or something. (There are people in the field who I wouldn’t like to see doing this first, but they are largely people I don’t expect to get it right.)
  3. I realized this weekend that the major drawback would be spending so much time on the road. I’m getting old for the track-gypsy life. Send me to Nationals once or twice a year, maybe a big international now and then, and I’m happy. I like my own bed, my own fridge, and my own cat.

However, in aggregate I’ve now written over three thousand words on the topic, so if someone wants to republish this in some kind of technical journal, we could make arrangements; in the meantime, I really ought to stop.

Post a comment