« Because eight months is better than never | Main | Worst possible vantage point »

Open training data formats

While I’m shooting my mouth off about how other people ought to be doing things (and I’m incubating some more detailed and technical thoughts on that particular topic, incidentally,) I’ve had some cause to think about training logs, particularly online ones, in recent days.

I’m skating on pretty thin ice when I talk about online training logs. For one thing, I keep my logs on paper—six or eight of the John Jerome né Jim Fixx logs from Random House, a few more random notebooks, etc. This year’s log is an IAAF pocket appointment calendar, and has the dates of all the major international races in it.

Also, I was partly responsible for one of the uglier and less-functional running logs on the web, back in the day; I’ve blocked most of that experience out of my memory, but in a quick 20/20 hindsight evaluation, we tried to do too much fancy stuff without getting the basics right.

On the other hand, through that experience, I have thought a lot about training logs, and I’ve actually been paid to write a quick review of some PalmOS-based logs. (Remember?)

Here’s one problem with every computer-based log I’ve ever seen: every athlete tracks different data. There is no simple way of describing RDBMS tables to allow for every idiosyncratic log habit. You need to accommodate both the old-school runner whose log is simply a wall calendar where they check off days they ran (or, at most, note the time) and the new-school data hound who is uploading HRM data, has a library of regular routes, and is tracking mileage on three rotating pairs of shoes. (This is a puzzle in itself; you need an entire table for shoes.) I used to track not only weekly mileage but my mileage over a trailing four-week window. Different data is generated by different kinds of runs, ranging from a normal training run to track work to racing. And, if you’re not convinced yet, consider triathlon training.

The other problem is linked to the first: lock-in. Spend a few months using any log, and you have a few months of valuable training data locked up in that software without an easy way of getting it back out, even if the log isn’t doing what you want from it. Most web log developers see this lock-in as a feature, keeping users coming back week after week, but I think it’s a roadblock; users like me are reluctant to try new logs because we’re afraid we’ll be putting our training data in jail, like dropping money into a piggy bank that can’t be reopened. I’ve seen some logs nod to the idea of data export by producing flat pages of data which may be printed out. Printed out! On paper! Talk about regression.

And yet logging is a critical tool for runners of all levels. A log lets you step back from your day-in-day-out training and see what you’ve actually done; it shows your strengths and weaknesses, and it can show you where you screwed up and incurred injury or fatigue. A computer-based log offers the (as yet unrealized, as far as I know) potential to perform more intricate analysis, visualize data in clear and illuminating ways, and share both raw and summarized data with coaches and other advisors. It’s too useful a tool to be discarded simply because it’s difficult, and that’s why people are still trying.

So what we need is a flexible data model which allows a wide variety of data but mandates little, and applications which provide for import and export.

The thing is, I think it’s possible to create that now. Specifically, I think it’s possible to describe such a data model in an XML Schema or DTD. Any application implementation which could read and write XML data conforming to that schema/DTD would then be free to store the data however it chose (potentially competing on performance,) or even to simply leave the data in XML and compete on ease of use. What’s more, by divorcing the data model from the application, it would be hypothetically possible for athletes to maintain their own data store, adding training sessions using whatever application they chose (on whatever platform was convenient!) and viewing and analyzing the data using potentially different applications.

Developers would be freed from creating end-to-end solutions; because they would be working with a standard data model, they could create data input managers customized to specific athletes or training programs, analysis engines, or even coaching bots. They could stop trying to lock in the few early adopters, and compete on features for a potentially much larger market. Also, it would open the doors to apples-to-apples comparison of aggregate training data, which might give a lift to the creative training commons we discussed a few months ago.

This might count as wishful thinking, but I think it stands up. Creating the schema would take a lot of work, and getting developers to buy in would take even more. I think the rewards would be significant, though, and worth the trouble.

Now Playing: It’s All Too Much from A Box Of Birds by The Church

Technorati Tags: , ,


I definitely share your “putting our training data in jail” concerns. That’s partly why I continue to maintain a pencil-and-paper log, summarizing the highlights in an Excel workbook. Not very cutting-edge, but I can usually find the workout I’m looking for in the Excel file just by doing a text search for the interval distance or workout venue (or whatever).

Post a comment