When open data sucks
Originally published on 29th Jun 2015
Nobody talks much about the pitfalls of using open data in a real life project. I’ll make a start.
My first real application of open data was to supplement the education pages on my wife’s local community website with some school performance information. We also used it to create this popular admissions mashup that helps new parents research local schools.
Around 25,000 people use the education pages each year, over 8,000 people have used the mashup and now I’m meeting people who’s relocation to Bath was made easier by what we made. Win!
So you want me to use your data, huh?
It isn’t plain sailing. Annual maintenance has become a real drag and my shoulders slump a couple of times each year when faced with the prospect of updating all the latest education data. It’s sporadic and it changes and it’s hard.
Meanwhile I see lots of chatter among open data advocates about shiny data stores that spring up and then go totally unused. Some of it is just bad marketing but too much of it is trust: Does your data look robust enough for me to want to use it?
Do you really want to start an open data project?
The DfE are possibly one of the worst offenders, but lots of publishers fall into similar traps. Let’s bring the two problems together…
5 ways to make me want your data
Here’s a wishlist of things I’d like to see all data publishers do. I know from experience these would save time and make me consider investing effort in your data:
1. A publication schedule
I shouldn’t have to regularly check a website by hand to see if data has been updated. If you’re not publishing data live, please tell me when you’re doing an update or let me sign up to receive an email.
2. Limited field changes
The DfE is a shocker, mostly because schools are ridiculously politicised. This is the first time in four years I haven’t had to make major updates when all the metrics changed.
The Land Registry is a good example at the opposite end of the spectrum: Their data structure appears not to have changed since 1995. (It’s notable the public is better informed about house prices than the places we entrust with our children.)
I really don’t want data if it’s subject to immediate and substantial change – it’ll keep breaking my project and we can’t do the sexy historical analysis stuff.
3. Multiple formats + an API
I’m a flexible guy and happy to use a spreadsheet but if you want us to develop solutions that are more interesting than a static graph, we need programmatic access. Write once, update forever – and save us a bundle of time on maintenance.
3b. Make API limits clear
When I find something interesting the first thing I do is check the API limit. This guides my approach on how best to use the data. Don’t bury it, pop your usage policy front and centre.
4. A decent search
I’m almost at the point of giving up on data.gov.uk because I just can’t find anything. When I type “annual civilian widget production” I’d probably like to see the latest national widget figures that were headlining the news last night – NOT every related document containing the word “widget” in no discernible order.
Data store search systems are really bad. If I can’t find it, I won’t use it.
5. A contact number
This is absolutely the best way to make publishers create good data. If your phone is likely to ring a lot with the same question from frustrated data consumers, you’ll put more effort into documentation and I’ll stop calling.
Open data shouldn’t be this frustrating. Pass this on 😉