Open data only works when it’s accessible, so which data platform do you choose?
Having had some success getting our local council to open up some data and help us organise our first event for Bath: Hacked, it quickly became very obvious that static CSV files wouldn’t get us far on the longer journey.
Developers – indeed all of us – need well organised data in a flexible format that’s easily understood if we’re to make open data useful.
That means setting up a data store.
I’m really chuffed to announce we just opened Bath’s official open data store using the Socrata platform – official post here.
Our new city data store (updated Nov 2014).
CKAN is currently a more common choice in the UK and I’ve been asked a lot why we chose Socrata. Some of the conversations had undertones of us having opted for “Yankee capitalists over open source heroes!” so given that I was heavily involved in this mildly contentious decision, I feel more comfortable splurging here rather than the official Bath: Hacked website.
How we chose the platform
In 2014 there aren’t many options if you want a robust data platform. For us it was a straight up battle of CKAN vs. Socrata.
For context, it’s worth pointing out early on that we’re currently a little unusual in the open data world: While we work closely with our local council, we aren’t an “official” group. We have no funding and work entirely on the goodwill of talented volunteers.
Therefore money is a massive issue for us.
I’ll break down the key questions and how we settled on a decision. Everything here was fiercely debated and these were my thoughts:
Initial setup and costs
CKAN may be open source but that doesn’t mean it’s a free turnkey solution: You need skills. Luckily Dan Hilton, a local developer, stepped straight in and was able to get us up and running with a demo install on a Digital Ocean droplet in about an hour. Swift.
Conversely Socrata needs ordering and configuring. They turned around our request and got us playing in about a week.
So here’s your first dilemma: How hard is your data store going to get hit? It’s a very tough question to answer in the early phases of an open data initiative.
Self-hosting CKAN gives you immediate issues of elasticity. Sure, you can get going with a cheap VPS but what happens when you load live data that’s prone to bursty activity? If your hosting isn’t elastic, how hard is it to transition a live system to larger servers or even the cloud later? Do you have the resources to handle it?
Meanwhile Socrata is a cloud-based platform; it’s elastic out of the box and the fixed fees came out lower than the CKAN managed alternative. This decision was a slam dunk: Socrata handles your set up, launch and growth seamlessly.
If you have Python and sysops skill in-house, CKAN is probably a winner for ongoing maintenance.
If you don’t have these on tap (and we didn’t) then CKAN starts to look very expensive. Conversely Socrata charges a fixed monthly fee for hosting, support and ongoing updates.
That means no surprise bills and nothing comes to a halt when your lone Python developer takes a holiday.
Open data is prone to attracting academics and hardcore developers (which is lovely) but I passionately believe open data is only worthwhile when it has the widest possible audience. That means lowering the technical skillset required to use open data.
CKAN is pretty okay at making data easy to find and download but it makes no attempt to go further.
Socrata includes some basic data views, visualisation and mapping tools which whilst basic and quirky, do allow quick and useful skims. They also go further by easily turning datasets into APIs, something developers love.
Happy devs and happy non-geeks is a massive win. CKAN is way behind on this and it’s a clear flaw in a product that’s all about data democracy. Poor show.
This caused huge debate: What happens a year, 5 years or even a decade down the line? Can we get data out of one platform and into another easily?
I simply couldn’t see an easy answer. Whoever you pick, moving later is going to be painful.
We’ve mitigated some of the issues by positioning ourselves as the “glue” that pushes unfriendly council data into our friendly data store. Should the worst happen we still have the data – we’d just lose the pretty shop window.
It would still be painful, but the world wouldn’t end.
Ideally I’d love to see an open standard for data transfer but since both platforms benefit from customer lock-in, I’m guessing it’s unlikely to happen.
Right now, suck it up.
Final score: Socrata 3 – 1 CKAN
My experience did make me question the “open source is good” mantra that’s particularly prevalent in government right now. Our CKAN experience highlights an easily over-looked truth: Open source is anything but free.
Setting up CKAN ourselves would have been time consuming and ultimately extremely expensive. Meanwhile their managed alternatives, while broadly similar to Socrata in terms of ongoing cost, were hugely costly upfront.
Bath: Hacked has moved a long way in a very short space of time and while our problems are different to those of government bodies, I’m already pretty sure the open data revolution will be driven further by groups like us than anyone else.
That means we need an easy entry point and right now CKAN doesn’t offer that.
Socrata provides a swift, scalable solution at a predictable price.
That ticks all my boxes.