The accidental government data portal

The accidental government data portal

The accidental government data portal

I’ve fallen into a dream project that gets me paid to play with numbers and it’s turning into an interesting template for using open data to create both commercial and public value.

I’m currently working with Kyero.com, a real estate website in Spain.

The company serves over 300,000 buyers each month in 13 languages and has teams working in Spain, France, Russia (and now Bath). It is also run by a longtime pal of mine – Martin Dell – with whom I’ve been drinking beer, coffee and sharing web ideas for many years.

Kyero has also recently become very interested in open data.

Find the problem first!

Kyero’s foray into open data didn’t start with me yanking on about the wonders of open data. It started with a chat about pensioners.

The Spanish property market is heavily reliant on foreign buyers and if you were to paint a statistically typical picture of those buyers, they’d be a married British couple in their fifties whose dream is to spend their latter years in the sun.

Property buyers in Spain
Property buyers in Spain: Brave, happy… and vulnerable

It may be a well trodden path but the gravity of that decision isn’t to be underestimated.

Purchasing a property is a life changing decision; purchasing a property in a whole other country with a foreign language, unfamiliar laws, currency risk, an opaque and turbulent property market, using all your life savings, is nothing short of terrifying.

I wanted to help potentially vulnerable buyers understand the market. That isn’t just because I’m a nice guy because in real estate, helping buyers is also good business.

We realised if open data could help us establish trust by giving clear market guidance, buyers would be safer while we could gain an advantage over firms who simply “list and dismiss”.

Takeaway: Publishing data then trying to think of interesting things to do with it is all the wrong way around. To find practical uses for open data, find a real problem first then go look for data to help solve that problem.

Clarity trumps clever

Hardcore open data geeks are going to be disappointed at this point because we’re not going to start down a road of D3 visualisations, predictive modelling or pulsing heatmaps. The problem is a whole lot simpler.

The real barrier to understanding the Spanish property market is property journalism.

The market has a wide range of metrics available, mostly produced by the official notaries and registrars who oversee the property buying process. (Open data availabilty in Spain is excellent with datos.gob.es and www.ine.es being our favourite sources. They produce very regular information to clear, consistent timetables.)

Government statisticians normally precede data releases with a densely worded press release. Property journalists then lift what they perceive to be the most interesting numbers (normally big ones) and churn out another densely packed 500 word piece on the property market.

Here’s a typical narrative we saw with last month’s mortgage approvals data:

Worrying huh? So we took the revolutionary step of plotting the same numbers on a graph:

Darn those pesky numbers getting in the way of a good story

With dozens of shouty pieces appearing each month in the press, we realised clarity and context is key to killing misleading headlines and thus far we’ve gathered over a million lines of government data, sanitised it, fixed it to the right geographies and popped it on a very simple website at data.kyero.com.

Our first attempt already answers the most basic of questions: Where does our fictitious British couple want to buy?

International buyers unsurprisingly love sun, sea and sand

What does the market look like where they want to buy?

Lifting, cleaning and publishing the key figures in one place felt revolutionary when we started

With so many data points to choose, things can quickly get confusing so we self-imposed a simplicity rule: If we can’t do it with a big number, a simple map, a line or a bar chart then we probably won’t do it. Our audience doesn’t give prizes for clever Javascript.

Takeaway: If you’re sweating on complex visualisations, you’re probably doing it wrong. Data’s first job is to be obvious to non-specialists and provide a clear and compelling narrative people can act upon.

20% numbers, 80% distribution

Getting your ideas in front of people takes way more energy than producing those ideas in the first place. Since this is a commercial project, we put more thought into distribution than I ever did with civic hacking.

The data.kyero.com public portal quickly became a sideshow. We found dumping numbers on the public in big chunks appeals to professionals and a subset of buyers – but not everyone. Instead packaging data into discrete parts and setting it in the right context turned out to be where the value lies.

Our first quick win was to build thousands of local buyer guides using a combination of open, paid and internal data. Around 10,000 people have used these in the last month.

Simple guides turned out to be our biggest win

How did people find our first efforts? Good old fashioned search engine marketing – something very few data portals do: The buyer guides have obvious keywording, they use well structured markup and are localised into English and Spanish (with eleven additional languages coming online shortly).

These guides quickly created wider awareness of what we’re doing.

Shortly after putting those first – super simple! – pages up, we were asked by estate agents and market analysts for API access to our data, others wanted copy-and-paste website widgets and we also received a constant stream of press requests for quotes and graphics.

Meanwhile the audience for our work is already tens of thousands because the simple visuals, analysis and quotes we’re able to make quickly can appear in countless contexts.

That’s a clear brand win, especially at such an early stage.

The next three months will be about making our numbers even more portable. It’s turns out making open data even more open is good business, which brings us nicely onto my new love of government PDFs…

Takeaway: The best efforts of open data activists are often wasted because they go unseen. If you don’t have clear channels for getting your work out to a wide audience, stop.

Why I learned to love PDFs

Nothing depresses a civic hacker more than finding good data in a terrible Excel spreadsheet or worse, a PDF file. Data becomes impossible to use and therefore worthless.

In a commercial project, the opposite is true. The harder the better. If data is hard to extract the chances are nobody else will bother, which leaves you effectively owning public data.

A policy we adopted early in the Kyero data project was never to use data we couldn’t regularly update. While it hurts agility the net result is that our proprietary processes of data import and sanitisation mean we can extract, visualise and distribute new market data in a matter of minutes.

Most importantly we can quickly break stats down to the local level buyers care about.

National press and market analysts rarely do this. I wonder how many other companies could gain such a key competitive advantage simply by bothering to lift and clean data? It’s an open goal.

Takeaway: Don’t avoid data just because it isn’t perfect. The process of cleaning, mashing and embellishing public data is where tangible value lies.

Quit open, it’s just data

I have learned that my time spent drum banging in the early days of Bath: Hacked led to some bad habits. While open data is a worthy cause to champion, it can also lead to tunnel vision. I tried to create public value with a very limited toolbox – one that only contained open data. It’s a pointless pursuit.

Data wrangling to solve real public problems is not about open data. It’s about being open to data, period.

Kyero’s rapidly enlarging data platform already has a very mixed array of data: For every public dataset, there is internal data and there’s paid data. I learned to not really care whether it’s public or private data if it contributes to the overall picture we’re framing.

That puts the hand wringing over open data licensing into sharp focus.

As a data consumer I don’t care which licence you publish under, as long as it’s clear. You’re giving it away? I’ll publish it. Want attribution? I’ll publish that too.

The arguments are irrelevant, as long as you publish. It’s rare that we wouldn’t quote a source.

Takeaway: Licensing rules have little practical impact on data consumers and are likely a bigger barrier to publishers. Let’s stop talking about open data, and concentrate on useable data.

Real estate trojan horse?

While my head was down on this project, I watched from the sidelines as one of the most interesting data releases in the UK appeared: Nope, not Lidar – I mean the The Indices of Deprivation.

It tells us where the poor people are.

It struck me as one of the most insightful datasets available for building a real picture of the UK in 2015. I saw the open data community work hard on a host of local choropleths based on the data (I think Alasdair Rae took the best run at it) and then… nothing. No great national coverage, no big debate, barely a conversation starter.

In austere times, that strikes me as a monumental fail.

I’d wager if this data were used by a major property portal (especially provocatively) the impact would be far wider than quiet chatter amongst data geeks.

Shocked? Good – now go do something about it

Fanciful? Not so much. At Kyero we’ve already started lifting unemployment data as a proxy for local economic health. Rightmove UK uses education performance data and publishes school catchment maps. Zoopla’s latest radio ad bangs on about data being central to its “smart property search”.

The social data trend has started, albeit very very slowly.

Perhaps the fastest way to move open data from largely unused, aspirational data silos to more obviously useful resource is to reach out to existing large audiences?

Real estate portals provide tremendous power both to distribute and contextualise social data for the general public. That isn’t just useful, it incites debate on the big issues we prefer to tuck away on Channel 4.

Who knew estate agents could be so interesting?

Kudos to Kyero for being so open to new thinking: I get paid to do something I love and it is already making a real difference to real people. This looks like the very early days of something much bigger and I’ll try to post more as we go.