Food, Glorious Food; Data, Marvellous Data!
Last night the Secretary of State for Environment, Elizabeth Truss, launched Great British Food a five year campaign celebrating British food in this country and around the world, tearing down outdated stereotypes about a bland British diet and showing off the reality of how and what we eat and drink today.
As part of the Year of British Food and Defra’s commitment to open up its data, Defra will be publishing decades of data underpinning the Family Food Survey as open data.
The Family Food Survey – the world’s longest running survey of what Britons eat and drink
In 1939, the wartime government had a number of concerns about food policy: Do people have access to a sufficient variety of food for a healthy diet? Could the country produce enough to feed itself? Would high food prices force people to go hungry?
In 2015 we don’t have warships blockading the Channel and we don’t encourage families to ‘Dig for Victory’, but those policy questions are still completely relevant to today’s government, for very different reasons.
What started as the Wartime Food Survey in July 1940 became the National Food Survey, and in recent years the Family Food survey. It’s the longest running continuous household survey of its kind and will reach its 75th anniversary next year.
Family Food records what foods people buy, and how much they spend. And because we know the quantities they purchase, we can derive estimates of nutritional intake from it. Nowadays there are other, more targeted nutritional surveys, but none have the long term data on trends that the Family Food Survey can offer.
There are a wealth of Family Food (and National Food) survey estimates already available online. But there is also a lot of material, in both electronic and paper form, which the Food Statistics team in Defra have long wanted to find a way to make accessible. We are now going to do this as part of Defra’s Data Programme.
What does the data look like?
Because the survey has been going for so long, the data we have can vary significantly in its detail and quality. Basically the further back you go, the less detailed the data that we have, which will shape how people can use it.
For the purposes of our project, we’ve separated the data we want to publish into three tiers for release:
Tier 1: Family Food survey data at household level (1974 – 2000), currently published on the UK Data Service.
This is held in a number of formats but at the lowest level, tab separated values. Metadata and classification data is often stored within pdf files. This data isn’t openly licenced, and is relatively difficult to access. We’re currently exploring with the UK Data Service whether they are able to host the data on their site under an open licence, without needing a user registration, or whether we will need to find a new way to publish the data. Either way we will make it discoverable on DATA.GOV.UK.
We are aiming to publish this data as open data in December 2015.
Tier 2: Raw survey level data (2001 – present), unpublished.
Since 2001 the Office for National Statistics has run the survey on our behalf, and we have survey data within Microsoft Access databases. This is richer than data on the UK Data Archive, but it will need to be anonymised to be released. We will need to work with ONS and others to do this.
Our current timeline for release: Spring 2016.
Tier 3: Food Survey reports (1940 – 1970s), currently in paper form.
In a filing cabinet in Defra’s York office is a paper copy of every survey report produced since 1940. We are going to digitise and publish the content of these reports. They contain a vast amount of survey results and analysis, and represent a historical record of our shopping and eating habits. The underlying data has long been lost but the reports are rich in statistical data. These will need to be digitised and data extracted. There are different cost/resource considerations here, but we will be working through them.
Our current timeline for release: Summer 2016.
There is a lot of work to do in scanning, reformatting and analysing this data to make sure it is both safe and suitable for publication and reuse. This is a big challenge for the team, covering everything from paper records to data that originated in various computer systems over a long period. Plus coding schemes in the data itself evolved to reflect the times: we never knew there was such a thing as ‘welfare orange juice’ until we started this.
We also need to take care around personal data, particularly in the data from 2001 to present that is richer in detail and will work with experts to ensure the data we publish is safe for people to reuse.
It’s become clear that the outcome will not be a single store of data that can be mined without some understanding of this evolution. What we will have ultimately though is a unique treasure trove of data for people to explore, and we can’t wait to see what users make of it.
What happens next?
We have set up a project team internally at Defra to manage the publication of Family Food Survey data. We are always keen to hear thoughts, opinions, comments from our user community and people and organisations outside Defra who might be excited about the idea of open Family Food Survey data, and what you hope to be able to do with it. We will keep you updated on progress publishing the data via the @DefraGovUK and @DefraStats twitter accounts, and using the #OpenDefra hashtag. You can also get to us via email@example.com.