Making data open for everyone
Over the past few years, there have been some exciting developments in open source tools and programming languages, business intelligence tools, big data, open data, and data visualization. These trends, and others, are changing the way we interact with and consume information and data. And that change is driving more organizations and governments to consider better ways to provide their data to more people.
The World Bank, for example, has a concerted effort underway to open its data in better and more visual ways. Google’s Public Data Explorer brings together large datasets from around the world into a single interface. For-profit providers like OpenGov and Socrata are helping local, state, and federal governments open their data (both internally and externally) in newer platforms.
We are firm believers in open data. (There are, of course, limitations to open data because of privacy or security, but that’s a discussion for another time). But open data is not simply about putting more data on the Internet. It’s not just only about posting files and telling people where to find them. To allow and encourage more people to use and interact with data, that data needs to be useful and readable not only by researchers, but also by the dad in northern Virginia or the student in rural Indiana who wants to know more about their public libraries.
Open data should be easy to access, analyze, and visualize
Many are working hard to provide more data in better ways, but we have a long way to go. Take, for example, the Congressional Budget Office (full disclosure, one of us used to work at CBO). Twice a year, CBO releases its Budget and Economic Outlook, which provides the 10-year budget projections for the federal government. Say you want to analyze 10-year budget projections for the Pell Grant program. You’d need to select “Get Data” and click on “Baseline Projections for Education” and then choose “Pell Grant Programs.” This brings you to a PDF report, where you can copy the data table you’re looking for into a format you can actually use (say, Excel). You would need to repeat the exercise to find projections for the 21 other programs for which the CBO provides data.
In another case, the Bureau of Labor Statistics has tried to provide users with query tools that avoid the use of PDFs, but still require extra steps to process. You can get the unemployment rate data through their Java Applet (which doesn’t work on all browsers, by the way), select the various series you want, and click “Get Data.” On the subsequent screen, you are given some basic formatting options, but the default display shows all of your data series as separate Excel files. You can then copy and paste or download each one and then piece them together.
Taking a step closer to the ideal of open data, the Institute of Museum and Library Services (IMLS) followed President Obama’s May 2013 executive order to make their data open in a machine-readable format. That’s great, but it only goes so far. The IMLS platform, for example, allows you to explore information about your own public library. But the data are labeled with variable names such as BRANLIB and BKMOB that are not intuitive or clear. Users then have to find the data dictionary to understand what data fields mean, how they’re defined, and how to use them.
These efforts to provide more data represent real progress, but often fail to be useful to the average person. They move from publishing data that are not readable (buried in PDFs or systems that allow the user to see only one record at a time) to data that are machine-readable (libraries of raw data files or APIs, from which data can be extracted using computer code). We now need to move from a world in which data are simply machine-readable to one in which data are human-readable.
This is not easy or cheap. Organizations need to build teams of trained staff, ensure data privacy concerns are addressed, and construct their own strategies to create better data platforms. But to have real success in open data, we need to help create data platforms that everyone can use.