Data Collaboratives for Official Statistics
On December 18th 2015, the Data Governance Project (DGP) was part of an OECD/Paris21 event organized in Paris on the use of new data sources for official statistics. Leiden’s DGP lead Jos Berens presented the decision flow tool, a result of the benchmarking exercise that was done to map best practices for forging Data Collaboratives – mutually beneficial, sustainable efforts wherein the value of data is shared between public and private sector.
These new types of data collaboratives can potentially play a meaningful role for complementing official statistics. For example, corporate data collections on consumer purchases of mobile telephone usage can be used as proxies for socio-economic indicators. This holds particularly true for the developing world, where data are often incomplete. At the same time, there are significant obstacles to gaining access to this data and processing it into reliable indicators. Reason for PARIS21 (the Partnership in Statistics for a 21st Century, an organization of National Statistical Offices hosted by the OECD) to organize a meeting to explore the potential and possible pitfalls of this field.
The first key notion that came from this meeting is to manage expectations. New data sources will not replace traditional household surveys and census data. These will always be needed to anchor results from monitoring new data streams. For example, during the meeting, the value of call detail records (CDR’s) to complement official statistics received significant attention. These records can show how a population moves and communicates across a country. However, the selectivity and representativity of such data will always have to be verified based on traditional modes of information gathering. This includes establishing how many people typically use a phone, and whether it often changes hands.
Finding common ground through responsible experimentation
Often, partners from the private or NGO sector that hold valuable data are willing to contribute to the public good for a variety of reasons – including corporate social responsibility, creating new revenue streams or forging new partnerships that can enhance the impact on the ground. For example, in the Data for Development Challenge, Telecom provider Orange has securely shared Call Detail Records in Ivory Coast and Senegal, for researchers to find relevant insight for policy making. Data sharing needs adequate ethical safeguards. When merging different datasets, analysts might be able to reveal sensitive information; and giving third parties access to large datasets may result in data leaks (technical or human error). Best practices are emerging, but there is no consensus yet on how to determine the right level of safeguards. Moving forward, it is essential to explore new ways of defining the value of data based on a benefit-risk analysis in a given context.
Transparency and inclusivity
Because of the potential risks associated with leveraging new data sources, it is important to be transparent about the way data is used. In the coming years, efforts need to be stepped up to raise awareness among citizens on how data can be and is being used. Where adequate, decision-making processes about data use have to become more inclusive. The Data Pop Alliance program on Data Literacy is an excellent example of how awareness can be raised and accountability strengthened, by educating local populations on their data.
Data Collaboratives for Official Statistics
The Decision Flow tool presented by the DGP in Paris consists of a step-by-step manual for corporate data holders from defining the value proposition, to risk management, process monitoring, through evaluation to repetition. Moving forward, the DGP will assist corporate partners to apply these steps in practice. This effort is part of a larger network working on responsible data use, the International Data Responsibility Group. The group’s upcoming conference on 19 February 2016 in The Hague will address these issues in more detail. For more information regarding this conference, please contact firstname.lastname@example.org.
Leiden Centre for Data Science
Centre For Innovation
Peace Informatics Lab
New York University GovLab
International Data Responsibility Group
Leiden New Faculty
World Economic Forum Data-Driven Development
Data Pop Alliance
 The Data Governance Project is a collaboration between the NYU GovLab, the Leiden University Centre for Innovation and the World Economic Forum Global Agenda Council on Data-Driven Development.
 These issues were at the heart of the UN Global Pulse PAG meeting in The Hague 23-24 October 2015.