The question as to whether or not data is increasingly interpenetrating our lives is obsolete. Data is everywhere: in our daily commute, our credit card transactions, our activities on the Web, our communications, our electricity consumption, and more.
The digital breadcrumbs we left behind have already been collected, stored, processed and acted upon by data-fed artificial intelligence. While not new, it is now faster — a lot faster — and on its way to becoming systematized.
This raises huge societal challenges: privacy, discrimination, inequality and, ultimately, free will. It has also generated high expectations for delivering better and timely decisions — and for improving our understanding of the social fabric.
The movement seems irreversible. We can deny it, cope with it or try to get the best out of it.
Take a deep breath and dive into the data
Crisis in Western economies has led to a reduction in aid budgets or, at the very least, budgetary prudence under tight control of our elected representatives. Since accountability requires figures, organizations — sometimes under pressure from civil society — have begun to share more and more information.
Donors are increasingly willing to open up their data in response to civil society’s demand for accountability. In doing so, these organizations started improving their data literacy and data ecosystems.
Data ends up being more than simply a communication tool and open data now provides better information and facilitate aid coordination. A good example is HDX, the platform developed by the U.N. Office for the Coordination of Humanitarian Affairs, which allows populations, nongovernmental organizations, local governments and donors to share information almost in real time, and to facilitate coordination and the allocation of aid. It is currently being used in Nepal to help aid coordination efforts after the earthquakes, and was used in West Africa during the Ebola crisis.
Beyond the illusion of full understanding
The possibilities offered by the use of big data for development started to generate buzz a few years ago. We are now beginning to enter into a period of healthy discussion, buzz-free — or at least hype-reduced — on what data for development can do and how it can be leveraged.
Morten Jerven is right: Big data is not a miracle solution that will monitor the progress on the sustainable development goals in real time. Big data won’t replace “small data” and national statistics; big data won’t free the world from Ebola; big data won’t eradicate the seasonal flu.
Obviously, the list of what big data cannot do for development may be expanded as long as overly enthusiastic stakeholders overadvocate for it — most of the time, in order to get more visibility and funding for their organization in what is clearly a crucial year for the development community.
All-seeing, but also blind
Big data suffers from a number of well documented issues.
Big data — especially those produced through social networks — suffer from what is called a “selection bias.” For instance, Twitter, Facebook and smartphone users are not always representative of the average citizen. All too often, the poorest and those deprived of access to technology won’t be counted, despite being those most in need of sound development policies.
In a nutshell, big data is not — at least, not yet — big enough to cover all individuals. The question of sample representativeness thus remains highly relevant.
Nevertheless, reducing big data and data-driven development to social network mining is precisely falling into the sample selection trap. Not unlike “small” data, big data also needs to be leveraged adequately. Different kinds of big data exist, for different uses. Mobile phone carriers do not need the latest smartphone to track you, nor do satellites discern among urbanization, night lights, methane or fertilizers emissions.
Techniques have been developed to take selection bias into account and — most of the time — these techniques rely on “small data.”
When used as a proxy to approach issues such as poverty, population density or price fluctuations, big data relies on benchmarks from more traditional sources, using decades of experience of data collection and statistical know-how.
Opposing big data and official statistics is thus misleading. We must approach data as an ecosystem with distinct and complementary use, but also recognizing different sets of actors. The newcomer in the field of “statistics for good” is obviously the private sector and engaging with multiple, profit-oriented companies is a new challenge for the official statistics community.
Big data everywhere, but where do I find it?
Although big data is making the headlines, one of the major obstacles to using it for development purposes is to actually get hold of the data. Not only does big data require specific and very technical skills, but it should be remembered that big data does not mean open data. Leaving apart real-time Web mining, big data is most of the time collected and stored by private companies. Although we, as consumers, are the data producers, we don’t collectively have access to it. These data are often at the core of companies’ business models and sharing it would endanger their market position, their profitability and our privacy.
So although big data is everywhere in our data-driven society, access to it remains scarce. Nevertheless, innovative projects with mobile providers, including Orange and Telefonica, have recently paved the way for new types of partnerships with both researchers and governments.
Setting up a “data pool,” and giving researchers and policymakers controlled access to data would be strategic to make inroads and accumulate proof of concept — but that remains a huge challenge.
How can we incentivize private sector data sharing? Under which legal arrangements? How can we construct genuine global governance on data?
Mapping data use
A taxonomy of data use is a good starting point to understand which kinds of data — and thus which actors — can be mobilized in development projects. We need to keep in mind that we are still in an experimentation phase; we must be prudent with big data’s abilities to inform and help solve difficulties facing people on the ground day to day.
Meanwhile, the use of mobile data to understand the spread of epidemics — and thus limit their expansion — is still controversial, with the debate reaching a climax during the Ebola crisis. Such examples show that big data is not only a technical issue; it affects society at large and often raises as many questions as it answers.
The unexpected gem to be found when mining big data for development is a renewed dialogue, with a new set of actors.
The mantra repeated by Emmanuel Letouzé, co-founder of the Data-Pop Alliance, is that big data is not chiefly about data, it is fundamentally about people. Indeed, the fact that most new data come from human activities allows — or should allow — people to make their voices heard as part of a public debate.
As economist Bill Easterly reminds us, while recommending Kentaro Toyama’s book “Geek Heresy”: Technology does not solve problems; people do.
This guest opinion is published in association with ID4D, an international blog for exchanges and constructive debates on development. Hosted and facilitated by the AFD, the French agency for development, ID4D is aimed at all development stakeholders.