Starting the public debate on data science ethics
How should machines make decisions about the different types of support people need to get back to work? Should government look at social media to understand what the public need from the justice system so they can design better mediation services? And how can public services predict demand by looking at searches for keywords on Google or GOV.UK?
These are all questions that 30 people spent a day considering in Sheffield on Saturday. They were part of a public dialogue that the Government Data Science Partnership is running with Sciencewise and Ipsos MORI about government use of data science. Data science can be defined as powerful computer techniques that can make sense of huge amounts of new forms of data to improve statistics, policy making and public services.
It provides huge opportunities, but also ethical challenges that we have not had to consider before. Because of this, we are creating an ethical framework which will give confidence to policy makers, data scientists and operational staff to innovate with data. Understanding what the public think about how government uses data science is a crucial part of how we develop this framework.
So what exactly is data science?
Data science is an incredibly difficult and complex topic to explain. I’ve been working in this area for two years and still find it hard to explain in simple terms. It’s different from more traditional statistics for three main reasons:
- rather than creating statistics to answer a question, it finds alternative existing data from which we can infer an answer (eg searches for ‘flu symptoms’ in Google might indicate the state of the nation’s health)
- rather than a human recording the data, newer forms of data can only be collected by a computer (eg sensors on car park spaces to indicate when they are full)
- rather than a human looking for patterns in data, machines can learn the best way to solve a problem by going through millions of rows of data (eg Facebook’s facial recognition system learns how to recognise people’s faces from the way we tag our friends in photos)
The difficulty in explaining this term became clear at the end of our pilot session. We realised that we needed to spend at least half a day getting people to understand what data science is, the data and techniques involved, the opportunities it brings and issues we need to consider.
Data, data everywhere
Part of the introductory half day was exploring what data people generate about themselves. It was interesting to see how much data people did generate – without always being aware that they were doing so and what that data was being used for. For example, census data, surveys, applying to public services, signing up for commercial services, Google searches, smartphones, wearables, internet of things etc.
The afternoon session was spent looking at case studies of potential government data science projects. What really struck me was despite the complexity of the subject, people were able to engage in a serious and pragmatic debate about the right thing to do.
They were thinking about both the public benefit of what the project could achieve (eg producing better population statistics to improve local service planning, understanding user needs to create a better justice system, prioritising food safety inspections to improve public health) and the ethical issues (eg accuracy and representativeness of the data, privacy, consent) and balancing them against each other. These are all questions that our first iteration of the ethics framework starts to address, and these workshops will really help to shape this.
Public dialogues love complex issues. The participants from each workshop will meet again at another workshop in a few weeks. This will give people time to properly get to grips with data science so they can make informed comments.
Over the next month, we’ll be going to Taunton, London and Wolverhampton to explore these issues with nearly 100 people. This is my take as a policy maker thinking carefully about ethics. Watch this space and you’ll also hear from a policy maker in a department, an academic thinking about the theory of engaging the public in data debates and the moderators running the workshops.