We need to open up GP data

We’ve been working on a prototype that asks users to tell us the GP practice they are registered with. We’ve prototyped pages like this before.

Even though it wasn’t “real”, participants filled in the fields as if it was. We saw users enter:

  • the name of their practice, or something close to it
  • the name of the road their practice is on
  • the name of their GP, or the GP they normally see
  • their own home postcode

To learn anything more we need to be able to watch users interacting with real data, to see how they react when it isn’t hardcoded to find the right practice.

To build the prototype we needed access to basic data about all the GP practices in the country, and all the GPs practising at them. Getting hold of that data has been a trial.

Problems

First problem: access to data

  • It’s not all easy to find. There’s no obvious place to go to find data. Searching the internet is a reasonable start, but there were data sets we didn’t find out about until someone approached us after a show and tell to say they could point us to a better data set if we sent them an email.
  • Some public data is only available for money. The General Medical Council keep the register of general medical practitioners. Every practitioner is obliged by law to register with them before they can practice. To get full access to the list costs £720 per year – a significant barrier for small teams on a shoestring budget.
  • Restrictive licensing. The data available through the NHS Choices API is licensed under the NHS Choices Syndication Licence, which restricts what you can make with the data and how you can share what you make.

Second problem: data quality

  • Weird data formats. The NHS Choices data are described as being in CSV format – comma separated variables – but it uses the not symbol (¬) as a separator and the encoding isn’t specified (we guessed ISO-8859-1 but it could be Windows-1252). Strictly speaking CSV is a pretty flexible format, but we’ve had a specification for it for over 15 years now.
  • ALL THE ORGANISATION DATA SERVICE DATA IS UPPERCASE. We want to show this data on screen in mixed case. Going from mixed case to uppercase is easy, the reverse is almost impossible with a complex data set.
  • The ODS data for General Medical Practitioners contains a lot of records that aren’t people. We need a list of practitioner names to search against, but almost half of the records have names like “GP IN ED PRESCRIBER”, “DR AT MIDDLE CHARE GROUP”, or “POOLED LIST”.
  • Duplicate data. NHS Choices publishes data for “GPs”, “GP Practices” and “GPs’ Staff”. The data for “GPs” and “GP Practices” are very similar and there’s no information on what they’re for.
  • Fields contain mixed data. The GP Practices data provided by NHS Choices has phone numbers and website URLs in the “County” field.

A better way

More than three-quarters of the time we’ve spent working on this prototype has been spent fighting with data. It’s messy now, but we can build a brighter future. A future where teams can worry about their service and not about the data behind it. To do that we need to embrace registers.

Registers are sources of trustworthy data that service
teams, analysts and policy experts can put to good use.

They are designed to be part of a wider ecosystem of
interconnected parts. They specialise in storing and
maintaining data, in a helpful, accessible, useable way.

We can use registers to break down the data into small chunks, cared for by the right organisation. For us, that could be three registers:

  1. A register of all practitioners allowed to practice, cared for by the General Medical Council.
  2. A register of all GP practices, cared for by the Organisation Data Service.
  3. A register of practitioners prescribing at practices, cared for by the NHS Prescription Service.

To properly serve the public interest, the registers should be free and open. They should be licensed under the Open Government Licence, which allows users to do anything with the data as long as they provide attribution.

Building those registers won’t be easy. We’ll need to work with the custodians to make sure they have the digital expertise they need both to manage their registers, and to understand their importance.

We must seek out opportunities to make data open as a side effect of delivering services. Open, well documented, cared-for data enables smaller, less well-connected teams to do more and to do it faster. It will reduce the barriers to entry that favour large internal projects and large external suppliers. It’s our job to open up the information of the health service so that anyone, no matter their size nor connections, can use it as their foundations.