… and how many addresses in that postcode, for what matters 🙂 After discussing the volume of addresses we expect to find in one town, we need to do something similar for postcodes. This post builds on the other work and will refer to calculations done in R then.
The address-postcode association is a more valuable bit of information than what we worked on the other day, and one that is carefully protected. After having worked in addresses open data for many months now, it is peculiar to realise how much effort is being spent trying to conceal that information, that is likely the only factor that gives the commercial address products any value. Different open data publishers give us “something” but never all of it, and never enough to reverse engineer the whole picture.
As I plan to run a series of crowdsourcing exercises to build that information, too, I need to understand in more detail what I am talking about.
First of all, what do we know about postcodes from published and reliable open data sources only?
- we know their centroid
- we know the main “populated place” the postcode is associated to, if any (populated place is Ordnance Survey to say a city, a town, a village, a hamlet, surburban areas and other settlements)
- we know which ones are active and when they were introduced, and which ones were terminated, and month and year of when that happened
- … and more, possibly not relevant to this investigation
… which means that – when our target geography is some town like Berkhamsted – we can easily say which postcodes are in scope and which not:
We have 611 postcodes in Berkhamsted.
We know that postcodes are designed to address about the same number of addresses each. Of course this is often not true as populated places change in time, new developments are added etc., but we can at least use that as a reasonable expectation.
We expect about 14 addresses per postcode.
Now, for safety, we check if the addresses we see in Land Registry’s Price Paid are consistent with this estimate.
We have properties in LRPP that are listed vs 593 different postcodes. This is at first sight good news, as it sounds reasonable that no residential properties changed ownership over the last 20 years in as little as 1 – 593 / 611 = 2.9% of postcodes…
… but the ownership change could have taken place long ago, or at least long enough for some of the postcodes not being active any longer. We may be double-counting as different postcodes that simply changed name. Let’s check that:
We have 27 Berkhamsted postcodes in LRPP that are not in OSON at all, not even associated to a different populated place. That is not too bad, less than a 5%. There’s another bad news, though:
105 postcodes for Berkhamsted addresses in LRPP are not considered Berkhamsted postcodes in OSON! That’s a lot. The most likely case is that many addresses recorded as Berkhamsted’s in LRPP are likely associated to some other populated place in OSON:
Eight different locations are listed vs those “Berkhamsted postcodes”: Dagnall, Ringshall, Little Gaddesden, Norcott Hill, Heath End, Potten End, Hudnall and Hudnall Corner. This is not too wrong, as these are villages and hamlets nearby.
This is going to complicate our lives, and reminds us of an important characteristics of the problem we’re facing: not only there are many ways to write an address wrong, but there are many correct ways to write the same address right, too.
The full code for this blog post and more is here.