Making the most of our data – combining and re-using multiple sources

This post was created automatically via an RSS feed and was originally published at http://news.opendatacommunities.org/making-the-most-of-our-data-combining-and-re-using-multiple-sources/

This is the first of a series of articles outlining how you can get more out of OpenDataCommunities’ data sources.

In this article, I’ll focus on combining related datasets to create your own custom outputs. I’ll use the Help-to-Buy data to show you how to bring together DCLG’s statistics on completed Equity Loan sales, with HM Treasury’s on sales completed under the Mortgage Guarantee Scheme. Both sets are available at Local Authority and Postcode District levels.

The output will be this Google Spread Sheet, which you are free to download and re-use as you wish.

With help from Ric Roberts at Swirrl (thanks Ric!), I’ve developed a couple of queries (see below) to join together and retrieve data from the Equity Loan and Mortgage Guarantee datasets.

These queries use the SPARQL language. I won’t use this article to explain what SPARQL is and how it works: this will be covered in separate blogs from the OpenDataCommunities team. However, it is worth mentioning that although SPARQL is not for the faint hearted, it is similar to querying data using the more familiar SQL language and syntax. I’d encourage you to follow the links and experiment with these queries, and I would welcome feedback on how you get on.

If you’re not comfortable working with SPARQL, why not try building your own datasets using our GeoSelector and Spreadsheet builder productivity tools

Importing OpenDataCommunities’ data into Google spreadsheets – how it works

Once I had developed my SPARQL queries, the next step is to embed them in a Google Docs spreadsheet. To do this, I used Google Spreadsheet’s IMPORTDATA() function, described here.

To get it working, we need the IMPORTDATA function to reference the URL to the SPARQL query which generates the CSV – i.e.

IMPORTDATA(“<URL to Sparql query results in CSV format>”)

You can see an example of this in Cell A6 of the “Postcode Districts” tab in the Google sheet.

To get the relevant URL, follow these steps:
1. When editing a SPARQL query – like this one, switch on “Expert mode”
2. Scroll down the page, to the “Alternative formats” section.
3. Copy the URL labeled “CSV”.
4. Paste this URL into your IMPORTDATA() formula, remembering to surround it with double-quotes.

SPARQL queries – you can copy and paste these to http://opendatacommunities.org/sparql

Combining Equity Loan and Mortgage Guarantee data for Postcode Districts

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?pc_dis_lbl ?equity_loans ?mortgage_guarantees  
  # comment these two out if you dont want the obs URIs returned
  ?equity_loan_obs ?mtg_obs 

WHERE {

  # help to buy equity loans by district 
  # http://opendatacommunities.org/graph/housing-market/help-to-buy/num-loans/loan-type/equity-loans-postcode-dis
   OPTIONAL {
      GRAPH <http://opendatacommunities.org/graph/housing-market/help-to-buy/num-loans/loan-type/equity-loans-postcode-dis> {
      ?equity_loan_obs <http://opendatacommunities.org/def/ontology/housing-market/help-to-buy/numLoansObs> ?equity_loans .
      ?equity_loan_obs <http://opendatacommunities.org/def/ontology/geography/refArea> ?pc_dis .
      ?equity_loan_obs        <http://opendatacommunities.org/def/ontology/time/refPeriod> ?equity_loan_period .
    }
  }

# mortgage guarantees by district
# http://opendatacommunities.org/graph/housing-market/help-to-buy/num-loans/mortagegaurantees/completed-loans-postcode-dis    
  OPTIONAL {
    GRAPH <http://opendatacommunities.org/graph/housing-market/help-to-buy/num-loans/mortagegaurantees/completed-loans-postcode-dis> {
      ?mtg_obs <http://opendatacommunities.org/def/ontology/housing-market/help-to-buy/numLoansObs> ?mortgage_guarantees .
      ?mtg_obs <http://opendatacommunities.org/def/ontology/geography/refArea> ?pc_dis .
      ?mtg_obs <http://opendatacommunities.org/def/ontology/time/refPeriod> ?mtg_period .
   }
 }

# adds labels for regions, where we've got them.
  OPTIONAL {?pc_dis rdfs:label ?pc_dis_lbl .}

 #change this to get the periods you want. (note that the postcode district data are available for different periods to districts)
  VALUES ?equity_loan_period { <http://reference.data.gov.uk/id/gregorian-interval/2013-04-01T00:00:00/P16M> }
 VALUES ?mtg_period { <http://reference.data.gov.uk/id/gregorian-interval/2013-10-08T00:00:00/P8M23D> }      
}

ORDER BY ?pc_dis_lbl  

Combining Equity Loan and Mortgage Guarantee data for Local Authorities

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?district_lbl ?equity_loans ?mortgage_guarantees 
# comment these two out if you dont want the obs URIs returned
 ?equity_loan_obs ?mtg_obs 

  WHERE {

    # help to buy equity loans by district 
    # http://opendatacommunities.org/data/housing-market/help-to-buy/num-loans/loan-type/equity-loans-ons
     GRAPH <http://opendatacommunities.org/graph/housing-market/help-to-buy/num-loans/loan-type/equity-loans-ons> {
      ?equity_loan_obs <http://opendatacommunities.org/def/ontology/housing-market/help-to-buy/numLoansObs> ?equity_loans .
      ?equity_loan_obs <http://opendatacommunities.org/def/ontology/geography/refArea> ?district .
     ?equity_loan_obs <http://opendatacommunities.org/def/ontology/time/refPeriod> ?equity_loan_period .
    }

     # mortgage guarantees by district
     # http://opendatacommunities.org/data/housing-market/help-to-buy/num-loans/mortagegaurantees/completed-loans-ons      
     GRAPH <http://opendatacommunities.org/graph/housing-market/help-to-buy/num-loans/mortagegaurantees/completed-loans-ons> {
      ?mtg_obs <http://opendatacommunities.org/def/ontology/housing-market/help-to-buy/numLoansObs> ?mortgage_guarantees .
?mtg_obs <http://opendatacommunities.org/def/ontology/geography/refArea> ?district .
      ?mtg_obs <http://opendatacommunities.org/def/ontology/time/refPeriod> ?mtg_period .
  }

 # adds labels for Districts where available.
  OPTIONAL {?district rdfs:label ?district_lbl }

 #change this to get the periods you want.
  VALUES ?equity_loan_period {             <http://reference.data.gov.uk/id/gregorian-interval/2013-04-01T00:00:00/P15M> }
  VALUES ?mtg_period { <http://reference.data.gov.uk/id/gregorian-interval/2013-10-08T00:00:00/P8M23D> }      
 }
 ORDER BY ?district_lbl
Posted in Posts from feeds Tagged with: , , , ,