Architecture of open data
A year ago we launched ArcGIS Open Data – a new capability of ArcGIS – with the explicit goal making it easy and efficient for government to make authoritative data discoverable and accessible. Included for any organization to use without additional cost our hope is that by removing the technical and acquisition barriers to open data initiatives we can instead focus as a community on appropriate experiences and capabilities that deliver meaningful impact within government, residential communities, and economic development.
While megacities are often highlighted for their open data initiatives – there are millions of government agencies across the world that have the opportunity to also gain the benefit of sharing their information with the public. As former Philadelphia CDO Mark Headd wrote “We’re not going make progress in getting smaller governments to adopt open data if the cost of standing up a data portal has the same budget impact as the salary for a teacher, or a cop, or a firefighter, or a building inspector…”
To understand how to achieve these goals we thought it would be useful to share how we designed and built ArcGIS Open Data. Perhaps through discussing the internal architecture we can create more shared knowledge of implementation practices for successful open data initiatives.
Open Data Microservices
ArcGIS is a loosely coupled platform integrated through a shared information model and standardized service definition. That’s a long way of saying microservices or service oriented architecture. This means instead of a single, large application – ArcGIS Open Data is composed of many individual server applications, web applications, mobile applications, search engine, metadata catalogs, harvesters, and more.
This architecture pattern means that we can independently build, deploy and manage small components of the overall application without affecting the other parts. This improves our ability to create new capabilities and introduce them without major disruptions. Individual engineers can work on a small component, build, test and deploy without requireing the entire team.
Architecture of Open Data Policy
Government open data initiatives should not be a technical, operational or an expensive burden. Open data services are necessary for open government to provide public access to information infrastructure and support apps and tools which answer specific needs.
Our mantra is “keep the data where it is”, or as Chris Whong puts it “Play it Where it Lies” – meaning there is already an existing and functioning operational system for storing and updating the data. Key aspects of open data are that it is accurate, authoritative, and timely. That means every time the data are exported, ETL’d, or ingested, it is from a single point in time and will be out of date – as well as potentially introduce errors or lose precision and metadata. Instead, the data should be served as close to the operational source as possible.
Open Data is far more than financial transparency – Government GIS is the backbone of data-driven governments and smarter city initiatives. Discovery and access to the data as services and information are imperative to departments across government as much as residents and businesses that operate within the region. Therefore the architecture of open data closely matches the architecture of modern GIS infrastructure – there are four major tiers of the application: Data Storage, Metadata Curation, Search, and Public Access.
Data Storage: managed locally or in the cloud
Servers are managed by the owners of the data who create and maintain the data as a matter of their operational mission to support business applications that drive government. Police agencies maintain their servers for public safety, crime monitoring, and enforcement. Transportation departments manage roads, bus lines, transit schedules and realtime vehicle tracking. Across a city or regional government, the data are already managed close to the source – meaning that the data are as accurate and recent as possible since its the same data that the government uses to function.
ArcGIS Online provides hosted services which allow for the cloud hosting of data for scalability and access through developer APIs. Hosted services scale to maintain performance as usage increases, meaning you don’t have to worry about how popular your open data become – we take care of the increased usage automatically.
While it is optional to host data in the cloud with ArcGIS, the data is not just parked for download – they power a wide swath of mobile and web applications for viewing, analysis, editing through browsers and mobile devices, as well as the ability for the data to be taken offline for use without internet connectivity or to be enriched with other data. There is no charge for how much the data is accessed, only for how much is hosted.
Common API for the Web
Every single ArcGIS data and map service already has an HTTP, RESTful API. Broadly named geoservices, this API provides full attribute, spatial, temporal query, aggregation, and analysis access output as JSON. So web developers are quickly able to use these services to build applications. Whenever you load a web map or an app, this is calling the ArcGIS API that anyone can access.
Taking the “keep the data where it is” mantra to the extreme, we have also built open-source frameworks to integrate third-party API into ArcGIS Open Data. Koop is like a polyfill for web services – making them all match the GeoServices specification. So whether the data are from CKAN, Github, Socrata or even government API like OpenFDA and US Census American Community Survey they can all have a common user experience and data export and work with ArcGIS visualization and analysis.
ArcGIS Online is a cloud service that manages the catalog, or URL links to these services, as well as other relevant metadata, access permissions, and collaboration. Data providers register links to existing service URL or they can upload spatial, tabular, and other content such as documents (neighborhood watch reports), apps (311 reporting), or web maps (crime spotting dashboard). There is lightweight metadata or support for standard ISO and FGDC metadata.
Open Data can be organized into thematic groups (e.g. transit, safety, education) or by agency (Police Department, Dept of Public Works) or whatever organization makes sense for an agency. Groups can also be used to implement an approval workflow – designating some groups as unverified and others as production.
All of this organizing is done with metadata and requires no moving or updating of the data. This minimizes work by data providers and keeps the data up to date and reduces data errors.
Open Data standards for the web
While ArcGIS Online provides for search to recall user content, ArcGIS Open Data has different user experience needs. Specifically people searching for open datasets are performing discovery – meaning they may know what they’re looking for, but not what data may be available. Therefore ArcGIS Open Data extends the ArcGIS Online content curation to add more formats that fulfill open data objectives. A set of metadata harvesters combine together metadata from numerous sources including ArcGIS Online, Server, Layers, as well as structured metadata and data statistics.
A combined search index builds categorical facets and a recommendation engine that assists people with finding the most relevant and high-quality data. Our harvesters calculate implicit quality scores on data based on metadata completeness, data coherency, and emergent utility by the public. While metadata is widely recognized as a valuable and often required element of data, it often is neglected. By providing a metric we assist data providers to improve metadata quality and improves search as well as visualization and analysis.
ArcGIS Open Data adds support for the many existing and emerging open data standards. Using the DCAT data.son specification, organizations’ data can be automatically included in national open data catalogs such as Data.gov, Data.gov.uk, Data.gov.au and so on. OpenSearch adds integration to web browsers and developer SDKs, and RDFa adds linked data markup for semantic web and search engines.
Above all of this, ArcGIS Open Data is a web application that uses the cloud catalog with links to on-premise or cloud hosted data and gives you a self styling interface to launch one or more open data sites. Within minutes, any organization can create and launch a new open data site. This can be the on site for the entire municipality, or you can create new sites for specific events, departments, or initiatives – as many as you want! There is no charge for the number of sites you create, and you maintain direct control over the styling and updates to your sites.
Consider this as a way to easily manage and distribute open data capabilities across your government without maintaining complicated and divergent content management systems or catalogs.
Additionally, because of the federated catalog architecture, organizations can include open data from other government agencies. For example a state can include open data from their cities and counties – or a city could include federal data such as weather from NOAA and disaster response information from FEMA.
While ArcGIS Open Data includes a user interface translated into at least 25 languages, there are also some open-source web interfaces that we are experimenting with and would appreciate feedback or testing. Check out our Backbone.js and Ember.js projects.
All of this is powered by the Open Data API and uses the same ArcGIS Online items that you’ve already been maintaining.
We have a lot more planned for the public to better understand your data and leverage the advanced capabilities of ArcGIS to make better local decisions. And because of our web architecture we can explore these ideas together without requiring any changes to how you publish your data.
ArcGIS Open Data is included with the ArcGIS platform and you can start right away contributing data to the Open Data Community. That means for anyone that has ArcGIS can use ArcGIS Open Data for free right now. That means for anyone that has ArcGIS can use ArcGIS Open Data right now without any difficult procurement processes or budget busting middleware to provide public access to data.
Let us know if you have any questions!