Don’t Forget Metadata!
I have many fond memories as a kid of our local libraries. I was fortunate to grow up a short walk away from
what was at the time the largest non-national library in Europe
as well as a closer local library in a grand
red-brick Victorian building.
One of those memories is the card indexes. I loved the card index (yes,
I was a metadata geek from an early age). Armed only with a question and some idea about keywords to
use you could riffle through wooden drawers full of pink, blue and beige cards and then wander off
to the stacks to find the answers. Magic stuff.
How does this translate when we are talking about data and datasets? Well, there are a number of existing
RDF ontologies that touch on the area of metadata, but probably the
most well-known and widely used one for describing dataset metadata is VoID.
VoID provides you with the means to identify and describe datasets as well as to provide metadata
ranging from contact and licensing information to example dataset resource URIs and patterns for accessing
the content of the dataset.
The question is where to place this metadata? One obvious answer is to add the metadata in to
the RDF dataset that it describes. This is great for the user that has chosen to access that data;
but in this loose book/dataset analogy, metadata that is contained inside your dataset is equivalent to
that US Library of Congress metadata that you find inside the front cover of a book.
Dataset metadata should be the card index to all the open data you publish.
Without it being in a single central location, your users must aimlessly wander the shelves, pulling down dataset after
dataset in search of what they want.
A better solution is to treat the metadata about datasets as a separate dataset in its own right.
Think of this as the downloadable / queryable card-index to your data library. Indeed the VoID recommendation
goes as far as to suggest that you use a “well-known” URL for the location of that dataset – sort of
the equivalent of having your virtual card-index always just on the left as you come through the door (that was
the sound of an analogy getting pushed to breaking point).
Metadata management doesn’t have to be hard. Our Data Platform product has VoID
metadata management built-in so when you update the dataset the metadata stays in sync. But regardless of what
method you use to publish open data, it is important to recognize that your metadata is one of the key components
to making your data discoverable.
Image by http://www.flickr.com/photos/mamsy/, CC BY 2.0, via Wikimedia Commons