Idle Thoughts on “Data Literacy” in the Library…

In part for a possible OU Library workshop, in part trying to mull over possible ideas for an upcoming ILI2015 workshop with Brian Kelly, I’ve been pondering what sorts of “data literacy” skills are in-scope for a typical academic library.

As a starting point, I wonder if this slicing is useful, based on the ideas of data management, discovery, reporting and sensemaking.


It identifies four different, though interconnected, sorts of activity, or concern:

  • Data curation questions – research focus – covering the management and dissemination of research data, as well as dissemination issues. This is mainly about policy, but begs the question about who to go to for the technical “data engineering” issues, and assumes that the researcher can do the data analysis/data science bits.
  • Data resourcing – teaching focus – finding and perhaps helping identify processes to preserve data for use in teaching context.
  • Data reporting – internal process focus – capturing, making sense of/analysing, and communicating data relating to library related resources or activities; to what extent should each librarian be able to use and invoke data as evidence relating to day job activities. Could include giving data to course teams about resource utilisation, research teams to demonstrate impact ito tracking downloads and use of OU published resources.
  • Date sensemaking – info skills focus – PROMPT in a data context, but also begging the question about who to go to for “data computing” applications or skills support (cf academic/scientific computing support, application training); also relates to ‘visual literacy’ in sense of interpreting data visualisations, methods for engaging in data storytelling and academic communication.

Poking in to each of those areas a little further, here’s what comes to mind at first thought…

Data Curation

The library is often the nexus of activity around archiving and publishing research papers as part of an open access archive (in the OU, this is via ORO: Open Research Online). Increasingly, funders (and publishers) require that researchers make data available too, often under an open data license. Into this box I’m thinking of those activities related to supporting the organisation, management, archiving, and publication of data related to research. It probably makes sense to frame this in the context of a formal lifecycle of a research project and either the various touchpoints that the lifecycle might have with the library, or those areas of the lifecycle where particular data issues arise. I’m sure such things exists, but what follows is an off-the-of-my-head informal take on it…!

Initial questions might relate to putting together (and costing) a research data management plan (planning/bidding, data quality policies, metadata plans etc). There might also be requests for advice about sharing data across research partners (which might extend privacy or data protection issues over and above any immediate local ones). In many cases, there may be concerns about linking to other datasets (for example, in terms of licensing or permissions, or relating to linked or derived data use; mapping is often a big concern here), or other, more mundane, operational issues (how do I share large datafiles that are too big to email?). Increasingly, there are likely to be publication/dissemination issues (how/where/in what format do I publish my data so it can be reused, how should I license it?) and legacy data management issues (how/where can I archive my data? what file formats should I use?). A researcher might also need support in thinking through consequences – or requirements – of managing data in a particular way. For example, particular dissemination or archiving requirements might inform the choice of data management solution from the start: if you use an Access database, or directory full of spreadsheets, during the project with one set of indexing, search or analysis requirements, you might find a certain amount of re-engineering work needs to be done in the dissemination phase if there is a requirement that the data is published at record level on a public webpage with different search or organisational requirements.

What is probably out of scope for the library in general terms, although it may be in scope for more specialised support units working out of the library, is providing support in actual technology decisions (as opposed to raising technology specification concerns…) or operations: choice of DBMS, for example, or database schema design. That said, who does provide this support, or whom should the library suggest might be able to provide such support services?

(Note that these practical, technical issues are totally in scope for the forthcoming OU course TM351 – Data management and analysis…;-)

Data resourcing

For the reference librarian, requests are likely to come in from teaching staff, students, or researchers about where to locate or access different sources of data for a particular task. For teaching staff, this might include identifying datasets that can be used in the context of a particular course, possibly over several years. This might require continuity of access via a persistent URL to different sorts of dataset: a fixed (historical) dataset, for example, or a current, “live” dataset, reporting the most recent figures month on month or year on year. Note that there may be some overlap with data management issues, for example, ensuring that data is both persistent and provided in a format that will remain appropriate for student use over several years.

Researchers too might have third party data discovery or access requests, particularly with respect to accessing commercial or privately licensed data. Again, there may be overlaps with data management concerns, such as how to managing secondary data/third party data appropriately so it doesn’t taint the future licensing or distribution of first party or derived data, for example.

Students, like researchers, might have very specific data access requests – either for particular datasets, or for specific facts – or require more general support, such as advice in citing or referencing sources of secondary data they have accessed or used.

Data reporting

In the data reporting bin, I’m thinking of various data reporting tasks the library might be asked to perform by teaching staff or researchers, as well data stuff that has to be done as internally within the library, by librarians, for themselves. That is, tasks within the library that require librarians to employ their own data handling skills.

So for example, a course team might want to know what library managed resources referenced from course material are being when and by how many students. Or learning analytics projects may request access to data to help build learner retention models.

A research team might be interested in number of research paper or data downloads from the local repository, or citation analyses, or other sources of bibliometric data, such as journal metrics or altmetrics, for assessing the impact of a particular project.

And within the library, there may be a need for working with and analysing data to support the daily operations of the library – staffing requirements on the helpdesk based on an analysis of how and when students call on it, perhaps – or to feed into future planning. Looking at journal productivity, for example, (how often journals are accessed, or cited, within the institution) when it comes to renewal (or subscription checking) time; or at a more technical level, building recommendation systems on top of library usage data. Monitoring the performance of particular areas of the library website through website analytic, or even linking out to other datasets and looking at the impact of library resource utilisation by individual students on their performance.

Date sensemaking

In this category, I’m lumping together a range of practical tools and skills to complement to the tools and skills that a library might nurture through information skills training activities. So for example, one are might be providing advice about how to visualise data as part of a communication or reporting activity, both in terms of general data literacy (use a bar chart, not a pie chart for this sort of data; switch the misleading colours off; sort the data to better communicate this rather than that, etc) as well as tool recommendations (try using this app to generate these sorts of charts, or this webservice to plot that sort of map). Another might be how to read, interpret, or critique a data visualisation (looking at crappy visualisations can help here!;-), or rate the quality of a dataset in much the same way you might rate the quality of an article.

At a more specialist level, there may be a need to service requests about what tools to use to work with a particular dataset, for example, a digital humanities researcher looking for advice on a text mining project?

I’m also not sure how far along the scale of search skills library support needs to go, or whether different levels of (specialist?) support need to be provided for undergrads, postgrads and researchers? Certainly, if your data is in a tabular format, even just as a Google spreadsheet, you become much more powerful as a user if you can frame complex data queries (pivot tables, any one?) or start customising SQL queries. Being able to merge datasets, filter them (by row, or by column), or facet them, cluster them or fuzzy join them are really powerful dataskills to have – and that can conveniently be developed within a single application such as OpenRefine!;-)

Note that there is likely to be some cross-over here also between the resource discovery role described above and helping folk develop their own data discovery and criticism skills.


So, is that a useful way of carving up the world of data, as the library might see it?

The four different perspectives on data related activities within the library described above cover not only data related support services offered by the library to other units, but also suggest a need for data related skills within the library to service its own operations.

What I guess I need to do is flesh out each of the topics with particular questions that exemplify the sort of question that might be asked in each context by different sorts of patron (researcher, educator, learner). If you have any suggestions/examples, please feel free to chip them in to the comments below…;-)

Posted in Benefits of open data, Data, Education, Posts from feeds, Smart communities Tagged with: , , , , , , , , ,