Habits and open data: Helping students develop a theory of scientific mind

(This article was originally published at BayesFactor: Software for Bayesian inference, and syndicated at StatsBlogs.)

This post is related to my open science talk with Candice Morey at Psychonomics 2015 in Chicago; also read Candice’s new post on the pragmatics: “A visit from the Ghost of Research Past”. In this post, we suggest three ideas that can be implemented in a lab setting to improve scientific practices, and encourage habits that make openness easier. These ideas are designed to be minimally effortful for the adviser, but to have a big impact on practice:

* Data partners: young scientists have a partner in another lab, with whom they swap data. The goal is to see if their data documentation is good enough that their partner can reproduce their main analysis with minimal interaction.
* Five year plan: When a project is part-way through, students must give a brief report that details what they have done to insure that the data and analyses will be comprehensible to members of the lab in five-year’s time, after they have left.
* Submission check: At first submission of an article based on the project, advisors should discuss with their advisees the pros and cons of opening their data, and how the data will be promoted online, if it will be open.

Betrayed by our habits

Science, like a lot of other things, is based largely on habit. We learn habits early on in our careers, and most of them serve us well. Habits like checking for problems in our data, such as bad coding or outliers, can keep us from getting fooled. Other habits, like doing a final, full read through of a paper before submission, save us work in the long run.

Other habits, however, can keep us from doing better science. Scientists value openness, at least in the abstract. Many scientists have had the frustrating experience of *closed* science: for instance, colleagues that do not share their data. Yet, most science is not open, in spite of the fact that many tools to facilite open science are freely available.

To us, the reasons seems obvious. Open science does not bring great immediate reward, and open practices are not part of most scientists’ habits. This is natural; many scientists were trained before openness was easy and expected. Our habits were formed without an expectation, for instance, that our data would be open to everyone. Analyses are messy, badly documented, and full of ad hoc solutions to problems that we decided to improve later. If you weren’t expecting data to be open, then making it so requires work.

When we are faced with opening our data at paper submission or publication, then, our habits betray us. Our values may say “we should be open”, but our real choice is *not* between open science and closed science; it is between “hours of work now with uncertain payoff” versus “no work now, and maybe no one will ask for it.” The result is not a free choice about open science. Our habits have encumbered our choice with irrelevent issues, such as “I don’t feel like doing this work right now. I’ll do something else more fun,” and everything else is more fun.

If we had habits that were more attuned to the expectation of scientific openness, we might be able to do better. Forming such habits later in a career takes work, but forming them early in one’s career is much easier. We suggest here a number of things that senior researchers who run labs can do to help their advisees build better habits. None of these things require much work, but we believe that they can help ensure the next generation of scientists has better habits than the current one.

Helping young scientists form better habits: three ideas

The ideas presented here are designed with several features in mind. They all:
* Require minimal effort on the part of an advisor.
* They require little *marginal* effort from a young scientist. They may even save effort, since they will encourage good practices and help avoid mistakes.
* They encourage development of a “theory of scientific mind”: How do other scientists think about data and materials? What would they expect of a data set? Will others understand what I’ve done?
* They help young scientists *truly* have a choice about whether to be open. By the time the choice must be made, no extra work is necessary. The decision can be driven by the arguments in favor, or against, open science, instead of mere momentary pragmatic concerns.

These are roughly ordered in where they would appear in an advisees training. We should emphasize that none of these require an advisor promoting them. Young scientists can do these things without their advisor’s support, to help build good habits.

Data partners

In the “data partner” scheme, young scientists in one lab partner with young scientists in another lab working on related topics. The goal of the data partner scheme is to help build an understanding of what information is necessary when sharing data, and to help catch analysis errors early on.

When collecting and analyzing their data, students should plan to share their data with their data partner with a short report containing an initial methods section, and a primary analysis (but without numbers). The data partner will be expected to reproduce the primary analysis *without* interaction with the student. The data will need to be well-documented, and the analysis sufficiently detailed, for the primary analysis to be reproducible. Details such as how the data are to be cleaned will be critical.

Once the data partner has attempted to reproduce the primary analysis, the two can discuss what was lacking. What could have been more clear? If the results could not be replicated, why? This will build the students’ understanding of data analysis, develop their theory of scientific mind, and catch many mistakes early in a project. As a side benefit, the student now has created substantial documentation of their data set: precisely the information necessary for releasing data to others.

The five-year plan

One of the issues that often comes up when training students is turn-over. A student often has “ownership” of a project, and an adviser is less involved, guiding the student along but not having complete knowledge of the entire project. This can be problematic. When a student leaves, what if the adviser wants to send the data to someone? What if another student wants to re-analyze it to check a hunch? What if the lab wants to perform a meta-analysis?

A lab runs on data; old materials (including stimuli), data, and analyses should be archived in a clear way so that if someone from the lab, years later, wants to use the materials or data, or wants to reproduce the analysis, they should be able to do so. This is part of being a good lab citizen.

When a project is mature, advisers should give time to the student in a lab meeting to answer the question “What have you done to insure that this project — including the materials, data, and statistical analysis — will be useable in five years?” This encourages students to think of the long-term usefulness of their data to others. Over the years, a formal meeting may be come unnecessary as lab standards become more geared toward openness.

The submission check

As the name implies, the “submission check” is meant to occur before a project is submitted for publication. If all has gone well, the project should be well-documented and ready to release. The work has been done; all that needs decided is whether the project will be open. If the answer is not pre-determined by a journal or granting agency requiring open data and materials, then the adviser should have a conversation with the young scientist: should we open these data and materials? What are the arguments for opening it? Are there arguments against it?

If it is decided to open the data, then the next question should be, “How will you promote these data and materials from the project?” For a young scientist thinking of the next stage of their career, promotion is critical. One of the advantages of open data is that it yields another product of the research that can be promoted; open materials and open code provide others. The student should be encouraged to think about how these can be leveraged to their advantage, and to follow through with their promotion ideas.

Promoting good habits and open science

For many scientists, open science is a difficult choice because it is encumbered by a number of unnecessary pragmatic concerns flowing from habits formed over many years. Openness is not truly a free choice, driven by the merits of open science. This need not be the case for the next generation of researchers. Senior researchers have an important role to play in helping their advisees form good habits and develop a theory of scientific mind. The data partner scheme, the five-year plan, and the submission check can help establish good lab practices, with the benefit that students will be prepared for a more open science.

Please comment on the article here: BayesFactor: Software for Bayesian inference

The post Habits and open data: Helping students develop a theory of scientific mind appeared first on All About Statistics.