Monday, January 22, 2018

Sharing is caring? Open source data as a solution to the reproducibility crisis

As many others on this blog have already discussed, the reproducibility crisis is a serious concern shared by many scientists. While some blame the current culture of scientific achievement and others blame a widespread misapplication of statistics, the exact reasons behind this crisis are difficult to determine. In all of these discussions on scientific reproducibility, the question still remains: how do we fix it?

One proposed solution is sharing data. As Jeff Leek discusses in his article, there is much debate and fear about the idea of sharing dating for increased transparency and discovery. For many years, scientific findings have been shared in journals, where researchers present the results and interpretations of their studies via descriptions and figures. This method has been the fundamental means of scientific progress, allowing scientists to build discoveries off of the foundation of others. However, it is increasingly debated whether papers are enough – in an increasingly connected world, should scientists also share their raw data, allowing others to truly dive into the analyses performed as well as search for new findings of their own. While open-source data could open up a new world of discovery, there are also potential risks: data-sharers could lose an advantage in their field if others publish findings before them and without credit and data-analyzers could potentially misinterpret or improperly use the dataset without proper training. There are both pitfalls and advantages to data sharing, but as the science community begins to acknowledge and address the reproducibility crisis, open source data is a very viable solution.


What does open-source data look like in practice? In the field of neuroscience, there are several organizations and research groups pioneering data sharing. One such group, Neurodata Without Borders, attempts to address the logistical problems of sharing data. One obstacle to open-source data is that different research groups use very specialized techniques and store data in various distinct ways that can be difficult for a potential data analyst to understand. The Neurodata Without Borders pilot project attempts “to develop a unified, extensible, open-source data format for cellular-based neurophysiology data.” With a unified database, this organization aims to make data-sharing accessible and practical for scientists across the globe. In another pioneering effort to facilitate data sharing, a group of neuroscience laboratories across the world recently came together to form the “International Brain Lab.” This lab is a giant collaboration and project of reproducibility, where laboratories in various locations will use the same tasks and protocols to develop a standard model of neural processing. The International Brain Lab’s “standard protocol attempts to address all possible sources of variability…. from the mice’s diets to the timing and quantity of light they are exposed to each day and the type of bedding they sleep on. Every experiment will be replicated in at least one separate lab, using identical protocols, before its results and data are made public.” With solutions such as these, perhaps the trend of irreproducibility in science will be replaced with a more positive trend of collaboration and unity in scientific discovery.

No comments:

Post a Comment