Monday, January 18, 2016

The Morality of Reproducible Data

            As academic scientists, we of course are invested in our research, and (ideally) want to leave our own mark on our respective fields. The data that we generate and publish do not just contribute to personal edification, but also the understanding of a topic on a global scale. With the wide accessibility of scientific journals, data is consistently reanalyzed and published findings applied to new experiments in a growing international research network. Keeping this in mind, it is more critical than ever before that the research community emphasize the importance of sound (and complete) data and experimental reproducibility.

            In his TED talk, Dan Ariely discussed how it was more likely for every person in a room to cheat a little than for just one to completely cheat. Students would give themselves a “4” in lieu of the “2” they deserved, presumably so that they would get a slightly better reward while still maintaining a degree of self-respect. This is sadly applicable in the research world, in the form of “cherrypicking” data or only withholding any negative findings from papers. Findings may often be smudged or spun in a certain light, or an incomplete story presented, not nearly enough to warrant a retraction, but just enough that the findings may not be entirely trustworthy. I agree with Jared Horvath’s Scientific American article in that funding provides a constant pressure for scientists to focus first on generating marketable data, and second on generating complete or valid data. However, while funding is a legitimate concern, I do not think that it is an excuse to perform unviable research or twist results. Granted, this is easy for me to say as a graduate student with a guaranteed stipend, but labs that produce questionable data do more than fail to contribute to science; they actually detract from ongoing research. False data can mislead other researchers who may use these findings as a baseline for their own projects, which in turn could possibly fail or lead to more misdirection. The withholding of negative data could lead other labs to pursue these and waste precious grant money rediscovering what should already be public domain.

            After reading some of these articles, it seems that it should be easier than ever to make sure that research is well-executed, given the formation of organizations, such as the PLoS ONE New Reproducibility Initiative and PubPeer. While these opportunities should be taken with a grain of salt, they seem like a viable means for experts to help fact-check or ensure that results hold true. I do think there is a critical difference between difficult and irreproducible experiments, in that some procedures may have a low success rate due to the necessity for high level of technical skill or specialized setup. However, if even a group of experts in the same field cannot recapitulate a finding, something is likely at fault with the underlying experimental strategy or the published data.

1 comment:

  1. I agree with your assessment of the current realm of scientific reproducibility (or lack thereof) of data. Though there is a lot of negative press regarding flagrant exaggerations, or worse, blatant forgeries of data, I believe the field is self-correcting. Though a large amount of time, effort, and resources will be spent it the self-correction process, it still occurs. One of the most blatant instances of data forgery, the scandal at Duke University involving Anil Potti, is such an example of the field of science as self-correcting. Though the harmful effects of his forged data can never be understated, in this instance ethical scientists reviewed his data and discovered that it was fraudulent. As a researcher that used fruits flies, I have the benefit of working in a very collaborative scientific environment. It is customary for labs to share their fly crosses and other reagents with others in the community. It is because of this fact that the fruit fly field is so self-correcting. People will not hesitate if they cannot reproduce your results. I believe that if science continues to be this collaborative, then we can still separate out the fraudulent from the real data.