Monday, January 18, 2016

The statistics of statistics

The largest issue, in my opinion, with irreproducibility and bias is scientific research isn’t a lack of awareness, and it is not even the fact that these problems exist. The greatest problem is that there is a lack of understanding of why, and a lack of self-awareness that you personally are capable of bias. To my point of awareness of bias and irreproducibility that is not the issue. As obvious in the homework assignment, there have been many articles “shedding light” on the issue. It is so common that the term irreproducible data is more like a running lab joke then a real day-to-day concern. Most people blame the “perish or publish” culture, which puts a lot of pressure on publishing data as soon as possible, the vague (either on purpose or not) methods sections, and also the use of statistics. The pressure to publish will never go away. The concern with a lack of transparency in methods section is also an issue that arises mostly due to the high pressure to publish, but also because there are small things that make a large difference, and the research is not even aware of these. After my admittedly short time in science (6 years) I do believe that this issue has started to be addressed, and may be a more personal then systematic issue.  

The argument for blaming the use of statistics is a complicated one. This is because statistics can be extremely powerful and necessary, particularly as researchers move toward analyzing large data sets and “-omics” type studies. The main argument is that statistics is either used to freely or in a basic understanding, or that complicated statistics are used to make data seem more significant then truly are. However, as pointed out in Jeremy Berg’s blog, the issue is that scientists do not really understand how the statistics work. Importantly, they do not understand the bias that is inherent in the statistics, and why a significant fraction of experiments cannot be repeated exactly.

If you fully understand the statistics on how easily data could not be reproducible it is easier to swallow the thought that irreproducible data is common place, and has always been a part of scientific research. This is argued by John Horvath, where he stresses that if we accept that most of the data published is false or irreproducible, we can then strive to focus on what is true. As scientists we are taught always to question data or idea, and this does not end just because something is “statistically significant”.  However, if we as scientists can accept this and focus on the ideas being presented, and how we can use the small amount of “true” data to move science further then the irreproducibility is not as huge an issue as we once thought, as long as we are being honest with our data and eliminating personal biases. Meaning it is OK to accept that there is a small level of irreproducibility, but we cannot add to it with our personal biases, otherwise that small level will become very large.

This leads to an issue of awareness. Being aware of our ability to bring bias into our research. As Dan Ariely pointed out in his TED talk, “a lot of people cheat a little bit”. We all have what he refers to as a fudge factor. We are willing to cheat just a little bit, but in general most people (and here I’m really saying most scientists) do not make up data, we simply allow our biases to creep into our experiments both in design and analysis. Dan states that one of these is because of our social norms, we as a scientific community need to educate researcher on how to avoid experimental biases, and we need to accept that our intuition is not always correct.

No comments:

Post a Comment