Monday, January 16, 2017

Bias is human nature

Whenever I think about the issue of bias and irreproducibility in science, there are two quotes that come to mind. 
“73.6% of all statistics are made up.” – Mark Suster 
The second quote was popularized by Mark Twain, who attributed it to Benjamin Disraeli: 
“There are three kinds of lies: lies, damned lies, and statistics.” 
How do these quotes relate to the issues of reproducibility and bias in science? The irony of the first quote is it is itself a made up statistic, meant to demonstrate that people will parrot figures without first validating their veracity. The second quote highlights that statistics can be deceitful if misrepresented. Combine misleading statistics with the repetition of false information, and you have a crisis in the validity and reproducibility of scientific data. You do not have to go far to find proof of this phenomenon: This article discusses the source of hype around new cancer drugs, which stems from both journalists and scientists repeating statistics without understanding the full context of the situation. Yet, it is not just scientists who do this. How many times have you or a Facebook friend read a statistic and then repeated it, without understanding where that number came from? Misleading facts combined with repetition without confirmation means it is very easy to fool ourselves into thinking there is something in the numbers when in actuality, there is nothing.
To demonstrate how easy it is to fool ourselves, take a look at the graph below: 
These graphs look related, right? An r value of .666 is not terrible. Let’s add some labels.

Do you believe this graph? It seems pretty reasonable, right? But let’s look at what the graph actually represents.

Surprise! This a spurious graph where the two variables have nothing to do with each other, yet look related because of the way the data is represented.
The point is, statistics is tricky. It is easy to ignore facts and justify what we want to see, especially when it benefits us. I think this may be a big reason why science is currently in a data crisis; it is not necessarily out of intentional malice, but rather because human beings are inherently bias, and we inherently make connections, false or not, between data sets. Of course, there are those who intentionally falsify data or manipulate data to fit their theories, but that’s a whole other topic.

1 comment:

  1. This post humorously highlights the nature of humans to see what they want to see. The number of people drowning in a pool and number of movies Nicolas Cage has appeared are discrete conditions that presumably have no impact on one another. However, anyone with no knowledge of either condition could conclude that they are related to one another. This also brings into light the issue of "correlation vs causation," a major flaw that causes us to conclude that data is related because they look like each other. This would be a more extreme example of the reproducibility crisis as it the data illustrated not only lack significance but would be considered "alternative facts."