Saturday, April 9, 2016

Nonparametric statistics in nonhuman primate research shouldn’t be taboo

My dissertation research has focused on defining the immune response of nonhuman primates infected with simian malaria parasites. One of the biggest challenges in nonhuman primate research is small sample sizes due to the cost of performing research utilizing these models. To put it into perspective, one monkey can range anywhere from $2,000 - $8,000 depending on the species and specific experiments that will be performed, and each animal costs $8 - $10 a day to house and feed. These costs add up quickly so researchers are required to limit the number of animals used, and in most cases, a monkey study consisting of anywhere from 3-7 animals is considered “well-powered” by the NHP research community. However as we have learned throughout the course and based on my experience with the heterogeneous responses of outbred models like NHPs, this sample size is typically not sufficient to rigorously and fairly test most scientific and statistical hypotheses. Further, most of the time the data does not meet the assumptions of a most parametric statistical tests, but most papers will use these test to gain significance, or in other words “p-hack”. This introduces bias into the nonhuman primate literature and provides a reason why many people are skeptical of NHP research. To fix this problem, there needs to be appropriate funding available for this NHP research to properly power and fairly assess the question being evaluated, or appropriate nonparametric statistical test should be used. However, this becomes difficult with small sample sizes as many nonparametric test require at least 5 subject to obtain a p-value of less than 0.05.

I have experienced the burn of an underpowered experiment that requires analysis by a nonparametric statistic first hand. Whenever I conducted one of the first experiments for my PhD, I had a result that was clearly significant based on the “bloody-obvious” test (see image to left). Prior to the experiment, I predicted the phenotype based on the literature and did a power calculation, which stated that I needed 5 animals to fairly test my hypothesis. My statistical hypothesis was that the mean parasite burden during a primary infection was different than the mean parasite burden during a relapse infection. Unfortunately during the experiment, one of my animals succumbed to the infection prior to having a relapse, which brought my sample size down to 4 animals. Whenever I performed a Wilcoxon matched pairs test (which I argue was the appropriate statistical test in this situation), I did not have enough data points to fairly test the hypothesis and got a P value of 0.0625 despite the phenotype; I should point out that this wasn't graphed correctly and should actually be connected by a line to imply that the analysis was paired. Whenever I presented this data, there was a huge debate on whether I had run the statistical test incorrectly and many thought, including PIs, I shouldn’t use nonparametric statistics even though this is clearly the appropriate test to perform. In the end, I succeeded in arguing my point and was able to report the phenotype in a publication even though we didn’t have significance because due to the death of one animal the study lacked the power needed to assess the data by nonparametric statistical methods. 
              Overall, I think that NHP researchers should embrace nonparametric statistics even though it may require more resources to generate significant data, but the benefit of producing reliable data that draws reproducible conclusions is key. Overall, the extra resources are well worth it even if it means that one has to do less experiments or hire one less technician. I think that “less is more” whenever it comes to science, particularly in the realm of NHP research. 

1 comment:

  1. Interesting quandary. That seems like a very difficult conundrum. It's unfortunately that one of the monkeys died since that prevented a "significant" finding. I think research is really starting to move towards blindly running stats and only accepting what these statistics pop out in order to determine significance. In your data's case, it was obvious to look at the data and say that despite the p>0.05, the graph really suggests a significant finding. Something I noticed while looking for something to report for our BadStats blog is that often, people will report a p<0.05 significant finding when running an incorrect stat, but when looking at the graph, it really calls into question whether or not that different is truly significant. It's really important to not just plug-and-chug numbers, but to spend the time to look at your data beyond just running stats. Kudos to your for sticking with the most-accurate statistical test for your analysis, although it seems to show an insignificant finding. It makes me very grateful not to have to do any monkey experiments!