My dissertation research has
focused on defining the immune response of nonhuman primates infected with
simian malaria parasites. One of the biggest challenges in nonhuman primate
research is small sample sizes due to the cost of performing research utilizing
these models. To put it into perspective, one monkey can range anywhere from
$2,000 - $8,000 depending on the species and specific experiments that will be
performed, and each animal costs $8 - $10 a day to house and feed. These costs
add up quickly so researchers are required to limit the number of animals used,
and in most cases, a monkey study consisting of anywhere from 3-7 animals is
considered “well-powered” by the NHP research community. However as we have
learned throughout the course and based on my experience with the heterogeneous
responses of outbred models like NHPs, this sample size is typically not
sufficient to rigorously and fairly test most scientific and statistical
hypotheses. Further, most of the time the data does not meet the assumptions of
a most parametric statistical tests, but most papers will use these test to
gain significance, or in other words “p-hack”. This introduces bias into the
nonhuman primate literature and provides a reason why many people are skeptical
of NHP research. To fix this problem, there needs to be appropriate funding
available for this NHP research to properly power and fairly assess the
question being evaluated, or appropriate nonparametric statistical test should
be used. However, this becomes difficult with small sample sizes as many nonparametric
test require at least 5 subject to obtain a p-value of less than 0.05.
I have experienced the burn of an
underpowered experiment that requires analysis by a nonparametric statistic
first hand. Whenever I conducted one of the first experiments for my PhD, I had
a result that was clearly significant based on the “bloody-obvious” test (see image to left). Prior
to the experiment, I predicted the phenotype based on the literature and did a
power calculation, which stated that I needed 5 animals to fairly test my
hypothesis. My statistical hypothesis was that the mean parasite burden during
a primary infection was different than the mean parasite burden during a
relapse infection. Unfortunately during the experiment, one of my animals
succumbed to the infection prior to having a relapse, which brought my sample
size down to 4 animals. Whenever I performed a Wilcoxon matched pairs test
(which I argue was the appropriate statistical test in this situation), I did not have enough data
points to fairly test the hypothesis and got a P value of 0.0625 despite the phenotype; I should point out that this wasn't graphed correctly and should actually be connected by a line to imply that the analysis was paired. Whenever I
presented this data, there was a huge debate on whether I had run the
statistical test incorrectly and many thought, including PIs, I shouldn’t use
nonparametric statistics even though this is clearly the appropriate test to
perform. In the end, I succeeded in arguing my point and was able to report the
phenotype in a publication even though we didn’t have significance because due
to the death of one animal the study lacked the power needed to assess the data
by nonparametric statistical methods.
Overall, I think that NHP researchers should
embrace nonparametric statistics even though it may require more resources to generate
significant data, but the benefit of producing reliable data that draws
reproducible conclusions is key. Overall, the extra resources are well worth it
even if it means that one has to do less experiments or hire one less
technician. I think that “less is more” whenever it comes to science,
particularly in the realm of NHP research.
Interesting quandary. That seems like a very difficult conundrum. It's unfortunately that one of the monkeys died since that prevented a "significant" finding. I think research is really starting to move towards blindly running stats and only accepting what these statistics pop out in order to determine significance. In your data's case, it was obvious to look at the data and say that despite the p>0.05, the graph really suggests a significant finding. Something I noticed while looking for something to report for our BadStats blog is that often, people will report a p<0.05 significant finding when running an incorrect stat, but when looking at the graph, it really calls into question whether or not that different is truly significant. It's really important to not just plug-and-chug numbers, but to spend the time to look at your data beyond just running stats. Kudos to your for sticking with the most-accurate statistical test for your analysis, although it seems to show an insignificant finding. It makes me very grateful not to have to do any monkey experiments!
ReplyDelete