Goal of the Paper:
The goal of this paper was to determine the change in number and activation
status of peripheral blood T-cell subsets during two blood-stage infection
models of malaria. One model involved two-short-course infections while the
other model used a long-course infection. The investigators were interested to
know if T-cell subsets changes within each group over time and if these changes
were different between the different infection models at the same time point.
Experimental Design:
Three animals were assigned to each group for a total of six animals in the
cohort. An initial pre-infection sample was collected from each animal and was
used as the baseline value for each macaque. Samples were then taken at
specific time points after inoculation for analysis by CBC and flow cytometry.
Comparisons were then performed to compare the data from baseline with other
points in the infection within infection model and to determine if there were
differences between the infection models at a specific time point.
Critiques of the Paper:
1. Repetitive t-tests to compare within
group and between groups whenever the experimental design reflects the need for
performing a two-way ANOVA with an appropriate post-hoc analysis.
The most
egregious error in this paper is the use of t-tests
to compare between group and within group based on this experimental design. Figure
3 is pasted above for reference and confirms this was the approach used by the
authors. According to the methods, a Student’s t-test was used to assess if
there were differences between groups at different time points, and a paired
t-test was used to determine if there were significant changes within group
compared to the baseline value. Conceptually, the authors knew that between
group analyses did not need a paired analysis and that within group analyses
did. However, the approach that was used was incorrect. By performing
repetitive t-test, the authors inflated their type-I error well above the
established threshold of 0.05, and given the sample size of the study, most of
the statistically significant results are likely invalid.
The
experimental design calls for a two-way, repeated measures ANOVA with an
appropriate post-hoc to address the objective. The approach that the authors
should have taken is to perform an initial two-way, repeated measures ANOVA
with group (i.e. infection model) and time point as the two factors. After
performing this analysis, the results would inform if there were significant
changes based on subject, group, time, and if there was an interaction-effect
occurring. If significant, the next step would have been to perform a post-hoc
analysis, and in the case of this experiment that appears to be underpowered
with only 3 animals per group, only specific planned comparisons should be
performed to conserve alpha. Using an unplanned comparison approach would be
unwise because it would likely be too underpowered to identify any significant
differences, especially if a pairwise analysis was performed for every possible
combination.
2.
Figures
are poorly designed and do not clearly indicate the relevant information, and
the captions are confusing and unclear.
The figures in this analysis
clearly indicate that individuals are being followed over time (see figure 3
above). This is appropriate representation of the data, but unfortunately, the
other aspects of the figure are lacking. For instance, the arrows on the figure
indicate inoculation and drug treatment. One group had a different inoculation
and drug treatment regimen than the other, and thus, displaying the data on the
same graph is bad data presentation. Further, the repetitive t-tests lead the
authors to use a strange convention of denoting the statistical significance
between a time point and a baseline value. With an appropriate two-way,
repeated measures ANOVA this could have been rectified. Overall, I would likely
suggest that the data be graphed separately based on group and a table be
generated to show significant differences in outcome variables between groups
to make it clearer and more effective for the reviewer/reader.
3.
Biological
conclusions should be questioned as treatment could be considered a confounding/third
variable.
One of the goals of the study was
to determine if there were differences in T-cell responses between groups.
Indeed, the two-way ANOVA that I mentioned above would answer this question in
the most appropriate statistical manner based on the experimental design.
However in that approach, the assumption is that drug intervention and
re-inoculation have no effect on the T-cell values. Based on my experience with
these drugs and this model, I would say that this is a fair assumption.
However, it is worth recognizing this aspect of the design and understanding
the appropriate statistics should that assumption not be made. If the drug
intervention was added into the current Two-Way ANOVA approach, this would for
a three-way ANOVA. As we have learned in the course, it is virtually impossible
to interpret the results of a three-way ANOVA because of the complexity of the
experimental design and, thus, the null hypothesis. Therefor if treatment was going
to be a factor, a linear or nonlinear model would likely be needed to determine
the effect between groups, within group over time, and if there was an effect
of treatment, and if there were three-way interactions between the different
factors.
Overall Conclusion:
This paper does not use appropriate statistics, has poor data representation,
figure captions, and graphs, and the overall assumptions made should be
questioned. Additionally the fact that it is likely severely underpowered, the
conclusions drawn are likely erroneous and could largely be false-positives
with the specific statistical approach that was taken. Finally, I suspect the
authors were “p-hacking” to achieve significance and that is why they went with
the t-tests and not the ANOVA analysis that the experiment calls for.
This comment has been removed by the author.
ReplyDeleteI agree that the t-tests should have not been used and that a two-way, repeated measures ANOVA would have been more appropriate for the experiment. However, as you mentioned, an n=3 per group would not have resulted in statistical significance with this approach even if it looked as if there were differences in the form of a graph. In this situation it would have been necessary to repeat the experiment with larger sample sizes before considering whether to publish or not, but I work with mice. As was brought up in class, this may not be a feasible approach when working with non-human primates. I am curious about whether the use of t-tests is more "tolerated" than comparing data with ANOVA amongst scientists working with non-human primates as compared to analyzing/publishing results from rodent models since these rodent colonies are less expensive to maintain. Mice also have much shorter lifespans and therefore may be able to produce replications in a shorter time. I wonder if there is some kind of open database that could be made available to scientists working with non-human primates so that they could compare across studies to see if the trends that they see with small sample sizes may be worth further replication. It seems like this kind of resource would be helpful in terms of trying to avoid the issue you bring up of false positives. Obviously, the best way to avoid this would be replicating with larger samples, as mentioned above. However, I wonder if this is really financially feasible when working on studies with non-human primates.
ReplyDelete