Banerjee et al., “A Role
for Apoptosis-Inducing Factor in T Cell Development.” 2012
In
reading papers through a statistically critical lens, I have found that the
most prolific error that is made, at least in immunologically based papers, is
the lack of statistics. Many papers use statistics very little, if at all, and
this is to say nothing of the validity of their choice and methodology of
statistical tests. One such example of this type of BadStats is Banerjee, H, et
al., 2012, where the authors show that Aif is critical for reducing levels of
reactive oxygen species (ROS) during T cell development, and without Aif, T
cells are arrested at the DN stage.
The
statistical issue with this manuscript, is that there is a lack of planning and
forethought into statistical analysis. This is evidenced first by the Materials
and Methods section. The subtitle of “Statistical Analyses” says only “Data
were analyzed using Student’s t test.” This suggests to me that there was no
statistical planning or study design that took place before the experimentation
began. This is a gross error on the part of the authors, because any
statistical findings/statistical analyses that take place during data analysis,
should be pre-planned to avoid the introduction of biases, and without reading
any more into the paper, it seems unlikely that all experiments in the entire
paper would be appropriately analyzed using one test.
Additionally,
by describing the statistical approach in so little detail and only using a
Student’s t test for all of the experiments done in the paper, it suggests to
me a clear lack of understanding of statistics. In my experience, the student’s
t test is kind of a “catch-all” statistical test in order to put asterisks on
your figures and make them look more important. The student’s t test is an
often-used test, and for good reasons, but using only one statistical analysis
throughout your entire paper suggests that it was thrown in at the last minute
before submission of the paper.
Upon
further dissection of this manuscript, it has also become clear that the
authors do not state whether the t test used was paired or unpaired. All of the
experiments in this paper were are not the same in terms of which test is
appropriate. Matched bone marrow chimera experiments should use a paired t
test, of which they present data in figure 2C. Most of the other experiments,
in which they are comparing Hq mice to WT mice, should be unpaired.
Figure
2E shows the cell counts using four different markers from four different
groups. This type of data is normally analyzed using an ANOVA test, but there
is insufficient detail in the experimental methods to know for sure. If the
authors would like to use a test that is not ANOVA, that rationale should be
explicitly stated in the methods and justified accordingly.
Additionally,
a Student’s t test can be one sample or two sample. It seems reasonable to
assume that most of these tests are two sample because they are comparing a
wild-type phenotype to that of Hq mice. However, this should be stated clearly
in the methods section in order to increase the reproducibility of the data.
The vagueness of the statistical analysis makes the statistics and the results
difficult to interpret.
Finally,
there is no discussion of power calculations, and the sample size that should
have been used in order to detect a difference in the two sets of animals.
Because the sample sizes are fairly low, (n=3-5 in 2-4 independent experiments,
depending on the figure), perhaps any differences in the ROS production, the
cell number, etc. would not have been able to be detected by the experiments as
they are written.
In
conclusion, this paper lacks a clear statistical design and leaves the reader
in the dark for much of the statistical analyses. There is little to comment on
the validity of the tests because the tests used are not explicitly stated.
Journals should try to have a statistician review each manuscript before
publication to ascertain that there is at least significant detail to draw
conclusions on the appropriateness of the test used, besides scientific peer
review for the the validity of the experimentation and the data itself.
I agree with your point that Student's t tests seem to be the catch-all test for scientists that don't understand statistics or what tests that they should use for their data. It surprises me how little statistics are in these papers that are getting published. You would think that journal editors or reviewers would be more strict about this considering that they would want to avoid possible retractions. Hopefully in the future, journals will require more rigorous statistical testing before accepting papers.
ReplyDeleteAs someone who also reads a lot of immunology papers, something that catches my eye a lot is the broad definitions of "upregulation" and "downregulation" of biomarkers. A lot of the cutoffs for declaring a cell "positive" or "upregulated" have no rationale associated with them, and appear to be completely arbitrary or based on the data post-hoc, which introduces another level of bias to their conclusions.
ReplyDeleteI have never seen a power analysis explained in any of the papers I've read as justification for animal numbers. The first time I heard about it was in a class on how to write a grant proposal, actually. I agree that including it in their paper would be useful.