Tuesday, April 19, 2016

Lack of Statistical Forethought? Do a t-test!

Banerjee et al., “A Role for Apoptosis-Inducing Factor in T Cell Development.” 2012

In reading papers through a statistically critical lens, I have found that the most prolific error that is made, at least in immunologically based papers, is the lack of statistics. Many papers use statistics very little, if at all, and this is to say nothing of the validity of their choice and methodology of statistical tests. One such example of this type of BadStats is Banerjee, H, et al., 2012, where the authors show that Aif is critical for reducing levels of reactive oxygen species (ROS) during T cell development, and without Aif, T cells are arrested at the DN stage.
The statistical issue with this manuscript, is that there is a lack of planning and forethought into statistical analysis. This is evidenced first by the Materials and Methods section. The subtitle of “Statistical Analyses” says only “Data were analyzed using Student’s t  test.” This suggests to me that there was no statistical planning or study design that took place before the experimentation began. This is a gross error on the part of the authors, because any statistical findings/statistical analyses that take place during data analysis, should be pre-planned to avoid the introduction of biases, and without reading any more into the paper, it seems unlikely that all experiments in the entire paper would be appropriately analyzed using one test.
Additionally, by describing the statistical approach in so little detail and only using a Student’s t test for all of the experiments done in the paper, it suggests to me a clear lack of understanding of statistics. In my experience, the student’s t test is kind of a “catch-all” statistical test in order to put asterisks on your figures and make them look more important. The student’s t test is an often-used test, and for good reasons, but using only one statistical analysis throughout your entire paper suggests that it was thrown in at the last minute before submission of the paper.
Upon further dissection of this manuscript, it has also become clear that the authors do not state whether the t test used was paired or unpaired. All of the experiments in this paper were are not the same in terms of which test is appropriate. Matched bone marrow chimera experiments should use a paired t test, of which they present data in figure 2C. Most of the other experiments, in which they are comparing Hq mice to WT mice, should be unpaired.
Figure 2E shows the cell counts using four different markers from four different groups. This type of data is normally analyzed using an ANOVA test, but there is insufficient detail in the experimental methods to know for sure. If the authors would like to use a test that is not ANOVA, that rationale should be explicitly stated in the methods and justified accordingly.

Additionally, a Student’s t test can be one sample or two sample. It seems reasonable to assume that most of these tests are two sample because they are comparing a wild-type phenotype to that of Hq mice. However, this should be stated clearly in the methods section in order to increase the reproducibility of the data. The vagueness of the statistical analysis makes the statistics and the results difficult to interpret.
Finally, there is no discussion of power calculations, and the sample size that should have been used in order to detect a difference in the two sets of animals. Because the sample sizes are fairly low, (n=3-5 in 2-4 independent experiments, depending on the figure), perhaps any differences in the ROS production, the cell number, etc. would not have been able to be detected by the experiments as they are written.

In conclusion, this paper lacks a clear statistical design and leaves the reader in the dark for much of the statistical analyses. There is little to comment on the validity of the tests because the tests used are not explicitly stated. Journals should try to have a statistician review each manuscript before publication to ascertain that there is at least significant detail to draw conclusions on the appropriateness of the test used, besides scientific peer review for the the validity of the experimentation and the data itself.


  1. I agree with your point that Student's t tests seem to be the catch-all test for scientists that don't understand statistics or what tests that they should use for their data. It surprises me how little statistics are in these papers that are getting published. You would think that journal editors or reviewers would be more strict about this considering that they would want to avoid possible retractions. Hopefully in the future, journals will require more rigorous statistical testing before accepting papers.

  2. As someone who also reads a lot of immunology papers, something that catches my eye a lot is the broad definitions of "upregulation" and "downregulation" of biomarkers. A lot of the cutoffs for declaring a cell "positive" or "upregulated" have no rationale associated with them, and appear to be completely arbitrary or based on the data post-hoc, which introduces another level of bias to their conclusions.

    I have never seen a power analysis explained in any of the papers I've read as justification for animal numbers. The first time I heard about it was in a class on how to write a grant proposal, actually. I agree that including it in their paper would be useful.