In 2007, Toscano, et
al. published the article “Differential glycosylation of TH1, TH2 and TH-17
effector cells selectively regulates susceptibility to cell death” in Nature Immunology. This study reported
that some T helper cell subsets (Th1 and Th17) were susceptible to
galectin-1-mediated anti-inflammatory regulation and others were resistant
(Th2) due to differing surface glycosylation patterns. The article contains
eight multi-pane figures, one table, and six supplementary figures/tables and
utilizes at least nine different experimental procedures, and the authors
report that their statistical testing consisted entirely of Mann-Whitney
U-tests.
The Mann-Whitney U-test is a non-parametric analysis that tests
the null hypothesis that values from two groups derive from the same
population. It is only appropriate for statistical comparisons of two groups in
which all values are independent, and technically, it is best applied when the
values do not conform to a normal distribution. Even ignoring the last
technicality, there are very few instances in this paper in which the
Mann-Whitney U-test was correctly applied. The most fundamental statistical
errors are outlined below.
More than two
groups compared
The authors performed experiments comparing properties of three
different groups of T cells, a design in which a one-way ANOVA would have been
an appropriate test, but they only indicate significant comparisons between two
of the three groups, suggesting that they either ignored one group in
statistical testing or that they performed multiple Mann-Whitney tests within
each three-group experiment(Fig. 1, 2, 3b-e, 4b, 6b, 6d, Sup3, Sup5b, Sup6).
Comparisons of
groups with more than one explanatory variable
Several experiments compare a variable in these three groups
of cells over time, with increasing dose, or under three different treatments,
conditions that require a two-way ANOVA(Fig. 1b-e, 3d, 3e, 4, 5a, 5b, 6c, 7c, Sup4).
The images below depict perhaps the most egregious examples of this.
Non-independent
samples
Each experiment with human cells was conducted using
a sample from a single human donor split into three groups. Clearly then, the
cells in each group are not independent and require a statistical test that
accounts for repeated measures to be appropriately analyzed. Similarly, the
authors frequently state that their reported data represent the mean of several
experiment replicates. In these cases, each replicate would need to be
considered paired for the purposes of statistical analyses, and the
Mann-Whitney test is once again inappropriate (all figures).
I feel like this is one of those weird cases where the authors were told that they needed to do stats to get their paper published. I feel the authors just threw whatever statistical test showed significance and just ran with it. Personally, I am a little surprised that the reviewers didn't do a better job of monitoring what type of statistical methods were used in the publication, since this is Nature Immunology. But perhaps the reviewers just wanted stats and didn't care what kind. With reviewers not understanding how to properly use statistics, you have to wonder if stats will ever undergo the supposed "self-correction" that science also undergoes.
ReplyDelete