In 2007, Toscano, et al. published the article “Differential glycosylation of TH1, TH2 and TH-17 effector cells selectively regulates susceptibility to cell death” in Nature Immunology. This study reported that some T helper cell subsets (Th1 and Th17) were susceptible to galectin-1-mediated anti-inflammatory regulation and others were resistant (Th2) due to differing surface glycosylation patterns. The article contains eight multi-pane figures, one table, and six supplementary figures/tables and utilizes at least nine different experimental procedures, and the authors report that their statistical testing consisted entirely of Mann-Whitney U-tests.
The Mann-Whitney U-test is a non-parametric analysis that tests the null hypothesis that values from two groups derive from the same population. It is only appropriate for statistical comparisons of two groups in which all values are independent, and technically, it is best applied when the values do not conform to a normal distribution. Even ignoring the last technicality, there are very few instances in this paper in which the Mann-Whitney U-test was correctly applied. The most fundamental statistical errors are outlined below.
More than two groups compared
The authors performed experiments comparing properties of three different groups of T cells, a design in which a one-way ANOVA would have been an appropriate test, but they only indicate significant comparisons between two of the three groups, suggesting that they either ignored one group in statistical testing or that they performed multiple Mann-Whitney tests within each three-group experiment(Fig. 1, 2, 3b-e, 4b, 6b, 6d, Sup3, Sup5b, Sup6).
Comparisons of groups with more than one explanatory variable
Several experiments compare a variable in these three groups of cells over time, with increasing dose, or under three different treatments, conditions that require a two-way ANOVA(Fig. 1b-e, 3d, 3e, 4, 5a, 5b, 6c, 7c, Sup4). The images below depict perhaps the most egregious examples of this.
Non-independent samplesEach experiment with human cells was conducted using a sample from a single human donor split into three groups. Clearly then, the cells in each group are not independent and require a statistical test that accounts for repeated measures to be appropriately analyzed. Similarly, the authors frequently state that their reported data represent the mean of several experiment replicates. In these cases, each replicate would need to be considered paired for the purposes of statistical analyses, and the Mann-Whitney test is once again inappropriate (all figures).