Thursday, April 14, 2016

One Test Does Not Fit All

In 2007, Toscano, et al. published the article “Differential glycosylation of TH1, TH2 and TH-17 effector cells selectively regulates susceptibility to cell death” in Nature Immunology. This study reported that some T helper cell subsets (Th1 and Th17) were susceptible to galectin-1-mediated anti-inflammatory regulation and others were resistant (Th2) due to differing surface glycosylation patterns. The article contains eight multi-pane figures, one table, and six supplementary figures/tables and utilizes at least nine different experimental procedures, and the authors report that their statistical testing consisted entirely of Mann-Whitney U-tests.

The Mann-Whitney U-test is a non-parametric analysis that tests the null hypothesis that values from two groups derive from the same population. It is only appropriate for statistical comparisons of two groups in which all values are independent, and technically, it is best applied when the values do not conform to a normal distribution. Even ignoring the last technicality, there are very few instances in this paper in which the Mann-Whitney U-test was correctly applied. The most fundamental statistical errors are outlined below.

More than two groups compared

The authors performed experiments comparing properties of three different groups of T cells, a design in which a one-way ANOVA would have been an appropriate test, but they only indicate significant comparisons between two of the three groups, suggesting that they either ignored one group in statistical testing or that they performed multiple Mann-Whitney tests within each three-group experiment(Fig. 1, 2, 3b-e, 4b, 6b, 6d, Sup3, Sup5b, Sup6).

Comparisons of groups with more than one explanatory variable

Several experiments compare a variable in these three groups of cells over time, with increasing dose, or under three different treatments, conditions that require a two-way ANOVA(Fig. 1b-e, 3d, 3e, 4, 5a, 5b, 6c, 7c, Sup4). The images below depict perhaps the most egregious examples of this.

Non-independent samples
Each experiment with human cells was conducted using a sample from a single human donor split into three groups. Clearly then, the cells in each group are not independent and require a statistical test that accounts for repeated measures to be appropriately analyzed. Similarly, the authors frequently state that their reported data represent the mean of several experiment replicates. In these cases, each replicate would need to be considered paired for the purposes of statistical analyses, and the Mann-Whitney test is once again inappropriate (all figures).

1 comment:

  1. I feel like this is one of those weird cases where the authors were told that they needed to do stats to get their paper published. I feel the authors just threw whatever statistical test showed significance and just ran with it. Personally, I am a little surprised that the reviewers didn't do a better job of monitoring what type of statistical methods were used in the publication, since this is Nature Immunology. But perhaps the reviewers just wanted stats and didn't care what kind. With reviewers not understanding how to properly use statistics, you have to wonder if stats will ever undergo the supposed "self-correction" that science also undergoes.