Friday, April 29, 2016

Non-parametric ANOVA: The Kruskal-Wallis test

Although we didn't have time to cover it in class, I wanted to briefly introduce the Kruskal-Wallis test, which is the non-parametric equivalent of a one-way ANOVA. I found this test while working on the BadStats assignment, where in I found a paper that tried to analyze survey data based on a 10-point scale. The paper totally butchered their analysis (trying to use parametric tests on non-parametric data), but I found that they should have used the Kruskal-Wallis test instead.

So, from a practical standpoint, I want to walk through how to use the Kruskal-Wallis test on Prism. Fortunately, Prism is a pretty fantastic software package that makes a lot of decisions for you when setting up this test. For this example, I made up a totally hypothetical experiment where we want to test how graduate student happiness varies between different years of students. Let's pretend we interviewed 20 students each from their 1st year, 3rd year, and 5th year of grad school and asked them to rate their happiness on a scale of 1 to 10, with 1 being "grad school has crushed my soul" and 10 being "I never want to leave grad school, this is amazing". Below are the survey results in a column table in Prism

Next, select Analyze --> Column Analysis -->One-way ANOVA (and nonparametric)

This data is not paired, so choose "No matching or pairing". And then below, do not assume a Gaussian distribution (this is nonparametric data and it totally doesn't follow a normal distribution), so choose the "No. Use nonparametric test" option (really, it's that easy to switch from parametric to nonparametric tests on Prism!). As you can see, Prism suggests the Kruskal-Wallis test, which is exactly what we're looking for.



If you want to make multiple comparisons, select that menu from the tabs at the top. In this scenario, we want to compare the mean ranks of each column to the others. You also have the option of making select comparisons or no multiple comparisons at all, depending on your experiment.


In the options menu, I chose to correct for multiple comparisons with the Dunn's test (which is the proper choice for planned multiple comparisons with the Kruskal-Wallis test). If you don't want to make corrections for multiple comparisons, choose the Fisher's LSD test  (third option down). I also chose to report the multiplicity-adjusted p-values since we are making multiple comparisons and don' want to p-hack the established alpha level of 0.05 for the whole experiment.

Here's the Kruskal-Wallis and multiple comparison results (also note that I graph the RANKS and not the raw survey data, as is proper for non-parametric data):
As you can see, these hypothetical grad students show significant variance in the ranks of their happiness levels between years, showing that this hypothetical grad school is a soul-crushing machine that increases its effects over time. Bless you, imaginary 5th years, I hope you escape soon.

Hopefully this was at least slightly helpful in demonstrating how to set up a Kruskal-Wallis test in Prism. Here's to hoping that scientists will actually use non-parametric tests on ordered and non-Gaussian data!

3 comments:

  1. Thanks for this Emily! Between the information in your previous post that has a flow chart of when to use which non-parametric test, and this example of setting up a non-parametric analysis, I think I have a better understanding of when to use which test. I know I will be dealing with non-parametric data in the near future, but I haven't been as comfortable with properly designing an experiment until now. This definitely helped.

    ReplyDelete
  2. Great post! I'd be willing to bet that this test is used frequently in research related to pain. While biomarkers of pain transmission can be explicitly measured on a continuous scale, measurements of pain sensation are much more subjective, and pain sensation data is inherently non-Gaussian. It's good to know that there are statistical tools available to scientists that do not have the privilege of working exclusively with more "conventional" data types (i.e. explicit, accurate values on a continuous scale).

    ReplyDelete
  3. Great post! I'd be willing to bet that this test is used frequently in research related to pain. While biomarkers of pain transmission can be explicitly measured on a continuous scale, measurements of pain sensation are much more subjective, and pain sensation data is inherently non-Gaussian. It's good to know that there are statistical tools available to scientists that do not have the privilege of working exclusively with more "conventional" data types (i.e. explicit, accurate values on a continuous scale).

    ReplyDelete