The American Statistical Association has released a statement on P Values. If you haven't read it, you should.
There really isn't anything new here.
I've seen a few headlines already that have clearly misinterpreted the statement. Such as (I paraphrase), "ASA Declares the P Value Dead." Coupled with some variation of a dance around the carcass meme.
We're still sort of left wondering. How, when and why did the train jump the tracks?
Maybe it's too many people doing the scientific process and too few of them doing it grounded in a scientific philosophy. In other words, this one is on the scientists, not the statisticians.
Thanks for the link. I will definitely share it with my p value centric lab mates and contemporaries. Though there may be nothing new here, I think it was important that ASA made the statement. There are a couple of points bouncing around my limited synapses here.
ReplyDeleteFirst, I disagree. Many parties are culpable, not just the scientists. True, ultimately it is the scientist (or science impersonator, which role I have played) who performs a poorly designed study, hacks his or her way to p < 0.05, and presses the ‘submit manuscript’ button; and the reviewer scientist (or impersonator, again) who offers to “accept the manuscript” for such marginally performed work. But education and training surely bear some of the responsibility here (let’s assume the education role and scientist role are separate even though they are often done by the same people). It makes me to reflect on the Shiny New Toys post. Therein, Kelsey remarks on her observations from teaching a chi squared test to Emory intro biolab students. Paraphrasing here, but the students went gaga when finding a p<0.05, without considering much beyond that. My comment on her observation was, “Don’t sweat it. You got them to think about their data quantitatively and methods to analyze it. They’ll learn better experimental design and statistical based testing as they go through their education.” I think I was too cavalier in this response. Instead of passing the responsibility to the next class, or instructor, or mentor, I think exposing students early to the ASA statement on p-values (and similar cautionary rules and expectations) might be appropriate, so we at least give a foundation, rooted in the opinion of expert statisticians, that p is the not the be all, end all ascribed by many of us.
Now for some real liberal latitude here: akin to the argument that gun makers and distributors bear some culpability for some nut job going ape with his/her stash of weapons, should not makers of Prism, or Sigma Stat, or your favorite stat package bear some culpability for the over use of p<0.05? Their software auto cranks out asterisks and “yes” when the magical threshold is hit. Mix one poorly trained scientist, stat package du jour, an RFA deadline, and a dash of salt, and voila! “Look! p<0.05. Prism said so!”
Murph asks, when, where, and why did the train jump the tracks? The train being p<0.05 as indicative of a real effect, and the tracks being the acknowledgement that this is all that is needed to get stuff published. As for the ASA, it seems like this has been building for quite some time. Why? Dunno. Could money have played a role?
As for science at the bench, the train seems to be chugging along. Slowing the train will require better education and a shift in norms on the part of reviewers and study sections. It seems unlikely scientists will change unless expectations do.
First, I'd like to address TJ's question about "when the trains jumped the tracks," by which -- I inferred -- he meant, "when did the null hypothesis testing go awry for testing scientific hypotheses?" It seems the ASA mentions this in their preamble. It's a historical problem that has been discussed for "decades," and now the ASA needs to run its own version of a PSA to really get these scientists in check, I suppose. I chalk a good portion of this up to intellectual laziness or the "publish or perish" culture science has cultivated; scientists are more concerned with getting published and securing their next grant than rigorously monitoring their statistical methods. It's a shame if that is truly the case.
ReplyDeleteAs for the statement itself, there was a specific sentence that stood out to me: " Researchers should bring many contextual factors into play to derive scientific inferences, including the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis." This is something TJ talked about all the time in our course, especially the importance of design correlating with statistical modeling. It seems like a self-evident statement to make, but perhaps some scientists are too caught up in pseudo-mathematical traditions to be so discerning about the statistical models they choose. It's all about time and efficiency, as I stated above.
One of the principles that stood out to me most in the ASA's statement was Principle #4. This statement is weird to me because I hear this from scientists all the time: "don't fiddle with your data until you get a significant p-value." It appears as though the ASA and the scientists I know are saying the same things. But if we again look at the cultural expectations scientists operate under versus the ideal environment the ASA presumes, we have a different story. Scientists are put under strain by journals to only publish positive results. Negative results are garbage to journals who want big discovery headlines. Therefore, any statistical test that will reflect a favorable p-value will be embraced -- no matter how it is attained. This is an unfortunate practice, but it signals another need for cultural change in the scientific workplace. I suppose, in a way, culpability can be weighed onto the journal publishers and editors in this case.
Which brings me to my final point upon engaging with my fellow students. To reflect on Vincent's comment, I think Vincent is treading a very shaky moral boundary. Software engineers shouldn't be liable for the misuse of their software. In fact, much of Prism is engineered to make sure it isn't misused (see: startup wizard guides). Making that analogy to gun sales is logically tenuous at best; these are two terribly different subject. As for the body of your opinion, Vincent, I think you make a good point. You can't pass on the responsibility of education to the next class or professor. These parts of education could be the only statistical training these undergraduates get. So, in short, any critical analysis helps guard against statistical model misuse.