TJ posted an article earlier this month about how the ASA issued a statement concerning the P value, saying that "statistical techniques for testing hypotheses.... have more flaws than Facebook's privacy policies."
I found out after some click-holing from article to article about statistics and the P value that Basic and Applied Social Psychology (BASP) has actually banned the P value since 2015, and more specifically, the null hypothesis significance testing procedure, or NHSTP. BASP even states that prior to publication, all "vestiges of the NHSTP (p-values, t-values, F-values, statements about 'significant' differences or lack thereof')" would have to be removed. And this basis arises from the fact that numbers are being generated where none exist, a problem that I feel is more specific to more qualitative fields such as psychology. But the BASP raised the same concerns as the ASA statement regarding the p value: that p < 0.05 is "too easy" to pass and sometimes "serves as an excuse for lower quality research." While the ASA's concern is mainly geared towards the people and scientists who perform the research (i.e. people are not properly trained to perform data analysis), it seems that BASP's concern arises from the nature of psychological research, stating that "banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP," even stating that it hopes other journals will follow suit.
I certainly understand where BASP is coming from - with a field where response/measured variables are more often qualitative than not, how does one effectively apply statistics to analyze whether or not an effect is real? What should we do about data generated in the "hard sciences" that are more qualitative, such as characterization of cell morphology? What about clinical data that measure subjective things such as level of pain on a scale of 1-10? Is there an existing statistical tool or procedure out there that everyone could agree would accurately measure "significance" without having to apply values or generate numbers to describe qualitative measurements?
Do you think we should abolish P-value significance testing for all research? Only psychology research? How about all qualitative vs. quantitative research?
I am somewhat torn on this topic. I do feel that P-value significance has caused a shift in the focus of research, as everyone is just trying to get "significant results". Additionally, it is a good argument that if a field is qualitative, why are we trying to make it quantitative. However, aren't fields like psychology still aiming to find a "difference"; and how can we truly claim a difference if p-value is not involved? I believe we should focus towards the non-parametric tests that utilize ranking systems in order to investigate these differences. Additionally, I think creating categories a priori can help to run statistical tests on qualitative measurements.
ReplyDeleteI do feel like this is an interesting stance that ASA and BASP has taken, and I think that other fields should start to also weigh the pros and cons of banning the P-value.
I, like Arielle, am a bit split on the topic. In the article that TJ shared, it is mentioned that they are not exactly arguing against the p-value, but that the p-value is extremely misused. They then go on to address some common misconceptions such as:
ReplyDelete"A p-value, or statistical significance, does not measure the size of an effect or the importance of a result."
I am constantly being reminded of this in my everyday research where I study phenotypic changes that seem to be important for a bacteria to infect the lung environment of a CF patient. It is very difficult to know if the effect I see in lab is anywhere near relevant to what I would see in the CF lung. We try to get around this by using differentiated bronchial epithelial cells or growing cultures supplemented with CF sputum, but in the end its nearly impossible to know.
I do argue that there is a time and a place for the p-value. But the idea that this is the most important number to focus on when thinking about whether or not an experiment/figure is important to include in a manuscript is wrong.
Therefore, like the BASP, I hope that all research scientists can move away from using "just a number" to understand significance and that reviewers and journals can critique a paper based on effect size, relevance, etc.
This is a really tough topic as there are solid arguments on either side of the table. Personally, I believe that the use of p-values to validate the significance of an experimental finding is entirely situationally based.
ReplyDeleteThat is to say, there are certain instances where the use of a p-value is vital (comparing the efficacy of a novel drug therapy to that of the current clinical standard - measuring a quantifiable result free of bias) whereas other situations do not warrant the use of a p-value as the quantified unit is subjective (think patient responses to "How do you feel today on a scale of 1-10?" after drug administration). In these cases, lets use the application of a generalized renal disease; a new novel drug thought to increase overall kidney function over that of supportive care. For this novel drug to be considered effective - kidney function as measured by BUN/CRE blood chemistry must decrease in response therapy induction. This instance provides a quantifiable standardized system of measurement free of any potential bias (the sample will provide identical results when run at multiple independent institutions).
For the latter scenario, measuring efficacy by asking the patient "How are you feeling today?" creates entirely subjective data that may or may not bear any relation to the treatment. In this instance the use of a p-value to determine drug efficacy is entirely incorrect. This could easily be shown by examination of responses between blinded placebo and experimental drug study groups.
This comment has been removed by the author.
ReplyDelete