I’d like to raise a question that’s occurred to me several
times during this class. Many of the problem sets we’ve been given have
required the use of a one-tailed t-test, and TJ has been pretty dismissive of
two-tailed t-tests, saying that they’re used by people who aren’t willing or
haven’t collected enough preliminary data to make a real hypothesis. Before
this class, however, I was taught that two-tailed tests were preferable and
that one-tailed tests were usually used inappropriately for, as TJ would call
it, “p-hacking.”
I can see merit in both arguments. Typically, we have a
pretty good idea of what we expect to happen in our experiments, and we’ve
often done enough preliminary tests to formulate a hypothesis that does predict
a change in only one direction. In that case, why not get as much rigor as we
can from our statistical test and devote all the alpha allowance to testing our
exact prediction? However, in my years of research, I’ve rarely been in a
situation where I only cared about a change in one direction. If you’re testing
a drug, you certainly want to know if it improves your disease condition, but
everyone who might ever be affected by that drug hopes that you’re also making
sure it doesn’t increase the severity of the disease. In my dissertation
research, I’ve often predicted that knocking out an anti-inflammatory regulator
in the gut will increase expression of pro-inflammatory mediators and decrease
levels of barrier-promoting proteins, but I’ve frequently seen that what really
happens is that the protein level of a pro-inflammatory cytokine increases, but
its mRNA levels decrease because of negative feedback, or that the expression
levels of a barrier protein increase because it’s being degraded more rapidly.
If I tried to do statistical tests that only looked for change in one
direction, I’d miss these realities even though my fundamental hypothesis about
increased inflammation was correct.
If you decide to do a one-tailed test and see a change in
the opposite direction, not only do you get no statistical info about it, but
technically, if you want to evaluate whether that change is significant, you
have to go back and repeat the study again with a different statistical plan.
Now in reality, no one’s going to do that; they’re just going to go back and
run a different statistical test on the same data in hopes of getting a
significant result. At that point, you’re changing your stats for a p-value, no
way around it. It’s even worse if you then repeat the test with a one-tailed in
the opposite direction, because then you’re essentially performing a two-tailed
test with an alpha of .1. If you did decide to go by the book and repeat the
experiment planning for a two-tailed or a different one-tailed test, then at
least you’re wasting time, lab resources, and taxpayer/donor money, and
frequently also animal lives or human samples, which raises an ethical dilemma.
It seems to me that there are very few times when it would
be appropriate to do a one-tailed test (for instance, you’re transfecting with
an overexpression plasmid; you only care if there’s an increase in expression
of your target, because if not, you’re going to repeat the transfection), and
the rest of the time, you should just power your studies appropriately for a
two-tailed t-test rather than risking missing a potentially important
biological result.
I'm not dismissive of one tail or the other. I say pick the tail that makes the most sense, scientifically.
ReplyDeleteI think you raise an interesting point in this post. I was also taught that your "default" should be two-tailed in order to cover all your bases, so to speak. There are many things in our field especially that are so intertwined that your expected result is often not the result that you see in your data. I have always been taught to let the data speak for itself, and choosing a one-tailed test without absolute certainty (i.e. previous experimental data and a superior knowledge of the system), you are giving the data a microphone and and speech to read. I think that in well-defined systems and perhaps repeat experiments, it's great to use a one-tailed test and can clearly boost your analysis. However, in a system that is highly complicated or a really wide open question, even if you have a hypothesis that is supported by some kind of previous observation from your lab or others, it sounds like it would be wise to use a two-tailed test.
ReplyDeleteFrom my very limited experience I think that the decision to do a one tail vs 2 tail test depends on how much confidence you have in your initial hypothesis will be correct. There's always going to be some amount of 'unknown' quantity (otherwise why do the experiment), which increases the less data that is out their to support the hypothesis. However, if you can make a compelling argument based off of some previous data then I think going with a one tailed test is alright in some instances.
ReplyDeleteFrom my very limited experience I think that the decision to do a one tail vs 2 tail test depends on how much confidence you have in your initial hypothesis will be correct. There's always going to be some amount of 'unknown' quantity (otherwise why do the experiment), which increases the less data that is out their to support the hypothesis. However, if you can make a compelling argument based off of some previous data then I think going with a one tailed test is alright in some instances.
ReplyDeleteI, also, agree with your past experiences. It seems that any time I have done statistical analysis in the past, it has been a two-tailed test. I can imagine this is because labs are always looking to get the most out of their data and, as you descirbed, not waste time, money, animla lives, or precious samples. Science is theoretically there only to increase our knowledge on a particular subject, and so I see no harm in using statistical methods that will "kill two birds with one stone," so to speak. If anything, its almost a shame to not look at both sides of possible responses every time. That being said, this position sightly lends itself to the discussion of publication of "negative" data... while it is highly important to know that something increases (or decreases) an effect, it is just as important to know that there is no effect.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI would say the test you choose really depends on the question you ask. For example, there are three types of clinical trial: The superiority trial, the non-inferiority trial, and the equivalence trial. For the first two, the hypothesis is one-sided, because you expect the new drug brings the patients a better / not-worse outcome (either longer survival or lower mortality) than the standard treatment or placebo. For the last one, the hypothesis is two-sided, because you expect the new drug gives the same effectiveness as the standard treatment.
ReplyDeleteAll the hypothesis testing should be started from the hypotheses. Once you formulate the statistical hypothesis, the test is accompanied. From a very practical view, I would say, generally, for a pilot study, a two-sided test is appropriate if we have no idea what the direction of change will be. For a confirmation study, which is usually established on the previous works, a more clear statistical hypothesis is available, and your test should follow the hypothesis you propose.