Bayes Theorem – remember that mentioned way, way back in Lecture
2? No, it isn’t some new age way of predicting who you’ll be romantically
involved with this winter, but there is a field of inference that comes from
this theorem that plays into Bayesian statistics, and that is a subfield of
statistics many scientists should be paying attention to.
Up until this point, we’ve basically been learning more frequentist
statistics than Bayesian statistics (i.e., heavy on the linear regression,
chi-squares, correlations, less so on multiple comparisons, etc.). This is
evident by our HistoryStats projects: we’ve been looking at the lives and work
of some of the founders of frequentists’ school of thought like Neyman, Pearson,
and Wald. How do we best describe these frequentist statisticians? Well, let’s take a
simple, intuitive analogy described by this
StackExchange forum. According to “user28” having a frequentist frame of
mind is like hearing the phone go off, referring to a model upon which helps
you identify the area of your home that phone is going off to make the
inference on where the phone is. Having a Bayesian frame of mind means you may
have that model in mind, but you also take into account places where you’ve
mistakenly left the phone in the past. Simply, frequentists believe that data
is a frequency, or a repeatable random sample, while Bayesians believe that
data is observed from a real sample. Furthermore, frequentists believe that
parameters are fixed, whereas Bayesians believe the parameters to be unknown
but can be described by probabilities. (So…that would make Fisher’s maximum
likelihoods a closet Bayesian statistic, wouldn’t it?).
TJ gave us some great examples of Bayes Theorem applied to
real life, like the probabilities in clinical trials with cancer treatments. However,
we never really got to see how Bayesian inference affects the experiment’s statistics
and experimental design.
To understand the experimental design, we need to understand
exactly how experimental design is updated or modified by Bayes Theorem,
generally. Let’s say you are going to flip a coin 10 times and you suspect a probability
distribution to describe these coin flips. Therefore, h would represent the
probability of heads, and p(h) would represent the distribution settled
on prior to any coin flips. Then the coin is flipped and way more heads come up
than usual, say 8 heads. By using Bayesian inference, we need to update our
prior belief about the coin – it’s now unfair. So our new beliefs may be
modeled like p(h|f)
where f is
the number of heads experienced in those 10 flips. This abstraction is read as “what
is the probability distribution of heads given the number of heads resulting
from 10 tosses [in this case 8]?” This seems like a reasonable update as we
pare down our hypotheses to fit our experimental data. Mathematically, the
update would look like p(h|f)
= u(h, f) x
p(h) where u(h,f)
is an updating factor written out as u(h, f) = (l(f|h))/l(f) where
l(f|h)
is a likelihood function or the probability we observed 8 heads given the
parameters we modeled in the beginning. The denominator of the updating factor
is just the likelihood of the data under no conditions. Because Bayesian
statistics doesn’t believe parameters are fixed, they can have conditions added
to them. Therefore, the likelihood of the data can be written as an integral l(f) = ∫l(f|h)p(h)dh (this is similar to a general
expectation value). The denominator turns out to be a weighted average of
likelihoods across all possible parameters. Or simply, a ratio that is able to
tell you what parameter values are most likely.
Darth Vader: crafty with a lightsaber and some conditional probabilities. |
How does this play out in the lab? Let’s take a hypothetical
animal trial where dose concentrations of many drugs are tested on large
amounts of animals to test their potencies. The lab wants to apply regression
analyses to the different drugs based on the specimen they inject the drug
into. For experimentation of one drug, the experimental design included six
equally spaced doses given to ten mice each; so, 60 animals to test a range of
concentrations for one drug. The investigators measured the number of surviving
mice one week after drug administration. It turns out that about 90% of mice
died at high concentrations of the drug, while 10-20% died at low
concentrations of the drug. After each of the experiments, maximum likelihood
estimations were used to estimate an LD50 value (or the dose at which the
probability of mice dying is 50%). As it turns out, the investigators used
results from the first few sets of experiments to predict a distribution for
following experiments, in anticipation of constructing an updating factor, as
described above. In total, if 50 drugs are tested with similar experimental
design, the investigators can use these 50 LD50 values as a sample from a
distribution of LD50 values.
Overall, these Bayesian inferences and the statistics are
mathematically rooted in Bayes Theorem. This theorem relies on conditional
probability. These conditional probabilities make the system easy to update and
a noteworthy design for scientists to consider -- because writing grant
proposals on frequentist assumptions can be dangerous when we try to predict a
model for data without any prior knowledge of the system.
No comments:
Post a Comment