When it comes to fitting models in data, we need to be
careful avoiding fancy mistakes. The regression functions can get pretty complicated,
which matches any of our wish for letting
the data be more explanatory. At this point, we could over-interpret the
data, and neglect that we can replace the models with other analysis that make
more sense in our research context.
I just finished my honor thesis and I had to persuade myself
for not playing around what I have learn from this advanced statistic course
for an undergraduate student. The research that I worked on is about whether a
plant, GBL, can inhibit growth of a bacteria, ATCC6919. We are interested in
this bioactivity because it can be an alternative cure to infection caused by this
bacterium.
Extractions of this plant were made from 3 different parts of
the plant, leaves, branches, and seed, and two different extract solvent was
used: ethanol and water. The question that my data analysis need to answer is
not only whether the GBL extracts is active against ATCC6919 growth (%
inhibition> 50%), but also whether the tree parts and the extract solvent contribute
to effectiveness of the extracts.
The ATCC6919 culture was treated with GBL extracts at a range
of concentration. So the result of the antibacterial investigation will
generate many dose response curves, like the one shown below. Since the trend
of the plot is clear that at the percentage inhibition is higher at higher
extract dose, there is a pretty good chance that we can find a regression model
which fit most of the data well. I could build a “dose response regression
model” for the extracts. However, I recalled that we were interested in finding
the extracts that were active (% inhibition > 50%). Therefore, a regression
model could be a statistically perfect fit, but it is scientifically non-sense.
I have to discard the idea of nonlinear regression curve model.
Then I thought about, could I compare the difference in inhibition
result from the two extraction solvents by comparing the fit of the data to two
models. The best-fit slope of the regression line should be the differences
between two group means. Thus, I set the variable defines extraction method X,
and assigned X=1 arbitrarily to aqueous extracts and X=2 arbitrarily to
ethanolic extracts. Y axis was the percentage inhibition of the extracts at
same concentration. It would look like the linear regression graph shown below.
However, if so, I neglected the other factor, which is the tree parts, which
can also contribute to the difference in inhibition. I could meet a problem
opposite to over-fitting the data, which is over-simplify it.
If we replace the regression models for two-way ANOVA, it is
easier to see whether plant parts, or the extraction method, or the interaction
of them make inhibition of the extracts differ. If you want it to be more basic, multiple student t-tests would work together, too.
To sum up, when we try to fit the models to data, before
thinking about which certain type of regression model fit better, check if
other (and simpler) method fits more.
It is important to determine what statistical test you are going to run before you begin the experiment. This is related to the experimental set-up that you plan to use. I think you're confusing "fitting a model to data" and picking a statistical model.
ReplyDeleteI'm not sure, but agree that using an ANOVA will help you to determine differences across groups (ex: seed to leaf, [conc]1 to [conc]2, etc).
(Also, ethanol will inhibit growth on its own, so make sure you're running the right controls i.e.: ethanol only).
Huang, that isn't scientific nonsense at all. Your extracts have widely different potencies, which is yuuuuge! Pharmacologists like me are big fans of different potencies.
ReplyDeleteTransforming data to a common ceiling (100% inhibition) washed away your ability to compare the maximal effects, which apparently interest you more. Go back to the original data...don't do the %max transformation (you must have missed the class where I warned against that....)
It is good to know that the different potencies could be interesting.
DeleteBut I think I did not transform my data here, since I made the graph straight from plotting extract concentration vs. percentage inhibition.
Because we were using the OD read to estimate turbidity of bacteria culture before and after treatment, the percentage inhibition can be 100% if there is no change in OD read. Please correct me if you think I did not spot where I transformed my data.
Huang, that isn't scientific nonsense at all. Your extracts have widely different potencies, which is yuuuuge! Pharmacologists like me are big fans of different potencies.
ReplyDeleteTransforming data to a common ceiling (100% inhibition) washed away your ability to compare the maximal effects, which apparently interest you more. Go back to the original data...don't do the %max transformation (you must have missed the class where I warned against that....)