When it comes to fitting a model to your data, it is
important to remember that just because you have a good R

^{2}value (~1.0), that doesn’t mean that your model is the best at estimating your actual population.
Looking at similar data sets and studies can help you
determine if your model and R

^{2}are consistent with what has been done in your field. Jim Frost discusses how physical processes tend to be predictable, have low variation, and a high R^{2}value. However, if you are a psychologist and study human behavior, which is highly variable, a high R^{2}value could indicate that your model, and therefore your sample size, does not best represent the population data.
Frost continues by addressing a few possibilities
that may explain a high R

^{2}value. For example, an R^{2}value is already biased because it is based on your sample data. Another reason that your R^{2 }may be too high is that you’re trying to fit too many models to your data just to find the perfect fit. As we’ve discussed in class, it is very important to pick your statistical model before you perform any experiments.
A very important problem
encountered with fitting a model to data is the issue of “overfitting”. This
means that your model is too complicated for your data set. If you think about,
you can probably force any equation to perfectly fit your data. But remember,
your data are collected from a sample that is generally meant to represent a
population. An overly complicated model may not accurately predict future data
points. Overfitting can cause your statistical values to be misleading. Having
a large sample size can help overcome this problem and allow for better
modeling of complex parameters. Remember, as Frost said in an accompanying
article:

“The more you want to learn, the larger your sample size must be.”

I haven’t performed any
experiments in my laboratory that require fitting a model to data, so I would
be interested in knowing if any of you have come across these issues. Do you
come across R

^{2}values that are uncommon for your field? Is overfitting a common issue in your lab?