## Wednesday, April 6, 2016

### Finding the Right Fit

The idea behind fitting a model is to find the best-fit values of the parameters that define the model. According to Motulsky, a common mistake in statistical models is trying to find a “perfect model” or overfitting. The goal is of a model is not to perfectly describe the data perfectly, since it may have too many variables and parameters to be useful. Nonlinear regression models can fit any model that defines Y as a function of X, and of course as the name suggests, the relationship between Y and X can be curved.

In the example above we are provided with census data that shows the U.S. population in the left, and on the right we can see the different model fits ranging from exponential all the way to a sixth degree polynomial. Most models seem to fit the data, however if we extrapolate the best fit to predict future population values, the behavior of the sixth-degree polynomial fit beyond the data range makes it a poor choice for extrapolation and this fit can be rejected without a need to calculate goodness of fit (unless we are headed to complete and total annihilation).

In this last example, researchers set out to model the growth of the native Mexican turkey in order to estimate the maximum instant growth period in order to market and sell at optimal weight. Previous literature had shown that the Richards model (in red) was the most appropriate function to estimate growth curves in poultry, however this study suggests that in this particular case a 4th order polynomial (in blue) is better at estimating the maximum growth period.

Which model is right? and which one is wrong? This goes back to the issue of overfitting, a model with too few parameters won't fit the data well, too many and it will but the confidence intervals will be wide. If the goal of the model is to predict future values, a model with too many parameters won't do it well, and if the goal is to interpret values scientifically, the CIs will be too big. And finally as something that is only tangentially related to fitting models:

Sources: Census data polynomial curve fitting, Mexican Turkeys, Doge, and Motulsky.