Thursday, March 31, 2016

Use Prism sample data to understand model fitting

Having trouble fitting a model to your data? Don’t worry because Prism is there to help. The program makes it easy to walk you through the basics of model fitting, and even includes sample data sets for you to practice on. Prism offers the following analyses that can fit lines and curves:
  • Linear regression
  • Deming linear regression (use when both X and Y variables are subject to error)
  • Nonlinear regression
  • Spline and Lowess (for curve fitting without selecting a model)
  • Interpolating from a standard curve

For simplicity’s sake I’ll walk you through how to take advantage of linear and nonlinear regression model fitting using Prism. 

Linear Regression
Linear regression is used when you can describe the data using the equation y=mx+b. Knowing x, you can predict y using the goodness-of-fit line predicted by Prism. To start you can use the sample data Linear Regression – predicting the slope. You should get a data table that looks like this:


MinutesControlTreated
1.034.29.28.31.29.44.
2.038.49.53.61.89.
3.057.55.78.99.77.
4.065.65.50.93.111.109.
5.076.91.84.109.141.
6.079.93.98.134.145.129.
7.0100.107.89.156.134.167.
8.0105.123.119.167.180.
9.0121.143.134.178.192.175.
10.0135.156.198.203.234.

Next it is time to analyze the data: click on the analyze button and then select linear regression.

ControlTreated
Best-fit values
Slope12.42 ± 0.567917.96 ± 0.8684
Y-intercept when X=0.017.42 ± 3.47128.48 ± 5.446
X-intercept when Y=0.0-1.402-1.585
1/slope0.080500.05568
95% Confidence Intervals
Slope11.26 to 13.5916.17 to 19.75
Y-intercept when X=0.010.28 to 24.5517.26 to 39.69
X-intercept when Y=0.0-2.163 to -0.7630-2.432 to -0.8818
Goodness of Fit
R square0.94850.9448
Sy.x8.44013.13
Is slope significantly non-zero?
F478.5427.8
DFn, DFd1.000, 26.001.000, 25.00
P value< 0.0001< 0.0001
Deviation from zero?SignificantSignificant
Data
Number of X values1010
Maximum number of Y replicates33
Total number of values2827
Number of missing values23







When determining whether or not the model fits your data, take a look at the R2 value. This is called the coefficient of determination and provides you with an idea of how well the best-fit line fits the data. The closer the R2 value is to 1, the better the fit.  In this example, we see that R2 is equal to .9485 and .9448 for the control and treatment group respectfully. The high fit of this model is further more confirmed by the graph in which there is little deviation of the actual from the expected values of Y according to the model.

Nonlinear Regression

Nonlinear regression is used when the predicted relationship between x and y is not as simple as a linear line. These models use different functions to derive a goodness-of-fit line and may depend on multiple independent variables. We’ll use an enzyme kinetics model to see how nonlinear regression models can be used to fit your data. Use the sample data provided by Prism titles Enzyme Kinetics – Michaelis-Menten. The data table should look like this:


[Substrate]Enzyme Activity
2.265.241.195.
4.521.487.505.
6.662.805.754.
8.885.901.898.
10.884.850.
12.852.914.
14.932.1110.851.
16.987.954.999.
18.984.961.1105.
20.954.1021.987.


When analyzing the data, use the Michaelis-Menton model. The results of the Prism analysis are as follows:



Enzyme Activity
Michaelis-Menten
Best-fit values
Vmax1353
Km5.886
Std. Error
Vmax75.93
Km0.9498
95% Confidence Intervals
Vmax1197 to 1509
Km3.933 to 7.839
Goodness of Fit
Degrees of Freedom26
R square0.9041
Absolute Sum of Squares170343
Sy.x80.94
Constraints
KmKm > 0.0
Number of points
Analyzed28




Notice, again that the R2 value is close to 1, and that when you look at the generated graph that the actual values deviate very little from the expected. To show you that this model fits the data best, let’s see what happens if we had tried to fit a linear regression model to the data.

Best-fit values
Slope35.79 ± 4.456
Y-intercept when X=0.0408.6 ± 55.70
X-intercept when Y=0.0-11.42
1/slope0.02794
95% Confidence Intervals
Slope26.63 to 44.95
Y-intercept when X=0.0294.1 to 523.1
X-intercept when Y=0.0-19.32 to -6.649
Goodness of Fit
R square0.7128
Sy.x140.1
Is slope significantly non-zero?
F64.53
DFn, DFd1.000, 26.00
P value< 0.0001
Deviation from zero?Significant
Data
Number of X values10
Maximum number of Y replicates3
Total number of values28
Number of missing values2

As you can see, the R2 value significantly decreases, and when you compare the linear to the nonlinear goodness of fit line, you see that there is much more deviation from the actual observed values from those that are predicted. That is why fitting the data to the model is so important, because if we use a model with poor fitness we are unlikely to make accurate predictions about the dependent variables from independent variables.

A more thorough walk through, as well as model fitting examples using the other analyzes provided by Prism, click here

1 comment:

  1. Julia, this walkthrough really helped me understand model fitting on a whole new level. I think it really highlights reasons behind why a model must fit well to provide an accurate representation of dependent and independent variables. The topic of linear versus non-linear regression is one I have had trouble grasping in the past - putting things into context with the "y=mx+b" equation really helped me connect the dots. I finally feel confident in my ability to fit models to primary data fro accurate analysis!

    ReplyDelete