Unbiased Research: Use Prism sample data to understand model fitting

Thursday, March 31, 2016

Use Prism sample data to understand model fitting

Having trouble fitting a model to your data? Don’t worry because Prism is there to help. The program makes it easy to walk you through the basics of model fitting, and even includes sample data sets for you to practice on. Prism offers the following analyses that can fit lines and curves:

• Linear regression
• Deming linear regression (use when both X and Y variables are subject to error)
• Nonlinear regression
• Spline and Lowess (for curve fitting without selecting a model)
• Interpolating from a standard curve

For simplicity’s sake I’ll walk you through how to take advantage of linear and nonlinear regression model fitting using Prism.

Linear Regression

Linear regression is used when you can describe the data using the equation y=mx+b. Knowing x, you can predict y using the goodness-of-fit line predicted by Prism. To start you can use the sample data Linear Regression – predicting the slope. You should get a data table that looks like this:

Minutes	Control			Treated
1.0	34.	29.	28.	31.	29.	44.
2.0	38.	49.	53.	61.		89.
3.0	57.		55.	78.	99.	77.
4.0	65.	65.	50.	93.	111.	109.
5.0	76.	91.	84.		109.	141.
6.0	79.	93.	98.	134.	145.	129.
7.0	100.	107.	89.	156.	134.	167.
8.0	105.	123.	119.	167.		180.
9.0	121.	143.	134.	178.	192.	175.
10.0	135.	156.		198.	203.	234.

Next it is time to analyze the data: click on the analyze button and then select linear regression.

	Control	Treated
Best-fit values
Slope	12.42 ± 0.5679	17.96 ± 0.8684
Y-intercept when X=0.0	17.42 ± 3.471	28.48 ± 5.446
X-intercept when Y=0.0	-1.402	-1.585
1/slope	0.08050	0.05568
95% Confidence Intervals
Slope	11.26 to 13.59	16.17 to 19.75
Y-intercept when X=0.0	10.28 to 24.55	17.26 to 39.69
X-intercept when Y=0.0	-2.163 to -0.7630	-2.432 to -0.8818
Goodness of Fit
R square	0.9485	0.9448
Sy.x	8.440	13.13
Is slope significantly non-zero?
F	478.5	427.8
DFn, DFd	1.000, 26.00	1.000, 25.00
P value	< 0.0001	< 0.0001
Deviation from zero?	Significant	Significant
Data
Number of X values	10	10
Maximum number of Y replicates	3	3
Total number of values	28	27
Number of missing values	2	3

When determining whether or not the model fits your data, take a look at the R² value. This is called the coefficient of determination and provides you with an idea of how well the best-fit line fits the data. The closer the R² value is to 1, the better the fit. In this example, we see that R² is equal to .9485 and .9448 for the control and treatment group respectfully. The high fit of this model is further more confirmed by the graph in which there is little deviation of the actual from the expected values of Y according to the model.

Nonlinear Regression

Nonlinear regression is used when the predicted relationship between x and y is not as simple as a linear line. These models use different functions to derive a goodness-of-fit line and may depend on multiple independent variables. We’ll use an enzyme kinetics model to see how nonlinear regression models can be used to fit your data. Use the sample data provided by Prism titles Enzyme Kinetics – Michaelis-Menten. The data table should look like this:

[Substrate]	Enzyme Activity
2.	265.	241.	195.
4.	521.	487.	505.
6.	662.	805.	754.
8.	885.	901.	898.
10.	884.	850.
12.	852.		914.
14.	932.	1110.	851.
16.	987.	954.	999.
18.	984.	961.	1105.
20.	954.	1021.	987.

When analyzing the data, use the Michaelis-Menton model. The results of the Prism analysis are as follows:

	Enzyme Activity
Michaelis-Menten
Best-fit values
Vmax	1353
Km	5.886
Std. Error
Vmax	75.93
Km	0.9498
95% Confidence Intervals
Vmax	1197 to 1509
Km	3.933 to 7.839
Goodness of Fit
Degrees of Freedom	26
R square	0.9041
Absolute Sum of Squares	170343
Sy.x	80.94
Constraints
Km	Km > 0.0
Number of points
Analyzed	28

Notice, again that the R² value is close to 1, and that when you look at the generated graph that the actual values deviate very little from the expected. To show you that this model fits the data best, let’s see what happens if we had tried to fit a linear regression model to the data.

Best-fit values
Slope	35.79 ± 4.456
Y-intercept when X=0.0	408.6 ± 55.70
X-intercept when Y=0.0	-11.42
1/slope	0.02794
95% Confidence Intervals
Slope	26.63 to 44.95
Y-intercept when X=0.0	294.1 to 523.1
X-intercept when Y=0.0	-19.32 to -6.649
Goodness of Fit
R square	0.7128
Sy.x	140.1
Is slope significantly non-zero?
F	64.53
DFn, DFd	1.000, 26.00
P value	< 0.0001
Deviation from zero?	Significant
Data
Number of X values	10
Maximum number of Y replicates	3
Total number of values	28
Number of missing values	2

As you can see, the R² value significantly decreases, and when you compare the linear to the nonlinear goodness of fit line, you see that there is much more deviation from the actual observed values from those that are predicted. That is why fitting the data to the model is so important, because if we use a model with poor fitness we are unlikely to make accurate predictions about the dependent variables from independent variables.

A more thorough walk through, as well as model fitting examples using the other analyzes provided by Prism, click here.

1 comment:

UnknownApril 6, 2016 at 1:02 PM
Julia, this walkthrough really helped me understand model fitting on a whole new level. I think it really highlights reasons behind why a model must fit well to provide an accurate representation of dependent and independent variables. The topic of linear versus non-linear regression is one I have had trouble grasping in the past - putting things into context with the "y=mx+b" equation really helped me connect the dots. I finally feel confident in my ability to fit models to primary data fro accurate analysis!
ReplyDelete
Replies

Add comment