Having trouble fitting a model to your data? Don’t worry because Prism is there to help. The program makes it easy to walk you through the basics of model fitting, and even includes sample data sets for you to practice on. Prism offers the following analyses that can fit lines and curves:
- • Linear regression
- • Deming linear regression (use when both X and Y variables are subject to error)
- • Nonlinear regression
- • Spline and Lowess (for curve fitting without selecting a model)
- • Interpolating from a standard curve
For simplicity’s sake I’ll walk you through how to take advantage of linear and nonlinear regression model fitting using Prism.
Linear Regression
Linear regression is used when you can describe the data using the equation y=mx+b. Knowing x, you can predict y using the goodness-of-fit line predicted by Prism. To start you can use the sample data Linear Regression – predicting the slope. You should get a data table that looks like this:
Minutes | Control | Treated | ||||
---|---|---|---|---|---|---|
1.0 | 34. | 29. | 28. | 31. | 29. | 44. |
2.0 | 38. | 49. | 53. | 61. | 89. | |
3.0 | 57. | 55. | 78. | 99. | 77. | |
4.0 | 65. | 65. | 50. | 93. | 111. | 109. |
5.0 | 76. | 91. | 84. | 109. | 141. | |
6.0 | 79. | 93. | 98. | 134. | 145. | 129. |
7.0 | 100. | 107. | 89. | 156. | 134. | 167. |
8.0 | 105. | 123. | 119. | 167. | 180. | |
9.0 | 121. | 143. | 134. | 178. | 192. | 175. |
10.0 | 135. | 156. | 198. | 203. | 234. |
Next it is time to analyze the data: click on the analyze button and then select linear regression.
Control | Treated | |
---|---|---|
Best-fit values | ||
Slope | 12.42 ± 0.5679 | 17.96 ± 0.8684 |
Y-intercept when X=0.0 | 17.42 ± 3.471 | 28.48 ± 5.446 |
X-intercept when Y=0.0 | -1.402 | -1.585 |
1/slope | 0.08050 | 0.05568 |
95% Confidence Intervals | ||
Slope | 11.26 to 13.59 | 16.17 to 19.75 |
Y-intercept when X=0.0 | 10.28 to 24.55 | 17.26 to 39.69 |
X-intercept when Y=0.0 | -2.163 to -0.7630 | -2.432 to -0.8818 |
Goodness of Fit | ||
R square | 0.9485 | 0.9448 |
Sy.x | 8.440 | 13.13 |
Is slope significantly non-zero? | ||
F | 478.5 | 427.8 |
DFn, DFd | 1.000, 26.00 | 1.000, 25.00 |
P value | < 0.0001 | < 0.0001 |
Deviation from zero? | Significant | Significant |
Data | ||
Number of X values | 10 | 10 |
Maximum number of Y replicates | 3 | 3 |
Total number of values | 28 | 27 |
Number of missing values | 2 | 3 |
When determining whether or not the model fits your data, take a look at the R2 value. This is called the coefficient of determination and provides you with an idea of how well the best-fit line fits the data. The closer the R2 value is to 1, the better the fit. In this example, we see that R2 is equal to .9485 and .9448 for the control and treatment group respectfully. The high fit of this model is further more confirmed by the graph in which there is little deviation of the actual from the expected values of Y according to the model.
Nonlinear Regression
Nonlinear regression is used when the predicted relationship between x and y is not as simple as a linear line. These models use different functions to derive a goodness-of-fit line and may depend on multiple independent variables. We’ll use an enzyme kinetics model to see how nonlinear regression models can be used to fit your data. Use the sample data provided by Prism titles Enzyme Kinetics – Michaelis-Menten. The data table should look like this:
[Substrate] | Enzyme Activity | ||
---|---|---|---|
2. | 265. | 241. | 195. |
4. | 521. | 487. | 505. |
6. | 662. | 805. | 754. |
8. | 885. | 901. | 898. |
10. | 884. | 850. | |
12. | 852. | 914. | |
14. | 932. | 1110. | 851. |
16. | 987. | 954. | 999. |
18. | 984. | 961. | 1105. |
20. | 954. | 1021. | 987. |
When analyzing the data, use the Michaelis-Menton model. The results of the Prism analysis are as follows:
Enzyme Activity | |
---|---|
Michaelis-Menten | |
Best-fit values | |
Vmax | 1353 |
Km | 5.886 |
Std. Error | |
Vmax | 75.93 |
Km | 0.9498 |
95% Confidence Intervals | |
Vmax | 1197 to 1509 |
Km | 3.933 to 7.839 |
Goodness of Fit | |
Degrees of Freedom | 26 |
R square | 0.9041 |
Absolute Sum of Squares | 170343 |
Sy.x | 80.94 |
Constraints | |
Km | Km > 0.0 |
Number of points | |
Analyzed | 28 |
Notice, again that the R2 value is close to 1, and that when you look at the generated graph that the actual values deviate very little from the expected. To show you that this model fits the data best, let’s see what happens if we had tried to fit a linear regression model to the data.
Best-fit values | |
Slope | 35.79 ± 4.456 |
Y-intercept when X=0.0 | 408.6 ± 55.70 |
X-intercept when Y=0.0 | -11.42 |
1/slope | 0.02794 |
95% Confidence Intervals | |
Slope | 26.63 to 44.95 |
Y-intercept when X=0.0 | 294.1 to 523.1 |
X-intercept when Y=0.0 | -19.32 to -6.649 |
Goodness of Fit | |
R square | 0.7128 |
Sy.x | 140.1 |
Is slope significantly non-zero? | |
F | 64.53 |
DFn, DFd | 1.000, 26.00 |
P value | < 0.0001 |
Deviation from zero? | Significant |
Data | |
Number of X values | 10 |
Maximum number of Y replicates | 3 |
Total number of values | 28 |
Number of missing values | 2 |
As you can see, the R2 value significantly decreases, and when you compare the linear to the nonlinear goodness of fit line, you see that there is much more deviation from the actual observed values from those that are predicted. That is why fitting the data to the model is so important, because if we use a model with poor fitness we are unlikely to make accurate predictions about the dependent variables from independent variables.
A more thorough walk through, as well as model fitting examples using the other analyzes provided by Prism, click here.
Julia, this walkthrough really helped me understand model fitting on a whole new level. I think it really highlights reasons behind why a model must fit well to provide an accurate representation of dependent and independent variables. The topic of linear versus non-linear regression is one I have had trouble grasping in the past - putting things into context with the "y=mx+b" equation really helped me connect the dots. I finally feel confident in my ability to fit models to primary data fro accurate analysis!
ReplyDelete