Monday, April 11, 2016

Statistical Tests - Selecting the least evil

I recently submitted a paper for consideration to a highly reputable journal that will remain unnamed. Two of the reviews were favorable but the third reviewer had a quite a few comments on the statistical analysis that was used (in addition to other unfavorable comments that led to a rejection). The samples were from a series of patients (n=7) who had a very rare infectious disease. Serum was sampled over time, but all patients came to care at different time points in their illness and received various experimental therapies.  This was clearly not an experimental design! We sought to determine if these precious and difficult to obtain samples could teach us something about the pathophysiology of the disease process.  The goal of the study was to examine a series of protein biomarkers in the serum of these patients and ask if there were any associations between clinical disease manifestations and level of a given biomarker. From what i've learned in this class, there should be no statistical analysis done on these data since they were purely observational, not rigorously controlled, the sample size is quite small, the samples were not randomly selected, probably do not accurately represent the larger population, and the SD of the groups are probably not identical. However, there is a certain weight given to data that are statistically significant in the scientific community, so after sitting down with our branch statistician (yes, i'm lucky to have access to such a person), we decided to determine the acute phase mean level of biomarker X (n=54) in group A (moderate disease n=5) vs group B (severe disease n=2). A standard T test was done and p values were corrected for multiple comparisons using false discovery rate. This at least gave us a panel of biomarkers to focus on for the purpose of the paper rather than just show the raw data for all 54 biomarkers.  Scientifically, the findings were quite interesting and made physiologic sense.  However, since there was more than one sample per patient in the acute phase, I somehow should correct for repeated measures, which I think would require me do an ANOVA, but the stats guy said that we don't have enough data to do an ANOVA and students T tests were as good as it gets. So do I publish the data with the not so great statistical test or without one at all? Which would you believe as a reader? A reviewer?  I still don't have answer to this. The most ironic thing about this study is that the most significant finding was not statistically significant! Clearly, IL-6 is associated with severe disease (filled red and black squares) and not with moderate disease (open circles and triangles). No need for statistics here.

No comments:

Post a Comment