What is correlation? Or perhaps a better question is, what is a good correlation? The answer isn’t very straightforward. I made up the following data to see if there was a correlation between a person’s midi-chlorian count and the number of soft drinks they consumed during a year.
The correlation statistics are as follows:
The p-value is very small, so you could conclude that there is a strong relationship between midi-chlorian count and soft drink consumption. If there really was no relationship between midi-chlorian count and soft drink consumption your chance of obtaining a correlation this strong is very unlikely. However, the r2 value is very small. This suggests that only about 12% of the variation in midi-chlorian count is explained by variation in yearly soft drink intake.
So, which measure do you look at to judge the correlation? The p-value is really small, which suggests that the correlation is unlikely to occur coincidentally. However, its important to remember that p-value is highly dependent on sample size. This study samples 111 individuals, and with a sample size that large, very small effect sizes can become statistically significant. The effect size is small since r2=0.1213, so we are left with the question, is an effect size of roughly 12% scientifically important? This is a difficult question to answer and it’s probably best left to the judgment of the scientist or the reader. I think this problem raises an interesting question about the strength of correlations reported in media. The news is full of correlation data between various categories, but how are the strengths of these correlations being judged? Do journalists and scientists look at low p-values and decide that a correlation is strong or do they look at a high r2 value and effect size? An alternative to this is for journalists to publish the actual data, and let readers conclude whether the correlation is strong enough to warrant action or consideration.