This past week, a
letter in Nature was published by the Department of Genetics at Albert
Einstein College of Medicine in New York. The letter’s title was pithy and
shocking “Evidence for a limit to human lifespan.” GASP! “But this can’t
be true,” some may say, “with advances in medicine, we’re only continuing to
lengthen the human lifespan!” First off, that’s what I think is a bit of a
teleological argument – assuming just because humans have certain intelligences
about the world, our purpose is to expand life or live forever. There’s some
sort of moral purpose that is implied in that answer. Anyways, that’s not the
point, but it sets off an interesting point of debate
Maybe we, humans, aren’t supposed to live forever. We forget
that we are but specks on the lithograph of time; we’re babies! Only 200,000
years has the species Homo sapiens existed. On a 4.5 billion-year old
planet. Many things have lived before us, and many things will live after us.
Perhaps this sounds a bit nihilistic, but I was happy to hear these reports
from these geneticists. Humans are single-handedly taking the gauntlet to
ourselves and our planet, making sure life, in any form, is going to have a
hard time existing on a planet where the temperature is rising almost 2 degrees
Fahrenheit each year. So yes, I was overjoyed, ecstatic, and relieved to hear
we may be shortening the execution of our planet Earth -- until I heard these
geneticists used statistics to come to their results. Oh, great, another
potential mishap with statistics by those not formally trained in statistics, I
thought.
The methods that the study employed were actually pretty
simple. Most people who took a basic stats class could probably understand
everything up until the cubic smoothing splines. The authors plotted the
maximum reported age at death (MRAD) for 534 people over the age of 110 or what
they call “supercentenarians” gathered from the International Database on
Longevity. These supercentarians came from Japan, France, US and the UK. Two
linear regressions were done for subsets of data between 1968-1994 and
1995-2006. The scientists found an increase in MRAD of 0.15 per year before
1995 (with r=0.68 and p=0.0007) and a decrease in MRAD of 0.28 years from
1995-2006. There was a r=-0.35 and a p-value of p=0.27 for the decreasing MRAD.
The scientists didn’t expand on the statistical shortcomings of their decreasing
MRAD points. The scientists conducted the same procedure with other data
points, and found a similar overall trend – that is, a significant increase
until breakpoint and an insignificant decrease after breakpoint -- but did not
discuss the weak correlation or significance.
The initial criticisms of the work have been predictable, to say the least. Why didn’t the scientists discuss their dismal p-values?
Moreover, some say that seeing a significant increase in MRAD then observing a
decrease, even if it was insignificant, is still something. This is, however, an
amateur cop out, in my opinion. There are so many more reasons to be critical
of the work than just the terrible p-values, and it starts with study design.
The main figure in Dong et. al. and the bane of my existence these last four days...
First, it’s really not clear why the scientists decided to
use the arbitrary breakpoint between 1994 and 1995. Clarification of this would
be helpful, if not crucial, to understand, as the entire crux of the paper’s
argument leans on this breakpoint and its subsequent data analysis (Figure 2a).
It appears that the breakpoint was chosen arbitrarily to support their initial
hypothetical claim that humans have reached a plateau in age advancement, and
the linear regression decrease model in MRAD is used in a rhetorical sense,
rather than in a strict statistical-sense of proving their ad-hoc
null-hypothesis statistical test (NHST). Is it even right to use NHST here? I’d
argue no. If their alternative hypothesis was that r=0, it’s increasingly hard
to test effect sizes closest to zero – you probably need to have a larger
sample size, and with this data, data that exists on the margins of collectability,
that’s hard to do.
The authors do make an implied comment on the power of their
statistical tests, noting that they probably don’t have a large enough sample
size to make case for a more robust statistical model. To alleviate this, they applied
a post-hoc sample expansion to include several RADs as the highest point,
including the MRAD and the second through fifth RAD; this appears to be more or
less a sneaky way of doing away with outliers to me. They concluded that the
average RADs had not increased since 1968, and that “all series showed the same
pattern as the MRAD.” But instead of applying an ANOVA to test their
hypothesis about average RADs not affecting the means, they never reference any
supplemental material. It appears that they just eyed it. Another
suggestion would be to compare the slopes of the “plateau” regions by doing
some sort of t-test.
In the end, the statistical procedures taken to prove their
own point would have any data scientist’s head spinning; probably enough so
that people just wouldn’t take a hard look at it for too long. This is
dangerous, but we can’t say Nature hasn’t committed the crime before: many times,
the journal has published funky stats simply because the title of the study was
provocative (as demonstrated by our class this semester). Nonetheless, you have
to tip your cap at the scientists who wanted to make a headline, they surely did it, then perhaps bonk them on the head with your cap and tell them to
do better stats before they make such grand claims.
No comments:
Post a Comment