Tuesday, April 12, 2016

No confidence interval for car make reliability?

I'm looking to buy a used car in the very near future, and there are some important attributes I'm looking for: performance, a gorgeous interior, 2015, and, yeah, it should be made on the other side of the pond.

But then I look at my stipend provided by my PhD program and I wonder what in the world it was that convinced me to become a biologist. And then think of Toyota.

Out of curiosity, I take a peak at Consumer Reports' 2015 Annual Auto Reliability Survey. And I find the following plot:

A very informative plot. What they've done is survey subscribers on the vehicles they own, in total covering more than 740,000 vehicles. For each reported vehicle a reliability score is calculated and associated with that vehicle's model. A mean reliability for the model is calculated, and all the mean reliabilities for a make's models are averaged to generate the yellow dot, indicating the make's mean reliability. The blue spread indicates the range from the make's least reliable model on average to the make's most reliable model. But why didn't they use confidence intervals? After all, they do claim to be determining "predicted reliability scores."

A confidence interval is used when sampling a population for a value of some characteristic in order to estimate the population's mean for that characteristic. Since samples are being taken as opposed to observing every individual in the population, our calculations will not give us the true mean, but it will hopefully be close. A confidence interval informs us of the range of values surrounding our calculated sample mean that must include the true population mean with a certain level of confidence.

All that is needed to calculate the confidence interval is the sample size, sample standard deviation, and the sample mean. But I can see why Consumer Reports decided not to have their plot display that. As a consumer, once I see the average of the all the models' mean reliabilities, I don't need to see a spread telling me about the true mean. The true mean is not the goal of the reliability report nor the consumer who's in the market for a vehicle. The spread I want to see is the range from the make's least to most reliable model, indicating the variability in the reliability of the make's models. For example, if I was comparing Mazda's reliability to Subaru's, I would note that their means are very close, but the spread of Mazda's reliability is constrained and sits to the higher end of Subaru's reliability, which varies much more. This information might give me more confidence in Mazda's reliability than if I had simply compared the means. The confidence interval may resemble the worst-to-best spreads, but it simply does not mean the same thing. And once we have "outliers" that skew a spread in one direction (see Hyundai, BMW, Ford, Cadillac, Jeep), the fixed ± error of confidence intervals begin to have less meaning to the consumer.

Perhaps if I were a chemical engineer or computer scientist, that Maserati pictured above (no reliability data available, but who cares when a car is that beautiful!) wouldn't be such a distant possibility. But for now, I'm going to look at Toyota. And it probably won't be 2015, either.