Sunday, April 10, 2016

Political Polls: A Quagmire of Sampling Errors

It is once again a presidential election year, which means mainstream media is practically covering our daily lives in a bloodbath of tortured poll results. Everywhere you look, there is yet another new opinion poll showing how the political pundits are faring against each other. However, before even looking at any of these results, we should be questioning how the survey was designed before it even began and how the data was actually collected.

The end goal of these political opinion surveys is to infer how the whole population of voters in the US will vote in November. However, sampling the opinion of every single American voter is completely impossible given the constraints of time, money, and willingness/truthfulness in every voter answering. So instead, polls must base their opinions on a smaller sample, a subset of individuals taken from the greater population. If the sample is chosen by random sampling, then the distribution of opinions of those people will be similar to that of the entire population. Randomization is key here. If there are any parts of the population that are intentionally (or unintentionally) excluded or over-represented in polling, then the sample distribution of opinions will not necessarily resemble that of the whole population. Problems in proper random sampling have been hotly debated for decades as discussed by Jill Lepore in a recent article in The New Yorker, and the debates have only grown more contentious in recent years with changes in technology. For example, many opinion polls are conducted by random dialing of phone numbers and collecting responses of whoever picks up. However, telephone response rates are currently in the single digits, and people who willingly pick up the phone often have very different political behaviors than the general public. Furthermore, random calling to cell phones was banned in the 1991 Telephone Consumer Protection Act, so calls are only going to land lines and not cell phones. And once again, the demographics of people who have land lines is not fully representative of the whole population, excluding younger voters in particular. After the UK elections in 2015, Nate Silver, the widely beloved political analyst behind FiveThirtyEight, highlighted a few recent examples of inaccurate political polls and some of the reasons to worry about bad sampling, such as the inability to rely on contact by phone, unreliable online polling methods, Americans withholding their true opinion in polls, and herd opinions swaying voters.

Although political polls aspire to have completely representative polling, an accurate sampling is nearly impossible in our current culture given the reasons above. To compensate for this, many political polls use methods that could be considered scientific heresy to scientists working in hypothesis-driven experiments. For non-probability sampling methods, pollsters make the assumption that each sample selected does not have an equal chance of being chosen from the population (i.e. younger voters with cellphones but no land lines are far less likely to be chosen in phone-based polling methods). There are various methods to compensate for the changes in variability, such as weighting the results of certain samples more than others. For example, a phone-based poll could weight the opinions of younger voters or minority voters more so than the opinions of older white voters more likely to have phone lines. But remember, although this seems like a practical method to “fix” political polls, it makes for really really bad science. Hypothesis-driven scientific studies only work on the assumption that all samples were RANDOMLY chosen from the population, i.e. the probability of choosing any one sample is equal for all samples in the population. Using non-probability sampling or not using true randomized selection throws this assumption out the window, runs it over with a semi-truck, and lights the remains on fire. Never trust scientific experiments using non-probability methods, they are scientifically unsound and completely useless when it comes to predicting anything of value about the whole population. And for political polls, make sure to investigate the weighting methods used and determine yourself how much you actually trust the reported results.

Another important feature of political polls is their sampling error. By pure chance, any sample poll of a population will have a slightly larger or smaller proportion of subjects voting a certain way compared to the true population vote. Pollsters only know the opinions of the small sample, not of the whole population, so the best way to show how sure you are that your sample represents the whole population is to calculate confidence intervals. A poll result percentage by itself is completely useless for making assumptions about the population without a confidence interval. By pure tradition, almost everyone uses a confidence interval of 95%. This means that the range of values reported has a 95% chance of containing the actual population value and a 5% chance that it does not contain the population value. For example, let’s pretend you polled 100 people about if they would vote for Bernie Sanders or Hillary Clinton in a state’s Democratic primary. If 54 of 100 voters in the sample say they would vote for Bernie, then the 95% CI for the sample is +/-9.77%, which means that the percentage range of 44.23% and 63.77% has a 95% chance of containing the population value of the proportion of voters who will vote for Bernie.


Increasing your sample size decreases your confidence interval, as illustrated below by PewResearch.This is because the confidence interval is roughly proportional to the square root of 1/n where n is the number of samples. So, bigger n leads to smaller confidence intervals. 

If you repeat the poll above for Hillary vs Bernie in 1000 sampled voters, and find 54% of voters +/-3.09% support Bernie to 95% confidence, then the percentage range of 50.91% to 57.09% is 95% likely to contain the population percentage of voters supporting Bernie. It’s still the same sample percentage, but the 95% confidence interval is much smaller.


If you want to take a closer look at current political polls, I highly encourage you to look at the compiled list of national polls on FiveThirtyEight and search through the methods sections to see how each poll plays with non-probability sampling weighting, random sampling methods, and determination of confidence intervals. The lack of standards between polls is both horrifying and fascinating and makes me question the results of all political polls, especially when methods aren’t clearly reported.

I’ll leave you with this one interesting snippet of the methods section of an NBC/SurveyMonkeypoll that actually acknowledges inherent bias in their selection methods and how they compensated for that in their methods.  It’s rare that a poll will actually acknowledge their bias like this, so enjoy this unusually honest example.














3 comments:

  1. Although there are several limitations on polling responses that make it almost impossible to have a completely representative sample, it should be clear to the general public that there is potential skewing of the polling results due to lack of random sampling. There have been several times this election season in which different news outlets have reported polls that end up being wrong after the primaries were complete. As a member of the general public, I have not heard one of those news outlets describe how inaccurate the sampling might be after one such poll is performed. I believe this is doing a disservice to those that follow particular news channels and could lead to some viewers discontinuing their viewing. I also believe it is misguiding viewers. Based on the reality that there could be pressure to follow the popular vote (or perhaps the opposite), especially if a citizen has not been following the race closely, this could potentially skew election results. Not only does inaccurate, nonrandom sampling lead to polling errors, but it could sway voters to vote in particular ways, a bias in itself.

    ReplyDelete
  2. I think Emily highlighted the major limitations, and in some cases, inappropriate experimental design of these polls. I would like to point out an additional argument, that in the example from NBC, roughly 2000 people polled are not registered voters. Meaning they will not even be voting in the election that they polls are trying to predict. This is just one example of how low the sample pool is that they will include even these unregistered voters in their analysis.

    But my greater comment is this: The argument made above shows the weakness of bias, sampling error, but also the lack of knowledge of how statistics affects our daily lives. In 2009, The National Center for Education reported that only 11% of high school students (in public schools) took a statistics course. Then, in higher education, statistics is only required for certain majors. That means that many students will graduate with advanced degrees and never have taken a statistics course. For that reason, many people rely on statisticians to do the work for them. This is no different when it comes to presidential election polling. It is clear when looking at the methods section of the NBC/SurveyMonkey Poll that there are trained statisticians conducting the surveys and that THEY are aware of the limitations of their polling. However, it is important that they disclose these limitations, and the impacts of these limitations, to their audiences. As Anna stated, these polls can affect how others decide to cast their vote. Therefore, these polls could be having a much bigger impact on our elections then we see on the surface.

    ReplyDelete
  3. I always knew that these polls were inaccurate to some degree, but I have to admit I did not think it was this bad. I think the general consensus is that most polling practices are terribly skewing the data even beyond not doing random sampling. The fact that they're trying to normalize their data based on a non-random sample and weight underrepresented votes is clearly a bad practice. If these polls are having issues with finding input, then they should just clearly state what the limits of their sample size were and not make overreaching claims. If there was a sample size of 10 voters (very bad for a poll, but easier for an example), and 9 of the individuals were Caucasian males, and 1 a Caucasian female, then it makes more sense to just say that the poll reflects ideas from Caucasian males rather than weight the lone female's opinion and try to advertise it as encompassing all similar demographic viewpoints. We even want these kinds of specialized polls from time to time (or at least politicians do when trying to appeal to certain constituents).

    ReplyDelete