Showing posts with label confidence interval of proportions. Show all posts
Showing posts with label confidence interval of proportions. Show all posts

Tuesday, April 5, 2016

The color of the dress should give you confidence... intervals

A bit over a year or so ago, an otherwise unremarkable photo of a dress in poor lighting became an internet sensation when it was discovered that different people saw the colors of the fabrics in the dress in dramatically different ways. There were two camps with little middle ground in between: the white and gold camp and the blue and black camp. People, including myself, were adamant that the other camp needed to have their eyes checked. Personally, I saw the dress as white and gold, and thought that people claiming that they saw the dress as blue and black were part of some elaborate hoax set up by ophthalmologists to generate checkup business. To investigate this, I set up a poll and submitted the link to r/SampleSize, which contains a small subpopulation of the popular website Reddit that are interested in taking surveys. I received a total of 45 replies, 27 responses for white and gold and 18 responses for blue and black This indicates that only 60% of responders saw the dress as white and gold, which was certainly not the hallmark of some great conspiracy, unless the secret ophthalmologist societies were in on my small-scale survey. The 95% confidence interval calculated from these results for the proportion of the population that sees the dress as white and gold was 45.45% to 72.98%. Phrased differently, there's a 95% chance that actual value for the percentage of the whole population that sees the dress as white and gold is in the range of 45.45-72.98%.

Another poll regarding the dress colors was conducted on a more visible platform. This survey received 432 respondents, 172 who saw the dress as white and gold. This is a much smaller percentage (39.81%) than that observed in my poll. The 95% confidence interval from this survey came out to 35.30-44.50%. Comparing my results to the results from this survey, there are two different 95% confidence intervals that do not overlap. So what does this mean? One possible reason for the disparity could be due to response validity, or lack thereof. I had created my poll well before taking this statistics class, thus it fell victim to a few different flaws in design. For one, my sample population was not random nor representative of the total population. Posting the poll on Reddit alone limited my sample population to the part of the population that uses Reddit, which is demographically weighted towards young males that spend a lot of time in front of a computer. The poll on Survata included a much larger age range of participants, though data on gender and time spent in front of screens was not recorded. Additionally, I did not limit the poll to one response per person, so responses may not have been independent. I also did not provide a choice other than white and gold or blue and black, which the Survata poll did include a choice for "neither". This may have skewed responses, as perhaps respondents who didn't see either color scheme chose to select the color scheme closest to what they saw, or just selected the first option on the survey, which was white and gold. These errors within my poll violated several assumptions that are necessary when analyzing confidence intervals, and thus weaken the meaningfulness of confidence intervals in the analysis of the data from the poll. However, it should be noted that it's entirely possible that confidence intervals from two different completely valid data sets do not overlap, or even do not contain the true population value of the measured parameter, due to the nature of what the confidence interval truly means.
The infamous dress