Twitter lit up Saturday with news of a Reuters/Ipsos poll putting Romney 21 points ahead of his nearest competition. The Reuters story is here. The poll put Romney at 37%, Santorum and Paul at 16% each and Gingrich at just 12%. Romney supporters were quick to seize on this as evidence that Gingrich’s (and Perry’s) criticisms of Bain Capital had not only failed but backfired and that Romney was much closer to a big win that would all but secure the nomination.
But those who read to the bottom of the Reuters story found this important detail:
The Reuters/Ipsos poll was conducted online from January 10-13 with a sample of 995 South Carolina registered voters. It included 398 Republicans and 380 Democrats.
While Reuters and Ipsos partner on a high quality live interviewer poll they also do some internet based polling, which is what this one is. Branding both as a “Reuters/Ipsos Poll” makes it hard to tell the difference, at least until the last paragraph of the story. For example, see this January “Reuters/Ipsos poll” report which is one of their live interview polls. And compare the ledes for the two stories, South Carolina first:
(Reuters) – Republican presidential candidate Mitt Romney has opened a wide lead over his rivals in the South Carolina primary election race, trouncing Newt Gingrich and gaining momentum in his march toward the party’s nomination, a Reuters/Ipsos poll shows.
And the January national live interviewer poll lede:
(Reuters) – U.S. presidential hopeful Mitt Romney has sailed farther ahead of rival Republican candidates nationally and narrowed President Barack Obama’s lead in the White House race, according to a Reuters/Ipsos poll on Tuesday.
Reuters chooses not to distinguish the two types of polling methodologies in their branding of the poll. Using “Reuters/Ipsos Online Poll” or “Reuters/Ipsos Internet Panel” would be more distinctive and carry an implied warning about the difference. But Reuters doesn’t do that.
The chart above makes it clear that this South Carolina poll is quite an outlier for both Romney and Gingrich, though not for the other candidates. But because the Romney vs Gingrich comparison is the highlight, the large positive outlier for Romney coupled with the large negative outlier for Gingrich makes the gap huge: 37 – 12 or a 25 point margin between the two candidates. In contrast, my polling trend estimate based on all other SC polls puts them at 29.4 – 20.1, a margin of only 9.3 points.
Any poll can become an outlier. Outliers happen. But in this case the combination of internet methodology, large outliers in politically crucial directions and the style of branding the poll that lowers the prominence of the online nature of the data collection all produce a misleading picture of the South Carolina race.
At the very least, and regardless of methodology, it would have been responsible of the Reuters story to point out the discrepancy of this poll from others recently completed in the state. That would alert readers and give them a valuable context within which to understand this polling result.
Should we dismiss internet polling out of hand? No. But we should understand that internet polling is still an R&D project rather than a fully developed statistically justified methodology. Random sampling theory is settled science and remains the basis of live interview telephone polls (mostly including cell phones now). While non-response is a significant issue in these polls, the underlying theory of sampling is not subject to serious criticism. In contrast internet based polls (mostly) start with self-selected volunteers who sign up to participate in online “panels” of respondents. (Some online polls, such as those pioneered by Knowledge Networks, recruit panel members in whole or in part by first selecting a random telephone sample and then asking telephone respondents to join the panel. This mitigates the sampling issue but still involves another step of selection from phone to participation online.) Whatever the virtues of online panels, strict random sampling from the population is not one of those virtues. And that means the theory of inference based on random sampling does not apply. For some “fundamentalists” that is the end of the story and grounds for dismissal of all online polls. I do not count myself among the fundamentalists, though it is foolish to dismiss their concerns cavalierly.
Smart stats guys point out that even randomly selected telephone respondents are not perfect examples of statistical theory. With response rates usually under 20% (depending on how response rate is calculated and there are some issues here) there is a lot of self-selection in telephone polls as well. Most pollsters compensate for differential non-response by weighting the data to account for several known demographic characteristics, which is a standard and justifiable practice, but that is a simple form of statistical modeling that is moving beyond the pure theory of random sampling. So this point of view says once you start modeling, why not admit it and embrace it with either phone or internet samples.
A problem is that the “right” way to model the sample selection in online polls is not settled science. Some really smart people have developed sophisticated methods to compensate for this problem. But other online pollsters have different methods, some equally sophisticated but different, some just “different”. And there might be a charlatan or two out there as well. This is normal at the R&D stage of science. But it means that we have not come to any consensus about how to model internet panels “correctly”.
At the very least this uncertainty would seem to require more clarity and transparency when results from internet based polls are presented, especially from respected news organizations such as Reuters.
(Disclosure: In the past I had a business relationship with Polimetrix, now YouGov/Polimetrix, a prominent internet pollster.)