Florida Endgame

I’ve been a little busy with polling in Wisconsin this past week and updates fell behind. Here is a Monday night update for Florida. Thanks @JoeLenski for the nudge.

As the entire world knows by now, Romney has moved up substantially in the last week, while Gingrich’s week has not been a good one. The Fortnight Review above gives a zoomed in look at the last two weeks. Note the gap between the pre-SC polls and the start of the post-SC polling on the 22nd. With a gap like that between polls, it might be better to just look at polls since the 22nd. If so, Mitt has gained about 10 or a shade more, while Newt has dropped something like 8 or so.

The end result is the standard trend putting Romney at 39.7 and Gingrich at 31.7, with Santorum at 11.5 and Paul at 10.2. The more sensitive redline estimate sees a bit more trend than the gray standard, making Romney at 42.4, Gingrich 28.2, Santorum 12.5 and Paul 10.8.  The last fortnight of polling is pretty similar to the sensitive estimator.

As always, I’ll stand with the standard trend. So there, @JoeLenski, those are my priors.

Here is the long run chart for Florida for the long run perspective of how this race has moved so much.

Reuters Internet Poll in South Carolina GOP2012

Twitter lit up Saturday with news of a Reuters/Ipsos poll putting Romney 21 points ahead of his nearest competition. The Reuters story is here. The poll put Romney at 37%, Santorum and Paul at 16% each and Gingrich at just 12%. Romney supporters were quick to seize on this as evidence that Gingrich’s (and Perry’s) criticisms of Bain Capital had not only failed but backfired and that Romney was much closer to a big win that would all but secure the nomination.

But those who read to the bottom of the Reuters story found this important detail:

The Reuters/Ipsos poll was conducted online from January 10-13 with a sample of 995 South Carolina registered voters. It included 398 Republicans and 380 Democrats.

While Reuters and Ipsos partner on a high quality live interviewer poll they also do some internet based polling, which is what this one is. Branding both as a “Reuters/Ipsos Poll” makes it hard to tell the difference, at least until the last paragraph of the story. For example, see this January “Reuters/Ipsos poll” report which is one of their live interview polls. And compare the ledes for the two stories, South Carolina first:

(Reuters) – Republican presidential candidate Mitt Romney has opened a wide lead over his rivals in the South Carolina primary election race, trouncing Newt Gingrich and gaining momentum in his march toward the party’s nomination, a Reuters/Ipsos poll shows.

And the January national live interviewer poll lede:

(Reuters) – U.S. presidential hopeful Mitt Romney has sailed farther ahead of rival Republican candidates nationally and narrowed President Barack Obama’s lead in the White House race, according to a Reuters/Ipsos poll on Tuesday.

Reuters chooses not to distinguish the two types of polling methodologies in their branding of the poll. Using “Reuters/Ipsos Online Poll” or “Reuters/Ipsos Internet Panel” would be more distinctive and carry an implied warning about the difference. But Reuters doesn’t do that.

The chart above makes it clear that this South Carolina poll is quite an outlier for both Romney and Gingrich, though not for the other candidates. But because the Romney vs Gingrich comparison is the highlight, the large positive outlier for Romney coupled with the large negative outlier for Gingrich makes the gap huge: 37 – 12 or a 25 point margin between the two candidates. In contrast, my polling trend estimate based on all other SC polls puts them at 29.4 – 20.1, a margin of only 9.3 points.

Any poll can become an outlier. Outliers happen. But in this case the combination of internet methodology, large outliers in politically crucial directions and the style of branding the poll that lowers the prominence of the online nature of the data collection all produce a misleading picture of the South Carolina race.

At the very least, and regardless of methodology, it would have been responsible of the Reuters story to point out the discrepancy of this poll from others recently completed in the state. That would alert readers and give them a valuable context within which to understand this polling result.

Should we dismiss internet polling out of hand? No. But we should understand that internet polling is still an R&D project rather than a fully developed statistically justified methodology. Random sampling theory is settled science and remains the basis of live interview telephone polls (mostly including cell phones now). While non-response is a significant issue in these polls, the underlying theory of sampling is not subject to serious criticism. In contrast internet based polls (mostly) start with self-selected volunteers who sign up to participate in online “panels” of respondents. (Some online polls, such as those pioneered by Knowledge Networks, recruit panel members in whole or in part by first selecting a random telephone sample and then asking telephone respondents to join the panel. This mitigates the sampling issue but still involves another step of selection from phone to participation online.) Whatever the virtues of online panels, strict random sampling from the population is not one of those virtues. And that means the theory of inference based on random sampling does not apply. For some “fundamentalists” that is the end of the story and grounds for dismissal of all online polls. I do not count myself among the fundamentalists, though it is foolish to dismiss their concerns cavalierly.

Smart stats guys point out that even randomly selected telephone respondents are not perfect examples of statistical theory. With response rates usually under 20% (depending on how response rate is calculated and there are some issues here) there is a lot of self-selection in telephone polls as well. Most pollsters compensate for differential non-response by weighting the data to account for several known demographic characteristics, which is a standard and justifiable practice, but that is a simple form of statistical modeling that is moving beyond the pure theory of random sampling. So this point of view says once you start modeling, why not admit it and embrace it with either phone or internet samples.

A problem is that the “right” way to model the sample selection in online polls is not settled science. Some really smart people have developed sophisticated methods to compensate for this problem. But other online pollsters have different methods, some equally sophisticated but different, some just “different”. And there might be a charlatan or two out there as well. This is normal at the R&D stage of science. But it means that we have not come to any consensus about how to model internet panels “correctly”.

At the very least this uncertainty would seem to require more clarity and transparency when results from internet based polls are presented, especially from respected news organizations such as Reuters.

(Disclosure: In the past I had a business relationship with Polimetrix, now YouGov/Polimetrix, a prominent internet pollster.)

Suffolk Tracking and Election Day in New Hampshire

Suffolk took their tracking poll into the field one last night on Monday, so today we have a final update in New Hampshire.

I wrote yesterday about some differences between the Suffolk tracker and other polls. I’m pleased this morning to see the tracker moving back towards my trend estimates based on other polls. This is exactly what I’d expect if the Suffolk variation was random and today we see a return to the central tendency, reflected in my trend estimate.

On Monday, Suffolk had fallen to 33% for Romney, while my trend estimate based on other polling was holding at 38-39. This was the only major discrepancy, leading me to suspect Suffolk was becoming a bit of an exception. Today, the last night of polling finds Suffolk returning Romney support to 37%, comfortably close to my trend estimate of 39.4. The other Suffolk results are all quite close to my final non-Suffolk trends. This is good news from a polling consistency perspective. At this point there is relatively little disagreement. Of course today’s voters may or may not agree, as we discovered four years ago. But from a pure data perspective, the evidence has converged.

So now, mixing all polls, here is the final Fortnight Review for New Hampshire.

While Romney has clearly been trending down, he looks to end up at 37% regardless of which estimator we use. Ron Paul still looks to be a solid second place at 18% with virtually no trend over the fortnight.

The excitement remains between Huntsman and Santorum. Huntsman has been trending up but still trails Paul at 13.6. Here the more sensitive estimators suggest more of a trend than my standard estimator, putting Huntsman at 16-17, quite close to Paul. One always wants to hedge bets with late surging candidates. But my “standard” estimator is standard for a reason, so I’ll stand by the 13.6 estimate though I won’t hold it against you if you want to believe “ready red” at 16.5.

Santorum’s surge seems to have plateaued in New Hampshire, where he continues to do worse than nationally or in South Carolina. My estimate puts him at 12.3, probably headed for 4th place.

And then there is Newt Gingrich. The last few days suggest he finally stopped falling, which he has been doing steadily for a long while. But at 9% it looks likely to be a 5th place, distant, finish.

Rick Perry? 0.6%. Amazing.

This morning’s political shows include considerable speculation along the lines of “the polls mean nothing” given Mitt’s comments on firings, Huntsman’s eventual snappy response about putting country first, and Gingrich’s attacks on Romney’s record at Bain Capital. As all junkies, I too need to think exciting things are happening here at the end and our “feel” for the last couple of days is a reliable guide to the outcome. But ultimately, I won’t go there. The reason to do polling, and to take the results seriously, is that most of the time we get reliable data that turn out pretty close to election outcomes, and we do this without last second subjective modifications. So what to I expect tonight? I expect the gray standard trend estimator to be my best guess. I’ll stick to the data.

Postscript: I’m getting annoyed at late polling coming in. Every time I think we are “final” another one comes out. Now it is Rasmussen, with a Monday night only sample. The results are consistent with Suffolk and the overall trends above, so makes only modest difference to the trend estimates above. But in the interest of completeness, and with fervent hope this really is the “final” update, here are the trends including Rasmussen.