From: Dave Rusin Date: Thu, 7 Dec 2000 03:26:34 -0600 (CST) To: Subject: Re: Simulated recount in Miami-Dade? First, I should observe that a precinct-by-precint analysis of Dade has already been done more or less as you suggest (I think). I believe it was the Miami Herald supplying data to Bruce Hansen (U. Wisc) but I can't find the pointer to this story right now. That's not a bad analysis, and basically it deflates Gore's claims, roughly because the precincts counted so far were already Democrat-leaning. But I think it might be missing a fact or two. Now, your plan is like this: For each precinct, start with number of undervotes multiply by .20 (acceptance rate of Dade canvassers so far) partition among candidates according to precinct's ratios add for all 800 (?) precincts I don't know where you got the 0.20; I thought I saw 0.24 in Boies' testimony: http://washingtonpost.com/wp-srv/onpolitics/elections/fl_contest_hearing120200.htm My concern is that, as noted by the GOP, this method is sure to count one fifth of ALL undervotes as a vote for someone -- even if, for example, all the undervotes in a particular precinct were intended by the voters to be blank. That would be unjust, of course, even if it is unexpected. I think there is a correction method which would address their concerns, and at the same time reassure the Dems who are concerned that there are precincts which lost unusually large numbers of ballots for reasons which affect the precincts differentially (sort of like this Palm Beach situation: http://www.herald.com/content/today/news/broward/digdocs/076650.htm There's no way to know for sure what the voter intended now, I guess, but we can at least make an allowance for abstentions. Here's a new plan: For each precinct, start with number of undervotes subtract 0.4% of precinct total tally. (replace with 0 if negative) multiply remainder by xxx partition among candidates according to precinct's ratios add for all 800 (?) precincts I don't know what number xxx is. Perhaps you can get me the data as follows. The partial Dade recount recovered, what, about 250 votes for all candidates? And this was from -- I'm estimating here -- about 1100 undervotes there. How many ballots were cast altogether in those precincts? Maybe 130,000? Taking these numbers for a moment, I'll show you what xxx is. Of any 130,000 Florida ballots cast, we can estimate that about 390 - 520 were intended by the voter to be blank. (I've estimated the no-vote rate to be 0.4% statewide; others have estimated about 0.3%). Let me be consistent to my other models and stick with the 0.4% rate. Then of the 1100 (?) undervotes so far considered, we should set aside 520 deliberately-blank ballots, leaving 580 ballots which we then deduce were intended by the voter to mean something. In that case the 250(?) actually recovered would translate into a 250/580=0.43 estimate for xxx. These numbers would suggest the Dade canvassers were not so conservative after all, managing to discern a voter intent in almost half of these marginally-punched ballots in which a vote was intended. (I probably should stress that this 43% value for xxx was computed using the numbers I have had to most guess at; if you want to pursue this, please respond with more data about the partial recount and I will help you correct a more valid number.) I can tell you how the models will differ. Your model recovers 20% of the undervote in every county. I would have expected you to use 24% based on what I read. My model recovers 43% of: (undervote - 0.4% total vote). There is NO DIFFERENCE between the last two models in precincts with undervotes which equal 0.9% of the total votes cast; my model recovers FEWER votes in precincts with smaller undervote rates, and MORE votes in precincts with larger undervote rates. Again, that "0.9%" will be different when I can get real data, but the rest of that sentence is valid, and in particular respects the concern that no votes should be "found" in some precincts (namely those with undervote rates under 0.4%). If it is true that Gore support is higher in precincts with higher undervote rates, this will improve Gore's edge somewhat. This model could be improved further if we could determine some mechanism which explains when a precinct's abstention rate is higher or lower. Now let me turn to another mathematical concern. In a precinct with 100 votes, counted as 60 Gore votes, 30 Bush votes, and 10 undervotes, we might split the undervotes as 6 Gore, 3 Bush, 1 abstain. In a precinct with 1000 votes counted as 600 Gore votes, 300 Bush votes, and 100 undervotes, we might split the undervotes as 60 Gore, 30 Bush, 10 abstain. However, I would be more confident that we had done a good job in the second case, since as we know (and can prove statistically) it's easier to have goofy spurts in the data among smaller data sets. At the very least, we have less round-off trouble: we could, for example, estimate a more accurate 4 abstentions in the second example, giving Gore 64 and Bush 32 of the undervotes. To some extent we see this already with county-wide data: even though we might expect, for example, a 0.4% undervote rate in all optical-scan counties, we see some variations among these counties -- especially among the smaller ones. Your proposal, which apportions (some) undervotes among the candidates using the precinct's ratios, will therefore encounter some significant swings from precinct to precinct, simply because the precincts are much smaller than the counties. This reduces our confidence in the results. It's not clear that there's a lot we can do about this (except, of course, to actually count the ballots!) Here is a suggestion, though: keep the models we propose above, but decide the candidate proportions a little differently. We might, for example, use the candidate totals for each precinct PLUS all precincts contiguous with it -- do you have an actual map from which this information can be gleaned? Or we might similarly lump each precinct together with "similar" ones, using variables which tend to affect voting patterns (income, race, education levels, or whatever). I should probably point out that this adjustment would be a lot of work, and in the end may very well produce no real change in the estimates. What it would do would be to reduce the size of the error bars: rather than simply asserting that the expected Gore gain is 500 votes (say), we would be able to say that, assuming the model to be valid, the expected gain is between 400 and 600 votes with probability xxx. Whether this kind of accuracy is important might depend on whether, for example, you are performing this analysis for a newspaper article or for legal proceedings. dave