Estimating the number of Iraqi deaths

For most Americans, any conversation regarding the number of Iraqis killed due to the invasion and subsequent occupation will be emotionally charged. There's no way around it, I suppose. Given how most of us are pretty well brainwashed from early childhood on to believe that the US is a "shining beacon on the hill" and that for many their sense of self-worth is likely to be negatively affected when their noses are rubbed into the nature of their very government's actions.

Which brings us back to the estimates of Iraqi deaths. If one goes back to the propaganda prior to the start of the war, the Iraq war was sold to the American public in part on the premise that the war would be short in duration and minimal in casualties. Now well into the fifth year, it is pretty clear that neither premise has been met. The question becomes just how bad is it? Trying to get a hold of a satisfactory answer is tricky, as it turns out. First off, war zones tend to be quite chaotic, and even if we were to assume that all parties involved would want to report accurately on the dead and injured, we are unlikely to expect much in the way of accuracy. News reports will only be as accurate as the reporters on the field, and over time, it's become increasingly difficult for English-language news services to get reporters outside of the Green Zone. Hence even the best aggregators based on news reports, such as Iraq Body Count (IBC), are probably going to significantly undercount the number of Iraqis killed under occupation. It also seems reasonable to expect that the US military and whatever puppet regime "governing" Iraq will be less than forthcoming regarding Iraqi casualties. Massive Iraqi casualties, especially civilian casualties, make the US government and its military look evil, and Iraqis puppet regimes look ineffective. We can expect that these particular organizations would be motivated, if anything, to suppress casualty information.

If the usual methods to assess a death toll are unacceptable, those interested in some approximation of the truth will search for an alternative method or methods. Starting with a 2004 paper published in Lancet, a prestigious and reputable peer-review medical journal) and followed up with a paper published in the same journal two years later, a team of researchers from Johns Hopkins University tried a novel approach - they used rigorous survey and sampling methodology in order to arrive at a closer approximation of the actual death toll. Their findings were nothing short of startling. The initial study, covering the first year of the US occupation, estimated the number of Iraqi deaths caused by said occupation to be at approximately 100,000. The second study estimated the Iraqi death toll, as of July 2006, to be around 601,000. More recently, ORB, using a different but equally rigorous survey and sampling methodology, estimated the Iraq death toll to be at 1.2 million. To an American public (and of course America's ruling elites) such numbers were bound to be the height of political incorrectness, and given the reaction - especially by US right-wing authoritarians, but to some degree as well among self-described moderates and the so-called "respectable left" - it is safe to say that these three reports have touched a nerve or two. After all, the prospect of belonging not to a civilized and gentle world power (as most Americans would characterize themselves), but rather a barbaric and murderous regime.

That's certainly not a "feel-good" message for a nation addicted to feeling good. The question becomes: how believable are these rather unpleasant numbers? To what extent are these numbers reflect reality (in other words, how externally valid or generalizable are these survey results)? To address that, it is useful at this point to discuss the sampling methodologies employed in these three studies. All three studies used forms of probability sampling. It is probably worthwhile to discuss why probability sampling methods are usually preferable to nonprobability sampling methods. To give you a taste, here's a description reasonably similar to what I use in the classroom:
A probability sampling method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen.
The two Lancet studies were based on a form of probability sampling called cluster sampling:
The problem with random sampling methods when we have to sample a population that's disbursed across a wide geographic region is that you will have to cover a lot of ground geographically in order to get to each of the units you sampled. Imagine taking a simple random sample of all the residents of New York State in order to conduct personal interviews. By the luck of the draw you will wind up with respondents who come from all over the state. Your interviewers are going to have a lot of traveling to do. It is for precisely this problem that cluster or area random sampling was invented.
In cluster sampling, we follow these steps:
  • divide population into clusters (usually along geographic boundaries)
  • randomly sample clusters
  • measure all units within sampled clusters
The ORB study employed yet another form of probability sampling called multistage sampling, which combines multiple probability sampling methods:
The most important principle here is that we can combine the simple methods described earlier in a variety of useful ways that help us address our sampling needs in the most efficient and effective manner possible. When we combine sampling methods, we call this multi-stage sampling.
For example, consider the idea of sampling New York State residents for face-to-face interviews. Clearly we would want to do some type of cluster sampling as the first stage of the process. We might sample townships or census tracts throughout the state. But in cluster sampling we would then go on to measure everyone in the clusters we select. Even if we are sampling census tracts we may not be able to measure everyone who is in the census tract. So, we might set up a stratified sampling process within the clusters. In this case, we would have a two-stage sampling process with stratified samples within cluster samples. Or, consider the problem of sampling students in grade schools. We might begin with a national sample of school districts stratified by economics and educational level. Within selected districts, we might do a simple random sample of schools. Within schools, we might do a simple random sample of classes or grades. And, within classes, we might even do a simple random sample of students. In this case, we have three or four stages in the sampling process and we use both stratified and simple random sampling. By combining different sampling methods we are able to achieve a rich variety of probabilistic sampling methods that can be used in a wide range of social research contexts.
What these sampling methods buy researchers is the ability to infer that their results are not merely descriptive of the sample, but of the population from which the sample was derived. The Iraqi death toll estimates based on probability sampling methods, while horrifying to ponder, are believable.

Now, as for the estimate counter that I've been displaying on the upper left-hand corner of this blog, we can make a few statements about its accuracy. First, as Just Foreign Policy notes, the counter uses as a starting value the Lancet study's 2006 estimate of 601,000 Iraqi deaths and then uses a median estimate based on IBC's estimations subsequent to the Lancet study. Just Foreign Policy is even kind enough to give us the formula used to derive their regularly updated estimate:
We multiple the Lancet number as of July 2006 by the ratio of current IBC deaths divided by IBC deaths as of July 1, 2006 (43,394).
The formula used is:
Just Foreign Policy estimate = (Lancet estimate as of July 2006) * ( (Current IBC Deaths) / (IBC Deaths as of July 1, 2006) )
Given that IBC's estimate tends to be conservative, the JFP estimate will probably be a bit conservative as well. Is the JFP counter "scientific"? No. But, given its consistency with externally valid studies, it does give us a rough ball-park idea of what is being done in our names.

I would also note that the sampling techniques used by ORB to estimate the Iraqi death toll were the same ones used to estimate the Rwanda genocide death toll - and those estimates were accepted at the time by, yes, you guessed it, the US government. It is also worth noting that sampling methodologies (along with whatever census records were available) were used to estimate the death toll from the Nazi Holocaust. Reputable estimates, for example, place the death toll among Europe's Jewish population at about 5 to 6 million. My understanding is that that number is routinely accepted among the German populace, and has been for quite some time. I doubt there are too many in Germany who "feel good" about what their nation did during WWII. Holocaust denial will often get one in legal hot water in Europe, and even where there are no legal constraints, Holocaust deniers are justifiably viewed as kooks and crackpots. We in the US also have no good reason to "feel good" for what our government has perpetrated in Iraq. Indeed, we will eventually be held accountable. Currently, our own genocide deniers hold sway in mainstream American discourse - indeed our early 21st century version of holocaust denial is still quite politically correct. The statistics available, though, are enough to show such denial for what it is (pure falsehood), and I would not be the least surprised to see the deniers thoroughly discredited in the coming decades, much as was the case with Germany several decades ago.

