Thursday, August 13, 2009

To Sample or Not To Sample?

For over a decade the idea of using statistical sampling instead of a straight count for the U.S. Census has been hot in some circles. Broadly speaking, Democrats have favored the method, while Republicans have opposed it.

Democrats expect to gain from using sampling, since traditional methods tend to undercount Democratic constituencies: minorities, illegal immigrants, and transients. And for the same reasons, Republicans expect to lose. So their respective support and opposition aren't surprising. Naturally, their public reasoning cannot be so self-centered.

Republicans base their formal opposition in part on the U.S. Constitution, which states: "The actual Enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years." The word "Enumeration" contains an implicit demand that each person be counted. The 14th Amendment includes similar language: "Representatives shall be apportioned among the several States according to their respective numbers, counting the whole number of persons in each State..." (emphasis mine). However, I find this line of argument unpersuasive. While an "enumeration" can mean a "list or catalog", it can also mean a "reckoning or count". While the former definition would exclude sampling, the latter would not. The phrase "whole number of persons" might be more persuasive, except that that phrase is pretty clearly used to void the original text's treatment of slaves: "Representatives and direct Taxes shall be apportioned among the several States which may be included within this Union, according to their respective Numbers, which shall be determined by adding to the whole Number of free Persons... three fifths of all other Persons." So the meaning of the Amendment is that every person shall be counted equally, i.e. there will be no more three-fifths rule.

Some Republicans also object that sampling can make the census less accurate. This is a better argument, since if true it is devastating to the whole basis of sampling. Technical Report 537, by Brown, Eaton, Freedman, Klein, Olshen, Wachter, Wells and Ylvisaker of the Department of Statistics, U.C. Berkeley, goes into great detail on the legal issues, techniques, and potential problems with sampling. The possibility that sampling will reduce accuracy is raised: "Will proposed adjustments to the census take out more error than they put in?" Unfortunately, the question is left unanswered: even these professors are unsure whether sampling will satisfy its intended purpose. The paper also points out that one problem sampling is meant to address - undercounting of certain groups - could be solved without sampling: "Census figures could be scaled up to match the demographic analysis totals for subgroups of the national population defined by age, sex and race (Section V). The people in a demographic group who are thought to be missing from the census would be added back, in proportion to the ones who are counted—state by state, block by block."

On the pro-sampling side, one of the most vocal supporters is Dr. Kenneth Prewitt, director of the Census Bureau from 1998-2001. He argues that sampling will be more accurate than a direct count, although Tech Report 537 seems to cast some doubt on this. Another point he makes is that the federal government government already uses statistical data for policy-making all the time. This point, though, is easily refuted. The Census Act as amended in 1976 states:

...except for the determination of the population for purposes of apportionment of Representatives in Congress among the several States, the Secretary shall, if he considers it feasible, authorize the use of the statistical method known as "sampling" in carrying out the provisions of this title. [emphasis mine]

The emphasized portion was the basis of the U.S. Supreme Court's 1998 decision in "U.S. House of Representatives v. U.S. Department of Commerce, et al.," which upheld the principle that sampling should not be used for apportionment.

My own take on the debate is that the Constitution demands that an accurate count be made by any means. I do not believe that a direct count is required. The census is a critical part of our representative democracy, though (which is why it is covered in our minimalist Constitution at all), and it is important that it be simple and transparent. The appearance of bias or political machination would erode civic trust in the most representative of our federal legislative bodies. A direct count that was just slightly less accurate than some other technique (sampling or something else) might still be favored due to its transparency.

The concerns raised in Tech Report 537 lead me to conclude that any improvement in accuracy from sampling would not be sufficiently great or certain enough that we should use it for the census. But future developments could change this balance: direct counts could become far less accurate (for some reason) or indirect counts far more accurate.

No comments:

Post a Comment