Standardized Tests: The Interpretation of Racial and Ethnic Gaps

Volume 2 Number 3
March 2000

Home Problems for the Heterodox Comments

STANDARDIZED TESTS: THE INTERPRETATION OF RACIAL AND ETHNIC GAPS

The interpretation of standardized test scores is full of traps that news media, politicians and interested citizens commonly fall into. Racial and ethnic gaps, and particularly their trends, are not always what they seem. A perceived gap decrease can really be an increase, and vice versa. In this essay we show how to make sense of test-score data. Examples are taken from Maryland (MSPAP) and Texas (TAAS) statewide exams, the bar exam and the National Board of Medical Examiners (NBME) Exam Part I. A coherent pattern emerges.

The appearance in our courts of these learned gentlemen of the law, who can make black appear white and white appear black, is forbidden.

-- Andorran Decree of 1864

A Father's Lament
Most stories filed each week by wire services end up in the trash. Some make it beyond a circle of regional interest, like the one filed on March 10, 1996 by the Associated Press. It was a story out of Waco, Texas that caught the eye of editors across the country. We read it over breakfast 1500 miles from Waco. It tells of an angry father upset by the failure of his daughter to pass a state high-school exit exam. He decided to do something about it. The story began:

WACO, Texas (AP) - When Lester and Coque Gibson's son failed the state's basic skills test eight years ago, they were dismayed. Last year, when their 16-year-old daughter failed, they were appalled.

The middle-class black couple had always hoped their children would defy the odds and grasp the American dream. But education is the key.

So Gibson demanded an accounting of the school district's test scores. And when he spread the numbers across his desk, he was shocked: Seventy-five percent of the black students and 66 percent of the Hispanic students failed the test in 1995, compared with only 37 percent of the white students.

The school district blames poverty and poor parenting for the failure rates. But Gibson blames institutional racism - teachers, he says, have low expectations of minority children.

"If we're going to get blamed for the education of our kids, then we may as well take control of their educational destiny and take a shot at it," Gibson said.

When kids fail, parents get frustrated. Sometimes they get angry. Lester Gibson got angry. Good test scores mean a lot. Students need them to get into college. Policemen and firemen need them to get promoted. Job seekers need them to get in the door. In Texas you need them to get a high school diploma. All this produces lots of rhetoric, some of it passionate, some of it scholarly, but most of it nonsense. We will try to sort it out.

The Waco results are not surprising. We know minorities score lower than whites on standardized exams. However, many scoring scenarios could have resulted. Whites failed at a 37 percent rate, blacks at 75 percent. But what if 50 percent of blacks had failed, or 60 percent or 90 percent? How likely was each of these possible results? We asked this question over breakfast a few years ago. More precisely, we asked: Given a white failure rate of 37 percent, what should we expect for the failure rates of blacks and Hispanics? A few simple assumptions provide the answers.

Suppose that the scores of each ethnic and racial group are normally distributed, that is, they fall on or close to a bell curve, like the distribution shown for whites in Figure 1. The distribution is scaled so that the area under it is unity. The white mean in standard deviation units (SD) is zero. The area under the curve from negative infinity to the passing score represents the fraction (0.37) of whites who failed the exam. When the passing score is -0.332 SD, that area is obtained.

Of all group differences, the best studied is between whites and blacks. The black-white gap is also the most reproducible, the black mean lagging behind the white by about one standard deviation. Consequently, we can estimate the black distribution by shifting the white distribution to the left by 1.0 SD, as in Figure 2. When we do this, the area representing the failing fraction increases to 0.75. That is, if whites fail at the rate of 37 percent, a black-white gap of 1.0 SD implies that blacks will fail at the rate of 75 percent, in agreement with their observed failure rate in Waco. Thus, the Waco school district results for blacks and whites were consistent with standardized test results observed universally. (Details of the calculation are given in Appendix A.)

Nationally, the Hispanic-white gap is more variable than the black-white gap. Part of the reason is that the term Hispanic applies to several groups genetically far removed from one another. In Waco most Hispanics are of Mexican descent -- Mestizos. For the Mexican American performance gap we used two sources: SAT scores and data from the 1966 Coleman Report.

The College Board divides Hispanics into several subgroups: Puerto Rican, Mexican American and Other Hispanic. The 1998 SAT results yield a white- Mexican American gap of 0.69 SD. The Coleman Report found a gap of 0.84 SD. We used both values to predict a Hispanic failure rate between 64 and 69 percent, bracketing the observed Hispanic failure rate of 66 percent.

The Usual Suspects
Writing about test scores in the November 1993 issue of The Atlantic Monthly, Duke University professor, Stanley Fish, renowned for scholarship in both law and literature, asserted: "Statistical studies have suggested that test scores reflect income and socioeconomic status. It has been demonstrated again and again that scores vary in relation to cultural background; the test's questions assume a certain uniformity in educational experience and lifestyle and penalize those who, for whatever reason, have had a different experience and lived different kinds of lives. In short, what is being measured by the SAT is not absolutes like native ability and merit but accidents like birth, social position, access to libraries, and the opportunity to take vacations or to take SAT prep courses."

Lani Guinier, Professor of Law at Harvard University, writing in the New York Times of June 24, 1997, argues, "But within every racial and ethnic group, test scores go up with family income. One explanation for this may be that students who come from better-off families can afford coaching for the test. Students from wealthier families also have other advantages. They are more likely to have been exposed to books and travel."

We know that test scores go up with family income. They also improve with socioeconomic status. Both trends are observed within all ethnic and racial groups. But before you blame income and socioeconomic status for the test score gaps, consider this:

Black children from the wealthiest families have mean SAT scores lower than white children from families below the poverty line.

Figure 3 shows how math SAT scores increase with family income for both whites and blacks, confirming Professor Guinier. However, black students from families earning more than $70,000 (1995 dollars) score lower than white students whose families earned less than $10,000. Figure 4 shows more of the same for the verbal SAT. Here too, the wealthiest blacks score below the poorest whites. (Complete data can be found in Appendix B.)

As for "social position, access to libraries, and the opportunity to take vacations or to take SAT prep courses," consider this:

Black children of parents with graduate degrees have lower SAT scores than white children of parents with a high-school diploma or less.

Figures 5 and 6 show, respectively, how math and verbal SAT scores for blacks and whites vary with parental levels of education. In both cases, black children of parents with graduate degrees score lower than white children whose parents have a high-school diploma or less.

When Professor Fish asserts that test scores reflect income and socioeconomic status, he is, of course, correct. We cannot conclude, however, as he does, that either is to blame for the black-white SAT gap. Figures 5 and 6, show that at every level of income and social advantage the gap exists. In fact, it remains remarkably constant when economic and cultural levels are controlled.

Professor Guinier observes that within every racial and ethnic group, test scores go up with family income. Guinier leaves no doubt she is aware in detail of the SAT data. Her syllogism begins, ". . . students who come from better-off families can afford coaching for the test . . . They are more likely to have been exposed to books and travel." We are to complete it with: Minorities have less income and cultural exposure; therefore, minorities have lower scores.

More SAT data may be found in Appendix B. There, you will discover that Asians mostly sit on top of the heap; that whites, Mexican Americans and blacks follow in that order. Some details prove interesting. For example, whites enjoy a verbal advantage over Asians that disappears at high levels of income and social advantage. Regrettably, the College Board no longer discloses these data. In 1996, they stopped publishing performance by income and parental education disaggregated by race and ethnicity.

Trends in the Gap
Test scores sell panty hose. Scores make news. Stories continue to appear months after they are released. Parents fret about them. Politicians demagogue them. Businesses make relocation decisions based on them. Property values vary with school performance, and school administrators put their jobs on the line. Everybody wants high pass rates. Pressure is everywhere. And the racial gap is never far from the surface.

Several states have their own statewide exams. Results are usually reported as pass rates, often at several achievement levels. The statewide exams are designed to maintain a constant level of difficulty from year to year, so that changes in performance signal progress or backsliding. A drop of even a few points is cause for hand-wringing. In contrast, improvements are celebrated.

Reporters on the education beat and their editors define the racial gap as a difference in pass rates between two groups, one of which is white. This can be dangerously misleading when used to track gap trends. For example, consider the Maryland School Performance Assessment Program (MSPAP). Between 1993 and 1999 the black-white gap for eighth grade math increased from 36.8 to 42.3 percentage points causing extensive hand-wringing. But spare the lotion. The difference between the black and white performance actually decreased slightly over this time. We will explain.

Maryland is very candid in reporting the results of its statewide exam. The data are completely disaggregated, making gap analysis possible school by school, county by county, statewide and by race. Table 1 presents eighth-grade MSPAP results for the years 1993 through 1999.

MSPAP Grade 8 Mathematics	Percent Satisfactory
MSPAP Grade 8 Mathematics	1993	1994	1995	1996	1997	1998	1999
African American	11.4	15.3	19.0	17.2	19.5	21.3	22.2
White (not Hispanic)	48.2	53.1	54.8	57.8	60.7	61.8	64.5
Gap (percent)	36.8	37.8	35.8	40.6	41.2	40.5	42.3
Table 1. Percent of eighth graders at the satisfactory level in MSPAP math tests from 1993 to 1999. Black-white gaps are computed as pass rate differences.

Figure 7 graphically displays the trend in the black-white gap.

Six years after the 1993 administration of the eighth-grade math test, both whites and blacks show improved pass rates. Whites, however, had improved more. But did they? Pass rate differences are a very arbitrary way to measure racial gaps. Their principal virtue is convenience. If we want to track a difference between two populations, looking at the difference between their means is best. In this way we can see whether the populations are growing together or apart.

Given the pass rates of two groups, we can compute their mean difference. (See Appendix A for how.) Table 2 adds the mean difference to the MSPAP data.

MSPAP Grade 8 Mathematics	Percent Satisfactory
MSPAP Grade 8 Mathematics	1993	1994	1995	1996	1997	1998	1999
African American	11.4	15.3	19.0	17.2	19.5	21.3	22.2
White (not Hispanic)	48.2	53.1	54.8	57.8	60.7	61.8	64.5
Gap (percent)	36.8	37.8	35.8	40.6	41.2	40.5	42.3
Mean Difference (SD)*	1.16	1.10	1.00	1.17	1.13	1.10	1.14
Table 2. Percent of eighth graders at the satisfactory level in MSPAP math tests from 1993 to 1999. Black-white gaps are given as pass rate differences in percentage points and as mean differences in SD. The percentage point gap increases over time, but the mean difference between the distributions remains almost constant, in fact slightly decreases.
*A method for computing mean differences from pass rates is given in Appendix A.

Figure 8 shows both computations of the black-white gap displayed graphically.

For the years 1993 to 1999 the black-white mean difference remained nearly constant. To check our calculation, we also calculated black-white mean differences from pass rates at the excellent level. The mean difference is a property of the distributions, and should not depend on the region of the distribution curve from which it is computed. The average mean difference computed from the satisfactory and excellent level pass rates were 1.11 SD (0.05) and 1.15 SD (0.05) respectively. (The numbers within parentheses are rms deviations.) Agreement is good, given the assumptions of the calculation and the dispersion of the data.

Table 3 shows pass rates at the excellent level, and the corresponding gaps.

MSPAP Grade 8 Mathematics	Percent Excellent
MSPAP Grade 8 Mathematics	1993	1994	1995	1996	1997	1998	1999
African American	0.260	0.463	0.868	0.942	1.29	1.80	2.56
White (not Hispanic)	5.94	7.42	8.96	11.7	13.0	16.6	22.1
Gap (percent)	5.68	6.96	8.09	10.8	11.7	14.8	19.5
Mean Difference (SD)	1.23	1.16	1.04	1.16	1.12	1.13	1.18
Table 3. Percent of eighth graders at the excellent level in MSPAP math tests from 1993 to 1999. The percentage point gap increases monotonically and sharply over time, but the mean difference between the black and white distributions remains almost constant.

Difficulties associated with expressing gaps as pass-rate differences are even more dramatically illustrated in Figure 9.

Calculating the gap as a difference in pass rates makes it appear that over time African American and white eighth graders spread apart on the MSPAP math test. Imagine the distress of the hand-wringers upon discovering that the eighth grade math gap at the excellent level increased monotonically by 13.8 percentage points between 1993 and 1999. All for naught because in fact the gap remained quite stable and even (insignificantly) narrowed over this time. We trust that reporters and editors, who read Appendix A, will render more appropriate accounts of test-score gaps in the future.

Can Racial Gaps Be Narrowed?
Yes and no. The SAT oozes g. So long as the test retains its integrity, there is little chance that the black-white gap will narrow significantly. Spearman's hypothesis has held up too long to expect otherwise. Statewide tests are different. They are designed to pass most students, though this has not yet occurred universally. Maryland has not had much luck with its MSPAP tests. Pass rates are low all around, and the black-white gap stubbornly resists closing. Texas has made some progress in narrowing the pass-rate gap in its 10th grade high school exit exam required for graduation, but the exit exam is weakly g loaded. On the more g-loaded Texas Assessment of Academic Skills (TAAS) tests the achievement gap has resisted all attempts at narrowing.

As part of its comprehensive statewide testing program (TAAS), Texas requires its high-school students to pass an exit exam as a graduation requirement. The passing score is 70 percent. Students first attempt the exam in the 10th grade, and if necessary are given seven additional chances to pass. If after eight tries they do not pass, students may continue to take the exam after completing their formal schooling. When ultimately they do pass, they are awarded a diploma. We computed the black-white and Hispanic-white mean differences for first attempts at the exit exam from 1994 to 1999. Figure 10 displays the trends. A small but significant narrowing of the gaps is apparent.

Need a Lawyer?
In 1988, New York State's Chief Judge established a committee, The New York State Judicial Commission on Minorities. Its purpose was to study the presence and effects of racism in the state's courts. Buried in its final 2000-page report was the finding that minorities passed the New York bar exam at significantly lower rates than whites. The commission found that for the period spanning 1985 through 1988, first-attempt pass rates were 31.1 percent for blacks and 73.1 percent for whites. Applying the methods of Appendix A, we translated these pass rates to a corresponding black-white mean difference of 1.11 SD.

Several years later, commenting on the Commission's findings, Edna Wells Handy wrote in The New York Law Journal of April 1996, "Determining whether those pass rates have remained constant since the Commission's report must await the completion and dissemination of the national bar exam study presently being conducted by the Law School Admission Council." Ms. Handy was referring to the most ambitious study of law students ever attempted. The Law School Admission Council is the organization that administers the Law School Admission Test (LSAT). At the time Handy's article appeared, it was tracking 27,000 students who enrolled in U.S. law schools in the fall of 1991. The students were followed from law school entry to the bar exam. The Council issued its report in 1998, finding that 92 percent of white law-school graduates passed the bar exam on the first attempt, as did 61 percent of black graduates. This implies a black-white mean difference of 1.13 SD.

The Council also reported the results of repeated attempts at the bar exam. It found that eventually 97 percent of white and 78 percent of black law graduates passed, corresponding to a black-white mean difference of 1.11 SD.

The one-plus SD gap between black and white lawyers stubbornly refused to go away. Others, however, viewed the Council's findings differently. "This study strongly refutes the myth that affirmative action policies tend to set students up for failure on the bar exam," hallucinated Henry Ramsey Jr., a retired California state judge and member of the committee that oversaw the study.

Tamar Lewin, covering the Council's report for the New York Times, characterized the Commission's findings as "likely to provide important support for advocates of affirmative action." Her column appeared under the headline: "Minorities Achieve High Success Rate in Bar Exams, Study Says."

The fact is that affirmative action has stratified the bar by race and ability. Black lawyers lag behind their white colleagues in measured ability by about 1.1 SD. Affirmative action creates a racial gap at law-school entry that never goes away. When entrance credentials are controlled, racial differences mostly vanish. More than 20,000 adult blacks in the U.S. have an IQ of 130 or more, but because of affirmative action, the chance that your black lawyer will be one of them is vanishingly small.

Need a Doctor?
Medical school admission is uncommonly competitive, there being many more applicants than slots. The competition is so intense that if black applicants were held to the same admission standards as whites and Asians, we would turn out almost no black physicians.

We now have a double standard for admission to medical school brought about by affirmative action. As a result, two tiers of American physicians have emerged separated by race and ability.

We have seen that law students admitted under affirmative action do not measure up to their white and Asian peers as law-school graduates. Can we say the same for doctors? We will quantify the performance gap for physicians.

A benchmark for medical competence is the National Board of Medical Examiners (NBME) Exam Part I. Every medical student in the US must pass it to become a physician. Students take the exam two years before graduation. It is one of several ways the profession keeps itself honest. The most comprehensive study of NBME pass rates was published in 1994 by Beth Dawson et al (JAMA 1994 272:9 674-9). The authors examined the performance of every medical student in the US taking the June exam for the first time over the years 1986, 1987 and 1988. Dawson and her colleagues found that white medical students passed the NBME test at a rate of 87.7 percent and blacks at 48.9 percent. Again, using methods described in Appendix A, we found these pass rates equivalent to a black-white mean difference of 1.19 SD. Mean differences for the bar and NBME exams are conspicuously similar. The one-plus SD gap does not yield easily.

Notably, when Dawson's study looked at entering students with similar academic credentials, the pass rates on the NBME exam were independent of race, pointing an accusing finger directly at affirmative action. For all its good intentions, affirmative action has created two levels of competence in American medicine, separated by a bit more than one standard deviation. When you are wheeled into the ER at 2:00 a.m., if you pray, pray that the black doctor who greets you entered medical school through the front door.

APPENDIX A. RELATIONSHIPS BETWEEN PASS RATES AND MEAN DIFFERENCES.

Assume all distributions are Gaussian with a common standard deviation. Let P(x) be the probability distribution for whites centered on x = 0. (Standard units are used throughout.) Let Δ be the difference between the white and minority distribution means (white - minority). Then the probability distribution for the minority group is P(x+Δ). If the passing fraction of whites is f_W, then the passing score, λ, is given by the solution of:

If the mean difference, Δ, is known for a minority group, the passing fraction of the minority group, f_M, may then be computed as:

or more conveniently from the transformation:

If passing rates are known for both whites and a minority group, the mean difference between the two distributions may be computed by solving (A.3) for Δ.

The distribution function, P(x) is given by

APPENDIX B. SAT 1995 DATA AND GRAPHS

1995 SAT Scores vs. Family Income
Estimated Family Income	White		Black		Asian		Mexican American
Estimated Family Income	M	V	M	V	M	V	M	V
$0-10k	460	409	355	320	482	343	386	330
$10k-$20k	459	418	369	337	500	363	403	349
$20k-$30k	471	428	382	352	518	397	420	369
$30k-$40k	478	433	393	362	528	415	431	384
$40k-$50k	488	439	405	375	537	432	446	399
$50k-$60k	498	446	414	382	549	444	456	409
$60k-$70k	506	453	415	385	558	453	458	415
$70k plus	533	475	442	407	595	476	478	430

1995 SAT Scores vs. Parental Education
Highest Level of Parental Education	White		Black		Asian		Mexican American
Highest Level of Parental Education	M	V	M	V	M	V	M	V
Less than HS diploma	418	374	347	308	478	338	389	331
HS diploma	459	414	371	340	502	382	422	376
Associate's degree	472	425	384	354	502	398	435	390
Bachelor's degree	513	460	409	379	550	426	468	420
Graduate degree	545	490	438	406	592	482	479	430