A Fatal Flaw in the Collegiate Learning Assessment Test

By Kevin PossinFebruary 21, 2013 | Print

The Collegiate Learning Assessment (CLA) test has received a great deal of publicity. The Spellings Commission on the Future of Higher Education suggested using the CLA universally as a means of achieving better “accountability” in higher education—to ensure no undergraduate student left behind, so to speak.

Exactly how good is this test at measuring what its authors at the Council for Aid to Education (CAE) claim it measures—that is, the higher-order skills of critical thinking, analytic reasoning, problem solving, and written communication?

As a philosophy professor I’ve found that the test itself is not the major problem; it’s the CAE’s scoring of the test that is fatally flawed in ways undetected by psychometricians, assessment officers, and administrators.

How does the CLA measure the enhancement of the crucial cognitive skills it purports to assess? By means of open-ended, real-world, performance-based tests. The CLA claims to assess these skills holistically, unlike multiple-choice tests, which define critical thinking as a discrete set of subskills that can be assessed separately.

There are three formats for the CLA: one Performance Task and two Analytic Tasks—Make-an-Argument and Critique-an-Argument. All three tasks are designed to measure how well students evaluate and analyze information and draw conclusions on the basis of that analysis. The CAE has published rubrics for scoring student performance on each task in terms of how well students assess the relevance and strength of evidence, recognize flawed arguments, recognize logical flaws (for example, mistaking mere correlation for causation), construct cogent arguments, critically review alternative positions, and recognize that a problem is complex and lacking a clear answer.

To see how successful the CLA is at measuring these critical-thinking skills, let’s begin by examining the Performance Task. On the right side of their computer screens, students are given access to a Document Library, consisting of information sources such as letters, research reports, newspaper clippings, diagrams, tables, and charts. Students are to use these in preparing their answers to questions that appear on the left side of their screens along with a response box, into which the students have ninety minutes to key their answers.

In a retired test titled “Crime Reduction” provided in the CAE’s promotional materials, students are asked, for example, to assess an argument that hiring more police “will only lead to more crime” based on a chart illustrating a positive correlation between the number of police and the number of crimes. Students are scored on the basis of (1) agreeing that more police are causing more crime, (2) suggesting that “more crime might necessitate more police,” (3) saying that correlation does not imply causation or that the relation could go either way, or (4) offering a possible common cause. Only the first of these options is treated as incorrect; the other three are treated as correct, but must be stated in terms of uncertainty.

On the one hand, I am glad to see that the first answer is deemed just plain wrong. When I began researching the CLA (for my own “Field Guide to Critical Thinking Assessment”) in 2008, I was struck by how any answer was accepted so long as one offered some reason for it, no matter its lack of justificatory power. Students were told, “Address the issue from any perspective—no answer is right.” This invited sophistry instead of critical thinking, rationalization instead of justification. My fears were explicitly confirmed at that time by Marc Chun, CAE Director of Product Strategy, during a web conference. These fears are not totally removed today, however, because the CAE says it develops its Performance Tasks “using sufficient information to permit multiple reasonable solutions—to ensure that students could arrive at three or four different conclusions based on a variety of evidence to back up each conclusion.”

It’s not clear, however, that students are being properly assessed by the graders for recognizing such differences in degrees to which conclusions are supported by data. According to the CAE, the student should express uncertainty rather than certainty in the explanation of the correlation between the size of the police force and the frequency of crime. But “uncertainty” is not good enough: To say merely that it might be wrong that more police lead to more crime or that there might be some other relation explaining the correlation is platitudinous—this is an inductive case, after all, so error is, by definition, always logically possible. One needs to offer a more likely alternative explanation for this correlation—for example, that the increase in crime has caused the hiring of more police. Likewise, if the student offers a common-cause hypothesis, it must be plausible and not just some far-flung possibility.

Let’s now examine one of the two Analytic Tasks, Make-an-Argument. Students are given a prompt. The example currently provided in the CAE’s promotional materials is: “Government funding would be better spent on preventing crime than in dealing with criminals after the fact.” Students have forty-five minutes to take any position on the topic and argue for it. The critical-thinking criteria used in scoring student responses are: “Clarifying a position and supporting it with evidence; considering alternative viewpoints or counter points to their argument; developing logical, persuasive arguments; [and exhibiting] depth and complexity of thinking about the issues raised in the prompt.”

All’s well so far with respect to this task’s goals and rubric. A problem appears, however, when one looks closely at the CAE’s exemplary “high-quality response.” In the student’s six-paragraph response, I found (1) another platitudinous assertion, this time that it might be false that government funding would be better spent on crime prevention; (2) two strawman fallacies; (3) two false dichotomies; (4) a slippery-slope fallacy; (5) a contradictory argument that equally well supports the use of government funds for crime prevention; and (6) the failure to ultimately take any position on the issue. This student’s response read well, but it was just rhetoric and padding, not rational argumentation and criticism.

During a conference call with CAE’s Jeffrey Steedle and Marc Chun, the criticisms I just summarized were called “ad hoc.” This charge puzzled me, because it implied that my criticisms were without independent evidence. But the evidence I was using was the student responses provided by the CAE. If anything was an ad hoc rescue, it was their charge of my being ad hoc. Perhaps what they meant was that I was making a hasty generalization from a small sample. But I was not using a dangerously small random sample; I was using their examples that were offered as being representative of “high-quality” student responses. I can only presume that they don’t know what “ad hoc” means. And this naturally brings me to my conclusion.

After this examination of the CLA, I have discovered a fatal weakness among its strengths. Its goals are to be commended: measuring students’ higher-order skills of critical thinking, analytic reasoning, problem solving, and written communication are essential to higher learning. And its rubrics and scoring criteria, used to assess the students’ application of these higher-order skills, are spot on. However, while the graders seem to be looking for the right things, they are in fact falling for numerous informal fallacies, platitudes, and evasions, persuaded by arguments and criticisms that are simply not cogent. How can this happen?

I think it’s because the graders cannot see the trees for the forest. They are trained to take only a holistic view of critical thinking, ignoring the component skills of critical thinking that are often the focus of the multiple-choice items so disparaged by the CAE. CAE staff express skepticism that component generic critical-thinking skills can be clarified enough for study, instruction, and testing. But many of us in philosophy departments all over the world do it every day, as we teach courses in critical thinking and informal logic. We teach students how to identify and dissect arguments, taxonomize arguments as inductive or deductive so as to apply the appropriate cogency conditions for their assessment, and identify and avoid popular formal and informal fallacies that result from not meeting those cogency conditions. We also instruct students on how to synthesize and apply all those component critical-thinking skills in the holistic tasks of discovering and arguing for the most rational position on an issue while critically reviewing competing positions and their arguments. One cannot successfully do the holistic latter without learning the component former, just as one cannot build a brick staircase without using component bricks.

Who, then, is so poorly constructing the answer keys and scoring the students’ responses to the CLA? According to Jeffery Steedle, measurement scientist, at the CAE:

When we train scorers, we provide actual student responses as examples at each scale point. We also provide a document that we call “response features,” which catalogues common (valid) ideas that students may discuss in their responses. This is initially created by the person who developed the task, but it is commonly updated in light of what we see in student responses. The developers are a mix of measurement professionals affiliated with CAE and experienced CLA scorers. The majority of scorers have backgrounds in the liberal arts (predominantly English literature and composition) or education. They’re roughly split between having master’s degrees and PhDs. One requirement is prior experience evaluating student writing at the college level. (Personal communication, January 18, 2012, and February 1, 2012)

The task and response authors and the scorers come from a diversity of disciplines, such as measurement, English, and education—not applied logic. It was only natural, then, that the CAE had the RAND Corporation do a reliability and validity study of the CLA using a diverse panel of forty-one faculty. But, as Richard Paul discovered, university professors “have little understanding of critical thinking nor how to teach for it, but also wrongly and confidently think they do.” Here’s a typical fraction of Paul’s findings:

Though the overwhelming majority [of the faculty] (89%) claimed critical thinking to be a primary objective of their instruction, only a small minority (19%) could give a clear explanation of what critical thinking is. When asked how they conceptualized truth, a surprising 41% of those who responded to the question said that knowledge, truth, and sound judgment are fundamentally a matter of personal preference or subjective taste. (Paul, Elder, and Bartell 1995)

Conducting a validity study to see if one’s staff of scorers is accurately measuring critical-thinking skills by correlating its results with the judgments of a diverse set of professors is, to paraphrase Wittgenstein (1953), like going out and buying several copies of the tabloids to assure oneself that their story about the UFO landing is true.

Students need a lot of help practicing component critical-thinking skills across many contexts in order for those skills to become sufficiently generic to be holistically transferable. This is exactly the kind of instruction and practice students receive in a dedicated critical-thinking or informal logic course. Critical-thinking skills are not statistically significantly enhanced by content-specific courses like introduction to philosophy or chemistry, or by content-independent courses such as symbolic logic. So much, then, for leaving the task of magically enhancing critical-thinking skills to “immersion” and “critical thinking across the curriculum.”

To enhance their critical-thinking skills, students should be deliberately and explicitly studying critical thinking with the assistance of those with real expertise in those skills. Simply possessing a graduate degree is poor evidence of having acquired that expertise. Hence, the CAE must ensure that its authors and graders truly are experts in both the wide array of component critical-thinking skills and their holistic application to the projects of making rational decisions, solving problems, and writing position papers and critical reviews using cogent arguments and criticisms instead of fallacious ones. The CAE personnel may be excellent “measurement scientists,” ensuring the reliability of the CLA, but they appear to be missing the mark on its validity—measuring rhetorical skills instead of actual critical-thinking skills.

A related issue that I want to briefly mention is that the CLA is computer scored; graders confirm the computer-assigned scores on only 10% of student responses. The CAE reassures us that “CLA computer-assisted scoring is as accurate as two human scorers,” with the correlation of scores being .80–.88 between graders and .84–.93 between computer and grader. This may be evidence that the CAE’s scoring system is reliable, but consistency is a fickle virtue when one is consistently wrong. If the accuracy of the CLA graders is in doubt, and the computer-assisted grading system is strongly correlated with the graders’ scoring, then the accuracy of the computer-assigned grading is in doubt too.

I have identified a chronic ailment on the part of the CLA, given a diagnosis, and offered a prescription, recommending a shift in the way the CLA is scored, so that student responses are judged more on the basis of component critical-thinking skills and less on the basis of rhetorical skills. I’m not optimistic that the patient will heed my advice, however, because doing so would come at the high price of rendering the CAE’s large database of past scores obsolete. I think it’s worth the price. (Download my complete report on the CLA’s flaws at


Paul, R., Elder, L., and Bartell, T. 1995. “Study of 38 Public Universities and 28 Private Universities to Determine Faculty Emphasis on Critical Thinking in Instruction: Executive Summary.” Accessed December 2012.

Wittgenstein, L. 1953. Philosophical Investigations. Malden, Mass.: Blackwell.

Kevin Possin is professor emeritus of philosophy at Winona State University in Minnesota.

Why Wait?

Get the current newsletter and
Join Our Email List
Sign up to receive exclusive content and special offers in the areas that interest you.
Copyright © 2000-2015 by John Wiley & Sons, Inc. or related companies. All rights reserved.