[Vishal Gupta] An old tool to assess AI‘s ability

Published : Mar 9, 2023 - 05:24 Updated : Mar 9, 2023 - 05:24

“AI passes US medical licensing exam.” “ChatGPT passes law school exams despite ‘mediocre’ performance.” “Would ChatGPT get a Wharton MBA?”

Headlines such as these have recently touted (and often exaggerated) the successes of ChatGPT. These successes follow a long tradition of comparing AI‘s abilities to those of human experts, such as Deep Blue’s chess victory over Garry Kasparov in 1997, IBM Watson’s “Jeopardy!” victory over Ken Jennings and Brad Rutter in 2011, and AlphaGo’s victory in the game Go over Lee Se-dol in 2016.

The implied subtext of these recent headlines is more alarmist: AI is coming for your job.

The problem is most people have little AI literacy -- an understanding of when and how to use AI tools effectively. What we need is a straightforward, general-purpose framework for assessing the strengths and weaknesses of AI tools that everyone can use.

To meet this need, my research group turned to an old idea from education: Bloom’s Taxonomy. First published in 1956 and later revised in 2001, Bloom’s Taxonomy is a hierarchy describing levels of thinking in which higher levels represent more complex thought. Its six levels are: 1) Remember -- recall basic facts; 2) Understand -- explain concepts; 3) Apply -- use information in new situations; 4) Analyze -- draw connections between ideas; 5) Evaluate -- critique or justify a decision or opinion; and 6) Create -- produce original work.

These six levels are intuitive, even for non-experts, but specific enough to make meaningful assessments. Moreover, Bloom’s Taxonomy isn’t tied to a particular technology -- it applies to cognition broadly. We can use it to assess the strengths and limitations of ChatGPT or other AI tools.

My research group has begun assessing ChatGPT through the lens of Bloom’s Taxonomy by asking it to respond to variations on a prompt, each targeting a different level of cognition.

For example, we asked an AI tool: “Suppose demand for COVID vaccines this winter is forecast to be 1 million doses plus or minus 300,000 doses. How much should we stock to meet 95 percent of demand?” This one would be an Apply task. We then modified the question, asking it to “Discuss the pros and cons of ordering 1.8 million vaccines” -- an Evaluate level task. Then we compared the quality of the two responses and repeated this exercise for all six levels of the taxonomy.

Preliminary results are instructive. ChatGPT generally does well with Recall, Understand and Apply tasks but struggles with the more complex Analyze and Evaluate tasks. With the first prompt, ChatGPT responded well by applying and explaining a formula to suggest a reasonable vaccine quantity (albeit making a small arithmetic mistake in the process).

With the second, however, ChatGPT waffled unconvincingly about having too many or too few vaccines. It made no quantitative assessment of these risks, did not account for the logistical challenges of cold storage for such an immense quantity and did not warn of the possibility that a vaccine-resistant variant might arise.

We are seeing similar behavior for different prompts across these taxonomy levels. Thus, Bloom’s Taxonomy allows us to draw more nuanced assessments of the AI technology than a raw human versus AI comparison.

As for our doctor, lawyer and consultant, Bloom’s Taxonomy also provides a more nuanced view of how AI might someday reshape -- not replace -- these professions. Although AI may excel at Recall and Understand tasks, few people consult their doctor to inventory all possible symptoms of a disease or ask their lawyer to recite case law verbatim or hire a consultant to explain the theory of Porter’s Five Forces.

But we turn to experts for higher-level cognitive tasks. We value our doctor’s clinical judgment, our lawyer’s ability to synthesize precedent and advocate on our behalf, and a consultant’s ability to identify an out-of-the-box solution. These skills are Analyze, Evaluate and Create tasks, levels of cognition where AI technology currently falls short.

Using Bloom’s Taxonomy, we can see that effective human-AI collaboration will largely mean delegating lower-level cognitive tasks so that we can focus our energy on more complex, cognitive tasks. Thus, instead of dwelling on whether an AI tool can compete with a human expert, we should be asking how well an AI tool’s capabilities can be used to help foster human critical thinking, judgment and creativity.

Of course, Bloom’s Taxonomy has its own limitations. Many complex tasks involve multiple levels of the taxonomy, frustrating attempts at categorization. And Bloom’s Taxonomy does not directly address issues of bias or racism, a major concern in large-scale AI applications. But while imperfect, Bloom’s Taxonomy remains useful. It is simple enough for everyone to grasp, general-purpose enough to apply to a broad range of AI tools, and structured enough to ensure we ask a consistent, thorough set of questions of those tools.

Much like the rise of social media and fake news requires us to develop better media literacy, tools such as ChatGPT demand that we develop our AI literacy. Bloom’s Taxonomy offers a way to think about what AI can do -- and what it can’t -- as this type of technology becomes embedded in more parts of our lives.

Vishal Gupta

Vishal Gupta is an associate professor of data sciences and operations at the USC Marshall School of Business and holds a courtesy appointment in the department of industrial and systems engineering. He wrote this piece for the Los Angeles Times. -- Ed.

(Tribune Content Agency)

By Korea Herald (khnews@heraldcorp.com)