The SAT is to standardized testing what the floppy disk is to data storage.
Providers of some of the most popular standardized tests are rethinking their offerings as new AI tools are challenging traditional techniques for finding out what students know — and allowing new ways to give and score tests.
For instance, ETS, one of the oldest and largest players in standardized testing, is moving away from traditional college entrance exams like the SAT to focus on new approaches to measure the skills and persistence of students.
It’s been a period of upheaval for academic testing in general and for the 75-year-old nonprofit ETS in particular. During the pandemic, concerns about equity and accessibility prompted at least 1,600 colleges to make admissions tests like the SAT optional, at least temporarily. Then, earlier this year, ETS said that it will no longer administer the SAT for the College Board. A College Board spokesperson, Holly Stepp, says that because the group has moved fully to a digital format, “we now develop and administer the SAT and PSAT-related assessments directly.”
ETS launched a rebranding effort in April to focus on what it called “talent solutions” rather than just academic testing. And it has downsized to readjust — it offered buyouts earlier this year to a large number of its employees, after laying off 6 percent of its staff last September.
“The assessments that ETS will deliver in the future will be more behavioral than they are cognitive,” says Kara McWilliams, vice president of product innovation and development at ETS. “What that means is that we are going to create experiences that allow us to measure the behaviors of a user, not what the answer to the question is,” she adds. “So we want to look at things like perseverance. And when we’re thinking about how we build these [assessment] experiences, we’re creating nudges inside them [so] that we can understand things like, ‘Did you ask for a hint? Did you reach out to a friend? Did you ask for more time?’ So what are the behaviors that you’re using to get to the answer? We don’t really care what the answer is, but how did you get there?”
One example of that work is the group’s new focus is its Skills for the Future initiative, a joint effort with the Carnegie Foundation for the Advancement of Teaching to reimagine assessments.
The goal of the effort is to move away from requiring students to stop everything to sit in a room to answer questions for a couple hours, says Timothy Knowles, president of the Carnegie Foundation. Instead, he says, the group is experimenting with using data that schools have about their students — including from after-school activities like sports, clubs and internships — to measure and track progress on skills including communication, collaboration and critical thinking.
“Can you look at those data in different ways and extrapolate from those data the extent to which a young person is developing certain skills?”— Timothy Knowles, president of the Carnegie Foundation
“The idea is to build an insight system that’d be useful for kids and families and educators,” he says. “So they would understand where people are on a developmental arc in terms of developing these skills that we know are predictive of success. So we’re figuring out ways of visualizing this in a way that’s not punitive or problematic for kids.”
Schools and school systems already have rich data that they don’t make much use of, he says. The question, he says, is “can you look at those data in different ways and extrapolate from those data the extent to which a young person is developing certain skills?”
The effort has partnered with education leaders in five states — Indiana, Nevada, North Carolina, Rhode Island and Wisconsin — to help pilot test the approach starting in January, Knowles says. Officials at ETS and the Carnegie Foundation say they will use new forms of AI to do things like review and tag existing student work, analyze state education data and run interactive assessments — though not all of these uses will be ready by January.
Experts are urging caution, however, especially when AI is used in analyzing data and building test questions.
“We still have a lot to learn as far as whether biases are baked into AI use,” says Nicol Turner Lee, director of the Center for Technology Innovation at the Brookings Institution. “AI is only as good as the training data, and if the training data is still skewed to more privileged students who have many more resources than those from underprivileged schools, that will have a negative impact on them.”
She points to a controversial experiment in 2020, during the height of the pandemic, when many schools had to close and operate remotely. Since many students could not take the in-person end-of-year exam offered by the International Baccalaureate Organization, the group decided to build a model to predict what the student scores would have been based on historical data.
“They developed an algorithm that essentially predicted which schools would have the higher likelihood of diploma-quality graduates,” she says.
Thousands of students complained about their resulting scores, and some governments launched formal investigations. “The algorithm itself did not take into account the location of the school and the resources of the schools,” says Turner Lee.
The researcher says ETS officials brought her in to speak at a recent event, where she shared her perspective and concerns about the approach of using AI in testing and assessment.
“Think about how hard we’ve worked to sort of address inequality in standardized testing,” she says. “You want to be cautious about going all in because the very datasets that are training the AI have the higher likelihood of being historically biased.”
Other test providers are experimenting with using AI to create new kinds of test questions.
“AI is only as good as the training data, and if the training data is still skewed to more privileged students who have many more resources than those from underprivileged schools, that will have a negative impact on them.”
— Nicol Turner Lee, director of the Center for Technology Innovation at the Brookings Institution
Next year’s edition of the Program for International Student Assessment, or PISA, exam — an international test measuring reading, mathematics and science literacy of 15-year-olds — is expected to include new kinds of “performance tasks” designed to see how students approach a problem, and which will be scored by AI.
McWilliams, of ETS, says she’s had a “mindset shift” in the past year about how she thinks about AI in testing.
Whereas last year, her focus was on using AI to help create traditional multiple-choice questions, now, she says, “what I am really focused on now is dynamic generation of content on the fly. And not for multiple-choice questions, but for more experiential tasks that allow individuals to demonstrate what they know and can do most meaningfully.”
One example is a new AI tool called Authentic Interview Prep, which uses AI to help people hone their job interview skills.
“A lot of people get nervous when they do interviews,” she says. “And so what we’re trying to do is create experiences that allow people to understand how to have interviews more meaningfully. And AI does things like give me feedback on the tone of my voice or the rate of my speech or my eye contact with you. And then on the fly, it’ll give me a haptic on my watch and say, ‘Kara, calm down. You’re speaking too quickly.’ Or, ‘Make better eye contact.’”
Of course, that kind of test isn’t about getting into college or grad school. It’s a different kind of measurement than the SAT — which she says will still have some role for the foreseeable future: “Where I’m thinking now is, ‘What is the content we want to create to help people with the experiences that they’re engaging with daily?’”