Exploring AI-Generated Speech in Language Assessment
March 25, 2025

Sanshiroh Ogawa, Second Language Acquisition doctoral student, receives funding from the Educational Testing Service for his research.
By Jessica Weiss ’05
Developing language proficiency tests with listening comprehension questions can be a time-consuming and resource-heavy process: after scripts and questions are drafted, researchers recruit speakers, then spend hours recording, editing and producing high-quality audio materials for assessments.
But as artificial intelligence continues to advance and AI-generated voices become increasingly lifelike, some researchers are exploring whether they could be considered for use in language proficiency testing—and the implications of doing so.
“These AI voices are becoming so realistic, it is mind-blowing,” said Sanshiroh Ogawa, a doctoral student in the Second Language Acquisition program in the School of Languages, Literatures, and Cultures (SLLC). “It could have major implications for both research and practice.”
Ogawa, who also teaches Japanese, was recently awarded a $10,000 grant through the Educational Testing Service’s TOEFL® Young Students Research Program to measure whether native Japanese speakers who are taking the TOEFL Junior® Test—a version of the well-known TOEFL exam to measure English proficiency—can distinguish between human and AI-generated speech, and whether it makes a difference in their results. He plans to survey over 400 test takers later this year.
A key component of Ogawa’s study is the use of differential item functioning—or DIF—analysis, a statistical method that will help determine whether AI-generated speech makes listening questions easier or harder for some test takers compared to human speech. If AI voices affect the difficulty of test items, it could mean the test isn’t measuring listening skills it intends to.
But if proven effective, Ogawa said using AI-generated voices could drastically reduce the time and resources needed for voice recordings, ideally helping to lower the cost of the TOEFL for test takers, which can be quite high. For example, the TOEFL iBT, another version of the exam used for university admissions and other purposes, currently costs $270, without additional fees that may be required. Some people take it multiple times.
Looking ahead, Ogawa said he hopes to expand his research by exploring how and whether cognitive processing varies while listening to AI-generated speech versus human speech. He also emphasized the ethical considerations surrounding AI in language assessment.
“We need to make sure that no groups of people are disadvantaged by the use of AI,” he said. “For example, we may be able to use AI-generated speech for English tests targeting adults, but can we use it for tests targeting young learners without having negative impacts on them? How about learners with certain disabilities? I think there are many other vulnerable populations to consider.”