View publication

*Equal Contributors

Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that incorporating reading text through prompting benefits verbatim transcription performance over fine-tuning, and second, showing that it is feasible to augment speech recognition tasks for end-to-end miscue detection. We conducted two case studies---children's read-aloud and adult atypical speech---and found that our proposed strategies improve verbatim transcription and miscue detection compared to current state-of-the-art.

Related readings and updates.

Much of early literacy education happens at home with caretakers reading books to young children. Prior research demonstrates how having dialogue with children during co-reading can develop critical reading readiness skills, but most adult readers are unsure if and how to lead effective conversations. We present ContextQ, a tablet-based reading application to unobtrusively present auto-generated dialogic questions to caretakers to support this…
Read more

Podcasting has grown to be a popular and powerful medium for storytelling, news, and entertainment. Without transcripts, podcasts may be inaccessible to people who are hard-of-hearing, deaf, or deaf-blind. However, ensuring that auto-generated podcast transcripts are readable and accurate is a challenge. The text needs to accurately reflect the meaning of what was spoken and be easy to read. The Apple Podcasts catalog contains millions of podcast episodes, which we transcribe using automatic speech recognition (ASR) models. To evaluate the quality of our ASR output, we compare a small number of human-generated, or reference, transcripts to corresponding ASR transcripts.

Read more