A systematic scoping review mapping the evidence-based foundations for designing AI-powered speaking practice systems for adult English as a Second Language learners.
This scoping review systematically maps the theories and mechanisms that support the design of AI-powered speaking practice systems for adult English as a Second Language (ESL) learners. Following PRISMA-ScR guidelines, the review synthesized 17 empirical studies published between 2015 and 2025, selected through a mechanism-focused purposive sampling strategy employing an AI-human hybrid triangulation protocol.
The synthesis identifies six foundational learning theories—including Skill Acquisition Theory, the Noticing Hypothesis, and Transfer-Appropriate Processing—that justify specific design choices. Key findings indicate that explicit, multi-modal feedback (ASR for pronunciation, pending prompts for grammar, LLM dialogue for discourse) significantly outperforms single-mode correction.
Furthermore, embedding AI tools within structured pedagogical frameworks (e.g., BOPPPS) amplifies their effectiveness by fostering metacognition. The report proposes a "Theory-Mechanism-Design" (TMD) logic for feature validation to guide the development of business-ready AI-ESL solutions.
Synthesize 17 empirical studies (2015-2025) to map theories and mechanisms driving AI-powered speaking practice systems. Focus on why interventions work and how to translate them into product-ready features.
PRISMA-ScR guided purposive sampling across 6 databases (Scopus, Web of Science, EBSCO, JSTOR, ERIC, PsycInfo). AI-human hybrid screening processed 2,877 records, retaining 17 mechanism-rich studies.
An AI-enabled speaking coach for real-world, impromptu fluency—where learners rehearse spontaneous speech, receive multi-layered feedback, and build long-term autonomy.
This review synthesizes theories and mechanisms to guide the design of AI-powered systems that support adult ESL learners in developing speaking proficiency.
Insufficient time for meaningful, interactive speaking practice in typical classroom settings. Limited in-class time and large class sizes make individualized practice impossible.
Mingyan et al., 2025
Fear of negative evaluation and linguistic insecurity leads to reluctance to speak and hinders skill development. Foreign language anxiety significantly impacts willingness to communicate.
Zheng et al., 2025
Teachers struggle to provide consistent, individualized, and immediate feedback required for effective learning due to time constraints.
Ngo et al., 2024; Sun, 2023
Structured using the Population-Concept-Context (PCC) framework
Scoping review with purposive, mechanism-rich sampling guided by PRISMA-ScR logic across six databases.
The AI-assisted review process replaced a second human screener with multi-model LLM consensus plus targeted human adjudication.
Primary evidence from included studies spanning practice scheduling, ASR feedback, LLM dialogue, and structured frameworks.
Explicit cues or elicited self-repairs outperformed generic transcripts or immediate supply of answers.
Dense scheduling accelerates procedural speech; interleaving supports transfer. Some metrics like silent pauses may not follow the same pattern.
Peer collaboration or LLM partners added motivational and noticing advantages beyond solo AI loops.
Lesson frameworks (BOPPPS, interleaved routines) improved discourse management, confidence, and adherence.
~2/3 of studies focused on East Asian university learners, highlighting a key limitation in generalizability.
"Theory" refers to a principled explanation of how knowledge is acquired, retained, and transferred in second-language speaking.
These mechanisms represent the HOW—the specific instructional strategies and technologies that operationalize the theories.
Tune ISI to retention goals; shorter ISI favors procedural performance.
Blocked (AAA) for fluency, interleaved (ABC) for transfer.
Recycle constructions across prompts with gradual variation.
Immediate phoneme/stress highlights with clear exemplars. g=0.86 for explicit.
Withhold answers to prompt self-repair, deepening processing.
ASR+Peer or LLM partner
BOPPPS framework
Short, frequent sessions
Mastery gates + fallbacks
Self-assessments & reflection
Mapping theoretical foundations to instructional mechanisms and design outcomes
ACT-R + Desirable Difficulties
BlockedβInterleavedβSpaced practice sequence optimizes both proceduralization and long-term retention. Mobile micro-practice enables distributed practice.
SCT + Interactionist
Partner scaffolding combines ZPD-appropriate support with negotiated meaning. LLM agents can provide both scaffolding and corrective feedback.
Self-Efficacy + Graduated Difficulty
Mastery experiences build confidence when combined with appropriate challenge levels. Confidence nudges reinforce perceived capability.
Four major findings synthesized from 17 empirical studies
Four evidence-based recommendations for AI-powered ESL speaking system design
Implications, limitations, and future directions
Majority of studies from East Asia (Japan, China, Taiwan). Findings may not generalize to all L1 backgrounds.
Most interventions 4-12 weeks. Long-term retention data beyond 1 year remains limited.
No empirical studies yet with GPT-4 class LLMs for speaking practice. Theoretical extrapolation required.
RCTs comparing human tutors vs. LLM conversation partners for speaking outcomes.
Optimal ISI scheduling algorithms for speaking skill proceduralization.
Replication studies across diverse L1 backgrounds and cultural contexts.
This scoping review synthesizes six foundational theories and ten instructional mechanisms into an actionable framework for AI-powered ESL speaking system design. The evidence supports a Blocked β Interleaved β Spaced practice progression, explicit tri-modal feedback, and scaffolded partner interactions as high-priority design features.
Theory-grounded design transforms AI-ESL systems from simple drill tools into sophisticated learning environments that proceduralize speaking skills, build robust phonological categories, and foster learner confidence.
As LLM capabilities advance, the theoretical foundations and empirical mechanisms identified here provide a principled roadmap for developing next-generation speaking practice systems that genuinely support second language acquisition.
Selected citations from the 17 empirical studies reviewed
Anderson, J. R. (2015). Cognitive Psychology and Its Implications (8th ed.). Worth Publishers.
Bandura, A. (1997). Self-Efficacy: The Exercise of Control. W.H. Freeman.
BjΓΆrk, R. A., & BjΓΆrk, E. L. (2020). Desirable difficulties in theory and practice. Journal of Applied Research in Memory and Cognition, 9(4), 475-479.
Golonka, E. M., et al. (2014). Technologies for foreign language learning: A review. Computer Assisted Language Learning, 27(1), 70-105.
Li, S., & DeKeyser, R. M. (2019). Implicit and explicit instruction. In J. W. Schwieter & A. Benati (Eds.), The Cambridge Handbook of Language Learning. Cambridge University Press.
Suzuki, Y. (2021). Optimizing fluency training for speaking skills transfer. Studies in Second Language Acquisition, 43(5), 1037-1061.
Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Harvard University Press.
Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press.
For the complete reference list, please see the full PDF document.