Memorising vocabulary is an important aspect of formal foreign-language learning. Advances in cognitive psychology have led to the development of adaptive learning systems that make vocabulary learning more efficient. One way these computer-based systems optimize learning is by measuring learning performance in real time to create optimal repetition schedules for individual learners. While such adaptive learning systems have been successfully applied to word learning using keyboard-based input, they have thus far seen little application in word learning where spoken instead of typed input is used. Here we present a framework for speech-based word learning using an adaptive model that was developed for and tested with typing-based word learning. We show that typing- and speech-based learning result in similar behavioral patterns that can be used to reliably estimate individual memory processes. We extend earlier findings demonstrating that a response-time based adaptive learning approach outperforms an accuracy-based, Leitner flashcard approach in learning efficiency (demonstrated by higher average accuracy and lower response times after a learning session). In short, we show that adaptive learning benefits transfer from typing-based learning, to speech based learning. Our work provides a basis for the development of language learning applications that use real-time pronunciation assessment software to score the accuracy of the learner’s pronunciations. We discuss the implications for our approach for the development of educationally relevant, adaptive speech-based learning applications.