Multilingual LibriSpeech (MLS) is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. It includes about 44.5K hours of English and a total of about 6K hours for other languages. This icefall training recipe was created for the restructured version of the English split of the dataset available on Hugging Face below.

The dataset is available on Hugging Face. For more details, please visit:

Dataset: https://huggingface.co/datasets/parler-tts/mls_eng
Original MLS dataset link: https://www.openslr.org/94

On-the-fly feature computation

This recipe currently only supports on-the-fly feature bank computation, since lhotse manifests and feature banks are not pre-calculated in this recipe. This should mean that the dataset can be streamed from Hugging Face, but we have not tested this yet. We may add a version that supports pre-calculating features to better match existing recipes.