From 4e2a4fdcd8fe39dc18d9c2d4bf8b0f4dcc554981 Mon Sep 17 00:00:00 2001 From: Kinan Martin Date: Wed, 16 Apr 2025 08:13:59 +0900 Subject: [PATCH] readme --- egs/mls_english/ASR/README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 egs/mls_english/ASR/README.md diff --git a/egs/mls_english/ASR/README.md b/egs/mls_english/ASR/README.md new file mode 100644 index 000000000..bacc237db --- /dev/null +++ b/egs/mls_english/ASR/README.md @@ -0,0 +1,19 @@ +# Introduction + + + +**Multilingual LibriSpeech (MLS)** is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. It includes about 44.5K hours of English and a total of about 6K hours for other languages. This icefall training recipe was created for the restructured version of the English split of the dataset available on Hugging Face below. + + + +The dataset is available on Hugging Face. For more details, please visit: + +- Dataset: https://huggingface.co/datasets/parler-tts/mls_eng +- Original MLS dataset link: https://www.openslr.org/94 + + +## On-the-fly feature computation + +This recipe currently only supports on-the-fly feature bank computation, since `lhotse` manifests and feature banks are not pre-calculated in this recipe. This should mean that the dataset can be streamed from Hugging Face, but we have not tested this yet. We may add a version that supports pre-calculating features to better match existing recipes. + +