Update README.md

This commit is contained in:
Bailey Machiko Hirota 2025-08-14 17:02:44 +09:00 committed by GitHub
parent 8e186160d1
commit 556a3f0941
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -5,7 +5,6 @@
**Multilingual LibriSpeech (MLS)** is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. It includes about 44.5K hours of English and a total of about 6K hours for other languages. This icefall training recipe was created for the restructured version of the English split of the dataset available on Hugging Face below.
The dataset is available on Hugging Face. For more details, please visit:
- Dataset: https://huggingface.co/datasets/parler-tts/mls_eng
@ -14,6 +13,7 @@ The dataset is available on Hugging Face. For more details, please visit:
## On-the-fly feature computation
This recipe currently only supports on-the-fly feature bank computation, since `lhotse` manifests and feature banks are not pre-calculated in this recipe. This should mean that the dataset can be streamed from Hugging Face, but we have not tested this yet. We may add a version that supports pre-calculating features to better match existing recipes.
This recipe currently only supports on-the-fly feature bank computation, since `lhotse` manifests and feature banks are not pre-calculated in this recipe. This should mean that the dataset can be streamed from Hugging Face, but we have not tested this yet. We may add a version that supports pre-calculating features to better match existing recipes.\
<br>
<!-- [./RESULTS.md](./RESULTS.md) contains the latest results. -->
[./RESULTS.md](./RESULTS.md) contains the latest results. This MLS English recipe was primarily developed for use in the ```multi_ja_en``` Japanese-English bilingual pipeline, which is based on MLS English and ReazonSpeech.