Senyan Li e6a6727012
Add Tibetan Amdo dialect xbmu_amdo31 in egs (#706)
* add egs/xbmu_amdo31

* fix xbmu_amdo31/ASR/pruned_transducer_stateless5/train.py

* fix xbmu_amdo31/ASR/pruned_transducer_stateless5/asr_datamodule.py

* fix xbmu_amdo31/ASR/prepare.sh

* add RESULTS.md and README.md

* dix pruned_transducer_stateless5 decode.py

* add transducer stateless7

* fix transducer_stateless7

* fix RESULTS.md error

* Add pruned_transducer_stateless7 validation set results
2022-12-03 23:50:49 +08:00

16 lines
851 B
Markdown

# Introduction
About the XBMU-AMDO31 corpus
XBMU-AMDO31 is an open-source Amdo Tibetan speech corpus published by Northwest Minzu University.
publicly available on https://huggingface.co/datasets/syzym/xbmu_amdo31
XBMU-AMDO31 dataset is a speech recognition corpus of Amdo Tibetan dialect.
The open source corpus contains 31 hours of speech data and resources related
to build speech recognition systems,including transcribed texts and a Tibetan
pronunciation lexicon.
(The lexicon is a Tibetan lexicon of the Lhasa dialect, which has been reused
for the Amdo dialect because of the uniformity of the Tibetan language)
The dataset can be used to train a model for Amdo Tibetan Automatic Speech Recognition (ASR).
This recipe includes some different ASR models trained with XBMU-AMDO31.
[./RESULTS.md](./RESULTS.md) contains the latest results.