Senyan Li e6a6727012
Add Tibetan Amdo dialect xbmu_amdo31 in egs (#706)
* add egs/xbmu_amdo31

* fix xbmu_amdo31/ASR/pruned_transducer_stateless5/train.py

* fix xbmu_amdo31/ASR/pruned_transducer_stateless5/asr_datamodule.py

* fix xbmu_amdo31/ASR/prepare.sh

* add RESULTS.md and README.md

* dix pruned_transducer_stateless5 decode.py

* add transducer stateless7

* fix transducer_stateless7

* fix RESULTS.md error

* Add pruned_transducer_stateless7 validation set results
2022-12-03 23:50:49 +08:00

851 B

Introduction

About the XBMU-AMDO31 corpus XBMU-AMDO31 is an open-source Amdo Tibetan speech corpus published by Northwest Minzu University. publicly available on https://huggingface.co/datasets/syzym/xbmu_amdo31

XBMU-AMDO31 dataset is a speech recognition corpus of Amdo Tibetan dialect. The open source corpus contains 31 hours of speech data and resources related to build speech recognition systems,including transcribed texts and a Tibetan pronunciation lexicon. (The lexicon is a Tibetan lexicon of the Lhasa dialect, which has been reused for the Amdo dialect because of the uniformity of the Tibetan language) The dataset can be used to train a model for Amdo Tibetan Automatic Speech Recognition (ASR).

This recipe includes some different ASR models trained with XBMU-AMDO31.

./RESULTS.md contains the latest results.