zr_jin 0f1bc6f8af
Multi_zh-Hans Recipe (#1238)
* Init commit for recipes trained on multiple zh datasets.

* fbank extraction for thchs30

* added support for aishell1

* added support for aishell-2

* fixes

* fixes

* fixes

* added support for stcmds and primewords

* fixes

* added support for magicdata

script for fbank computation not done yet

* added script for magicdata fbank computation

* file permission fixed

* updated for the wenetspeech recipe

* updated

* Update preprocess_kespeech.py

* updated

* updated

* updated

* updated

* file permission fixed

* updated paths

* fixes

* added support for kespeech dev/test set fbank computation

* fixes for file permission

* refined support for KeSpeech

* added scripts for BPE model training

* updated

* init commit for the multi_zh-cn zipformer recipe

* disable speed perturbation by default

* updated

* updated

* added necessary files for the zipformer recipe

* removed redundant wenetspeech M and S sets

* updates for multi dataset decoding

* refined

* formatting issues fixed

* updated

* minor fixes

* this commit finalize the recipe (hopefully)

* fixed formatting issues

* minor fixes

* updated

* using soft links to reduce redundancy

* minor updates

* using soft links to reduce redundancy

* minor updates

* minor updates

* using soft links to reduce redundancy

* minor updates

* Update README.md

* minor updates

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* minor updates

* minor fixes

* fixed a formatting issue

* Update preprocess_kespeech.py

* Update prepare.sh

* Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* removed redundant files

* symlinks added

* minor updates

* added CI tests for `multi_zh-hans`

* minor fixes

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-09-13 11:57:05 +08:00

39 lines
982 B
Markdown

# Introduction
This recipe includes scripts for training Zipformer model using multiple Chinese datasets.
# Included Training Sets
1. THCHS-30
2. AiShell-{1,2,4}
3. ST-CMDS
4. Primewords
5. MagicData
6. Aidatatang_200zh
7. AliMeeting
8. WeNetSpeech
9. KeSpeech-ASR
|Datset| Number of hours| URL|
|---|---:|---|
|**TOTAL**|14,106|---|
|THCHS-30|35|https://www.openslr.org/18/|
|AiShell-1|170|https://www.openslr.org/33/|
|AiShell-2|1,000|http://www.aishelltech.com/aishell_2|
|AiShell-4|120|https://www.openslr.org/111/|
|ST-CMDS|110|https://www.openslr.org/38/|
|Primewords|99|https://www.openslr.org/47/|
|aidatatang_200zh|200|https://www.openslr.org/62/|
|MagicData|755|https://www.openslr.org/68/|
|AliMeeting|100|https://openslr.org/119/|
|WeNetSpeech|10,000|https://github.com/wenet-e2e/WenetSpeech|
|KeSpeech|1,542|https://github.com/KeSpeech/KeSpeech|
# Included Test Sets
1. Aishell-{1,2,4}
2. Aidatatang_200zh
3. AliMeeting
4. MagicData
5. KeSpeech-ASR
6. WeNetSpeech