15 Commits

Author SHA1 Message Date
jinzr
c2cb70fc22 Create generate_unique_lexicon.py 2023-12-20 18:58:40 +08:00
jinzr
2a1877486e Create convert_transcript_words_to_tokens.py 2023-12-20 16:51:45 +08:00
jinzr
ecfbd090af Delete convert_transcript_words_to_tokens.py 2023-12-20 15:05:57 +08:00
jinzr
6097d7363d Create convert_transcript_words_to_tokens.py 2023-12-20 11:31:06 +08:00
jinzr
852f5a6153 isort formatted 2023-11-09 10:56:48 +08:00
JinZr
de3daf6496 Merge branch 'dev/lm_multi_zh-hans' of https://github.com/JinZr/icefall into dev/lm_multi_zh-hans 2023-11-09 10:53:05 +08:00
JinZr
91da99ff52 updated 2023-11-09 10:51:41 +08:00
jinzr
3694e419fb Update prepare_lm_training_data.py 2023-11-08 11:52:01 +08:00
jinzr
d29efb7345 Update prepare_lm_training_data.py 2023-11-08 10:20:56 +08:00
jinzr
403e2e52ac Update prepare_lm_training_data.py 2023-11-08 10:20:10 +08:00
jinzr
7f53f59776 Update prepare_lm_training_data.py 2023-11-08 10:14:08 +08:00
jinzr
86c3dbec0e Update prepare_lm_training_data.py 2023-11-08 10:07:32 +08:00
jinzr
94f963baf8 Update prepare_lm_training_data.py 2023-11-08 10:05:29 +08:00
jinzr
1a11440014 minor updates 2023-11-08 09:57:57 +08:00
zr_jin
0f1bc6f8af
Multi_zh-Hans Recipe (#1238)
* Init commit for recipes trained on multiple zh datasets.

* fbank extraction for thchs30

* added support for aishell1

* added support for aishell-2

* fixes

* fixes

* fixes

* added support for stcmds and primewords

* fixes

* added support for magicdata

script for fbank computation not done yet

* added script for magicdata fbank computation

* file permission fixed

* updated for the wenetspeech recipe

* updated

* Update preprocess_kespeech.py

* updated

* updated

* updated

* updated

* file permission fixed

* updated paths

* fixes

* added support for kespeech dev/test set fbank computation

* fixes for file permission

* refined support for KeSpeech

* added scripts for BPE model training

* updated

* init commit for the multi_zh-cn zipformer recipe

* disable speed perturbation by default

* updated

* updated

* added necessary files for the zipformer recipe

* removed redundant wenetspeech M and S sets

* updates for multi dataset decoding

* refined

* formatting issues fixed

* updated

* minor fixes

* this commit finalize the recipe (hopefully)

* fixed formatting issues

* minor fixes

* updated

* using soft links to reduce redundancy

* minor updates

* using soft links to reduce redundancy

* minor updates

* minor updates

* using soft links to reduce redundancy

* minor updates

* Update README.md

* minor updates

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* minor updates

* minor fixes

* fixed a formatting issue

* Update preprocess_kespeech.py

* Update prepare.sh

* Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* removed redundant files

* symlinks added

* minor updates

* added CI tests for `multi_zh-hans`

* minor fixes

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-09-13 11:57:05 +08:00