38 Commits

Author SHA1 Message Date
zr_jin
4d047dc8b8
Merge c2cb70fc22ffd0a9cb8cbe107846ef3441a7d39c into d9ae8c02a0abdeddc5a4cf9fad72293eda134de3 2024-02-10 04:49:39 -07:00
Yifan Yang
5dfc3ed7f9
Fix buffer size of DynamicBucketingSampler (#1468)
* Fix buffer size

* Fix for flake8

---------

Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
2024-01-21 02:10:42 +08:00
Fangjun Kuang
8136ad775b
Use high_freq -400 in computing fbank features. (#1447)
See also https://github.com/k2-fsa/sherpa-onnx/issues/514
2024-01-04 13:59:32 +08:00
jinzr
c2cb70fc22 Create generate_unique_lexicon.py 2023-12-20 18:58:40 +08:00
jinzr
2a1877486e Create convert_transcript_words_to_tokens.py 2023-12-20 16:51:45 +08:00
jinzr
ecfbd090af Delete convert_transcript_words_to_tokens.py 2023-12-20 15:05:57 +08:00
jinzr
6097d7363d Create convert_transcript_words_to_tokens.py 2023-12-20 11:31:06 +08:00
Fangjun Kuang
f85f0252a9
Add greedy search for streaming zipformer CTC. (#1415) 2023-12-13 17:34:12 +08:00
jinzr
39a02f7c30 added blank penalty 2023-11-17 17:06:23 +08:00
jinzr
a37408f663 Revert "Update decode.py"
This reverts commit 73e1237c2d5842ab0b0d3b5ab474c948fd8ff019.
2023-11-09 11:57:49 +08:00
jinzr
73e1237c2d Update decode.py 2023-11-09 11:50:39 +08:00
jinzr
16499a5ef6 Update decode.py 2023-11-09 11:37:18 +08:00
zr_jin
fb541ec60c
Merge branch 'k2-fsa:master' into dev/lm_multi_zh-hans 2023-11-09 11:08:28 +08:00
jinzr
b4d91d24ac Update asr_datamodule.py 2023-11-09 11:02:36 +08:00
jinzr
7bd260fb5a Update decode.py 2023-11-09 11:01:21 +08:00
jinzr
852f5a6153 isort formatted 2023-11-09 10:56:48 +08:00
JinZr
de3daf6496 Merge branch 'dev/lm_multi_zh-hans' of https://github.com/JinZr/icefall into dev/lm_multi_zh-hans 2023-11-09 10:53:05 +08:00
JinZr
91da99ff52 updated 2023-11-09 10:51:41 +08:00
jinzr
8d20337d8a Update decode.py 2023-11-09 10:45:22 +08:00
jinzr
4c4c26fbb7 Update decode.py 2023-11-09 10:40:33 +08:00
jinzr
3694e419fb Update prepare_lm_training_data.py 2023-11-08 11:52:01 +08:00
jinzr
c54fdf9ff9 Update prepare_lm_data.sh 2023-11-08 11:42:46 +08:00
jinzr
3f89cb380a minor updates 2023-11-08 11:36:36 +08:00
jinzr
817413f899 minor updates 2023-11-08 10:53:34 +08:00
jinzr
d29efb7345 Update prepare_lm_training_data.py 2023-11-08 10:20:56 +08:00
jinzr
403e2e52ac Update prepare_lm_training_data.py 2023-11-08 10:20:10 +08:00
jinzr
7f53f59776 Update prepare_lm_training_data.py 2023-11-08 10:14:08 +08:00
jinzr
86c3dbec0e Update prepare_lm_training_data.py 2023-11-08 10:07:32 +08:00
jinzr
94f963baf8 Update prepare_lm_training_data.py 2023-11-08 10:05:29 +08:00
jinzr
1a11440014 minor updates 2023-11-08 09:57:57 +08:00
zr_jin
770c495484
minor fixes in the CTC decoding code (#1338) 2023-10-25 17:14:17 +08:00
zr_jin
f82bccfd63
Support CTC decoding for multi-zh_hans recipe (#1313) 2023-10-24 19:04:09 +08:00
jinzr
a006382941 Create prepare_lm_data.sh 2023-10-23 13:29:31 +08:00
zr_jin
d2bd0933b1
Compatibility with the latest Lhotse (#1314) 2023-10-17 21:22:32 +08:00
zr_jin
ef658d691e
fixes for init value of diagnostics.TensorDiagnosticOptions (#1269)
* fixes for `diagnostics`

Replace `2 ** 22` with `512` as the default value of `diagnostics.TensorDiagnosticOptions`

also black formatted some scripts

* fixed formatting issues
2023-09-24 17:06:47 +08:00
Tiance Wang
7e1288af50
fix thchs-30 download command (#1260) 2023-09-19 16:46:36 +08:00
zr_jin
7cc2dae940
Fixes to incorporate with the latest Lhotse release (#1249) 2023-09-13 12:39:49 +08:00
zr_jin
0f1bc6f8af
Multi_zh-Hans Recipe (#1238)
* Init commit for recipes trained on multiple zh datasets.

* fbank extraction for thchs30

* added support for aishell1

* added support for aishell-2

* fixes

* fixes

* fixes

* added support for stcmds and primewords

* fixes

* added support for magicdata

script for fbank computation not done yet

* added script for magicdata fbank computation

* file permission fixed

* updated for the wenetspeech recipe

* updated

* Update preprocess_kespeech.py

* updated

* updated

* updated

* updated

* file permission fixed

* updated paths

* fixes

* added support for kespeech dev/test set fbank computation

* fixes for file permission

* refined support for KeSpeech

* added scripts for BPE model training

* updated

* init commit for the multi_zh-cn zipformer recipe

* disable speed perturbation by default

* updated

* updated

* added necessary files for the zipformer recipe

* removed redundant wenetspeech M and S sets

* updates for multi dataset decoding

* refined

* formatting issues fixed

* updated

* minor fixes

* this commit finalize the recipe (hopefully)

* fixed formatting issues

* minor fixes

* updated

* using soft links to reduce redundancy

* minor updates

* using soft links to reduce redundancy

* minor updates

* minor updates

* using soft links to reduce redundancy

* minor updates

* Update README.md

* minor updates

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* minor updates

* minor fixes

* fixed a formatting issue

* Update preprocess_kespeech.py

* Update prepare.sh

* Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* removed redundant files

* symlinks added

* minor updates

* added CI tests for `multi_zh-hans`

* minor fixes

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-09-13 11:57:05 +08:00