82 Commits

Author SHA1 Message Date
Wei Kang
11d816d174
Add cumstomized score for hotwords (#1385)
* add custom score for each hotword

* Add more comments

* Fix deocde

* fix style

* minor fixes
2023-11-18 18:47:55 +08:00
Fangjun Kuang
666d69b20d
Rename train2.py to avoid confusion (#1386) 2023-11-17 18:12:59 +08:00
zr_jin
23913f6afd
Minor refinements for some stale but recently merged PRs (#1354)
* incorporate https://github.com/k2-fsa/icefall/pull/1269

* incorporate https://github.com/k2-fsa/icefall/pull/1301

* black formatted

* incorporate https://github.com/k2-fsa/icefall/pull/1162

* black formatted
2023-10-31 10:28:20 +08:00
zr_jin
1814bbb0e7
typo fixed (#1334) 2023-10-25 00:03:33 +08:00
zr_jin
d76c3fe472
Migrate zipformer model to other Chinese datasets (#1216)
added zipformer recipe for AISHELL-1
2023-10-24 16:24:46 +08:00
zr_jin
92ef561ff7
Minor fixes for torch.jit.script support (#1329) 2023-10-24 01:10:50 +08:00
zr_jin
d2bd0933b1
Compatibility with the latest Lhotse (#1314) 2023-10-17 21:22:32 +08:00
zr_jin
1ef349d120
[WIP] AISHELL-1 pruned transducer stateless7 streaming recipe (#1300)
* `pruned_transudcer_stateless7_streaming` for AISHELL-1

* Update train.py

* Update train2.py

* Update decode.py

* Update RESULTS.md
2023-10-16 16:28:16 +08:00
zr_jin
162ceaf4b3
fixes for data preparation (#1307)
Issue: #1306
2023-10-12 17:05:41 +08:00
zr_jin
0d09a44930
Update train.py (#1299) 2023-10-11 10:06:00 +08:00
Fangjun Kuang
f14b673408
Add HLG decoding with OpenFst on CPU for aishell conformer_ctc (#1279) 2023-10-01 13:46:16 +08:00
yaguang
8181d19860
check bbpe model exists in advance. (#1277) 2023-09-27 17:35:26 +08:00
yaguang
a5ba1133c4
Compatible with new lhotse versions. (#1278) 2023-09-27 17:33:38 +08:00
zr_jin
ef658d691e
fixes for init value of diagnostics.TensorDiagnosticOptions (#1269)
* fixes for `diagnostics`

Replace `2 ** 22` with `512` as the default value of `diagnostics.TensorDiagnosticOptions`

also black formatted some scripts

* fixed formatting issues
2023-09-24 17:06:47 +08:00
Fangjun Kuang
34e40a86b3
Fix exporting decoder model to onnx (#1264)
* Use torch.jit.script() to export the decoder model

See also https://github.com/k2-fsa/sherpa-onnx/issues/327
2023-09-22 09:57:15 +08:00
Fangjun Kuang
f5dc957d44
Fix CI tests (#1266) 2023-09-21 21:16:14 +08:00
zr_jin
7cc2dae940
Fixes to incorporate with the latest Lhotse release (#1249) 2023-09-13 12:39:49 +08:00
zr_jin
a81396b482
Use tokens.txt to replace bpe.model (#1162) 2023-08-12 16:53:59 +08:00
zr_jin
74806b744b
disable speed perturbation by default (#1176)
* disable speed perturbation by default

* minor fixes

* minor updates

* updated bash scripts to incorporate with the `speed-perturb` arg

* minor fixes

1. changed the naming scheme from `speed-perturb` to `perturb-speed` to align with the librispeech recipe

>> 00256a7669/egs/librispeech/ASR/local/compute_fbank_librispeech.py (L65)

2. changed arg type for `perturb-speed` to str2bool
2023-08-10 20:56:02 +08:00
zr_jin
856c0f2a60
fixed default param for an aishell recipe (#1159) 2023-07-04 19:12:39 +08:00
Wei Kang
219bba1310
zipformer wenetspeech (#1130)
* copy files

* update train.py

* small fixes

* Add decode.py

* Fix dataloader in decode.py

* add blank penalty

* Add blank-penalty to other decoding method

* Minor fixes

* add zipformer2 recipe

* Minor fixes

* Remove pruned7

* export and test models

* Replace bpe with tokens in export.py and pretrain.py

* Minor fixes

* Minor fixes

* Minor fixes

* Fix export

* Update results

* Fix zipformer-ctc

* Fix ci

* Fix ci

* Fix CI

* Fix CI

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-06-26 09:33:18 +08:00
Wei Kang
ba257efbcd
Add Context biasing (#1038)
* Add context biasing for librispeech

* Add context biasing for wenetspeech

* fix bugs

* Implement Aho-Corasick context graph

* fix some bugs

* Fixes to forward_one_step; add draw to context graph

* add output arc; fix black

* Fix wenetspeech tokenizer

* Minor fixes to the decode.py
2023-06-03 21:28:49 +08:00
Fangjun Kuang
7b0afbdc16
Remove cur_batch_idx (#1102) 2023-05-30 14:49:54 +08:00
marcoyang1998
585e7b224f
Aishell pruned_transducer_stateless7 (#962)
* Add pruned_transducer_stateless7 for Aishell

* update README.md

* update comments and small fixes
2023-05-23 11:04:33 +08:00
Wei Kang
80156dda09
Training with byte level BPE (AIShell) (#986)
* copy files from zipformer librispeech

* Add byte bpe training for aishell

* compile LG graph

* Support LG decoding

* Minor fixes

* black

* Minor fixes

* export & fix pretrain.py

* fix black

* Update RESULTS.md

* Fix export.py
2023-05-04 19:16:17 +08:00
Wei Kang
0efed1cec5
Fix path in aishell rnnlm training (#1016) 2023-04-20 23:09:31 +08:00
Wei Kang
5c65516e05
Fix aishell rnnlm training command (#1015) 2023-04-20 16:14:16 +08:00
marcoyang1998
d337398d29
Shallow fusion for Aishell (#954)
* add shallow fusion and LODR for aishell

* update RESULTS

* add save by iterations
2023-04-03 16:20:29 +08:00
Fangjun Kuang
35e21a0d2e
Fix torchscript export for aishell (#969) 2023-03-27 14:08:26 +08:00
Jason's Lab
6196b4a407
Add char-based language model training process for aishell. (#945)
* Add char-based language model training process for aishell.

Add soft link from librispeech/ASR/local/sort_lm_training_data.py to aishell/ASR/local/

---------

Co-authored-by: lichao <www.563042811@qq.com>
2023-03-16 09:52:11 +08:00
Fangjun Kuang
f5de2e90c6
Fix style issues. (#937) 2023-03-08 22:56:04 +08:00
pehonnet
07243d136a
remove key from result filename (#936)
Co-authored-by: pe-honnet <pe.honnet@telepathy.ai>
2023-03-08 21:06:07 +08:00
Meng Wei
74a2069f94
fix expired links (#856) 2023-01-28 14:43:47 +08:00
marcoyang
53454701cb fix segmentation fault 2022-11-22 11:39:21 +08:00
Desh Raj
d31db01037 manual correction of black formatting 2022-11-17 14:18:05 -05:00
Desh Raj
107df3b115 apply black on all files 2022-11-17 09:42:17 -05:00
Fangjun Kuang
60317120ca
Revert "Apply new Black style changes" 2022-11-17 20:19:32 +08:00
Desh Raj
d110b04ad3 apply new black formatting to all files 2022-11-16 13:06:43 -05:00
Fangjun Kuang
ff3f026381
Checkout the LM for aishell explicitly (#642) 2022-10-31 19:47:43 +08:00
Fangjun Kuang
d1f16a04bd
fix type hints for decode.py (#623) 2022-10-18 06:56:12 +08:00
LIyong.Guo
923b60a7c6
padding zeros (#591) 2022-09-28 21:20:33 +08:00
Fangjun Kuang
e18fa78c3a
Check that read_manifests_if_cached returns a non-empty dict. (#555) 2022-08-28 11:50:11 +08:00
Lucky Wong
9277c95bcd
Pruned transducer stateless2 for AISHELL-1 (#536)
* Fix not enough values to unpack error .

* [WIP] Pruned transducer stateless2 for AISHELL-1

* fix the style issue

* code format for black

* add pruned-transducer-stateless2 results for AISHELL-1

* simplify result
2022-08-22 10:17:26 +08:00
Lucky Wong
31686ac829
Fix not enough values to unpack error . (#533) 2022-08-18 10:45:06 +08:00
Wei Kang
5c17255eec
Sort results to make it more convenient to compare decoding results (#522)
* Sort result to make it more convenient to compare decoding results

* Add cut_id to recognition results

* add cut_id to results for all recipes

* Fix torch.jit.script

* Fix comments

* Minor fixes

* Fix torch.jit.tracing for Pytorch version before v1.9.0
2022-08-12 07:12:50 +08:00
Fangjun Kuang
5149788cb2
Fix computing averaged loss in the aishell recipe. (#523)
* Fix computing averaged loss in the aishell recipe.

* Set find_unused_parameters optionally.
2022-08-09 10:53:31 +08:00
boji123
3c9e7f733b
[debug] raise remind when git-lfs not available (#504)
* [debug] raise remind when git-lfs not available

* modify comment
2022-07-28 16:17:49 +08:00
Fangjun Kuang
385645d533
Fix get_transducer_model() for aishell. (#497)
PR #495 introduces an error. This commit fixes it.
2022-07-26 15:42:21 +08:00
Fangjun Kuang
d3fc4b031e
Support using aidatatang_200zh optionally in aishell training (#495)
* Use aidatatang_200zh optionally in aishell training.
2022-07-26 11:25:01 +08:00
Jun Wang
d792bdc9bc
fix typo (#445) 2022-06-25 11:00:53 +08:00