69 Commits

Author SHA1 Message Date
pkufool
93dd3f5887 Update results 2023-06-23 21:16:09 +08:00
pkufool
ae47b739f0 Fix export 2023-06-23 17:51:50 +08:00
pkufool
63e53bad59 Minor fixes 2023-06-21 18:13:24 +08:00
pkufool
a7d0588827 Minor fixes 2023-06-19 12:12:33 +08:00
pkufool
3ef74b0630 Minor fixes 2023-06-16 17:13:16 +08:00
pkufool
78ef1d1874 Replace bpe with tokens in export.py and pretrain.py 2023-06-16 16:56:35 +08:00
pkufool
389e191478 export and test models 2023-06-16 11:19:58 +08:00
pkufool
28d3f6df55 Remove pruned7 2023-06-15 10:30:34 +08:00
pkufool
bf36d1984e Minor fixes 2023-06-15 10:21:12 +08:00
pkufool
a1b12cf4e9 Merge branch 'master' into wenetspeech 2023-06-13 16:18:28 +08:00
pkufool
2d32ba5d5d add zipformer2 recipe 2023-06-13 16:18:20 +08:00
Wei Kang
ba257efbcd
Add Context biasing (#1038)
* Add context biasing for librispeech

* Add context biasing for wenetspeech

* fix bugs

* Implement Aho-Corasick context graph

* fix some bugs

* Fixes to forward_one_step; add draw to context graph

* add output arc; fix black

* Fix wenetspeech tokenizer

* Minor fixes to the decode.py
2023-06-03 21:28:49 +08:00
Fangjun Kuang
7b0afbdc16
Remove cur_batch_idx (#1102) 2023-05-30 14:49:54 +08:00
pkufool
04c3f9ab53 Minor fixes 2023-05-25 16:33:10 +08:00
pkufool
899f858659 Add blank-penalty to other decoding method 2023-05-25 12:20:29 +08:00
pkufool
961750c0a9 Merge branch 'master' into wenetspeech 2023-05-24 22:16:16 +08:00
pkufool
4f28e15a1d add blank penalty 2023-05-24 22:15:20 +08:00
Fangjun Kuang
1df71a6b38
add onnx export for stateless2 (#1086) 2023-05-23 16:11:00 +08:00
Fangjun Kuang
ea8b15309f
Add onnx export scripts for wenetspeech recipe. (#1085) 2023-05-23 13:32:14 +08:00
pkufool
42513d2e98 Fix dataloader in decode.py 2023-05-16 11:33:11 +08:00
pkufool
47565959d9 Merge branch 'master' into wenetspeech 2023-05-08 08:38:01 +08:00
marcoyang1998
d337398d29
Shallow fusion for Aishell (#954)
* add shallow fusion and LODR for aishell

* update RESULTS

* add save by iterations
2023-04-03 16:20:29 +08:00
marcoyang1998
c21b6a208b
Add finetuning script for aishell (#974)
* add aishell finetune scripts

* add an example bash script
2023-03-30 17:08:46 +08:00
Wei Kang
d74822d07b
Fix wenetspeech decoding speed (#953) 2023-03-21 21:35:32 +08:00
Fangjun Kuang
f5de2e90c6
Fix style issues. (#937) 2023-03-08 22:56:04 +08:00
pehonnet
07243d136a
remove key from result filename (#936)
Co-authored-by: pe-honnet <pe.honnet@telepathy.ai>
2023-03-08 21:06:07 +08:00
Yuekai Zhang
3c54333b06
fix bug (#796) 2022-12-28 11:20:38 +08:00
wzy
e83409cbe5
Filter the training data of T < S for Wenet train recipe (#753)
* filter the case of T <  S  for training data

* fix style issues

* fix style issues

* fix style issues

Co-authored-by: 张云斌 <zhangyunbin@MacBook-Air.local>
2022-12-11 20:16:10 +08:00
Cesc
be6e08f69a
fix wenet stateless5 jit export error (#735) 2022-12-05 23:35:10 +08:00
Fangjun Kuang
bd7fa2253d
Update the manifest statistics of the L subset of wenetspeech (#731) 2022-12-04 20:27:45 +08:00
marcoyang
53454701cb fix segmentation fault 2022-11-22 11:39:21 +08:00
Fangjun Kuang
903ef3b161 Add decode.py 2022-11-22 10:30:59 +08:00
Fangjun Kuang
96cff34d15 small fixes 2022-11-18 18:20:34 +08:00
Desh Raj
d31db01037 manual correction of black formatting 2022-11-17 14:18:05 -05:00
Desh Raj
107df3b115 apply black on all files 2022-11-17 09:42:17 -05:00
Fangjun Kuang
60317120ca
Revert "Apply new Black style changes" 2022-11-17 20:19:32 +08:00
Desh Raj
d110b04ad3 apply new black formatting to all files 2022-11-16 13:06:43 -05:00
Fangjun Kuang
1d494556fc update train.py 2022-11-14 16:26:53 +08:00
Fangjun Kuang
ab38f4a926 copy files 2022-11-14 14:51:03 +08:00
Fangjun Kuang
7f1c0e07b6
Remove onnx and onnxruntime from requirements.txt (#640)
* Remove onnx and onnxruntime from requirements.txt
2022-10-31 13:44:40 +08:00
Fangjun Kuang
d69bb826ed
Support exporting LSTM with projection to ONNX (#621)
* Support exporting LSTM with projection to ONNX

* Add missing files

* small fixes
2022-10-18 11:25:31 +08:00
Fangjun Kuang
d1f16a04bd
fix type hints for decode.py (#623) 2022-10-18 06:56:12 +08:00
Fangjun Kuang
c39cba5191
Support exporting to ONNX for the wenetspeech recipe (#615)
* Support exporting to ONNX for the wenetspeech recipe
2022-10-13 15:17:20 +08:00
LIyong.Guo
923b60a7c6
padding zeros (#591) 2022-09-28 21:20:33 +08:00
Fangjun Kuang
e18fa78c3a
Check that read_manifests_if_cached returns a non-empty dict. (#555) 2022-08-28 11:50:11 +08:00
Fangjun Kuang
d68b8e9120
Disable CUDA_LAUNCH_BLOCKING in wenetspeech recipes. (#554)
* Disable CUDA_LAUNCH_BLOCKING in wenetspeech recipes.

* minor fixes
2022-08-28 11:17:38 +08:00
yangsuxia
951b03f6d7
Add function display_and_save_batch in wenetspeech/pruned_transducer_stateless2/train.py (#528)
* Add function display_and_save_batch in egs/wenetspeech/ASR/pruned_transducer_stateless2/train.py

* Modify function: display_and_save_batch

* Delete empty line in pruned_transducer_stateless2/train.py

* Modify code format
2022-08-13 11:09:54 +08:00
Wei Kang
5c17255eec
Sort results to make it more convenient to compare decoding results (#522)
* Sort result to make it more convenient to compare decoding results

* Add cut_id to recognition results

* add cut_id to results for all recipes

* Fix torch.jit.script

* Fix comments

* Minor fixes

* Fix torch.jit.tracing for Pytorch version before v1.9.0
2022-08-12 07:12:50 +08:00
Mingshuang Luo
e538232485
change for pruned rnnt5 train.py (#519) 2022-08-04 12:29:39 +08:00
Weiji Zhuang
36eacaccb2
Fix preparing char based lang and add multiprocessing for wenetspeech text segmentation (#513)
* add multiprocessing for wenetspeech text segmentation

* Fix preparing char based lang for wenetspeech

* fix style

Co-authored-by: WeijiZhuang <zhuangweiji@xiaomi.com>
2022-08-03 19:19:40 +08:00