183 Commits

Author SHA1 Message Date
Desh Raj
9c2172c1c4
Zipformer for TedLium (#1125)
* initial commit for zipformer tedlium

* fix unk decoding

* add pretrained model and logs

* update for new AsrModel

* add option for choosing rnnt type

* add results with modified rnnt
2023-06-28 16:43:49 +08:00
Wei Kang
219bba1310
zipformer wenetspeech (#1130)
* copy files

* update train.py

* small fixes

* Add decode.py

* Fix dataloader in decode.py

* add blank penalty

* Add blank-penalty to other decoding method

* Minor fixes

* add zipformer2 recipe

* Minor fixes

* Remove pruned7

* export and test models

* Replace bpe with tokens in export.py and pretrain.py

* Minor fixes

* Minor fixes

* Minor fixes

* Fix export

* Update results

* Fix zipformer-ctc

* Fix ci

* Fix ci

* Fix CI

* Fix CI

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-06-26 09:33:18 +08:00
Zengwei Yao
0ad037d076
Add CTC loss option in zipformer recipe (#1111)
* add CTC loss option in zipformer recipe

* add ctc_decode.py

* support CTC model export, add jit_pretrained_ctc.py, pretrained_ctc.py

* update README.md and RESULTS.md

* add CI test
2023-06-14 14:27:29 +08:00
Wei Kang
ba257efbcd
Add Context biasing (#1038)
* Add context biasing for librispeech

* Add context biasing for wenetspeech

* fix bugs

* Implement Aho-Corasick context graph

* fix some bugs

* Fixes to forward_one_step; add draw to context graph

* add output arc; fix black

* Fix wenetspeech tokenizer

* Minor fixes to the decode.py
2023-06-03 21:28:49 +08:00
Zengwei Yao
a7e142b7ff
Support long audios recognition (#980)
* support long file transcription

* rename recipe as long_file_recog

* add docs

* support multi-gpu decoding

* style fix
2023-05-19 20:27:55 +08:00
Wei Kang
80156dda09
Training with byte level BPE (AIShell) (#986)
* copy files from zipformer librispeech

* Add byte bpe training for aishell

* compile LG graph

* Support LG decoding

* Minor fixes

* black

* Minor fixes

* export & fix pretrain.py

* fix black

* Update RESULTS.md

* Fix export.py
2023-05-04 19:16:17 +08:00
marcoyang1998
45c13e90e4
RNNLM rescore + Low-order density ratio (#1017)
* add rnnlm rescore + LODR

* add LODR in decode.py

* update RESULTS
2023-04-24 15:00:02 +08:00
marcoyang1998
34d1b07c3d
Modified beam search with RNNLM rescoring (#1002)
* add RNNLM rescore

* add shallow fusion and lm rescore for streaming zipformer

* minor fix

* update RESULTS.md

* fix yesno workflow, change from ubuntu-18.04 to ubuntu-latest
2023-04-17 16:43:00 +08:00
marcoyang1998
d337398d29
Shallow fusion for Aishell (#954)
* add shallow fusion and LODR for aishell

* update RESULTS

* add save by iterations
2023-04-03 16:20:29 +08:00
Zengwei Yao
2a5a75cb56
add option of using full attention for streaming model decoding (#975) 2023-03-30 14:30:13 +08:00
Zengwei Yao
bcc5923ab9
Support batch-wise forced-alignment (#970)
* support batch-wise forced-alignment based on beam search

* add length_norm to HypothesisList.topk()

* Use Hypothesis and HypothesisList instead
2023-03-28 23:24:24 +08:00
marcoyang1998
9ddd811925
Fix padding_idx (#942)
* fix padding_idx

* update RESULTS.md
2023-03-10 14:37:28 +08:00
Fangjun Kuang
f5de2e90c6
Fix style issues. (#937) 2023-03-08 22:56:04 +08:00
pehonnet
07243d136a
remove key from result filename (#936)
Co-authored-by: pe-honnet <pe.honnet@telepathy.ai>
2023-03-08 21:06:07 +08:00
marcoyang1998
c51e6c5b9c
fix typo (#916) 2023-02-20 19:04:57 +08:00
Fangjun Kuang
8d3810e289
Simplify ONNX export (#881)
* Simplify ONNX export

* Fix ONNX CI tests
2023-02-07 15:01:59 +08:00
marcoyang1998
1f0408b103
Support Transformer LM (#750)
* support transformer LM

* show number of parameters during training

* update docstring

* testing files for ppl calculation

* add lm wrampper for rnn and transformer LM

* apply lm wrapper in lm shallow fusion

* small updates

* update decode.py to support LM fusion and LODR

* add export.py

* update CI and workflow

* update decoding results

* fix CI

* remove transformer LM from CI test
2022-12-29 10:53:36 +08:00
Daniil
b293db4baf
Tedlium3 conformer ctc2 (#696)
* modify preparation

* small refacor

* add tedlium3 conformer_ctc2

* modify decode

* filter unk in decode

* add scaling converter

* address comments

* fix lambda function lhotse

* add implicit manifest shuffle

* refactor ctc_greedy_search

* import model arguments from train.py

* style fix

* fix ci test and last style issues

* update RESULTS

* fix RESULTS numbers

* fix label smoothing loss

* update model parameters number in RESULTS
2022-12-13 16:13:26 +08:00
Fangjun Kuang
6533f359c9
Fix CI (#726)
* Fix CI

* Disable shuffle for yesno.

See https://github.com/k2-fsa/icefall/issues/197
2022-12-02 10:53:06 +08:00
marcoyang1998
4b5bc480e8
Add low-order density ratio in RNNLM shallow fusion (#678)
* Support LODR in RNNLM shallow fusion

* fix style

* fix code style

* update workflow and CI

* update results

* propagate changes to stateless3

* add decoding results for stateless3+giga

* fix CI
2022-11-30 17:26:05 +08:00
huangruizhe
6693d907d3
shuffle full Librispeech data (#574)
* shuffled full/partial librispeech data

* fixed the code style issue

* Shuffled full librispeech data off-line

* Fixed style, addressed comments, and removed redandunt codes

* Used the suggested version of black

* Propagated the changes to other folders for librispeech (except
conformer_mmi and streaming_conformer_ctc)
2022-11-27 11:26:09 +08:00
Desh Raj
d31db01037 manual correction of black formatting 2022-11-17 14:18:05 -05:00
Desh Raj
107df3b115 apply black on all files 2022-11-17 09:42:17 -05:00
Fangjun Kuang
60317120ca
Revert "Apply new Black style changes" 2022-11-17 20:19:32 +08:00
Desh Raj
d110b04ad3 apply new black formatting to all files 2022-11-16 13:06:43 -05:00
Fangjun Kuang
cedf9aa24f
Fix shallow fusion and add CI tests for it (#676)
* Fix shallow fusion and add CI tests for it

* Fix -1 index in embedding introduced in the zipformer PR
2022-11-13 11:51:00 +08:00
Fangjun Kuang
7e82f87126
Add Zipformer from Dan (#672) 2022-11-12 18:11:19 +08:00
Fangjun Kuang
e334e570d8
Filter utterances with number_tokens > number_feature_frames. (#604) 2022-11-12 07:57:58 +08:00
Zengwei Yao
32de2766d5
Refactor getting timestamps in fsa-based decoding (#660)
* refactor getting timestamps for fsa-based decoding

* fix doc

* fix bug
2022-11-05 22:36:06 +08:00
Zengwei Yao
3600ce1b5f
Apply delay penalty on transducer (#654)
* add delay penalty

* fix CI

* fix CI
2022-11-04 16:10:09 +08:00
marcoyang
a2d7095c1c resolve conflicts 2022-11-04 11:37:42 +08:00
marcoyang
bdaeaae1ae resolve conflicts 2022-11-04 11:25:10 +08:00
Wei Kang
64aed2cdeb
Fix LG log file name (#657) 2022-11-03 23:12:35 +08:00
Wei Kang
163d929601
Add fast_beam_search_LG (#622)
* Add fast_beam_search_LG

* add fast_beam_search_LG to commonly used recipes

* fix ci

* fix ci

* Fix error
2022-11-03 16:29:30 +08:00
marcoyang
2a52b8c125 update docs 2022-11-03 11:10:21 +08:00
marcoyang
6c8d1f9ef5 update 2022-11-02 17:48:58 +08:00
marcoyang
0a46a39e24 update decoding commands 2022-11-02 17:25:31 +08:00
marcoyang
63d0a52dbd support RNNLM shallow fusion in stateless5 2022-11-02 16:37:29 +08:00
marcoyang
de2f5e3e6d support RNNLM shallow fusion for LSTM transducer 2022-11-02 16:15:56 +08:00
Wei Kang
d389524d45
remove tail padding for non-streaming models (#625) 2022-11-01 11:09:56 +08:00
Zengwei Yao
03668771d7
Get timestamps during decoding (#598)
* print out timestamps during decoding

* add word-level alignments

* support to compute mean symbol delay with word-level alignments

* print variance of symbol delay

* update doc

* support to compute delay for pruned_transducer_stateless4

* fix bug

* add doc
2022-11-01 10:24:00 +08:00
ezerhouni
9b671e1c21
Add Shallow fusion in modified_beam_search (#630)
* Add utility for shallow fusion

* test batch size == 1 without shallow fusion

* Use shallow fusion for modified-beam-search

* Modified beam search with ngram rescoring

* Fix code according to review

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-10-21 16:44:56 +08:00
Fangjun Kuang
d1f16a04bd
fix type hints for decode.py (#623) 2022-10-18 06:56:12 +08:00
Zengwei Yao
aa58c2ee02
Modify ActivationBalancer for speed (#612)
* add a probability to apply ActivationBalancer

* minor fix

* minor fix
2022-10-13 15:14:28 +08:00
Fangjun Kuang
1c07d2fb37
Remove all-in-one for onnx export (#614)
* Remove all-in-one for onnx export

* Exit on error for CI
2022-10-12 10:34:06 +08:00
Zengwei Yao
f3ad32777a
Gradient filter for training lstm model (#564)
* init files

* add gradient filter module

* refact getting median value

* add cutoff for grad filter

* delete comments

* apply gradient filter in LSTM module, to filter both input and params

* fix typing and refactor

* filter with soft mask

* rename lstm_transducer_stateless2 to lstm_transducer_stateless3

* fix typos, and update RESULTS.md

* minor fix

* fix return typing

* fix typo
2022-09-29 11:15:43 +08:00
LIyong.Guo
923b60a7c6
padding zeros (#591) 2022-09-28 21:20:33 +08:00
marcoyang1998
1e31fbcd7d
Add clamping operation in Eve optimizer for all scalar weights to avoid (#550)
non stable training in some scenarios. The clamping range is set to (-10,2).
 Note that this change may cause unexpected effect if you resume
training from a model that is trained without clamping.
2022-08-25 12:12:50 +08:00
Zengwei Yao
f2f5baf687
Use ScaledLSTM as streaming encoder (#479)
* add ScaledLSTM

* add RNNEncoderLayer and RNNEncoder classes in lstm.py

* add RNN and Conv2dSubsampling classes in lstm.py

* hardcode bidirectional=False

* link from pruned_transducer_stateless2

* link scaling.py pruned_transducer_stateless2

* copy from pruned_transducer_stateless2

* modify decode.py pretrained.py test_model.py train.py

* copy streaming decoding files from pruned_transducer_stateless2

* modify streaming decoding files

* simplified code in ScaledLSTM

* flat weights after scaling

* pruned2 -> pruned4

* link __init__.py

* fix style

* remove add_model_arguments

* modify .flake8

* fix style

* fix scale value in scaling.py

* add random combiner for training deeper model

* add using proj_size

* add scaling converter for ScaledLSTM

* support jit trace

* add using averaged model in export.py

* modify test_model.py, test if the model can be successfully exported by jit.trace

* modify pretrained.py

* support streaming decoding

* fix model.py

* Add cut_id to recognition results

* Add cut_id to recognition results

* do not pad in Conv subsampling module; add tail padding during decoding.

* update RESULTS.md

* minor fix

* fix doc

* update README.md

* minor change, filter infinite loss

* remove the condition of raise error

* modify type hint for the return value in model.py

* minor change

* modify RESULTS.md

Co-authored-by: pkufool <wkang.pku@gmail.com>
2022-08-19 14:38:45 +08:00
marcoyang1998
c74cec59e9
propagate changes from #525 to other librispeech recipes (#531)
* propaga changes from #525 to other librispeech recipes

* refactor display_and_save_batch to utils

* fixed typo

* reformat code style
2022-08-17 17:18:15 +08:00