mirror of https://github.com/k2-fsa/icefall.git synced 2025-08-08 09:32:20 +00:00

History

* Init commit for recipes trained on multiple zh datasets.

* fbank extraction for thchs30

* added support for aishell1

* added support for aishell-2

* fixes

* fixes

* fixes

* added support for stcmds and primewords

* fixes

* added support for magicdata

script for fbank computation not done yet

* added script for magicdata fbank computation

* file permission fixed

* updated for the wenetspeech recipe

* updated

* Update preprocess_kespeech.py

* updated

* updated

* updated

* updated

* file permission fixed

* updated paths

* fixes

* added support for kespeech dev/test set fbank computation

* fixes for file permission

* refined support for KeSpeech

* added scripts for BPE model training

* updated

* init commit for the multi_zh-cn zipformer recipe

* disable speed perturbation by default

* updated

* updated

* added necessary files for the zipformer recipe

* removed redundant wenetspeech M and S sets

* updates for multi dataset decoding

* refined

* formatting issues fixed

* updated

* minor fixes

* this commit finalize the recipe (hopefully)

* fixed formatting issues

* minor fixes

* updated

* using soft links to reduce redundancy

* minor updates

* using soft links to reduce redundancy

* minor updates

* minor updates

* using soft links to reduce redundancy

* minor updates

* Update README.md

* minor updates

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* minor updates

* minor fixes

* fixed a formatting issue

* Update preprocess_kespeech.py

* Update prepare.sh

* Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* removed redundant files

* symlinks added

* minor updates

* added CI tests for `multi_zh-hans`

* minor fixes

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

2023-09-13 11:57:05 +08:00

conformer_ctc

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

conformer_ctc2

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

conformer_ctc3

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

conformer_mmi

Add Zipformer-MMI (#746 )

2022-12-11 21:30:39 +08:00

conv_emformer_transducer_stateless

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

conv_emformer_transducer_stateless2

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

local

Fix broken code in download_lm.py (#1046 )

2023-05-08 20:48:17 +08:00

long_file_recog

Support long audios recognition (#980 )

2023-05-19 20:27:55 +08:00

lstm_transducer_stateless

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

lstm_transducer_stateless2

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

lstm_transducer_stateless3

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned2_knowledge

Remove cur_batch_idx (#1102 )

2023-05-30 14:49:54 +08:00

pruned_stateless_emformer_rnnt2

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless2

Add context biasing for zipformer recipe (#1204 )

2023-08-28 19:37:32 +08:00

pruned_transducer_stateless3

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless4

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless5

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless6

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless7

doc str fixes (#1241 )

2023-09-07 16:34:53 +08:00

pruned_transducer_stateless7_ctc

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless7_ctc_bs

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless7_streaming

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless7_streaming_multi

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

pruned_transducer_stateless8

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

streaming_conformer_ctc

shuffle full Librispeech for zipformer recipes (#869 )

2023-02-03 11:54:57 +08:00

tdnn_lstm_ctc

support using mini librispeech in training (#1048 )

2023-05-09 15:10:06 +08:00

transducer

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

transducer_lstm

Fix style issues. (#937 )

2023-03-08 22:56:04 +08:00

transducer_stateless

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

transducer_stateless2

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

transducer_stateless_multi_datasets

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

zipformer

Multi_zh-Hans Recipe (#1238 )

2023-09-13 11:57:05 +08:00

zipformer_mmi

Use tokens.txt to replace bpe.model (#1162 )

2023-08-12 16:53:59 +08:00

.gitignore

Streaming Zipformer with multi-dataset (#984 )

2023-04-21 15:43:28 +08:00

add_alignments.sh

Get alignments using lhotse workflows align-with-torchaudio (#888 )

2023-02-08 21:54:35 +08:00

distillation_with_hubert.sh

Add docs for distillation (#812 )

2023-01-11 16:45:24 +08:00

finetune.sh

Add adaption recipe for pruned_transducer_stateless7 (#1059 )

2023-05-17 16:02:27 +08:00

generate-lm.sh

Add Zipformer-MMI (#746 )

2022-12-11 21:30:39 +08:00

long_file_recog.sh

Support long audios recognition (#980 )

2023-05-19 20:27:55 +08:00

prepare.sh

Add data preparation for the MuST-C speech translation corpus (#1107 )

2023-06-05 15:49:41 +08:00

README.md

Add CTC loss option in zipformer recipe (#1111 )

2023-06-14 14:27:29 +08:00

RESULTS-100hours.md

[Ready to merge]stateless6: states4 + hubert distillation. (#387 )

2022-05-28 12:37:50 +08:00

RESULTS.md

Decode zipformer with external LMs (#1193 )

2023-08-03 15:50:35 +08:00

shared

Refactoring (#4 )

2021-08-04 14:53:02 +08:00

README.md

Introduction

Please refer to https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/index.html for how to run models in this recipe.

./RESULTS.md contains the latest results.

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

	Encoder	Decoder	Comment
`transducer`	Conformer	LSTM
`transducer_stateless`	Conformer	Embedding + Conv1d	Using optimized_transducer from computing RNN-T loss
`transducer_stateless2`	Conformer	Embedding + Conv1d	Using torchaudio for computing RNN-T loss
`transducer_lstm`	LSTM	LSTM
`transducer_stateless_multi_datasets`	Conformer	Embedding + Conv1d	Using data from GigaSpeech as extra training data
`pruned_transducer_stateless`	Conformer	Embedding + Conv1d	Using k2 pruned RNN-T loss
`pruned_transducer_stateless2`	Conformer(modified)	Embedding + Conv1d	Using k2 pruned RNN-T loss
`pruned_transducer_stateless3`	Conformer(modified)	Embedding + Conv1d	Using k2 pruned RNN-T loss + using GigaSpeech as extra training data
`pruned_transducer_stateless4`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless2 + save averaged models periodically during training + delay penalty
`pruned_transducer_stateless5`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless4 + more layers + random combiner
`pruned_transducer_stateless6`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless4 + distillation with hubert
`pruned_transducer_stateless7`	Zipformer	Embedding + Conv1d	First experiment with Zipformer from Dan
`pruned_transducer_stateless7_ctc`	Zipformer	Embedding + Conv1d	Same as pruned_transducer_stateless7, but with extra CTC head
`pruned_transducer_stateless7_ctc_bs`	Zipformer	Embedding + Conv1d	pruned_transducer_stateless7_ctc + blank skip
`pruned_transducer_stateless7_streaming`	Streaming Zipformer	Embedding + Conv1d	streaming version of pruned_transducer_stateless7
`pruned_transducer_stateless7_streaming_multi`	Streaming Zipformer	Embedding + Conv1d	same as pruned_transducer_stateless7_streaming, trained on LibriSpeech + GigaSpeech
`pruned_transducer_stateless8`	Zipformer	Embedding + Conv1d	Same as pruned_transducer_stateless7, but using extra data from GigaSpeech
`pruned_stateless_emformer_rnnt2`	Emformer(from torchaudio)	Embedding + Conv1d	Using Emformer from torchaudio for streaming ASR
`conv_emformer_transducer_stateless`	ConvEmformer	Embedding + Conv1d	Using ConvEmformer for streaming ASR + mechanisms in reworked model
`conv_emformer_transducer_stateless2`	ConvEmformer	Embedding + Conv1d	Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model
`lstm_transducer_stateless`	LSTM	Embedding + Conv1d	Using LSTM with mechanisms in reworked model
`lstm_transducer_stateless2`	LSTM	Embedding + Conv1d	Using LSTM with mechanisms in reworked model + gigaspeech (multi-dataset setup)
`lstm_transducer_stateless3`	LSTM	Embedding + Conv1d	Using LSTM with mechanisms in reworked model + gradient filter + delay penalty
`zipformer`	Upgraded Zipformer	Embedding + Conv1d	The latest recipe

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.

CTC

	Encoder	Comment
`conformer-ctc`	Conformer	Use auxiliary attention head
`conformer-ctc2`	Reworked Conformer	Use auxiliary attention head
`conformer-ctc3`	Reworked Conformer	Streaming version + delay penalty
`zipformer`	Upgraded Zipformer	Use auxiliary transducer head

MMI

	Encoder	Comment
`conformer-mmi`	Conformer
`zipformer-mmi`	Zipformer	CTC warmup + use HP as decoding graph for decoding