History

CTC attention model with reworked Conformer encoder and reworked Transformer decoder (#462 )

* ctc attention model with reworked conformer encoder and reworked transformer decoder

* remove unnecessary func

* resolve flake8 conflicts

* fix typos and modify the expr of ScaledEmbedding

* use original beam size

* minor changes to the scripts

* add rnn lm decoding

* minor changes

* check whether q k v weight is None

* check whether q k v weight is None

* check whether q k v weight is None

* style correction

* update results

* update results

* upload the decoding results of rnn-lm to the RESULTS

* upload the decoding results of rnn-lm to the RESULTS

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

2022-07-22 15:31:25 +08:00

conformer_ctc

fix typo (#445 )

2022-06-25 11:00:53 +08:00

conformer_ctc2

CTC attention model with reworked Conformer encoder and reworked Transformer decoder (#462 )

2022-07-22 15:31:25 +08:00

conformer_mmi

fix typo (#445 )

2022-06-25 11:00:53 +08:00

conv_emformer_transducer_stateless

Simplified memory bank for Emformer (#440 )

2022-07-12 19:19:58 +08:00

conv_emformer_transducer_stateless2

Simplified memory bank for Emformer (#440 )

2022-07-12 19:19:58 +08:00

local

[Ready to be merged] Add RNN-LM to Conformer-CTC decoding (#439 )

2022-06-23 19:37:03 +08:00

pruned_stateless_emformer_rnnt2

Fix exporting emformer with torchscript using torch 1.6.0 (#402 )

2022-06-07 09:19:37 +08:00

pruned_transducer_stateless

Using streaming conformer as transducer encoder (#380 )

2022-06-28 00:18:54 +08:00

pruned_transducer_stateless2

CTC attention model with reworked Conformer encoder and reworked Transformer decoder (#462 )

2022-07-22 15:31:25 +08:00

pruned_transducer_stateless3

Add RNN-LM rescoring in fast beam search (#475 )

2022-07-18 16:52:17 +08:00

pruned_transducer_stateless4

Using streaming conformer as transducer encoder (#380 )

2022-06-28 00:18:54 +08:00

pruned_transducer_stateless5

Rand combine update result (#467 )

2022-07-11 18:13:31 +08:00

pruned_transducer_stateless6

update multi_quantization installation (#469 )

2022-07-13 21:16:45 +08:00

streaming_conformer_ctc

fix typo (#445 )

2022-06-25 11:00:53 +08:00

tdnn_lstm_ctc

Replace load_manifest_lazy with load_manifest for MUSAN. (#412 )

2022-06-09 11:42:18 +08:00

transducer

fix typos (#409 )

2022-06-08 20:08:44 +08:00

transducer_lstm

fix typos (#409 )

2022-06-08 20:08:44 +08:00

transducer_stateless

Using streaming conformer as transducer encoder (#380 )

2022-06-28 00:18:54 +08:00

transducer_stateless2

fix typos (#409 )

2022-06-08 20:08:44 +08:00

transducer_stateless_multi_datasets

Replace load_manifest_lazy with load_manifest for MUSAN. (#412 )

2022-06-09 11:42:18 +08:00

.gitignore

Modified conformer with multi datasets (#312 )

2022-04-29 15:40:30 +08:00

distillation_with_hubert.sh

update multi_quantization installation (#469 )

2022-07-13 21:16:45 +08:00

prepare_giga_speech.sh

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

2022-06-06 10:19:16 +08:00

prepare.sh

[Ready to be merged] Add RNN-LM to Conformer-CTC decoding (#439 )

2022-06-23 19:37:03 +08:00

README.md

Simplified memory bank for Emformer (#440 )

2022-07-12 19:19:58 +08:00

RESULTS-100hours.md

[Ready to merge]stateless6: states4 + hubert distillation. (#387 )

2022-05-28 12:37:50 +08:00

RESULTS.md

CTC attention model with reworked Conformer encoder and reworked Transformer decoder (#462 )

2022-07-22 15:31:25 +08:00

shared

Refactoring (#4 )

2021-08-04 14:53:02 +08:00

README.md

Introduction

Please refer to https://icefall.readthedocs.io/en/latest/recipes/librispeech/index.html for how to run models in this recipe.

./RESULTS.md contains the latest results.

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

	Encoder	Decoder	Comment
`transducer`	Conformer	LSTM
`transducer_stateless`	Conformer	Embedding + Conv1d	Using optimized_transducer from computing RNN-T loss
`transducer_stateless2`	Conformer	Embedding + Conv1d	Using torchaudio for computing RNN-T loss
`transducer_lstm`	LSTM	LSTM
`transducer_stateless_multi_datasets`	Conformer	Embedding + Conv1d	Using data from GigaSpeech as extra training data
`pruned_transducer_stateless`	Conformer	Embedding + Conv1d	Using k2 pruned RNN-T loss
`pruned_transducer_stateless2`	Conformer(modified)	Embedding + Conv1d	Using k2 pruned RNN-T loss
`pruned_transducer_stateless3`	Conformer(modified)	Embedding + Conv1d	Using k2 pruned RNN-T loss + using GigaSpeech as extra training data
`pruned_transducer_stateless4`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless2 + save averaged models periodically during training
`pruned_transducer_stateless5`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless4 + more layers + random combiner
`pruned_transducer_stateless6`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless4 + distillation with hubert
`pruned_stateless_emformer_rnnt2`	Emformer(from torchaudio)	Embedding + Conv1d	Using Emformer from torchaudio for streaming ASR
`conv_emformer_transducer_stateless`	ConvEmformer	Embedding + Conv1d	Using ConvEmformer for streaming ASR + mechanisms in reworked model
`conv_emformer_transducer_stateless2`	ConvEmformer	Embedding + Conv1d	Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.