Zengwei Yao f2f5baf687
Use ScaledLSTM as streaming encoder (#479)
* add ScaledLSTM

* add RNNEncoderLayer and RNNEncoder classes in lstm.py

* add RNN and Conv2dSubsampling classes in lstm.py

* hardcode bidirectional=False

* link from pruned_transducer_stateless2

* link scaling.py pruned_transducer_stateless2

* copy from pruned_transducer_stateless2

* modify decode.py pretrained.py test_model.py train.py

* copy streaming decoding files from pruned_transducer_stateless2

* modify streaming decoding files

* simplified code in ScaledLSTM

* flat weights after scaling

* pruned2 -> pruned4

* link __init__.py

* fix style

* remove add_model_arguments

* modify .flake8

* fix style

* fix scale value in scaling.py

* add random combiner for training deeper model

* add using proj_size

* add scaling converter for ScaledLSTM

* support jit trace

* add using averaged model in export.py

* modify test_model.py, test if the model can be successfully exported by jit.trace

* modify pretrained.py

* support streaming decoding

* fix model.py

* Add cut_id to recognition results

* Add cut_id to recognition results

* do not pad in Conv subsampling module; add tail padding during decoding.

* update RESULTS.md

* minor fix

* fix doc

* update README.md

* minor change, filter infinite loss

* remove the condition of raise error

* modify type hint for the return value in model.py

* minor change

* modify RESULTS.md

Co-authored-by: pkufool <wkang.pku@gmail.com>
2022-08-19 14:38:45 +08:00
..
2021-08-04 14:53:02 +08:00

Introduction

Please refer to https://icefall.readthedocs.io/en/latest/recipes/librispeech/index.html for how to run models in this recipe.

./RESULTS.md contains the latest results.

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

Encoder Decoder Comment
transducer Conformer LSTM
transducer_stateless Conformer Embedding + Conv1d Using optimized_transducer from computing RNN-T loss
transducer_stateless2 Conformer Embedding + Conv1d Using torchaudio for computing RNN-T loss
transducer_lstm LSTM LSTM
transducer_stateless_multi_datasets Conformer Embedding + Conv1d Using data from GigaSpeech as extra training data
pruned_transducer_stateless Conformer Embedding + Conv1d Using k2 pruned RNN-T loss
pruned_transducer_stateless2 Conformer(modified) Embedding + Conv1d Using k2 pruned RNN-T loss
pruned_transducer_stateless3 Conformer(modified) Embedding + Conv1d Using k2 pruned RNN-T loss + using GigaSpeech as extra training data
pruned_transducer_stateless4 Conformer(modified) Embedding + Conv1d same as pruned_transducer_stateless2 + save averaged models periodically during training
pruned_transducer_stateless5 Conformer(modified) Embedding + Conv1d same as pruned_transducer_stateless4 + more layers + random combiner
pruned_transducer_stateless6 Conformer(modified) Embedding + Conv1d same as pruned_transducer_stateless4 + distillation with hubert
pruned_stateless_emformer_rnnt2 Emformer(from torchaudio) Embedding + Conv1d Using Emformer from torchaudio for streaming ASR
conv_emformer_transducer_stateless ConvEmformer Embedding + Conv1d Using ConvEmformer for streaming ASR + mechanisms in reworked model
conv_emformer_transducer_stateless2 ConvEmformer Embedding + Conv1d Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model
lstm_transducer_stateless LSTM Embedding + Conv1d Using LSTM with mechanisms in reworked model

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.