From 93a5c878f17fad4a9ea9f90d2ce2d1d10e759f69 Mon Sep 17 00:00:00 2001
From: Desh Raj <r.desh26@gmail.com>
Date: Tue, 13 Jun 2023 08:14:11 -0400
Subject: [PATCH] remove changes in librispeech

---
 egs/librispeech/ASR/README.md  |   2 +
 egs/librispeech/ASR/RESULTS.md | 622 ++++++++++++++++++++++++++++++++-
 2 files changed, 623 insertions(+), 1 deletion(-)

diff --git a/egs/librispeech/ASR/README.md b/egs/librispeech/ASR/README.md
index 9ffd78d5b..6f5ee7846 100644
--- a/egs/librispeech/ASR/README.md
+++ b/egs/librispeech/ASR/README.md
@@ -26,6 +26,7 @@ The following table lists the differences among them.
 | `pruned_transducer_stateless7_ctc`    | Zipformer | Embedding + Conv1d | Same as pruned_transducer_stateless7, but with extra CTC head|
 | `pruned_transducer_stateless7_ctc_bs` | Zipformer | Embedding + Conv1d | pruned_transducer_stateless7_ctc + blank skip |
 | `pruned_transducer_stateless7_streaming` | Streaming Zipformer | Embedding + Conv1d | streaming version of pruned_transducer_stateless7 |
+| `pruned_transducer_stateless7_streaming_multi` | Streaming Zipformer | Embedding + Conv1d | same as pruned_transducer_stateless7_streaming, trained on LibriSpeech + GigaSpeech  |
 | `pruned_transducer_stateless8`        | Zipformer | Embedding + Conv1d | Same as pruned_transducer_stateless7, but using extra data from GigaSpeech|
 | `pruned_stateless_emformer_rnnt2`     | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
 | `conv_emformer_transducer_stateless`  | ConvEmformer | Embedding + Conv1d | Using ConvEmformer for streaming ASR + mechanisms in reworked model |
@@ -33,6 +34,7 @@ The following table lists the differences among them.
 | `lstm_transducer_stateless`           | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model |
 | `lstm_transducer_stateless2`          | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model + gigaspeech (multi-dataset setup) |
 | `lstm_transducer_stateless3`          | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model + gradient filter + delay penalty |
+| `zipformer`                           | Upgraded Zipformer | Embedding + Conv1d | The latest recipe |
 
 The decoder in `transducer_stateless` is modified from the paper
 [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
diff --git a/egs/librispeech/ASR/RESULTS.md b/egs/librispeech/ASR/RESULTS.md
index ecb84eb01..b7f704e41 100644
--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@@ -1,5 +1,537 @@
 ## Results
 
+### zipformer (zipformer + pruned stateless transducer)
+
+See <https://github.com/k2-fsa/icefall/pull/1058> for more details.
+
+[zipformer](./zipformer)
+
+#### Non-streaming
+
+##### normal-scaled model, number of model parameters: 65549011, i.e., 65.55 M
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/cBaoIabCQxSDsyZM7FzqZA/>
+
+You can find a pretrained model, training logs, decoding logs, and decoding results at:
+<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+| decoding method      | test-clean | test-other | comment            |
+|----------------------|------------|------------|--------------------|
+| greedy_search        | 2.27       | 5.1        | --epoch 30 --avg 9 |
+| modified_beam_search | 2.25       | 5.06       | --epoch 30 --avg 9 |
+| fast_beam_search     | 2.25       | 5.04       | --epoch 30 --avg 9 |
+| greedy_search        | 2.23       | 4.96       | --epoch 40 --avg 16 |
+| modified_beam_search | 2.21       | 4.91       | --epoch 40 --avg 16 |
+| fast_beam_search     | 2.24       | 4.93       | --epoch 40 --avg 16 |
+
+The training command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+./zipformer/train.py \
+  --world-size 4 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp \
+  --causal 0 \
+  --full-libri 1 \
+  --max-duration 1000
+```
+
+The decoding command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0"
+for m in greedy_search modified_beam_search fast_beam_search; do
+  ./zipformer/decode.py \
+    --epoch 30 \
+    --avg 9 \
+    --use-averaged-model 1 \
+    --exp-dir ./zipformer/exp \
+    --max-duration 600 \
+    --decoding-method $m
+done
+```
+
+##### small-scaled model, number of model parameters: 23285615, i.e., 23.3 M
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/53P4tL22TpO0UdiL0kPaLg/>
+
+You can find a pretrained model, training logs, decoding logs, and decoding results at:
+<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-small-2023-05-16>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+| decoding method      | test-clean | test-other | comment            |
+|----------------------|------------|------------|--------------------|
+| greedy_search        | 2.64       | 6.14       | --epoch 30 --avg 8 |
+| modified_beam_search | 2.6        | 6.01       | --epoch 30 --avg 8 |
+| fast_beam_search     | 2.62       | 6.06       | --epoch 30 --avg 8 |
+| greedy_search        | 2.49       | 5.91       | --epoch 40 --avg 13 |
+| modified_beam_search | 2.46       | 5.83       | --epoch 40 --avg 13 |
+| fast_beam_search     | 2.46       | 5.87       | --epoch 40 --avg 13 |
+
+The training command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1"
+./zipformer/train.py \
+  --world-size 2 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp-small \
+  --causal 0 \
+  --num-encoder-layers 2,2,2,2,2,2 \
+  --feedforward-dim 512,768,768,768,768,768 \
+  --encoder-dim 192,256,256,256,256,256 \
+  --encoder-unmasked-dim 192,192,192,192,192,192 \
+  --base-lr 0.04 \
+  --full-libri 1 \
+  --max-duration 1500
+```
+
+The decoding command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0"
+for m in greedy_search modified_beam_search fast_beam_search; do
+  ./zipformer/decode.py \
+    --epoch 40 \
+    --avg 13 \
+    --exp-dir zipformer/exp-small \
+    --max-duration 600 \
+    --causal 0 \
+    --decoding-method $m \
+    --num-encoder-layers 2,2,2,2,2,2 \
+    --feedforward-dim 512,768,768,768,768,768 \
+    --encoder-dim 192,256,256,256,256,256 \
+    --encoder-unmasked-dim 192,192,192,192,192,192
+done
+```
+
+##### large-scaled model, number of model parameters: 148439574, i.e., 148.4 M
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/HJ74wWYpQAGSzETkmQnrmQ/>
+
+You can find a pretrained model, training logs, decoding logs, and decoding results at:
+<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-large-2023-05-16>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+| decoding method      | test-clean | test-other | comment            |
+|----------------------|------------|------------|--------------------|
+| greedy_search        | 2.12       | 4.91       | --epoch 30 --avg 9 |
+| modified_beam_search | 2.11       | 4.9        | --epoch 30 --avg 9 |
+| fast_beam_search     | 2.13       | 4.93       | --epoch 30 --avg 9 |
+| greedy_search        | 2.12       | 4.8        | --epoch 40 --avg 13 |
+| modified_beam_search | 2.11       | 4.7        | --epoch 40 --avg 13 |
+| fast_beam_search     | 2.13       | 4.78       | --epoch 40 --avg 13 |
+
+The training command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+./zipformer/train.py \
+  --world-size 4 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp-large \
+  --causal 0 \
+  --num-encoder-layers 2,2,4,5,4,2 \
+  --feedforward-dim 512,768,1536,2048,1536,768 \
+  --encoder-dim 192,256,512,768,512,256 \
+  --encoder-unmasked-dim 192,192,256,320,256,192 \
+  --full-libri 1 \
+  --max-duration 1000
+```
+
+The decoding command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0"
+for m in greedy_search modified_beam_search fast_beam_search; do
+  ./zipformer/decode.py \
+    --epoch 40 \
+    --avg 16 \
+    --exp-dir zipformer/exp-large \
+    --max-duration 600 \
+    --causal 0 \
+    --decoding-method $m \
+    --num-encoder-layers 2,2,4,5,4,2 \
+    --feedforward-dim 512,768,1536,2048,1536,768 \
+    --encoder-dim 192,256,512,768,512,256 \
+    --encoder-unmasked-dim 192,192,256,320,256,192
+done
+```
+
+#### streaming
+
+##### normal-scaled model, number of model parameters: 66110931, i.e., 66.11 M
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/9rD0i6rMSWq1O61poWi71A>
+
+You can find a pretrained model, training logs, decoding logs, and decoding results at:
+<https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+| decoding method      | chunk size | test-clean | test-other | decoding mode       | comment            |
+|----------------------|------------|------------|------------|---------------------|--------------------|
+| greedy_search        | 320ms      | 3.06       | 7.81       | simulated streaming | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
+| greedy_search        | 320ms      | 3.06       | 7.79       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
+| modified_beam_search | 320ms      | 3.01       | 7.69       | simulated streaming | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
+| modified_beam_search | 320ms      | 3.05       | 7.69       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
+| fast_beam_search     | 320ms      | 3.04       | 7.68       | simulated streaming | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
+| fast_beam_search     | 320ms      | 3.07       | 7.69       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
+| greedy_search        | 640ms      | 2.81       | 7.15       | simulated streaming | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
+| greedy_search        | 640ms      | 2.84       | 7.16       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
+| modified_beam_search | 640ms      | 2.79       | 7.05       | simulated streaming | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
+| modified_beam_search | 640ms      | 2.81       | 7.11       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
+| fast_beam_search     | 640ms      | 2.84       | 7.04       | simulated streaming | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
+| fast_beam_search     | 640ms      | 2.83       | 7.1        | chunk-wise          | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
+
+Note: For decoding mode, `simulated streaming` indicates feeding full utterance during decoding using `decode.py`,
+  while `chunk-size` indicates feeding certain number of frames at each time using `streaming_decode.py`.
+
+The training command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+./zipformer/train.py \
+  --world-size 4 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp-causal \
+  --causal 1 \
+  --full-libri 1 \
+  --max-duration 1000
+```
+
+The simulated streaming decoding command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0"
+for m in greedy_search modified_beam_search fast_beam_search; do
+  ./zipformer/decode.py \
+    --epoch 30 \
+    --avg 8 \
+    --use-averaged-model 1 \
+    --exp-dir ./zipformer/exp-causal \
+    --causal 1 \
+    --chunk-size 16 \
+    --left-context-frames 128 \
+    --max-duration 600 \
+    --decoding-method $m
+done
+```
+
+The chunk-wise streaming decoding command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0"
+for m in greedy_search modified_beam_search fast_beam_search; do
+  ./zipformer/streaming_decode.py \
+    --epoch 30 \
+    --avg 8 \
+    --use-averaged-model 1 \
+    --exp-dir ./zipformer/exp-causal \
+    --causal 1 \
+    --chunk-size 16 \
+    --left-context-frames 128 \
+    --num-decode-streams 2000 \
+    --decoding-method $m
+done
+```
+
+### pruned_transducer_stateless7 (Fine-tune with mux)
+
+See <https://github.com/k2-fsa/icefall/pull/1059> for more details.
+
+[pruned_transducer_stateless7](./pruned_transducer_stateless7)
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/MaNDZfO7RzW2Czzf3R2ZRA/>
+
+You can find the pretrained model and bpe model needed for fine-tuning at:
+<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11>
+
+You can find a fine-tuned model, fine-tuning logs, decoding logs, and decoding
+results at:
+<https://huggingface.co/yfyeung/icefall-asr-finetune-mux-pruned_transducer_stateless7-2023-05-19>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+Number of model parameters: 70369391, i.e., 70.37 M
+
+| decoding method      |    dev     |    test    | test-clean | test-other |      comment       |
+|----------------------|------------|------------|------------|------------|--------------------|
+| greedy_search        |   14.27    |   14.22    |    2.08    |    4.79    | --epoch 20 --avg 5 |
+| modified_beam_search |   14.22    |   14.08    |    2.06    |    4.72    | --epoch 20 --avg 5 |
+| fast_beam_search     |   14.23    |   14.17    |    2.08    |    4.09    | --epoch 20 --avg 5 |
+
+The training commands are:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1"
+
+./pruned_transducer_stateless7/finetune.py \
+  --world-size 2 \
+  --num-epochs 20 \
+  --start-epoch 1 \
+  --exp-dir pruned_transducer_stateless7/exp_giga_finetune \
+  --subset S \
+  --use-fp16 1 \
+  --base-lr 0.005 \
+  --lr-epochs 100 \
+  --lr-batches 100000 \
+  --bpe-model icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11/data/lang_bpe_500/bpe.model \
+  --do-finetune True \
+  --use-mux True \
+  --finetune-ckpt icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11/exp/pretrain.pt \
+  --max-duration 500
+```
+
+The decoding commands are:
+```bash
+# greedy_search
+./pruned_transducer_stateless7/decode.py \
+    --epoch 20 \
+    --avg 5 \
+    --use-averaged-model 1 \
+    --exp-dir ./pruned_transducer_stateless7/exp_giga_finetune \
+    --max-duration 600 \
+    --decoding-method greedy_search
+
+# modified_beam_search
+./pruned_transducer_stateless7/decode.py \
+    --epoch 20 \
+    --avg 5 \
+    --use-averaged-model 1 \
+    --exp-dir ./pruned_transducer_stateless7/exp_giga_finetune \
+    --max-duration 600 \
+    --decoding-method modified_beam_search \
+    --beam-size 4
+
+# fast_beam_search
+./pruned_transducer_stateless7/decode.py \
+    --epoch 20 \
+    --avg 5 \
+    --use-averaged-model 1 \
+    --exp-dir ./pruned_transducer_stateless7/exp_giga_finetune \
+    --max-duration 600 \
+    --decoding-method fast_beam_search \
+    --beam 20.0 \
+    --max-contexts 8 \
+    --max-states 64
+```
+
+### pruned_transducer_stateless7 (zipformer + multidataset(LibriSpeech + GigaSpeech + CommonVoice 13.0))
+
+See <https://github.com/k2-fsa/icefall/pull/1010> for more details.
+
+[pruned_transducer_stateless7](./pruned_transducer_stateless7)
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/SwdJoHgZSZWn8ph9aJLb8g/>
+
+You can find a pretrained model, training logs, decoding logs, and decoding
+results at:
+<https://huggingface.co/yfyeung/icefall-asr-multidataset-pruned_transducer_stateless7-2023-05-04>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+Number of model parameters: 70369391, i.e., 70.37 M
+
+| decoding method      | test-clean | test-other | comment            |
+|----------------------|------------|------------|--------------------|
+| greedy_search        | 1.91       | 4.06       | --epoch 30 --avg 7 |
+| modified_beam_search | 1.90       | 3.99       | --epoch 30 --avg 7 |
+| fast_beam_search     | 1.90       | 3.98       | --epoch 30 --avg 7 |
+
+
+The training commands are:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+
+./pruned_transducer_stateless7/train.py \
+  --world-size 8 \
+  --num-epochs 30 \
+  --use-multidataset 1 \
+  --use-fp16 1 \
+  --max-duration 750 \
+  --exp-dir pruned_transducer_stateless7/exp
+```
+
+The decoding commands are:
+```bash
+# greedy_search
+./pruned_transducer_stateless7/decode.py \
+    --epoch 30 \
+    --avg 7 \
+    --use-averaged-model 1 \
+    --exp-dir ./pruned_transducer_stateless7/exp \
+    --max-duration 600 \
+    --decoding-method greedy_search
+
+# modified_beam_search
+./pruned_transducer_stateless7/decode.py \
+    --epoch 30 \
+    --avg 7 \
+    --use-averaged-model 1 \
+    --exp-dir ./pruned_transducer_stateless7/exp \
+    --max-duration 600 \
+    --decoding-method modified_beam_search \
+    --beam-size 4
+
+# fast_beam_search
+./pruned_transducer_stateless7/decode.py \
+    --epoch 30 \
+    --avg 7 \
+    --use-averaged-model 1 \
+    --exp-dir ./pruned_transducer_stateless7/exp \
+    --max-duration 600 \
+    --decoding-method fast_beam_search \
+    --beam 20.0 \
+    --max-contexts 8 \
+    --max-states 64
+```
+
+### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer + Multi-Dataset)
+
+#### [pruned_transducer_stateless7_streaming_multi](./pruned_transducer_stateless7_streaming_multi)
+
+See <https://github.com/k2-fsa/icefall/pull/984> for more details.
+
+You can find a pretrained model, training logs, decoding logs, and decoding
+results at: <https://huggingface.co/marcoyang/icefall-libri-giga-pruned-transducer-stateless7-streaming-2023-04-04>
+
+Number of model parameters: 70369391, i.e., 70.37 M
+
+##### training on full librispeech + full gigaspeech (with giga_prob=0.9)
+
+The WERs are:
+
+
+| decoding method      | chunk size | test-clean | test-other | comment             | decoding mode        |
+|----------------------|------------|------------|------------|---------------------|----------------------|
+| greedy search        | 320ms      | 2.43       | 6.0        | --epoch 20 --avg 4  | simulated streaming  |
+| greedy search        | 320ms      | 2.47       | 6.13       | --epoch 20 --avg 4  | chunk-wise           |
+| fast beam search     | 320ms      | 2.43       | 5.99       | --epoch 20 --avg 4  | simulated streaming  |
+| fast beam search     | 320ms      | 2.8        | 6.46       | --epoch 20 --avg 4  | chunk-wise           |
+| modified beam search | 320ms      | 2.4        | 5.96       | --epoch 20 --avg 4  | simulated streaming  |
+| modified beam search | 320ms      | 2.42       | 6.03       | --epoch 20 --avg 4  | chunk-size           |
+| greedy search        | 640ms      | 2.26       | 5.58       | --epoch 20 --avg 4  | simulated streaming  |
+| greedy search        | 640ms      | 2.33       | 5.76       | --epoch 20 --avg 4  | chunk-wise           |
+| fast beam search     | 640ms      | 2.27       | 5.54       | --epoch 20 --avg 4  | simulated streaming  |
+| fast beam search     | 640ms      | 2.37       | 5.75       | --epoch 20 --avg 4  | chunk-wise           |
+| modified beam search | 640ms      | 2.22       | 5.5        | --epoch 20 --avg 4  | simulated streaming  |
+| modified beam search | 640ms      | 2.25       | 5.69       | --epoch 20 --avg 4  | chunk-size           |
+
+The model also has good WERs on GigaSpeech. The following WERs are achieved on GigaSpeech test and dev sets:
+
+| decoding method      | chunk size | dev        | test | comment    | decoding mode       |
+|----------------------|------------|-----|------|------------|---------------------|
+| greedy search        | 320ms      | 12.08       | 11.98       | --epoch 20 --avg 4  | simulated streaming  |
+| greedy search        | 640ms      | 11.66       | 11.71       | --epoch 20 --avg 4  | simulated streaming  |
+| modified beam search | 320ms      | 11.95       | 11.83       | --epoch 20 --avg 4  | simulated streaming  |
+| modified beam search | 320ms      | 11.65       | 11.56       | --epoch 20 --avg 4  | simulated streaming  |
+
+
+Note: `simulated streaming` indicates feeding full utterance during decoding using `decode.py`,
+while `chunk-size` indicates feeding certain number of frames at each time using `streaming_decode.py`.
+
+The training command is:
+
+```bash
+./pruned_transducer_stateless7_streaming_multi/train.py \
+  --world-size 4 \
+  --num-epochs 20 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir pruned_transducer_stateless7_streaming_multi/exp \
+  --full-libri 1 \
+  --giga-prob 0.9 \
+  --max-duration 750 \
+  --master-port 12345
+```
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/G4yDMLXGQXexf41i4MA2Tg/#scalars>
+
+The simulated streaming decoding command (e.g., chunk-size=320ms) is:
+```bash
+for m in greedy_search fast_beam_search modified_beam_search; do
+  ./pruned_transducer_stateless7_streaming_multi/decode.py \
+    --epoch 20 \
+    --avg 4 \
+    --exp-dir ./pruned_transducer_stateless7_streaming_multi/exp \
+    --max-duration 600 \
+    --decode-chunk-len 32 \
+    --right-padding 64 \
+    --decoding-method $m
+done
+```
+
+The streaming chunk-size decoding command (e.g., chunk-size=320ms) is:
+```bash
+for m in greedy_search modified_beam_search fast_beam_search; do
+  ./pruned_transducer_stateless7_streaming_multi/streaming_decode.py \
+    --epoch 20 \
+    --avg 4 \
+    --exp-dir ./pruned_transducer_stateless7_streaming_multi/exp \
+    --decoding-method $m \
+    --decode-chunk-len 32 \
+    --num-decode-streams 2000
+done
+```
+
+
+#### Smaller model
+
+We also provide a very small version (only 6.1M parameters) of this setup. The training command for the small model is:
+
+```bash
+./pruned_transducer_stateless7_streaming_multi/train.py \
+  --world-size 4 \
+  --num-epochs 30 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir pruned_transducer_stateless7_streaming_multi/exp \
+  --full-libri 1 \
+  --giga-prob 0.9 \
+  --num-encoder-layers "2,2,2,2,2" \
+  --feedforward-dims "256,256,512,512,256" \
+  --nhead "4,4,4,4,4" \
+  --encoder-dims "128,128,128,128,128" \
+  --attention-dims "96,96,96,96,96" \
+  --encoder-unmasked-dims "96,96,96,96,96" \
+  --max-duration 1200 \
+  --master-port 12345
+```
+
+You can find this pretrained small model and its training logs, decoding logs, and decoding
+results at:
+<https://huggingface.co/marcoyang/icefall-libri-giga-pruned-transducer-stateless7-streaming-6M-2023-04-03>
+
+
+| decoding method      | chunk size | test-clean | test-other | comment             | decoding mode        |
+|----------------------|------------|------------|------------|---------------------|----------------------|
+| greedy search        | 320ms      | 5.95       | 15.03       | --epoch 30 --avg 1  | simulated streaming  |
+| greedy search        | 640ms      | 5.61       | 13.86       | --epoch 30 --avg 1  | simulated streaming  |
+| modified beam search | 320ms      | 5.72       | 14.34      | --epoch 30 --avg 1  | simulated streaming  |
+| modified beam search | 640ms      | 5.43       | 13.16      | --epoch 30 --avg 1  | simulated streaming  |
+| fast beam search | 320ms      | 5.88       | 14.45      | --epoch 30 --avg 1  | simulated streaming  |
+| fast beam search | 640ms      | 5.48       | 13.31      | --epoch 30 --avg 1  | simulated streaming  |
+
+This small model achieves the following WERs on GigaSpeech test and dev sets:
+
+| decoding method      | chunk size | dev | test | comment    | decoding mode       |
+|----------------------|------------|------------|------------|---------------------|----------------------|
+| greedy search        | 320ms      | 17.57       | 17.2     | --epoch 30 --avg 1  | simulated streaming  |
+| modified beam search | 320ms      | 16.98       | 11.98       | --epoch 30 --avg 1  | simulated streaming  |
+
+You can find the tensorboard logs at <https://tensorboard.dev/experiment/tAc5iXxTQrCQxky5O5OLyw/#scalars>.
+
 ### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer)
 
 #### [pruned_transducer_stateless7_streaming](./pruned_transducer_stateless7_streaming)
@@ -53,7 +585,7 @@ The tensorboard log can be found at
 
 The simulated streaming decoding command (e.g., chunk-size=320ms) is:
 ```bash
-for $m in greedy_search fast_beam_search modified_beam_search; do
+for m in greedy_search fast_beam_search modified_beam_search; do
   ./pruned_transducer_stateless7_streaming/decode.py \
     --epoch 30 \
     --avg 9 \
@@ -76,6 +608,90 @@ for m in greedy_search modified_beam_search fast_beam_search; do
     --num-decode-streams 2000
 done
 ```
+We also support decoding with neural network LMs. After combining with language models, the WERs are
+| decoding method      | chunk size | test-clean | test-other | comment             | decoding mode        |
+|----------------------|------------|------------|------------|---------------------|----------------------|
+| `modified_beam_search` | 320ms      | 3.11       | 7.93       | --epoch 30 --avg 9  | simulated streaming  |
+| `modified_beam_search_lm_shallow_fusion` | 320ms      | 2.58       | 6.65       | --epoch 30 --avg 9  | simulated streaming  |
+| `modified_beam_search_lm_rescore` | 320ms      | 2.59       | 6.86       | --epoch 30 --avg 9  | simulated streaming  |
+| `modified_beam_search_lm_rescore_LODR` | 320ms      | 2.52       | 6.73       | --epoch 30 --avg 9  | simulated streaming  |
+
+Please use the following command for `modified_beam_search_lm_shallow_fusion`:
+```bash
+for lm_scale in $(seq 0.15 0.01 0.38); do
+    for beam_size in 4 8 12; do
+        ./pruned_transducer_stateless7_streaming/decode.py \
+            --epoch 99 \
+            --avg 1 \
+            --use-averaged-model False \
+            --beam-size $beam_size \
+            --exp-dir ./pruned_transducer_stateless7_streaming/exp-large-LM \
+            --max-duration 600 \
+            --decode-chunk-len 32 \
+            --decoding-method modified_beam_search_lm_shallow_fusion \
+            --use-shallow-fusion 1 \
+            --lm-type rnn \
+            --lm-exp-dir rnn_lm/exp \
+            --lm-epoch 99 \
+            --lm-scale $lm_scale \
+            --lm-avg 1 \
+            --rnn-lm-embedding-dim 2048 \
+            --rnn-lm-hidden-dim 2048 \
+            --rnn-lm-num-layers 3 \
+            --lm-vocab-size 500
+    done
+done
+```
+
+Please use the following command for `modified_beam_search_lm_rescore`:
+```bash
+./pruned_transducer_stateless7_streaming/decode.py \
+    --epoch 30 \
+    --avg 9 \
+    --use-averaged-model True \
+    --beam-size 8 \
+    --exp-dir ./pruned_transducer_stateless7_streaming/exp \
+    --max-duration 600 \
+    --decode-chunk-len 32 \
+    --decoding-method modified_beam_search_lm_rescore \
+    --use-shallow-fusion 0 \
+    --lm-type rnn \
+    --lm-exp-dir rnn_lm/exp \
+    --lm-epoch 99 \
+    --lm-avg 1 \
+    --rnn-lm-embedding-dim 2048 \
+    --rnn-lm-hidden-dim 2048 \
+    --rnn-lm-num-layers 3 \
+    --lm-vocab-size 500
+```
+
+Please use the following command for `modified_beam_search_lm_rescore_LODR`:
+```bash
+./pruned_transducer_stateless7_streaming/decode.py \
+    --epoch 30 \
+    --avg 9 \
+    --use-averaged-model True \
+    --beam-size 8 \
+    --exp-dir ./pruned_transducer_stateless7_streaming/exp \
+    --max-duration 600 \
+    --decode-chunk-len 32 \
+    --decoding-method modified_beam_search_lm_rescore_LODR \
+    --use-shallow-fusion 0 \
+    --lm-type rnn \
+    --lm-exp-dir rnn_lm/exp \
+    --lm-epoch 99 \
+    --lm-avg 1 \
+    --rnn-lm-embedding-dim 2048 \
+    --rnn-lm-hidden-dim 2048 \
+    --rnn-lm-num-layers 3 \
+    --lm-vocab-size 500 \
+    --tokens-ngram 2 \
+    --backoff-id 500
+```
+
+A well-trained RNNLM can be found here: <https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm/tree/main>. The bi-gram used in LODR decoding
+can be found here: <https://huggingface.co/marcoyang/librispeech_bigram>.
+
 
 #### Smaller model
 
@@ -540,6 +1156,10 @@ for m in greedy_search fast_beam_search modified_beam_search ; do
 done
 ```
 
+Note that a small change is made to the `pruned_transducer_stateless7/decoder.py` in
+this [PR](https://github.com/k2-fsa/icefall/pull/942) to address the
+problem of emitting the first symbol at the very beginning. If you need a
+model without this issue, please download the model from here: <https://huggingface.co/marcoyang/icefall-asr-librispeech-pruned-transducer-stateless7-2023-03-10>
 
 ### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T + gradient filter)