remove changes in librispeech

2025-12-11 06:55:27 +00:00 · 2023-06-13 08:14:11 -04:00 · 2023-06-13 08:14:11 -04:00 · 93a5c878f1
commit 93a5c878f1
parent 494e88bcb7
2 changed files with 623 additions and 1 deletions
--- a/egs/librispeech/ASR/README.md
+++ b/egs/librispeech/ASR/README.md
@ -26,6 +26,7 @@ The following table lists the differences among them.
 | `pruned_transducer_stateless7_ctc`    | Zipformer | Embedding + Conv1d | Same as pruned_transducer_stateless7, but with extra CTC head|
 | `pruned_transducer_stateless7_ctc_bs` | Zipformer | Embedding + Conv1d | pruned_transducer_stateless7_ctc + blank skip |
 | `pruned_transducer_stateless7_streaming` | Streaming Zipformer | Embedding + Conv1d | streaming version of pruned_transducer_stateless7 |
 | `pruned_transducer_stateless7_streaming_multi` | Streaming Zipformer | Embedding + Conv1d | same as pruned_transducer_stateless7_streaming, trained on LibriSpeech + GigaSpeech  |
 | `pruned_transducer_stateless8`        | Zipformer | Embedding + Conv1d | Same as pruned_transducer_stateless7, but using extra data from GigaSpeech|
 | `pruned_stateless_emformer_rnnt2`     | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
 | `conv_emformer_transducer_stateless`  | ConvEmformer | Embedding + Conv1d | Using ConvEmformer for streaming ASR + mechanisms in reworked model |
@ -33,6 +34,7 @@ The following table lists the differences among them.
 | `lstm_transducer_stateless`           | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model |
 | `lstm_transducer_stateless2`          | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model + gigaspeech (multi-dataset setup) |
 | `lstm_transducer_stateless3`          | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model + gradient filter + delay penalty |
 | `zipformer`                           | Upgraded Zipformer | Embedding + Conv1d | The latest recipe |
 The decoder in `transducer_stateless` is modified from the paper
 [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@ -1,5 +1,537 @@
 ## Results
 ### zipformer (zipformer + pruned stateless transducer)
 See <https://github.com/k2-fsa/icefall/pull/1058> for more details.
 [zipformer](./zipformer)
 #### Non-streaming
 ##### normal-scaled model, number of model parameters: 65549011, i.e., 65.55 M
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/cBaoIabCQxSDsyZM7FzqZA/>
 You can find a pretrained model, training logs, decoding logs, and decoding results at:
 <https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15>
 You can use <https://github.com/k2-fsa/sherpa> to deploy it.
 | decoding method      | test-clean | test-other | comment            |
 |----------------------|------------|------------|--------------------|
 | greedy_search        | 2.27       | 5.1        | --epoch 30 --avg 9 |
 | modified_beam_search | 2.25       | 5.06       | --epoch 30 --avg 9 |
 | fast_beam_search     | 2.25       | 5.04       | --epoch 30 --avg 9 |
 | greedy_search        | 2.23       | 4.96       | --epoch 40 --avg 16 |
 | modified_beam_search | 2.21       | 4.91       | --epoch 40 --avg 16 |
 | fast_beam_search     | 2.24       | 4.93       | --epoch 40 --avg 16 |
 The training command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0,1,2,3"
 ./zipformer/train.py \
  --world-size 4 \
  --num-epochs 40 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp \
  --causal 0 \
  --full-libri 1 \
  --max-duration 1000
 ```
 The decoding command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0"
 for m in greedy_search modified_beam_search fast_beam_search; do
  ./zipformer/decode.py \
    --epoch 30 \
    --avg 9 \
    --use-averaged-model 1 \
    --exp-dir ./zipformer/exp \
    --max-duration 600 \
    --decoding-method $m
 done
 ```
 ##### small-scaled model, number of model parameters: 23285615, i.e., 23.3 M
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/53P4tL22TpO0UdiL0kPaLg/>
 You can find a pretrained model, training logs, decoding logs, and decoding results at:
 <https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-small-2023-05-16>
 You can use <https://github.com/k2-fsa/sherpa> to deploy it.
 | decoding method      | test-clean | test-other | comment            |
 |----------------------|------------|------------|--------------------|
 | greedy_search        | 2.64       | 6.14       | --epoch 30 --avg 8 |
 | modified_beam_search | 2.6        | 6.01       | --epoch 30 --avg 8 |
 | fast_beam_search     | 2.62       | 6.06       | --epoch 30 --avg 8 |
 | greedy_search        | 2.49       | 5.91       | --epoch 40 --avg 13 |
 | modified_beam_search | 2.46       | 5.83       | --epoch 40 --avg 13 |
 | fast_beam_search     | 2.46       | 5.87       | --epoch 40 --avg 13 |
 The training command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0,1"
 ./zipformer/train.py \
  --world-size 2 \
  --num-epochs 40 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-small \
  --causal 0 \
  --num-encoder-layers 2,2,2,2,2,2 \
  --feedforward-dim 512,768,768,768,768,768 \
  --encoder-dim 192,256,256,256,256,256 \
  --encoder-unmasked-dim 192,192,192,192,192,192 \
  --base-lr 0.04 \
  --full-libri 1 \
  --max-duration 1500
 ```
 The decoding command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0"
 for m in greedy_search modified_beam_search fast_beam_search; do
  ./zipformer/decode.py \
    --epoch 40 \
    --avg 13 \
    --exp-dir zipformer/exp-small \
    --max-duration 600 \
    --causal 0 \
    --decoding-method $m \
    --num-encoder-layers 2,2,2,2,2,2 \
    --feedforward-dim 512,768,768,768,768,768 \
    --encoder-dim 192,256,256,256,256,256 \
    --encoder-unmasked-dim 192,192,192,192,192,192
 done
 ```
 ##### large-scaled model, number of model parameters: 148439574, i.e., 148.4 M
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/HJ74wWYpQAGSzETkmQnrmQ/>
 You can find a pretrained model, training logs, decoding logs, and decoding results at:
 <https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-large-2023-05-16>
 You can use <https://github.com/k2-fsa/sherpa> to deploy it.
 | decoding method      | test-clean | test-other | comment            |
 |----------------------|------------|------------|--------------------|
 | greedy_search        | 2.12       | 4.91       | --epoch 30 --avg 9 |
 | modified_beam_search | 2.11       | 4.9        | --epoch 30 --avg 9 |
 | fast_beam_search     | 2.13       | 4.93       | --epoch 30 --avg 9 |
 | greedy_search        | 2.12       | 4.8        | --epoch 40 --avg 13 |
 | modified_beam_search | 2.11       | 4.7        | --epoch 40 --avg 13 |
 | fast_beam_search     | 2.13       | 4.78       | --epoch 40 --avg 13 |
 The training command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0,1,2,3"
 ./zipformer/train.py \
  --world-size 4 \
  --num-epochs 40 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-large \
  --causal 0 \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192 \
  --full-libri 1 \
  --max-duration 1000
 ```
 The decoding command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0"
 for m in greedy_search modified_beam_search fast_beam_search; do
  ./zipformer/decode.py \
    --epoch 40 \
    --avg 16 \
    --exp-dir zipformer/exp-large \
    --max-duration 600 \
    --causal 0 \
    --decoding-method $m \
    --num-encoder-layers 2,2,4,5,4,2 \
    --feedforward-dim 512,768,1536,2048,1536,768 \
    --encoder-dim 192,256,512,768,512,256 \
    --encoder-unmasked-dim 192,192,256,320,256,192
 done
 ```
 #### streaming
 ##### normal-scaled model, number of model parameters: 66110931, i.e., 66.11 M
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/9rD0i6rMSWq1O61poWi71A>
 You can find a pretrained model, training logs, decoding logs, and decoding results at:
 <https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17>
 You can use <https://github.com/k2-fsa/sherpa> to deploy it.
 | decoding method      | chunk size | test-clean | test-other | decoding mode       | comment            |
 |----------------------|------------|------------|------------|---------------------|--------------------|
 | greedy_search        | 320ms      | 3.06       | 7.81       | simulated streaming | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
 | greedy_search        | 320ms      | 3.06       | 7.79       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
 | modified_beam_search | 320ms      | 3.01       | 7.69       | simulated streaming | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
 | modified_beam_search | 320ms      | 3.05       | 7.69       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
 | fast_beam_search     | 320ms      | 3.04       | 7.68       | simulated streaming | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
 | fast_beam_search     | 320ms      | 3.07       | 7.69       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 16 --left-context-frames 128 |
 | greedy_search        | 640ms      | 2.81       | 7.15       | simulated streaming | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
 | greedy_search        | 640ms      | 2.84       | 7.16       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
 | modified_beam_search | 640ms      | 2.79       | 7.05       | simulated streaming | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
 | modified_beam_search | 640ms      | 2.81       | 7.11       | chunk-wise          | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
 | fast_beam_search     | 640ms      | 2.84       | 7.04       | simulated streaming | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
 | fast_beam_search     | 640ms      | 2.83       | 7.1        | chunk-wise          | --epoch 30 --avg 8 --chunk-size 32 --left-context-frames 256 |
 Note: For decoding mode, `simulated streaming` indicates feeding full utterance during decoding using `decode.py`,
  while `chunk-size` indicates feeding certain number of frames at each time using `streaming_decode.py`.
 The training command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0,1,2,3"
 ./zipformer/train.py \
  --world-size 4 \
  --num-epochs 40 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-causal \
  --causal 1 \
  --full-libri 1 \
  --max-duration 1000
 ```
 The simulated streaming decoding command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0"
 for m in greedy_search modified_beam_search fast_beam_search; do
  ./zipformer/decode.py \
    --epoch 30 \
    --avg 8 \
    --use-averaged-model 1 \
    --exp-dir ./zipformer/exp-causal \
    --causal 1 \
    --chunk-size 16 \
    --left-context-frames 128 \
    --max-duration 600 \
    --decoding-method $m
 done
 ```
 The chunk-wise streaming decoding command is:
 ```bash
 export CUDA_VISIBLE_DEVICES="0"
 for m in greedy_search modified_beam_search fast_beam_search; do
  ./zipformer/streaming_decode.py \
    --epoch 30 \
    --avg 8 \
    --use-averaged-model 1 \
    --exp-dir ./zipformer/exp-causal \
    --causal 1 \
    --chunk-size 16 \
    --left-context-frames 128 \
    --num-decode-streams 2000 \
    --decoding-method $m
 done
 ```
 ### pruned_transducer_stateless7 (Fine-tune with mux)
 See <https://github.com/k2-fsa/icefall/pull/1059> for more details.
 [pruned_transducer_stateless7](./pruned_transducer_stateless7)
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/MaNDZfO7RzW2Czzf3R2ZRA/>
 You can find the pretrained model and bpe model needed for fine-tuning at:
 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11>
 You can find a fine-tuned model, fine-tuning logs, decoding logs, and decoding
 results at:
 <https://huggingface.co/yfyeung/icefall-asr-finetune-mux-pruned_transducer_stateless7-2023-05-19>
 You can use <https://github.com/k2-fsa/sherpa> to deploy it.
 Number of model parameters: 70369391, i.e., 70.37 M
 | decoding method      |    dev     |    test    | test-clean | test-other |      comment       |
 |----------------------|------------|------------|------------|------------|--------------------|
 | greedy_search        |   14.27    |   14.22    |    2.08    |    4.79    | --epoch 20 --avg 5 |
 | modified_beam_search |   14.22    |   14.08    |    2.06    |    4.72    | --epoch 20 --avg 5 |
 | fast_beam_search     |   14.23    |   14.17    |    2.08    |    4.09    | --epoch 20 --avg 5 |
 The training commands are:
 ```bash
 export CUDA_VISIBLE_DEVICES="0,1"
 ./pruned_transducer_stateless7/finetune.py \
  --world-size 2 \
  --num-epochs 20 \
  --start-epoch 1 \
  --exp-dir pruned_transducer_stateless7/exp_giga_finetune \
  --subset S \
  --use-fp16 1 \
  --base-lr 0.005 \
  --lr-epochs 100 \
  --lr-batches 100000 \
  --bpe-model icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11/data/lang_bpe_500/bpe.model \
  --do-finetune True \
  --use-mux True \
  --finetune-ckpt icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11/exp/pretrain.pt \
  --max-duration 500
 ```
 The decoding commands are:
 ```bash
 # greedy_search
 ./pruned_transducer_stateless7/decode.py \
    --epoch 20 \
    --avg 5 \
    --use-averaged-model 1 \
    --exp-dir ./pruned_transducer_stateless7/exp_giga_finetune \
    --max-duration 600 \
    --decoding-method greedy_search
 # modified_beam_search
 ./pruned_transducer_stateless7/decode.py \
    --epoch 20 \
    --avg 5 \
    --use-averaged-model 1 \
    --exp-dir ./pruned_transducer_stateless7/exp_giga_finetune \
    --max-duration 600 \
    --decoding-method modified_beam_search \
    --beam-size 4
 # fast_beam_search
 ./pruned_transducer_stateless7/decode.py \
    --epoch 20 \
    --avg 5 \
    --use-averaged-model 1 \
    --exp-dir ./pruned_transducer_stateless7/exp_giga_finetune \
    --max-duration 600 \
    --decoding-method fast_beam_search \
    --beam 20.0 \
    --max-contexts 8 \
    --max-states 64
 ```
 ### pruned_transducer_stateless7 (zipformer + multidataset(LibriSpeech + GigaSpeech + CommonVoice 13.0))
 See <https://github.com/k2-fsa/icefall/pull/1010> for more details.
 [pruned_transducer_stateless7](./pruned_transducer_stateless7)
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/SwdJoHgZSZWn8ph9aJLb8g/>
 You can find a pretrained model, training logs, decoding logs, and decoding
 results at:
 <https://huggingface.co/yfyeung/icefall-asr-multidataset-pruned_transducer_stateless7-2023-05-04>
 You can use <https://github.com/k2-fsa/sherpa> to deploy it.
 Number of model parameters: 70369391, i.e., 70.37 M
 | decoding method      | test-clean | test-other | comment            |
 |----------------------|------------|------------|--------------------|
 | greedy_search        | 1.91       | 4.06       | --epoch 30 --avg 7 |
 | modified_beam_search | 1.90       | 3.99       | --epoch 30 --avg 7 |
 | fast_beam_search     | 1.90       | 3.98       | --epoch 30 --avg 7 |
 The training commands are:
 ```bash
 export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
 ./pruned_transducer_stateless7/train.py \
  --world-size 8 \
  --num-epochs 30 \
  --use-multidataset 1 \
  --use-fp16 1 \
  --max-duration 750 \
  --exp-dir pruned_transducer_stateless7/exp
 ```
 The decoding commands are:
 ```bash
 # greedy_search
 ./pruned_transducer_stateless7/decode.py \
    --epoch 30 \
    --avg 7 \
    --use-averaged-model 1 \
    --exp-dir ./pruned_transducer_stateless7/exp \
    --max-duration 600 \
    --decoding-method greedy_search
 # modified_beam_search
 ./pruned_transducer_stateless7/decode.py \
    --epoch 30 \
    --avg 7 \
    --use-averaged-model 1 \
    --exp-dir ./pruned_transducer_stateless7/exp \
    --max-duration 600 \
    --decoding-method modified_beam_search \
    --beam-size 4
 # fast_beam_search
 ./pruned_transducer_stateless7/decode.py \
    --epoch 30 \
    --avg 7 \
    --use-averaged-model 1 \
    --exp-dir ./pruned_transducer_stateless7/exp \
    --max-duration 600 \
    --decoding-method fast_beam_search \
    --beam 20.0 \
    --max-contexts 8 \
    --max-states 64
 ```
 ### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer + Multi-Dataset)
 #### [pruned_transducer_stateless7_streaming_multi](./pruned_transducer_stateless7_streaming_multi)
 See <https://github.com/k2-fsa/icefall/pull/984> for more details.
 You can find a pretrained model, training logs, decoding logs, and decoding
 results at: <https://huggingface.co/marcoyang/icefall-libri-giga-pruned-transducer-stateless7-streaming-2023-04-04>
 Number of model parameters: 70369391, i.e., 70.37 M
 ##### training on full librispeech + full gigaspeech (with giga_prob=0.9)
 The WERs are:
 | decoding method      | chunk size | test-clean | test-other | comment             | decoding mode        |
 |----------------------|------------|------------|------------|---------------------|----------------------|
 | greedy search        | 320ms      | 2.43       | 6.0        | --epoch 20 --avg 4  | simulated streaming  |
 | greedy search        | 320ms      | 2.47       | 6.13       | --epoch 20 --avg 4  | chunk-wise           |
 | fast beam search     | 320ms      | 2.43       | 5.99       | --epoch 20 --avg 4  | simulated streaming  |
 | fast beam search     | 320ms      | 2.8        | 6.46       | --epoch 20 --avg 4  | chunk-wise           |
 | modified beam search | 320ms      | 2.4        | 5.96       | --epoch 20 --avg 4  | simulated streaming  |
 | modified beam search | 320ms      | 2.42       | 6.03       | --epoch 20 --avg 4  | chunk-size           |
 | greedy search        | 640ms      | 2.26       | 5.58       | --epoch 20 --avg 4  | simulated streaming  |
 | greedy search        | 640ms      | 2.33       | 5.76       | --epoch 20 --avg 4  | chunk-wise           |
 | fast beam search     | 640ms      | 2.27       | 5.54       | --epoch 20 --avg 4  | simulated streaming  |
 | fast beam search     | 640ms      | 2.37       | 5.75       | --epoch 20 --avg 4  | chunk-wise           |
 | modified beam search | 640ms      | 2.22       | 5.5        | --epoch 20 --avg 4  | simulated streaming  |
 | modified beam search | 640ms      | 2.25       | 5.69       | --epoch 20 --avg 4  | chunk-size           |
 The model also has good WERs on GigaSpeech. The following WERs are achieved on GigaSpeech test and dev sets:
 | decoding method      | chunk size | dev        | test | comment    | decoding mode       |
 |----------------------|------------|-----|------|------------|---------------------|
 | greedy search        | 320ms      | 12.08       | 11.98       | --epoch 20 --avg 4  | simulated streaming  |
 | greedy search        | 640ms      | 11.66       | 11.71       | --epoch 20 --avg 4  | simulated streaming  |
 | modified beam search | 320ms      | 11.95       | 11.83       | --epoch 20 --avg 4  | simulated streaming  |
 | modified beam search | 320ms      | 11.65       | 11.56       | --epoch 20 --avg 4  | simulated streaming  |
 Note: `simulated streaming` indicates feeding full utterance during decoding using `decode.py`,
 while `chunk-size` indicates feeding certain number of frames at each time using `streaming_decode.py`.
 The training command is:
 ```bash
 ./pruned_transducer_stateless7_streaming_multi/train.py \
  --world-size 4 \
  --num-epochs 20 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir pruned_transducer_stateless7_streaming_multi/exp \
  --full-libri 1 \
  --giga-prob 0.9 \
  --max-duration 750 \
  --master-port 12345
 ```
 The tensorboard log can be found at
 <https://tensorboard.dev/experiment/G4yDMLXGQXexf41i4MA2Tg/#scalars>
 The simulated streaming decoding command (e.g., chunk-size=320ms) is:
 ```bash
 for m in greedy_search fast_beam_search modified_beam_search; do
  ./pruned_transducer_stateless7_streaming_multi/decode.py \
    --epoch 20 \
    --avg 4 \
    --exp-dir ./pruned_transducer_stateless7_streaming_multi/exp \
    --max-duration 600 \
    --decode-chunk-len 32 \
    --right-padding 64 \
    --decoding-method $m
 done
 ```
 The streaming chunk-size decoding command (e.g., chunk-size=320ms) is:
 ```bash
 for m in greedy_search modified_beam_search fast_beam_search; do
  ./pruned_transducer_stateless7_streaming_multi/streaming_decode.py \
    --epoch 20 \
    --avg 4 \
    --exp-dir ./pruned_transducer_stateless7_streaming_multi/exp \
    --decoding-method $m \
    --decode-chunk-len 32 \
    --num-decode-streams 2000
 done
 ```
 #### Smaller model
 We also provide a very small version (only 6.1M parameters) of this setup. The training command for the small model is:
 ```bash
 ./pruned_transducer_stateless7_streaming_multi/train.py \
  --world-size 4 \
  --num-epochs 30 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir pruned_transducer_stateless7_streaming_multi/exp \
  --full-libri 1 \
  --giga-prob 0.9 \
  --num-encoder-layers "2,2,2,2,2" \
  --feedforward-dims "256,256,512,512,256" \
  --nhead "4,4,4,4,4" \
  --encoder-dims "128,128,128,128,128" \
  --attention-dims "96,96,96,96,96" \
  --encoder-unmasked-dims "96,96,96,96,96" \
  --max-duration 1200 \
  --master-port 12345
 ```
 You can find this pretrained small model and its training logs, decoding logs, and decoding
 results at:
 <https://huggingface.co/marcoyang/icefall-libri-giga-pruned-transducer-stateless7-streaming-6M-2023-04-03>
 | decoding method      | chunk size | test-clean | test-other | comment             | decoding mode        |
 |----------------------|------------|------------|------------|---------------------|----------------------|
 | greedy search        | 320ms      | 5.95       | 15.03       | --epoch 30 --avg 1  | simulated streaming  |
 | greedy search        | 640ms      | 5.61       | 13.86       | --epoch 30 --avg 1  | simulated streaming  |
 | modified beam search | 320ms      | 5.72       | 14.34      | --epoch 30 --avg 1  | simulated streaming  |
 | modified beam search | 640ms      | 5.43       | 13.16      | --epoch 30 --avg 1  | simulated streaming  |
 | fast beam search | 320ms      | 5.88       | 14.45      | --epoch 30 --avg 1  | simulated streaming  |
 | fast beam search | 640ms      | 5.48       | 13.31      | --epoch 30 --avg 1  | simulated streaming  |
 This small model achieves the following WERs on GigaSpeech test and dev sets:
 | decoding method      | chunk size | dev | test | comment    | decoding mode       |
 |----------------------|------------|------------|------------|---------------------|----------------------|
 | greedy search        | 320ms      | 17.57       | 17.2     | --epoch 30 --avg 1  | simulated streaming  |
 | modified beam search | 320ms      | 16.98       | 11.98       | --epoch 30 --avg 1  | simulated streaming  |
 You can find the tensorboard logs at <https://tensorboard.dev/experiment/tAc5iXxTQrCQxky5O5OLyw/#scalars>.
 ### Streaming Zipformer-Transducer (Pruned Stateless Transducer + Streaming Zipformer)
 #### [pruned_transducer_stateless7_streaming](./pruned_transducer_stateless7_streaming)
@ -53,7 +585,7 @@ The tensorboard log can be found at
 The simulated streaming decoding command (e.g., chunk-size=320ms) is:
 ```bash
-for $m in greedy_search fast_beam_search modified_beam_search; do
+for m in greedy_search fast_beam_search modified_beam_search; do
  ./pruned_transducer_stateless7_streaming/decode.py \
    --epoch 30 \
    --avg 9 \
@ -76,6 +608,90 @@ for m in greedy_search modified_beam_search fast_beam_search; do
    --num-decode-streams 2000
 done
 ```
 We also support decoding with neural network LMs. After combining with language models, the WERs are
 | decoding method      | chunk size | test-clean | test-other | comment             | decoding mode        |
 |----------------------|------------|------------|------------|---------------------|----------------------|
 | `modified_beam_search` | 320ms      | 3.11       | 7.93       | --epoch 30 --avg 9  | simulated streaming  |
 | `modified_beam_search_lm_shallow_fusion` | 320ms      | 2.58       | 6.65       | --epoch 30 --avg 9  | simulated streaming  |
 | `modified_beam_search_lm_rescore` | 320ms      | 2.59       | 6.86       | --epoch 30 --avg 9  | simulated streaming  |
 | `modified_beam_search_lm_rescore_LODR` | 320ms      | 2.52       | 6.73       | --epoch 30 --avg 9  | simulated streaming  |
 Please use the following command for `modified_beam_search_lm_shallow_fusion`:
 ```bash
 for lm_scale in $(seq 0.15 0.01 0.38); do
    for beam_size in 4 8 12; do
        ./pruned_transducer_stateless7_streaming/decode.py \
            --epoch 99 \
            --avg 1 \
            --use-averaged-model False \
            --beam-size $beam_size \
            --exp-dir ./pruned_transducer_stateless7_streaming/exp-large-LM \
            --max-duration 600 \
            --decode-chunk-len 32 \
            --decoding-method modified_beam_search_lm_shallow_fusion \
            --use-shallow-fusion 1 \
            --lm-type rnn \
            --lm-exp-dir rnn_lm/exp \
            --lm-epoch 99 \
            --lm-scale $lm_scale \
            --lm-avg 1 \
            --rnn-lm-embedding-dim 2048 \
            --rnn-lm-hidden-dim 2048 \
            --rnn-lm-num-layers 3 \
            --lm-vocab-size 500
    done
 done
 ```
 Please use the following command for `modified_beam_search_lm_rescore`:
 ```bash
 ./pruned_transducer_stateless7_streaming/decode.py \
    --epoch 30 \
    --avg 9 \
    --use-averaged-model True \
    --beam-size 8 \
    --exp-dir ./pruned_transducer_stateless7_streaming/exp \
    --max-duration 600 \
    --decode-chunk-len 32 \
    --decoding-method modified_beam_search_lm_rescore \
    --use-shallow-fusion 0 \
    --lm-type rnn \
    --lm-exp-dir rnn_lm/exp \
    --lm-epoch 99 \
    --lm-avg 1 \
    --rnn-lm-embedding-dim 2048 \
    --rnn-lm-hidden-dim 2048 \
    --rnn-lm-num-layers 3 \
    --lm-vocab-size 500
 ```
 Please use the following command for `modified_beam_search_lm_rescore_LODR`:
 ```bash
 ./pruned_transducer_stateless7_streaming/decode.py \
    --epoch 30 \
    --avg 9 \
    --use-averaged-model True \
    --beam-size 8 \
    --exp-dir ./pruned_transducer_stateless7_streaming/exp \
    --max-duration 600 \
    --decode-chunk-len 32 \
    --decoding-method modified_beam_search_lm_rescore_LODR \
    --use-shallow-fusion 0 \
    --lm-type rnn \
    --lm-exp-dir rnn_lm/exp \
    --lm-epoch 99 \
    --lm-avg 1 \
    --rnn-lm-embedding-dim 2048 \
    --rnn-lm-hidden-dim 2048 \
    --rnn-lm-num-layers 3 \
    --lm-vocab-size 500 \
    --tokens-ngram 2 \
    --backoff-id 500
 ```
 A well-trained RNNLM can be found here: <https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm/tree/main>. The bi-gram used in LODR decoding
 can be found here: <https://huggingface.co/marcoyang/librispeech_bigram>.
 #### Smaller model
@ -540,6 +1156,10 @@ for m in greedy_search fast_beam_search modified_beam_search ; do
 done
 ```
 Note that a small change is made to the `pruned_transducer_stateless7/decoder.py` in
 this [PR](https://github.com/k2-fsa/icefall/pull/942) to address the
 problem of emitting the first symbol at the very beginning. If you need a
 model without this issue, please download the model from here: <https://huggingface.co/marcoyang/icefall-asr-librispeech-pruned-transducer-stateless7-2023-03-10>
 ### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T + gradient filter)