From a6f4bc77c8aee73f88d984f85fc4d4c8d0921b0e Mon Sep 17 00:00:00 2001
From: Fangjun Kuang <csukuangfj@gmail.com>
Date: Wed, 1 Jun 2022 08:32:36 +0800
Subject: [PATCH] Update results for streaming Emformer.

---
 egs/librispeech/ASR/README.md  |  1 +
 egs/librispeech/ASR/RESULTS.md | 64 ++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/egs/librispeech/ASR/README.md b/egs/librispeech/ASR/README.md
index a738b652f..e2aaa9d7e 100644
--- a/egs/librispeech/ASR/README.md
+++ b/egs/librispeech/ASR/README.md
@@ -22,6 +22,7 @@ The following table lists the differences among them.
 | `pruned_transducer_stateless4`        | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless2 + save averaged models periodically during training                        |
 | `pruned_transducer_stateless5`        | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + more layers + random combiner|
 | `pruned_transducer_stateless6`        | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + distillation with hubert|
+| `pruned_stateless_emformer_rnnt2`     | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
 
 
 The decoder in `transducer_stateless` is modified from the paper
diff --git a/egs/librispeech/ASR/RESULTS.md b/egs/librispeech/ASR/RESULTS.md
index 453751ba5..15f72e55f 100644
--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@@ -1,5 +1,69 @@
 ## Results
 
+### LibriSpeech BPE training results (Pruned Stateless Emformer RNN-T)
+
+[pruned_stateless_emformer_rnnt2](./pruned_stateless_emformer_rnnt2)
+
+Use [Emformer](https://arxiv.org/abs/2010.10759) from [torchaudio](https://github.com/pytorch/audio)
+for streaming ASR. The Emformer model is imported from torchaudio without modifications.
+
+|                                     | test-clean | test-other | comment                                |
+|-------------------------------------|------------|------------|----------------------------------------|
+| greedy search (max sym per frame 1) | 4.28       | 11.42       | --epoch 39 --avg 6  --max-duration 600 |
+| modified beam search                | 4.22       | 11.16       | --epoch 39 --avg 6  --max-duration 600 |
+| fast beam search                    | 4.29       | 11.26       | --epoch 39 --avg 6 --max-duration 600  |
+
+
+The training commands are:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+
+./pruned_stateless_emformer_rnnt2/train.py \
+  --world-size 8 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --exp-dir pruned_stateless_emformer_rnnt2/exp-full \
+  --full-libri 1 \
+  --use-fp16 0 \
+  --max-duration 200 \
+  --prune-range 5 \
+  --lm-scale 0.25 \
+  --master-port 12358 \
+  --num-encoder-layers 18 \
+  --left-context-length 128 \
+  --segment-length 8 \
+  --right-context-length 4
+```
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/ZyiqhAhmRjmr49xml4ofLw/>
+
+The decoding commands are:
+```bash
+for m in greedy_search fast_beam_search modified_beam_search; do
+  for epoch in 39; do
+    for avg in 6; do
+      ./pruned_stateless_emformer_rnnt2/decode.py \
+        --epoch $epoch \
+        --avg $avg \
+        --use-averaged-model 1 \
+        --exp-dir pruned_stateless_emformer_rnnt2/exp-full \
+        --max-duration 50 \
+        --decoding-method $m \
+        --num-encoder-layers 18 \
+        --left-context-length 128 \
+        --segment-length 8 \
+        --right-context-length 4
+    done
+  done
+done
+```
+
+You can find a pretrained model, training logs, decoding logs, and decoding
+results at:
+<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01>
+
+
 ### LibriSpeech BPE training results (Pruned Stateless Transducer 5)
 
 [pruned_transducer_stateless5](./pruned_transducer_stateless5)