update README.md and RESULTS.md

2025-12-11 06:55:27 +00:00 · 2023-06-14 10:18:37 +08:00 · 2023-06-14 10:18:37 +08:00 · 11ea660c86
commit 11ea660c86
parent 40d2bda318
2 changed files with 66 additions and 1 deletions
--- a/egs/librispeech/ASR/README.md
+++ b/egs/librispeech/ASR/README.md
@ -47,6 +47,7 @@ We place an additional Conv1d layer right after the input embedding layer.
 | `conformer-ctc`              | Conformer          | Use auxiliary attention head |
 | `conformer-ctc2`             | Reworked Conformer | Use auxiliary attention head |
 | `conformer-ctc3`             | Reworked Conformer | Streaming version + delay penalty |
+| `zipformer`                  | Upgraded Zipformer | Use auxiliary transducer head | The latest recipe |

 # MMI

--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@ -1,5 +1,69 @@
 ## Results

+### zipformer (zipformer + pruned stateless transducer + CTC)
+
+See <https://github.com/k2-fsa/icefall/pull/1111> for more details.
+
+[zipformer](./zipformer)
+
+#### Non-streaming
+
+##### normal-scaled model, number of model parameters: 65805511, i.e., 65.81 M
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/Lo3Qlad7TP68ulM2K0ixgQ/>
+
+You can find a pretrained model, training logs, decoding logs, and decoding results at:
+<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-transducer-ctc-2023-06-13>
+
+You can use <https://github.com/k2-fsa/sherpa> to deploy it.
+
+Results of the CTC head:
+
+| decoding method         | test-clean | test-other | comment            |
+|-------------------------|------------|------------|--------------------|
+| ctc-decoding            | 2.40       | 5.66       | --epoch 40 --avg 16 |
+| 1best                   | 2.46       | 5.11       | --epoch 40 --avg 16 |
+| nbest                   | 2.46       | 5.11       | --epoch 40 --avg 16 |
+| nbest-rescoring         | 2.37       | 4.93       | --epoch 40 --avg 16 |
+| whole-lattice-rescoring | 2.37       | 4.88       | --epoch 40 --avg 16 |
+
+The training command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+./zipformer/train.py \
+  --world-size 4 \
+  --num-epochs 40 \
+  --start-epoch 1 \
+  --use-fp16 1 \
+  --exp-dir zipformer/exp-ctc-rnnt \
+  --causal 0 \
+  --use-transducer 1 \
+  --use-ctc 1 \
+  --ctc-loss-scale 0.2 \
+  --full-libri 1 \
+  --max-duration 1000
+```
+
+The decoding command is:
+```bash
+export CUDA_VISIBLE_DEVICES="0"
+for m in ctc-decoding 1best nbest nbest-rescoring whole-lattice-rescoring; do
+  ./zipformer/ctc_decode.py \
+      --epoch 40 \
+      --avg 16 \
+      --exp-dir zipformer/exp-ctc-rnnt \
+      --use-transducer 1 \
+      --use-ctc 1 \
+      --max-duration 300 \
+      --causal 0 \
+      --num-paths 100 \
+      --nbest-scale 1.0 \
+      --hlg-scale 0.6 \
+      --decoding-method $m
+done
+```
+
 ### zipformer (zipformer + pruned stateless transducer)

 See <https://github.com/k2-fsa/icefall/pull/1058> for more details.