Update streaming train and export commands

2025-09-04 10:57:11 +09:00 · 2025-09-04 10:57:11 +09:00 · ecbe9851f0
commit ecbe9851f0
parent bc2560cb7a
1 changed files with 39 additions and 10 deletions
--- a/egs/multi_ja_en/ASR/RESULTS.md
+++ b/egs/multi_ja_en/ASR/RESULTS.md
@ -11,6 +11,7 @@ The training command is:
 ```shell
 ./zipformer/train.py \
  --world-size 8 \
+  --causal 1 \
  --num-epochs 10 \
  --start-epoch 1 \
  --use-fp16 1 \
@ -82,6 +83,7 @@ The training command is:
 ```shell
 ./zipformer/train.py \
  --world-size 8 \
+  --causal 1 \
  --num-epochs 10 \
  --start-epoch 1 \
  --use-fp16 1 \
@ -93,24 +95,51 @@ The training command is:
 The decoding command is:

 ```shell
-./zipformer/decode.py \
-    --epoch 10 \
-    --avg 1 \
-    --exp-dir ./zipformer/exp \
-    --decoding-method modified_beam_search \
-    --manifest-dir data/manifests
+TODO
 ```

-To export the model with onnx:
+To export the model with sherpa onnx:

 ```shell
-./zipformer/export-onnx.py \
+./zipformer/export-onnx-streaming.py \
  --tokens ./data/lang/bbpe_2000/tokens.txt \
  --use-averaged-model 0 \
  --epoch 10 \
  --avg 1 \
-  --decode-chunk-len 32 \
-  --exp-dir ./zipformer/exp
+  --exp-dir ./zipformer/exp-15k15k-streaming \
+  --num-encoder-layers "2,2,3,4,3,2" \
+  --downsampling-factor "1,2,4,8,4,2" \
+  --feedforward-dim "512,768,1024,1536,1024,768" \
+  --num-heads "4,4,4,8,4,4" \
+  --encoder-dim "192,256,384,512,384,256" \
+  --query-head-dim 32 \
+  --value-head-dim 12 \
+  --pos-head-dim 4 \
+  --pos-dim 48 \
+  --encoder-unmasked-dim "192,192,256,256,256,192" \
+  --cnn-module-kernel "31,31,15,15,15,31" \
+  --decoder-dim 512 \
+  --joiner-dim 512 \
+  --causal True \
+  --chunk-size 16 \
+  --left-context-frames 128 \
+  --fp16 True
+```
+
+(Adjust the `chunk-size` and `left-context-frames` as necessary)
+
+To export the model as Torchscript (`.jit`):
+
+```shell
+./zipformer/export.py \
+  --exp-dir ./zipformer/exp-15k15k-streaming \
+  --causal 1 \
+  --chunk-size 16 \
+  --left-context-frames 128 \
+  --tokens data/lang/bbpe_2000/tokens.txt \
+  --epoch 10 \
+  --avg 1 \
+  --jit 1
 ```

 You may also use decode chunk sizes `16`, `32`, `64`, `128`.