From ecbe9851f026f32b89620064ecfb748a960ea0de Mon Sep 17 00:00:00 2001 From: Kinan Martin Date: Thu, 4 Sep 2025 10:57:11 +0900 Subject: [PATCH] Update streaming train and export commands --- egs/multi_ja_en/ASR/RESULTS.md | 49 +++++++++++++++++++++++++++------- 1 file changed, 39 insertions(+), 10 deletions(-) diff --git a/egs/multi_ja_en/ASR/RESULTS.md b/egs/multi_ja_en/ASR/RESULTS.md index e89c9e3b6..24dd42a26 100644 --- a/egs/multi_ja_en/ASR/RESULTS.md +++ b/egs/multi_ja_en/ASR/RESULTS.md @@ -11,6 +11,7 @@ The training command is: ```shell ./zipformer/train.py \ --world-size 8 \ + --causal 1 \ --num-epochs 10 \ --start-epoch 1 \ --use-fp16 1 \ @@ -82,6 +83,7 @@ The training command is: ```shell ./zipformer/train.py \ --world-size 8 \ + --causal 1 \ --num-epochs 10 \ --start-epoch 1 \ --use-fp16 1 \ @@ -93,24 +95,51 @@ The training command is: The decoding command is: ```shell -./zipformer/decode.py \ - --epoch 10 \ - --avg 1 \ - --exp-dir ./zipformer/exp \ - --decoding-method modified_beam_search \ - --manifest-dir data/manifests +TODO ``` -To export the model with onnx: +To export the model with sherpa onnx: ```shell -./zipformer/export-onnx.py \ +./zipformer/export-onnx-streaming.py \ --tokens ./data/lang/bbpe_2000/tokens.txt \ --use-averaged-model 0 \ --epoch 10 \ --avg 1 \ - --decode-chunk-len 32 \ - --exp-dir ./zipformer/exp + --exp-dir ./zipformer/exp-15k15k-streaming \ + --num-encoder-layers "2,2,3,4,3,2" \ + --downsampling-factor "1,2,4,8,4,2" \ + --feedforward-dim "512,768,1024,1536,1024,768" \ + --num-heads "4,4,4,8,4,4" \ + --encoder-dim "192,256,384,512,384,256" \ + --query-head-dim 32 \ + --value-head-dim 12 \ + --pos-head-dim 4 \ + --pos-dim 48 \ + --encoder-unmasked-dim "192,192,256,256,256,192" \ + --cnn-module-kernel "31,31,15,15,15,31" \ + --decoder-dim 512 \ + --joiner-dim 512 \ + --causal True \ + --chunk-size 16 \ + --left-context-frames 128 \ + --fp16 True +``` + +(Adjust the `chunk-size` and `left-context-frames` as necessary) + +To export the model as Torchscript (`.jit`): + +```shell +./zipformer/export.py \ + --exp-dir ./zipformer/exp-15k15k-streaming \ + --causal 1 \ + --chunk-size 16 \ + --left-context-frames 128 \ + --tokens data/lang/bbpe_2000/tokens.txt \ + --epoch 10 \ + --avg 1 \ + --jit 1 ``` You may also use decode chunk sizes `16`, `32`, `64`, `128`.