export.py and uploaded models to HF

This commit is contained in:
AmirHussein96 2025-09-15 21:22:40 -04:00
parent bb45822c81
commit e0593f3152
2 changed files with 90 additions and 41 deletions

View File

@ -65,7 +65,7 @@ You can find a pretrained model, training logs, decoding logs, and decoding resu
|------------------------------------|------------|------------|------------------------------------------| |------------------------------------|------------|------------|------------------------------------------|
| modified beam search | 14.7 | 12.4 | --epoch 20, --avg 10, beam(10),pruned range 5 | | modified beam search | 14.7 | 12.4 | --epoch 20, --avg 10, beam(10),pruned range 5 |
| modified beam search | 15.5 | 13 | --epoch 20, --avg 10, beam(20),pruned range 5 | | modified beam search | 15.5 | 13 | --epoch 20, --avg 10, beam(20),pruned range 5 |
| modified beam search | 17.6 | 14.8 | --epoch 20, --avg 10, beam(10), pruned range 10 | | modified beam search | 18.2 | 14.8 | --epoch 20, --avg 10, beam(20), pruned range 10 |
@ -77,10 +77,10 @@ To reproduce the above result, use the following commands for training:
./zipformer/train.py \ ./zipformer/train.py \
--world-size 4 \ --world-size 4 \
--num-epochs 30 \ --num-epochs 25 \
--start-epoch 1 \ --start-epoch 1 \
--use-fp16 1 \ --use-fp16 1 \
--exp-dir zipformer/exp-st-medium-nohat800s-warmstep8k_baselr05_lrbatch5k_lrepoch6 \ --exp-dir zipformer/exp-st-medium \
--causal 0 \ --causal 0 \
--num-encoder-layers 2,2,2,2,2,2 \ --num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \ --feedforward-dim 512,768,1024,1536,1024,768 \
@ -88,8 +88,8 @@ To reproduce the above result, use the following commands for training:
--encoder-unmasked-dim 192,192,256,256,256,192 \ --encoder-unmasked-dim 192,192,256,256,256,192 \
--max-duration 800 \ --max-duration 800 \
--prune-range 10 \ --prune-range 10 \
--warm-step 8000 \ --warm-step 5000 \
--lr-epochs 6 \ --lr-epochs 8 \
--base-lr 0.055 \ --base-lr 0.055 \
--use-hat False --use-hat False
@ -106,7 +106,7 @@ for method in modified_beam_search; do
./zipformer/decode.py \ ./zipformer/decode.py \
--epoch $epoch \ --epoch $epoch \
--beam-size 20 \ --beam-size 20 \
--avg 13 \ --avg 10 \
--exp-dir ./zipformer/exp-st-medium-prun10 \ --exp-dir ./zipformer/exp-st-medium-prun10 \
--max-duration 800 \ --max-duration 800 \
--decoding-method $method \ --decoding-method $method \
@ -115,7 +115,8 @@ for method in modified_beam_search; do
--encoder-dim 192,256,384,512,384,256 \ --encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192 \ --encoder-unmasked-dim 192,192,256,256,256,192 \
--context-size 2 \ --context-size 2 \
--use-averaged-model true --use-averaged-model true \
--use-hat False
done done
done done
``` ```

View File

@ -27,11 +27,16 @@ Usage:
- For non-streaming model: - For non-streaming model:
./zipformer/export.py \ ./zipformer/export.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--bpe-model data/lang_bpe_500/bpe.model \ --bpe-model data/lang_bpe_en_1000/bpe.model \
--epoch 30 \ --epoch 20 \
--avg 9 \ --avg 10 \
--jit 1 --use-averaged-model True \
--jit 1 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
It will generate a file `jit_script.pt` in the given `exp_dir`. You can later It will generate a file `jit_script.pt` in the given `exp_dir`. You can later
load it by `torch.jit.load("jit_script.pt")`. load it by `torch.jit.load("jit_script.pt")`.
@ -44,14 +49,19 @@ for how to use the exported models outside of icefall.
- For streaming model: - For streaming model:
./zipformer/export.py \ ./zipformer/export.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--causal 1 \ --causal 1 \
--chunk-size 16 \ --chunk-size 32 \
--left-context-frames 128 \ --left-context-frames 128 \
--bpe-model data/lang_bpe_500/bpe.model \ --bpe-model data/lang_bpe_en_1000/bpe.model \
--epoch 30 \ --epoch 20 \
--avg 9 \ --avg 10 \
--jit 1 --use-averaged-model True \
--jit 1 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
It will generate a file `jit_script_chunk_16_left_128.pt` in the given `exp_dir`. It will generate a file `jit_script_chunk_16_left_128.pt` in the given `exp_dir`.
You can later load it by `torch.jit.load("jit_script_chunk_16_left_128.pt")`. You can later load it by `torch.jit.load("jit_script_chunk_16_left_128.pt")`.
@ -65,40 +75,69 @@ for how to use the exported models outside of icefall.
- For non-streaming model: - For non-streaming model:
./zipformer/export.py \ ./zipformer/export.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--bpe-model data/lang_bpe_500/bpe.model \ --causal 0 \
--epoch 30 \ --bpe-model data/lang_bpe_en_1000/bpe.model \
--avg 9 --epoch 20 \
--avg 10 \
--use-averaged-model True \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
- For streaming model: - For streaming model:
./zipformer/export.py \ ./zipformer/export.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--causal 1 \ --causal 1 \
--bpe-model data/lang_bpe_500/bpe.model \ --bpe-model data/lang_bpe_en_1000/bpe.model \
--epoch 30 \ --epoch 20 \
--avg 9 --avg 10 \
--use-averaged-model True \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
It will generate a file `pretrained.pt` in the given `exp_dir`. You can later It will generate a file `pretrained.pt` in the given `exp_dir`. You can later
load it by `icefall.checkpoint.load_checkpoint()`. load it by `icefall.checkpoint.load_checkpoint()`.
- For non-streaming model: - For non-streaming model:
To use the generated file with `zipformer/decode.py`, To use the generated file with `zipformer/decode_st.py`,
you can do: you can do:
cd /path/to/exp_dir cd /path/to/exp_dir
ln -s pretrained.pt epoch-9999.pt ln -s pretrained.pt epoch-9999.pt
cd /path/to/egs/librispeech/ASR cd /path/to/egs/iwslt_ta/ST
./zipformer/decode.py \ ./zipformer/decode.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--epoch 9999 \ --epoch 9999 \
--avg 1 \ --avg 1 \
--max-duration 600 \ --max-duration 800 \
--decoding-method greedy_search \ --decoding-method greedy_search \
--bpe-model data/lang_bpe_500/bpe.model --bpe-model data/lang_bpe_en_1000/bpe.model \
--use-hat false
./zipformer/decode.py \
--exp-dir ./zipformer/exp-st-medium6 \
--epoch 9999 \
--avg 1 \
--beam-size 20 \
--max-duration 800 \
--decoding-method modified_beam_search \
--bpe-model data/lang_bpe_en_1000/bpe.model \
--use-hat false \
--use-averaged-model false \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
- For streaming model: - For streaming model:
@ -107,11 +146,11 @@ To use the generated file with `zipformer/decode.py` and `zipformer/streaming_de
cd /path/to/exp_dir cd /path/to/exp_dir
ln -s pretrained.pt epoch-9999.pt ln -s pretrained.pt epoch-9999.pt
cd /path/to/egs/librispeech/ASR cd /path/to/egs/iwslt_ta/ST
# simulated streaming decoding # simulated streaming decoding
./zipformer/decode.py \ ./zipformer/decode.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--epoch 9999 \ --epoch 9999 \
--avg 1 \ --avg 1 \
--max-duration 600 \ --max-duration 600 \
@ -119,11 +158,17 @@ To use the generated file with `zipformer/decode.py` and `zipformer/streaming_de
--chunk-size 16 \ --chunk-size 16 \
--left-context-frames 128 \ --left-context-frames 128 \
--decoding-method greedy_search \ --decoding-method greedy_search \
--bpe-model data/lang_bpe_500/bpe.model --bpe-model data/lang_bpe_en_1000/bpe.model \
--use-hat false \
--use-averaged-model false \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
# chunk-wise streaming decoding # chunk-wise streaming decoding
./zipformer/streaming_decode.py \ ./zipformer/streaming_decode.py \
--exp-dir ./zipformer/exp \ --exp-dir ./zipformer/exp-st-medium6 \
--epoch 9999 \ --epoch 9999 \
--avg 1 \ --avg 1 \
--max-duration 600 \ --max-duration 600 \
@ -131,7 +176,13 @@ To use the generated file with `zipformer/decode.py` and `zipformer/streaming_de
--chunk-size 16 \ --chunk-size 16 \
--left-context-frames 128 \ --left-context-frames 128 \
--decoding-method greedy_search \ --decoding-method greedy_search \
--bpe-model data/lang_bpe_500/bpe.model --bpe-model data/lang_bpe_en_1000/bpe.model \
--use-hat false \
--use-averaged-model false \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1536,1024,768 \
--encoder-dim 192,256,384,512,384,256 \
--encoder-unmasked-dim 192,192,256,256,256,192
Check ./pretrained.py for its usage. Check ./pretrained.py for its usage.
@ -139,17 +190,14 @@ Note: If you don't want to train a model from scratch, we have
provided one for you. You can get it at provided one for you. You can get it at
- non-streaming model: - non-streaming model:
https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15 https://huggingface.co/AmirHussein/zipformer-iwslt22-Ta
- streaming model:
https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17
with the following commands: with the following commands:
sudo apt-get install git-lfs sudo apt-get install git-lfs
git lfs install git lfs install
git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15 git clone https://huggingface.co/AmirHussein/zipformer-iwslt22-Ta
git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17
# You will find the pre-trained models in exp dir # You will find the pre-trained models in exp dir
""" """