mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 18:12:19 +00:00
169 lines
5.0 KiB
Markdown
169 lines
5.0 KiB
Markdown
## Results
|
|
|
|
#### Zipformer
|
|
|
|
| | dev | test | comment |
|
|
|------------------------------------|------------|------------|------------------------------------------|
|
|
| modified beam search | 21.87 | 29.04 | --epoch 25, --avg 5, --max-duration 500 |
|
|
|
|
The training command:
|
|
|
|
```
|
|
export CUDA_VISIBLE_DEVICES="0,1,2,3"
|
|
|
|
./zipformer/train.py \
|
|
--world-size 4 \
|
|
--num-epochs 25 \
|
|
--start-epoch 1 \
|
|
--use-fp16 1 \
|
|
--exp-dir zipformer/exp-asr-seame \
|
|
--causal 0 \
|
|
--num-encoder-layers 2,2,2,2,2,2 \
|
|
--feedforward-dim 512,768,1024,1024,1024,768 \
|
|
--encoder-dim 192,256,256,256,256,256 \
|
|
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
|
--prune-range 10 \
|
|
--max-duration 500
|
|
```
|
|
|
|
The decoding command:
|
|
|
|
```
|
|
./zipformer/decode.py \
|
|
--epoch 25 \
|
|
--avg 5 \
|
|
--beam-size 10
|
|
--exp-dir ./zipformer/exp-asr-seame \
|
|
--max-duration 800 \
|
|
--num-encoder-layers 2,2,2,2,2,2 \
|
|
--feedforward-dim 512,768,1024,1024,1024,768 \
|
|
--encoder-dim 192,256,256,256,256,256 \
|
|
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
|
--decoding-method modified_beam_search
|
|
```
|
|
|
|
The pretrained model is available at: <https://huggingface.co/AmirHussein/zipformer-seame>
|
|
|
|
|
|
### Zipformer-HAT
|
|
|
|
| | dev | test | comment |
|
|
|------------------------------------|------------|------------|------------------------------------------|
|
|
| modified beam search | 22.00 | 29.92 | --epoch 20, --avg 5, --max-duration 500 |
|
|
|
|
|
|
The training command for reproducing is given below:
|
|
|
|
```
|
|
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
|
|
|
|
./zipformer_hat/train.py \
|
|
--world-size 4 \
|
|
--num-epochs 25 \
|
|
--start-epoch 1 \
|
|
--base-lr 0.045 \
|
|
--lr-epochs 6 \
|
|
--use-fp16 1 \
|
|
--exp-dir zipformer_hat/exp \
|
|
--causal 0 \
|
|
--num-encoder-layers 2,2,2,2,2,2 \
|
|
--feedforward-dim 512,768,1024,1024,1024,768 \
|
|
--encoder-dim 192,256,256,256,256,256 \
|
|
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
|
--prune-range 10 \
|
|
--max-duration 500
|
|
```
|
|
|
|
The decoding command is:
|
|
```
|
|
## modified beam search
|
|
./zipformer_hat/decode.py \
|
|
--epoch 25 --avg 5 --use-averaged-model True \
|
|
--beam-size 10 \
|
|
--causal 0 \
|
|
--exp-dir zipformer_hat/exp \
|
|
--bpe-model data_seame/lang_bpe_4000/bpe.model \
|
|
--max-duration 1000 \
|
|
--num-encoder-layers 2,2,2,2,2,2 \
|
|
--feedforward-dim 512,768,1024,1024,1024,768 \
|
|
--encoder-dim 192,256,256,256,256,256 \
|
|
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
|
--decoding-method modified_beam_search
|
|
```
|
|
|
|
A pre-trained model and decoding logs can be found at <https://huggingface.co/AmirHussein/zipformer-hat-seame>
|
|
|
|
### Zipformer-HAT-LID
|
|
|
|
| | dev | test | comment |
|
|
|------------------------------------|------------|------------|------------------------------------------|
|
|
| modified beam search | 20.04 | 26.91 | --epoch 15, --avg 5, --max-duration 500 |
|
|
|
|
The training command for reproducing is given below:
|
|
|
|
```
|
|
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
|
|
|
|
./zipformer_hat_lid/train.py \
|
|
--world-size 4 \
|
|
--lid True \
|
|
--num-epochs 25 \
|
|
--start-epoch 1 \
|
|
--base-lr 0.045 \
|
|
--use-fp16 1 \
|
|
--lid-loss-scale 0.3 \
|
|
--exp-dir zipformer_hat_lid/exp \
|
|
--causal 0 \
|
|
--lid-output-layer 3 \
|
|
--num-encoder-layers 2,2,2,2,2,2 \
|
|
--feedforward-dim 512,768,1024,1024,1024,768 \
|
|
--encoder-dim 192,256,256,256,256,256 \
|
|
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
|
--lids "<en>,<zh>" \
|
|
--prune-range 10 \
|
|
--freeze-main-model False \
|
|
--use-lid-encoder True \
|
|
--use-lid-joiner True \
|
|
--lid-num-encoder-layers 2,2,2 \
|
|
--lid-downsampling-factor 2,4,2 \
|
|
--lid-feedforward-dim 256,256,256 \
|
|
--lid-num-heads 4,4,4 \
|
|
--lid-encoder-dim 256,256,256 \
|
|
--lid-encoder-unmasked-dim 128,128,128 \
|
|
--lid-cnn-module-kernel 31,15,31 \
|
|
--max-duration 500
|
|
|
|
```
|
|
|
|
The decoding command is:
|
|
```
|
|
## modified beam search
|
|
python zipformer_hat_lid/decode.py \
|
|
--epoch $epoch --avg 5 --use-averaged-model True \
|
|
--beam-size 10 \
|
|
--lid False \
|
|
--lids "<en>,<zh>" \
|
|
--exp-dir zipformer_hat_lid/exp \
|
|
--bpe-model data_seame/lang_bpe_4000/bpe.model \
|
|
--max-duration 800 \
|
|
--num-encoder-layers 2,2,2,2,2,2 \
|
|
--feedforward-dim 512,768,1024,1024,1024,768 \
|
|
--encoder-dim 192,256,256,256,256,256 \
|
|
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
|
--decoding-method modified_beam_search \
|
|
--lid-output-layer 3 \
|
|
--use-lid-encoder True \
|
|
--use-lid-joiner True \
|
|
--lid-num-encoder-layers 2,2,2 \
|
|
--lid-downsampling-factor 2,4,2 \
|
|
--lid-feedforward-dim 256,256,256 \
|
|
--lid-num-heads 4,4,4 \
|
|
--lid-encoder-dim 256,256,256 \
|
|
--lid-encoder-unmasked-dim 128,128,128 \
|
|
--lid-cnn-module-kernel 31,15,31
|
|
```
|
|
|
|
A pre-trained model and decoding logs can be found at <https://huggingface.co/AmirHussein/zipformer-hat-lid-seame>
|
|
|
|
|