icefall/egs/seame/ASR/RESULTS.md
2024-04-05 09:58:02 -04:00

169 lines
5.0 KiB
Markdown

## Results
#### Zipformer
| | dev | test | comment |
|------------------------------------|------------|------------|------------------------------------------|
| modified beam search | 21.87 | 29.04 | --epoch 25, --avg 5, --max-duration 500 |
The training command:
```
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./zipformer/train.py \
--world-size 4 \
--num-epochs 25 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp-asr-seame \
--causal 0 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1024,1024,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
--prune-range 10 \
--max-duration 500
```
The decoding command:
```
./zipformer/decode.py \
--epoch 25 \
--avg 5 \
--beam-size 10
--exp-dir ./zipformer/exp-asr-seame \
--max-duration 800 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1024,1024,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
--decoding-method modified_beam_search
```
The pretrained model is available at: <https://huggingface.co/AmirHussein/zipformer-seame>
### Zipformer-HAT
| | dev | test | comment |
|------------------------------------|------------|------------|------------------------------------------|
| modified beam search | 22.00 | 29.92 | --epoch 20, --avg 5, --max-duration 500 |
The training command for reproducing is given below:
```
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./zipformer_hat/train.py \
--world-size 4 \
--num-epochs 25 \
--start-epoch 1 \
--base-lr 0.045 \
--lr-epochs 6 \
--use-fp16 1 \
--exp-dir zipformer_hat/exp \
--causal 0 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1024,1024,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
--prune-range 10 \
--max-duration 500
```
The decoding command is:
```
## modified beam search
./zipformer_hat/decode.py \
--epoch 25 --avg 5 --use-averaged-model True \
--beam-size 10 \
--causal 0 \
--exp-dir zipformer_hat/exp \
--bpe-model data_seame/lang_bpe_4000/bpe.model \
--max-duration 1000 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1024,1024,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
--decoding-method modified_beam_search
```
A pre-trained model and decoding logs can be found at <https://huggingface.co/AmirHussein/zipformer-hat-seame>
### Zipformer-HAT-LID
| | dev | test | comment |
|------------------------------------|------------|------------|------------------------------------------|
| modified beam search | 20.04 | 26.91 | --epoch 15, --avg 5, --max-duration 500 |
The training command for reproducing is given below:
```
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./zipformer_hat_lid/train.py \
--world-size 4 \
--lid True \
--num-epochs 25 \
--start-epoch 1 \
--base-lr 0.045 \
--use-fp16 1 \
--lid-loss-scale 0.3 \
--exp-dir zipformer_hat_lid/exp \
--causal 0 \
--lid-output-layer 3 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1024,1024,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
--lids "<en>,<zh>" \
--prune-range 10 \
--freeze-main-model False \
--use-lid-encoder True \
--use-lid-joiner True \
--lid-num-encoder-layers 2,2,2 \
--lid-downsampling-factor 2,4,2 \
--lid-feedforward-dim 256,256,256 \
--lid-num-heads 4,4,4 \
--lid-encoder-dim 256,256,256 \
--lid-encoder-unmasked-dim 128,128,128 \
--lid-cnn-module-kernel 31,15,31 \
--max-duration 500
```
The decoding command is:
```
## modified beam search
python zipformer_hat_lid/decode.py \
--epoch $epoch --avg 5 --use-averaged-model True \
--beam-size 10 \
--lid False \
--lids "<en>,<zh>" \
--exp-dir zipformer_hat_lid/exp \
--bpe-model data_seame/lang_bpe_4000/bpe.model \
--max-duration 800 \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,1024,1024,1024,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
--decoding-method modified_beam_search \
--lid-output-layer 3 \
--use-lid-encoder True \
--use-lid-joiner True \
--lid-num-encoder-layers 2,2,2 \
--lid-downsampling-factor 2,4,2 \
--lid-feedforward-dim 256,256,256 \
--lid-num-heads 4,4,4 \
--lid-encoder-dim 256,256,256 \
--lid-encoder-unmasked-dim 128,128,128 \
--lid-cnn-module-kernel 31,15,31
```
A pre-trained model and decoding logs can be found at <https://huggingface.co/AmirHussein/zipformer-hat-lid-seame>