icefall/egs/iwslt22_ta/ASR/RESULTS.md
2023-11-01 05:31:43 +03:00

2.8 KiB

Results

IWSLT Tunisian training results (Stateless Pruned Transducer)

2023-06-01

Decoding method dev WER test WER comment
modified beam search 47.6 51.2 --epoch 20, --avg 10

The training command for reproducing is given below:

export CUDA_VISIBLE_DEVICES="0,1,2,3"


  
./pruned_transducer_stateless5/train.py \
  --world-size 4 \
  --num-epochs 20 \
  --start-epoch 1 \
  --exp-dir pruned_transducer_stateless5/exp \
  --max-duration 300 \
  --num-buckets 50

The tensorboard training log can be found at https://tensorboard.dev/experiment/yBijWJSPSGuBqMwTZ509lA/

The decoding command is:

for method in modified_beam_search; do
    ./pruned_transducer_stateless5/decode.py \
    --epoch 15 \
    --beam-size 20 \
    --avg 5 \
    --exp-dir ./pruned_transducer_stateless5/exp \
    --max-duration 400 \
    --decoding-method modified_beam_search \
    --max-sym-per-frame 1 \
    --num-encoder-layers 12 \
    --dim-feedforward 1024 \
    --nhead 8 \
    --encoder-dim 256 \
    --decoder-dim 256 \
    --joiner-dim 256 \
    --use-averaged-model true
done

IWSLT Tunisian training results (Zipformer)

2023-06-01

You can find a pretrained model, training logs, decoding logs, and decoding results at: https://tensorboard.dev/experiment/yLE399ZPTzePG8B39jRyOw/

Decoding method dev WER test WER comment
modified beam search 47.6 51.2 --epoch 20, --avg 10

To reproduce the above result, use the following commands for training:

Note: the model was trained on V-100 32GB GPU

export CUDA_VISIBLE_DEVICES="0,1"
./zipformer/train.py \
  --world-size 2 \
  --num-epochs 20 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp \
  --causal 0 \
  --num-encoder-layers 2,2,2,2,2,2 \
  --feedforward-dim 512,768,1024,1536,1024,768 \
  --encoder-dim 192,256,384,512,384,256 \
  --encoder-unmasked-dim 192,192,256,256,256,192 \
  --max-duration 800 \
  --prune-range 10
  

The decoding command is:

for method in modified_beam_search; do
  ./zipformer/decode.py \
  --epoch 20 \
  --beam-size 20 \
  --avg 13 \
  --exp-dir ./zipformer/exp\
  --max-duration 800 \
  --decoding-method $method \
 	--num-encoder-layers 2,2,2,2,2,2 \
 	--feedforward-dim 512,768,1024,1536,1024,768 \
 	--encoder-dim 192,256,384,512,384,256 \
 	--encoder-unmasked-dim 192,192,256,256,256,192
  --use-averaged-model true
 done