icefall/egs/ami/ASR/RESULTS.md
Desh Raj db75627e92
[recipe] AMI Zipformer transducer (#698)
* remove unnecessary changes

* add AMI prepare scripts

* add zipformer scripts for AMI

* added logs and pretrained model

* minor fix

* remove unwanted changes

* fix missing link

* make suggested changes

* update results
2022-11-26 10:00:45 +08:00

3.4 KiB

Results

AMI training results (Pruned Transducer)

2022-11-20

Zipformer (pruned_transducer_stateless7)

Zipformer encoder + non-current decoder. The decoder contains only an embedding layer, a Conv1d (with kernel size 2) and a linear layer (to transform tensor dim).

All the results below are using a single model that is trained by combining the following data: IHM, IHM+reverb, SDM, and GSS-enhanced MDM. Speed perturbation and MUSAN noise augmentation are applied on top of the pooled data.

WERs for IHM:

dev test comment
greedy search 19.25 17.83 --epoch 14 --avg 8 --max-duration 500
modified beam search 18.92 17.40 --epoch 14 --avg 8 --max-duration 500 --beam-size 4
fast beam search 19.44 18.04 --epoch 14 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8

WERs for SDM:

dev test comment
greedy search 31.32 32.38 --epoch 14 --avg 8 --max-duration 500
modified beam search 31.25 32.21 --epoch 14 --avg 8 --max-duration 500 --beam-size 4
fast beam search 31.11 32.10 --epoch 14 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8

WERs for GSS-enhanced MDM:

dev test comment
greedy search 22.05 22.93 --epoch 14 --avg 8 --max-duration 500
modified beam search 21.67 22.43 --epoch 14 --avg 8 --max-duration 500 --beam-size 4
fast beam search 22.21 22.83 --epoch 14 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8

The training command for reproducing is given below:

export CUDA_VISIBLE_DEVICES="0,1,2,3"

./pruned_transducer_stateless7/train.py \
  --world-size 4 \
  --num-epochs 15 \
  --exp-dir pruned_transducer_stateless7/exp \
  --max-duration 150 \
  --max-cuts 150 \
  --prune-range 5 \
  --lr-factor 5 \
  --lm-scale 0.25 \
  --use-fp16 True

The decoding command is:

# greedy search
./pruned_transducer_stateless7/decode.py \
        --epoch 14 \
        --avg 8 \
        --exp-dir ./pruned_transducer_stateless7/exp \
        --max-duration 500 \
        --decoding-method greedy_search

# modified beam search
./pruned_transducer_stateless7/decode.py \
        --iter 105000 \
        --avg 10 \
        --exp-dir ./pruned_transducer_stateless7/exp \
        --max-duration 500 \
        --decoding-method modified_beam_search \
        --beam-size 4

# fast beam search
./pruned_transducer_stateless7/decode.py \
        --iter 105000 \
        --avg 10 \
        --exp-dir ./pruned_transducer_stateless5/exp \
        --max-duration 500 \
        --decoding-method fast_beam_search \
        --beam 4 \
        --max-contexts 4 \
        --max-states 8

Pretrained model is available at https://huggingface.co/desh2608/icefall-asr-ami-pruned-transducer-stateless7

The tensorboard training log can be found at https://tensorboard.dev/experiment/VH10QOTBTbuYpWx994Onrg/#scalars