Desh Raj c4aaf3ea3b
Add AliMeeting multi-condition training recipe (#751)
* add AliMeeting multi-domain recipe

* convert scripts to symbolic links
2022-12-10 18:15:23 +08:00

3.4 KiB

Results (CER)

2022-12-09

Zipformer (pruned_transducer_stateless7)

Zipformer encoder + non-current decoder. The decoder contains only an embedding layer, a Conv1d (with kernel size 2) and a linear layer (to transform tensor dim).

All the results below are using a single model that is trained by combining the following data: IHM, IHM+reverb, SDM, and GSS-enhanced MDM. Speed perturbation and MUSAN noise augmentation are applied on top of the pooled data.

WERs for IHM:

eval test comment
greedy search 10.13 12.21 --epoch 15 --avg 8 --max-duration 500
modified beam search 9.58 11.53 --epoch 15 --avg 8 --max-duration 500 --beam-size 4
fast beam search 9.92 12.07 --epoch 15 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8

WERs for SDM:

eval test comment
greedy search 23.70 26.41 --epoch 15 --avg 8 --max-duration 500
modified beam search 23.37 25.85 --epoch 15 --avg 8 --max-duration 500 --beam-size 4
fast beam search 23.60 26.38 --epoch 15 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8

WERs for GSS-enhanced MDM:

eval test comment
greedy search 12.24 14.99 --epoch 15 --avg 8 --max-duration 500
modified beam search 11.82 14.22 --epoch 15 --avg 8 --max-duration 500 --beam-size 4
fast beam search 12.30 14.98 --epoch 15 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8

The training command for reproducing is given below:

export CUDA_VISIBLE_DEVICES="0,1,2,3"

./pruned_transducer_stateless7/train.py \
  --world-size 4 \
  --num-epochs 15 \
  --exp-dir pruned_transducer_stateless7/exp \
  --max-duration 300 \
  --max-cuts 100 \
  --prune-range 5 \
  --lr-factor 5 \
  --lm-scale 0.25 \
  --use-fp16 True

The decoding command is:

# greedy search
./pruned_transducer_stateless7/decode.py \
        --epoch 15 \
        --avg 8 \
        --exp-dir ./pruned_transducer_stateless7/exp \
        --max-duration 500 \
        --decoding-method greedy_search

# modified beam search
./pruned_transducer_stateless7/decode.py \
        --epoch 15 \
        --avg 8 \
        --exp-dir ./pruned_transducer_stateless7/exp \
        --max-duration 500 \
        --decoding-method modified_beam_search \
        --beam-size 4

# fast beam search
./pruned_transducer_stateless7/decode.py \
        --epoch 15 \
        --avg 8 \
        --exp-dir ./pruned_transducer_stateless5/exp \
        --max-duration 500 \
        --decoding-method fast_beam_search \
        --beam 4 \
        --max-contexts 4 \
        --max-states 8

Pretrained model is available at https://huggingface.co/desh2608/icefall-asr-alimeeting-pruned-transducer-stateless7

The tensorboard training log can be found at https://tensorboard.dev/experiment/EzmVahMMTb2YfKWXwQ2dyQ/#scalars