## Results ### AMI training results (Pruned Transducer) #### 2022-11-20 #### Zipformer (pruned_transducer_stateless7) Zipformer encoder + non-current decoder. The decoder contains only an embedding layer, a Conv1d (with kernel size 2) and a linear layer (to transform tensor dim). All the results below are using a single model that is trained by combining the following data: IHM, IHM+reverb, SDM, and GSS-enhanced MDM. Speed perturbation and MUSAN noise augmentation are applied on top of the pooled data. **WERs for IHM:** | | dev | test | comment | |---------------------------|------------|------------|------------------------------------------| | greedy search | 19.25 | 17.83 | --epoch 14 --avg 8 --max-duration 500 | | modified beam search | 18.92 | 17.40 | --epoch 14 --avg 8 --max-duration 500 --beam-size 4 | | fast beam search | 19.44 | 18.04 | --epoch 14 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8 | **WERs for SDM:** | | dev | test | comment | |---------------------------|------------|------------|------------------------------------------| | greedy search | 31.32 | 32.38 | --epoch 14 --avg 8 --max-duration 500 | | modified beam search | 31.25 | 32.21 | --epoch 14 --avg 8 --max-duration 500 --beam-size 4 | | fast beam search | 31.11 | 32.10 | --epoch 14 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8 | **WERs for GSS-enhanced MDM:** | | dev | test | comment | |---------------------------|------------|------------|------------------------------------------| | greedy search | 22.05 | 22.93 | --epoch 14 --avg 8 --max-duration 500 | | modified beam search | 21.67 | 22.43 | --epoch 14 --avg 8 --max-duration 500 --beam-size 4 | | fast beam search | 22.21 | 22.83 | --epoch 14 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8 | The training command for reproducing is given below: ``` export CUDA_VISIBLE_DEVICES="0,1,2,3" ./pruned_transducer_stateless7/train.py \ --world-size 4 \ --num-epochs 15 \ --exp-dir pruned_transducer_stateless7/exp \ --max-duration 150 \ --max-cuts 150 \ --prune-range 5 \ --lr-factor 5 \ --lm-scale 0.25 \ --use-fp16 True ``` The decoding command is: ``` # greedy search ./pruned_transducer_stateless7/decode.py \ --epoch 14 \ --avg 8 \ --exp-dir ./pruned_transducer_stateless7/exp \ --max-duration 500 \ --decoding-method greedy_search # modified beam search ./pruned_transducer_stateless7/decode.py \ --iter 105000 \ --avg 10 \ --exp-dir ./pruned_transducer_stateless7/exp \ --max-duration 500 \ --decoding-method modified_beam_search \ --beam-size 4 # fast beam search ./pruned_transducer_stateless7/decode.py \ --iter 105000 \ --avg 10 \ --exp-dir ./pruned_transducer_stateless5/exp \ --max-duration 500 \ --decoding-method fast_beam_search \ --beam 4 \ --max-contexts 4 \ --max-states 8 ``` Pretrained model is available at The tensorboard training log can be found at