## Results (CER) #### 2022-12-09 #### Zipformer (pruned_transducer_stateless7) Zipformer encoder + non-current decoder. The decoder contains only an embedding layer, a Conv1d (with kernel size 2) and a linear layer (to transform tensor dim). All the results below are using a single model that is trained by combining the following data: IHM, IHM+reverb, SDM, and GSS-enhanced MDM. Speed perturbation and MUSAN noise augmentation are applied on top of the pooled data. **WERs for IHM:** | | eval | test | comment | |---------------------------|------------|------------|------------------------------------------| | greedy search | 10.13 | 12.21 | --epoch 15 --avg 8 --max-duration 500 | | modified beam search | 9.58 | 11.53 | --epoch 15 --avg 8 --max-duration 500 --beam-size 4 | | fast beam search | 9.92 | 12.07 | --epoch 15 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8 | **WERs for SDM:** | | eval | test | comment | |---------------------------|------------|------------|------------------------------------------| | greedy search | 23.70 | 26.41 | --epoch 15 --avg 8 --max-duration 500 | | modified beam search | 23.37 | 25.85 | --epoch 15 --avg 8 --max-duration 500 --beam-size 4 | | fast beam search | 23.60 | 26.38 | --epoch 15 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8 | **WERs for GSS-enhanced MDM:** | | eval | test | comment | |---------------------------|------------|------------|------------------------------------------| | greedy search | 12.24 | 14.99 | --epoch 15 --avg 8 --max-duration 500 | | modified beam search | 11.82 | 14.22 | --epoch 15 --avg 8 --max-duration 500 --beam-size 4 | | fast beam search | 12.30 | 14.98 | --epoch 15 --avg 8 --max-duration 500 --beam-size 4 --max-contexts 4 --max-states 8 | The training command for reproducing is given below: ``` export CUDA_VISIBLE_DEVICES="0,1,2,3" ./pruned_transducer_stateless7/train.py \ --world-size 4 \ --num-epochs 15 \ --exp-dir pruned_transducer_stateless7/exp \ --max-duration 300 \ --max-cuts 100 \ --prune-range 5 \ --lr-factor 5 \ --lm-scale 0.25 \ --use-fp16 True ``` The decoding command is: ``` # greedy search ./pruned_transducer_stateless7/decode.py \ --epoch 15 \ --avg 8 \ --exp-dir ./pruned_transducer_stateless7/exp \ --max-duration 500 \ --decoding-method greedy_search # modified beam search ./pruned_transducer_stateless7/decode.py \ --epoch 15 \ --avg 8 \ --exp-dir ./pruned_transducer_stateless7/exp \ --max-duration 500 \ --decoding-method modified_beam_search \ --beam-size 4 # fast beam search ./pruned_transducer_stateless7/decode.py \ --epoch 15 \ --avg 8 \ --exp-dir ./pruned_transducer_stateless5/exp \ --max-duration 500 \ --decoding-method fast_beam_search \ --beam 4 \ --max-contexts 4 \ --max-states 8 ``` Pretrained model is available at The tensorboard training log can be found at