icefall/egs/commonvoice/ASR/RESULTS.md
Yifan Yang 8838fe0bd2
Zipformer for Common Voice (#997)
* Add soft links in pruned_transducer_stateless7 for CommonVoice

* Add python files

* Update prepare.sh

* Update normalization

* Fix for soft links

* Add some docs

* Add export

* Update egs/commonvoice/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Add export for onnx

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-04-17 17:47:25 +08:00

1.6 KiB

Results

GigaSpeech BPE training results (Pruned Stateless Transducer 7)

pruned_transducer_stateless7

See #997 for more details.

Number of model parameters: 70369391, i.e., 70.37 M

The best WER, as of 2023-04-17, for Common Voice English 13.0 (cv-corpus-13.0-2023-03-09/en) is below:

Results are:

Dev Test
greedy search 9.96 12.54
modified beam search 9.86 12.48

To reproduce the above result, use the following commands for training:

export CUDA_VISIBLE_DEVICES="0,1,2,3"
./pruned_transducer_stateless7/train.py \
  --world-size 4 \
  --num-epochs 30 \
  --start-epoch 1 \ 
  --use-fp16 1 \
  --exp-dir pruned_transducer_stateless7/exp \
  --max-duration 550

and the following commands for decoding:

# greedy search
./pruned_transducer_stateless7/decode.py \
  --epoch 30 \
  --avg 5 \
  --decoding-method greedy_search \
  --exp-dir pruned_transducer_stateless7/exp \
  --bpe-model data/en/lang_bpe_500/bpe.model \
  --max-duration 600

# modified beam search
./pruned_transducer_stateless7/decode.py \
  --epoch 30 \
  --avg 5 \
  --decoding-method modified_beam_search \
  --beam-size 4 \
  --exp-dir pruned_transducer_stateless7/exp \
  --bpe-model data/en/lang_bpe_500/bpe.model \
  --max-duration 600

Pretrained model is available at https://huggingface.co/yfyeung/icefall-asr-cv-corpus-13.0-2023-03-09-en-pruned-transducer-stateless7-2023-04-17

The tensorboard log for training is available at https://tensorboard.dev/experiment/j4pJQty6RMOkMJtRySREKw/