icefall/RESULTS.md at 64c53640857d0b9c3fd63070c2f741d374051ce9

mirrors/icefall

Fork 0

mirror of https://github.com/k2-fsa/icefall.git synced 2025-08-09 18:12:19 +00:00

Machiko Bailey da597ad782

Update RESULTS.md (#1873 )

2025-02-04 09:04:25 +08:00

2.1 KiB

Raw Blame History

Results

Zipformer

Non-streaming

The training command is:

./zipformer/train.py \
  --bilingual 1 \
  --world-size 4 \
  --num-epochs 30 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp \
  --max-duration 600

The decoding command is:

./zipformer/decode.py \
    --epoch 28 \
    --avg 15 \
    --exp-dir ./zipformer/exp \
    --max-duration 600 \
    --decoding-method greedy_search

To export the model with onnx:

./zipformer/export-onnx.py   --tokens data/lang_bbpe_2000/tokens.txt   --use-averaged-model 0   --epoch 35   --avg 1   --exp-dir zipformer/exp   --num-encoder-layers "2,2,3,4,3,2"   --downsampling-factor "1,2,4,8,4,2"   --feedforward-dim "512,768,1024,1536,1024,768"   --num-heads "4,4,4,8,4,4"   --encoder-dim "192,256,384,512,384,256"   --query-head-dim 32   --value-head-dim 12   --pos-head-dim 4   --pos-dim 48   --encoder-unmasked-dim "192,192,256,256,256,192"   --cnn-module-kernel "31,31,15,15,15,31"   --decoder-dim 512   --joiner-dim 512   --causal False   --chunk-size "16,32,64,-1"   --left-context-frames "64,128,256,-1"   --fp16 True

Word Error Rates (WERs) listed below:

Datasets	ReazonSpeech	ReazonSpeech	LibriSpeech	LibriSpeech
Zipformer WER (%)	dev	test	test-clean	test-other
greedy_search	5.9	4.07	3.46	8.35
modified_beam_search	4.87	3.61	3.28	8.07

Character Error Rates (CERs) for Japanese listed below:

Decoding Method	In-Distribution CER	JSUT	CommonVoice	TEDx
greedy search	12.56	6.93	9.75	9.67
modified beam search	11.59	6.97	9.55	9.51

Pre-trained model can be found here: https://huggingface.co/reazon-research/reazonspeech-k2-v2-ja-en/tree/main

2.1 KiB Raw Blame History

Results

Zipformer

Non-streaming

2.1 KiB

Raw Blame History