From 44b9730e826cb5d479672efb599c367621b3f1d6 Mon Sep 17 00:00:00 2001 From: marcoyang Date: Fri, 5 Jan 2024 17:39:16 +0800 Subject: [PATCH] update RESULTS.md --- egs/spgispeech/ASR/RESULTS.md | 66 +++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/egs/spgispeech/ASR/RESULTS.md b/egs/spgispeech/ASR/RESULTS.md index de9e35c5a..30a354e30 100644 --- a/egs/spgispeech/ASR/RESULTS.md +++ b/egs/spgispeech/ASR/RESULTS.md @@ -1,5 +1,71 @@ ## Results +### SPGISpeech BPE training results (Zipformer Transducer) + +#### 2024-01-05 + +#### Zipformer encoder + embedding decoder + +Transducer: Zipformer encoder + stateless decoder. + +The WERs are: + +| | dev | val | comment | +|---------------------------|------------|------------|------------------------------------------| +| greedy search | 2.08 | 2.14 | --epoch 30 --avg 10 | +| modified beam search | 2.05 | 2.09 | --epoch 30 --avg 10 --beam-size 4 | +n| fast beam search | 2.07 | 2.17 | --epoch 30 --avg 10 --beam 20 --max-contexts 8 --max-states 64 | + +**NOTE:** SPGISpeech transcripts can be prepared in `ortho` or `norm` ways, which refer to whether the +transcripts are orthographic or normalized. These WERs correspond to the normalized transcription +scenario. + +The training command for reproducing is given below: + +``` +export CUDA_VISIBLE_DEVICES="0,1,2,3" + +python zipformer/train.py \ + --world-size 4 \ + --num-epochs 30 \ + --start-epoch 1 \ + --use-fp16 1 \ + --exp-dir zipformer/exp \ + --num-workers 2 \ + --max-duration 1000 +``` + +The decoding command is: +``` +# greedy search +python ./zipformer/decode.py \ + --epoch $epoch \ + --avg $avg \ + --exp-dir ./zipformer/exp \ + --max-duration 1000 \ + --decoding-method modified_beam_search + --decoding-method greedy_search + +# modified beam search +python ./zipformer/decode.py \ + --epoch $epoch \ + --avg $avg \ + --exp-dir ./zipformer/exp \ + --max-duration 1000 \ + --decoding-method modified_beam_search + +# fast beam search +python ./zipformer/decode.py \ + --epoch $epoch \ + --avg $avg \ + --exp-dir ./zipformer/exp \ + --max-duration 1000 \ + --decoding-method fast_beam_search + --beam 4 \ + --max-contexts 4 \ + --max-states 8 +``` + ### SPGISpeech BPE training results (Pruned Transducer) #### 2022-05-11