From 4237127be2d81b30860dbf67a1563c373a1c5a46 Mon Sep 17 00:00:00 2001 From: jinzr Date: Wed, 20 Mar 2024 12:39:21 +0800 Subject: [PATCH] added results on `zh-HK` --- egs/commonvoice/ASR/RESULTS.md | 83 ++++++++++++++++++++++++++++++---- 1 file changed, 74 insertions(+), 9 deletions(-) diff --git a/egs/commonvoice/ASR/RESULTS.md b/egs/commonvoice/ASR/RESULTS.md index 2c158d91d..411deb667 100644 --- a/egs/commonvoice/ASR/RESULTS.md +++ b/egs/commonvoice/ASR/RESULTS.md @@ -1,4 +1,73 @@ ## Results + +### Commonvoice Cantonese (zh-HK) Char training results (Zipformer) + +See #1542 for more details. + +Number of model parameters: 72526519, i.e., 72.53 M + +The best CER, for CommonVoice 16.1 (cv-corpus-16.1-2023-12-06/zh-HK) is below: + +| | Dev | Test | Note | +|----------------------|-------|------|--------------------| +| greedy_search | 1.17 | 1.22 | --epoch 24 --avg 5 | +| modified_beam_search | 0.98 | 1.11 | --epoch 24 --avg 5 | +| fast_beam_search | 1.08 | 1.27 | --epoch 24 --avg 5 | + +When doing the cross-corpus validation on MDCC (w/o blank penalty), +the best CER is below: + +| | Dev | Test | Note | +|----------------------|-------|------|--------------------| +| greedy_search | 42.40 | 42.03| --epoch 24 --avg 5 | +| modified_beam_search | 39.73 | 39.19| --epoch 24 --avg 5 | +| fast_beam_search | 42.14 | 41.98| --epoch 24 --avg 5 | + +When doing the cross-corpus validation on MDCC (with blank penalty set to 2.2), +the best CER is below: + +| | Dev | Test | Note | +|----------------------|-------|------|----------------------------------------| +| greedy_search | 39.19 | 39.09| --epoch 24 --avg 5 --blank-penalty 2.2 | +| modified_beam_search | 37.73 | 37.65| --epoch 24 --avg 5 --blank-penalty 2.2 | +| fast_beam_search | 37.73 | 37.74| --epoch 24 --avg 5 --blank-penalty 2.2 | + +To reproduce the above result, use the following commands for training: + +```bash +export CUDA_VISIBLE_DEVICES="0,1" +./zipformer/train_char.py \ + --world-size 2 \ + --num-epochs 30 \ + --start-epoch 1 \ + --use-fp16 1 \ + --exp-dir zipformer/exp \ + --cv-manifest-dir data/zh-HK/fbank \ + --language zh-HK \ + --use-validated-set 1 \ + --context-size 1 \ + --max-duration 1000 +``` + +and the following commands for decoding: + +```bash +for method in greedy_search modified_beam_search fast_beam_search; do + ./zipformer/decode_char.py \ + --epoch 24 \ + --avg 5 \ + --decoding-method $method \ + --exp-dir zipformer/exp \ + --cv-manifest-dir data/zh-HK/fbank \ + --context-size 1 \ + --language zh-HK +done +``` + +Detailed experimental results and pre-trained model are available at: + + + ### GigaSpeech BPE training results (Pruned Stateless Transducer 7) #### [pruned_transducer_stateless7](./pruned_transducer_stateless7) @@ -13,8 +82,8 @@ Results are: | | Dev | Test | |----------------------|-------|-------| -| greedy search | 9.96 | 12.54 | -| modified beam search | 9.86 | 12.48 | +| greedy_search | 9.96 | 12.54 | +| modified_beam_search | 9.86 | 12.48 | To reproduce the above result, use the following commands for training: @@ -55,10 +124,6 @@ and the following commands for decoding: Pretrained model is available at -The tensorboard log for training is available at - - - ### Commonvoice (fr) BPE training results (Pruned Stateless Transducer 7_streaming) #### [pruned_transducer_stateless7_streaming](./pruned_transducer_stateless7_streaming) @@ -73,9 +138,9 @@ Results are: | decoding method | Test | |----------------------|-------| -| greedy search | 9.95 | -| modified beam search | 9.57 | -| fast beam search | 9.67 | +| greedy_search | 9.95 | +| modified_beam_search | 9.57 | +| fast_beam_search | 9.67 | Note: This best result is trained on the full librispeech and gigaspeech, and then fine-tuned on the full commonvoice.