added results on zh-HK

2025-08-26 10:16:14 +00:00 · 2024-03-20 12:39:21 +08:00 · 2024-03-20 12:39:21 +08:00 · 4237127be2
commit 4237127be2
parent 6993183dd7
1 changed files with 74 additions and 9 deletions
--- a/egs/commonvoice/ASR/RESULTS.md
+++ b/egs/commonvoice/ASR/RESULTS.md
@ -1,4 +1,73 @@
 ## Results
+
+### Commonvoice Cantonese (zh-HK) Char training results (Zipformer)
+
+See #1542 for more details.
+
+Number of model parameters:  72526519, i.e., 72.53 M
+
+The best CER, for CommonVoice 16.1 (cv-corpus-16.1-2023-12-06/zh-HK) is below:
+
+|                      |  Dev  | Test |        Note        |
+|----------------------|-------|------|--------------------|
+|    greedy_search     | 1.17  | 1.22 | --epoch 24 --avg 5 |
+| modified_beam_search | 0.98  | 1.11 | --epoch 24 --avg 5 |
+|   fast_beam_search   | 1.08  | 1.27 | --epoch 24 --avg 5 |
+
+When doing the cross-corpus validation on MDCC (w/o blank penalty),
+the best CER is below:
+
+|                      |  Dev  | Test |        Note        |
+|----------------------|-------|------|--------------------|
+|    greedy_search     | 42.40 | 42.03| --epoch 24 --avg 5 |
+| modified_beam_search | 39.73 | 39.19| --epoch 24 --avg 5 |
+|   fast_beam_search   | 42.14 | 41.98| --epoch 24 --avg 5 |
+
+When doing the cross-corpus validation on MDCC (with blank penalty set to 2.2),
+the best CER is below:
+
+|                      |  Dev  | Test |                  Note                  |
+|----------------------|-------|------|----------------------------------------|
+|    greedy_search     | 39.19 | 39.09| --epoch 24 --avg 5 --blank-penalty 2.2 |
+| modified_beam_search | 37.73 | 37.65| --epoch 24 --avg 5 --blank-penalty 2.2 |
+|   fast_beam_search   | 37.73 | 37.74| --epoch 24 --avg 5 --blank-penalty 2.2 |
+
+To reproduce the above result, use the following commands for training:
+
+```bash
+export CUDA_VISIBLE_DEVICES="0,1"
+./zipformer/train_char.py \
+  --world-size 2 \
+  --num-epochs 30 \
+  --start-epoch 1 \ 
+  --use-fp16 1 \
+  --exp-dir zipformer/exp \
+  --cv-manifest-dir data/zh-HK/fbank \
+  --language zh-HK \
+  --use-validated-set 1 \
+  --context-size 1 \
+  --max-duration 1000
+```
+
+and the following commands for decoding:
+
+```bash
+for method in greedy_search modified_beam_search fast_beam_search; do
+  ./zipformer/decode_char.py \
+    --epoch 24 \
+    --avg 5 \
+    --decoding-method $method \
+    --exp-dir zipformer/exp \
+    --cv-manifest-dir data/zh-HK/fbank \
+    --context-size 1 \
+    --language zh-HK 
+done
+```
+
+Detailed experimental results and pre-trained model are available at:
+<https://huggingface.co/zrjin/icefall-asr-commonvoice-zh-HK-zipformer-2024-03-20>
+
+
 ### GigaSpeech BPE training results (Pruned Stateless Transducer 7)

 #### [pruned_transducer_stateless7](./pruned_transducer_stateless7)
@ -13,8 +82,8 @@ Results are:

 |                      |  Dev  | Test  |
 |----------------------|-------|-------|
-|    greedy search     | 9.96  | 12.54 |
-| modified beam search | 9.86  | 12.48 |
+|    greedy_search     | 9.96  | 12.54 |
+| modified_beam_search | 9.86  | 12.48 |

 To reproduce the above result, use the following commands for training:

@ -55,10 +124,6 @@ and the following commands for decoding:
 Pretrained model is available at
 <https://huggingface.co/yfyeung/icefall-asr-cv-corpus-13.0-2023-03-09-en-pruned-transducer-stateless7-2023-04-17>

-The tensorboard log for training is available at
-<https://tensorboard.dev/experiment/j4pJQty6RMOkMJtRySREKw/>
-
-
 ### Commonvoice (fr) BPE training results (Pruned Stateless Transducer 7_streaming)

 #### [pruned_transducer_stateless7_streaming](./pruned_transducer_stateless7_streaming)
@ -73,9 +138,9 @@ Results are:

 |    decoding method   | Test  |
 |----------------------|-------|
-|    greedy search     | 9.95  | 
-| modified beam search | 9.57  |
-|   fast beam search   | 9.67  |
+|    greedy_search     | 9.95  | 
+| modified_beam_search | 9.57  |
+|   fast_beam_search   | 9.67  |

 Note: This best result is trained on the full librispeech and gigaspeech, and then fine-tuned on the full commonvoice.