minor updates

2025-09-04 06:34:20 +00:00 · 2023-11-22 17:11:19 +08:00 · 2023-11-22 17:11:19 +08:00 · 428579e3ac
commit 428579e3ac
parent fe35141e7e
3 changed files with 63 additions and 1 deletions
--- a/egs/multi_zh_en/ASR/README.md
+++ b/egs/multi_zh_en/ASR/README.md
@ -0,0 +1,19 @@
+# Introduction
+
+This recipe includes scripts for training Zipformer model using both English and Chinese datasets.
+
+# Included Training Sets
+
+1. LibriSpeech (English)
+2. AiShell-2 (Chinese)
+3. TAL-CSASR (Code-Switching, Chinese and English)
+   
+|Datset| Number of hours| URL|
+|---|---:|---|
+|**TOTAL**|2,547|---|
+|LibriSpeech|960|https://www.openslr.org/12/|
+|AiShell-2|1,000|http://www.aishelltech.com/aishell_2|
+|TAL-CSASR|587|https://ai.100tal.com/openData/voice|
+
+
+
--- a/egs/multi_zh_en/ASR/RESULTS.md
+++ b/egs/multi_zh_en/ASR/RESULTS.md
@ -0,0 +1,44 @@
+## Results
+
+### Zh-En datasets bpe-based training results (Non-streaming) on Zipformer model
+
+This is the [pull request #1238](https://github.com/k2-fsa/icefall/pull/1265) in icefall.
+
+#### Non-streaming (Byte-Level BPE vocab_size=2000)
+
+Best results (num of params : ~69M):
+
+The training command:
+
+```
+./zipformer/train.py \
+  --world-size 4 \
+  --num-epochs 35 \
+  --use-fp16 1 \
+  --max-duration 1000 \
+  --num-workers 8
+```
+
+The decoding command:
+
+```
+for method in greedy_search modified_beam_search fast_beam_search; do
+    ./zipformer/decode.py \
+    --epoch 34 \
+    --avg 19 \
+    --decoding_method $method
+done
+```
+
+Word Error Rates (WERs) listed below are produced by the checkpoint of the 20th epoch using greedy search and BPE model (# tokens is 2000).
+
+|       Datasets       | TAL-CSASR | TAL-CSASR | 
+|----------------------|-----------|-----------|
+|   Zipformer WER (%)  |    dev    |   test    | 
+|     greedy_search    |   6.65    |   6.69    |
+| modified_beam_search |   6.46    |   6.51    |
+|   fast_beam_search   |   6.57    |   6.68    |
+
+Pre-trained model can be found here : https://huggingface.co/zrjin/icefall-asr-zipformer-multi-zh-en-2023-11-22, which is trained on LibriSpeech 960-hour training set (with speed perturbation), TAL-CSASR training set (with speed perturbation) and AiShell-2 (w/o speed perturbation).
+
+
--- a/egs/multi_zh_en/ASR/prepare.sh
+++ b/egs/multi_zh_en/ASR/prepare.sh
@ -13,7 +13,6 @@ dl_dir=$PWD/download
 . shared/parse_options.sh || exit 1

 vocab_sizes=(
-  500
  2000
 )