Introduction

This recipe includes scripts for training Zipformer model using multiple Chinese datasets.

Included Training Sets

  1. THCHS-30
  2. AiShell-{1,2,4}
  3. ST-CMDS
  4. Primewords
  5. MagicData
  6. Aidatatang_200zh
  7. AliMeeting
  8. WeNetSpeech
  9. KeSpeech-ASR
Datset Number of hours URL
TOTAL 14,106 ---
THCHS-30 35 https://www.openslr.org/18/
AiShell-1 170 https://www.openslr.org/33/
AiShell-2 1,000 http://www.aishelltech.com/aishell_2
AiShell-4 120 https://www.openslr.org/111/
ST-CMDS 110 https://www.openslr.org/38/
Primewords 99 https://www.openslr.org/47/
aidatatang_200zh 200 https://www.openslr.org/62/
MagicData 755 https://www.openslr.org/68/
AliMeeting 100 https://openslr.org/119/
WeNetSpeech 10,000 https://github.com/wenet-e2e/WenetSpeech
KeSpeech 1,542 https://github.com/KeSpeech/KeSpeech

Included Test Sets

  1. Aishell-{1,2,4}
  2. Aidatatang_200zh
  3. AliMeeting
  4. MagicData
  5. KeSpeech-ASR
  6. WeNetSpeech