This repository has been archived on 2026-03-23. You can view files and clone it, but cannot push or open issues or pull requests.

Introduction

This recipe includes scripts for training Zipformer model using multiple Chinese datasets.

Included Training Sets

  1. THCHS-30
  2. AiShell-{1,2,4}
  3. ST-CMDS
  4. Primewords
  5. MagicData
  6. Aidatatang_200zh
  7. AliMeeting
  8. WeNetSpeech
  9. KeSpeech-ASR
Datset Number of hours URL
TOTAL 14,106 ---
THCHS-30 35 https://www.openslr.org/18/
AiShell-1 170 https://www.openslr.org/33/
AiShell-2 1,000 http://www.aishelltech.com/aishell_2
AiShell-4 120 https://www.openslr.org/111/
ST-CMDS 110 https://www.openslr.org/38/
Primewords 99 https://www.openslr.org/47/
aidatatang_200zh 200 https://www.openslr.org/62/
MagicData 755 https://www.openslr.org/68/
AliMeeting 100 https://openslr.org/119/
WeNetSpeech 10,000 https://github.com/wenet-e2e/WenetSpeech
KeSpeech 1,542 https://github.com/KeSpeech/KeSpeech

Included Test Sets

  1. Aishell-{1,2,4}
  2. Aidatatang_200zh
  3. AliMeeting
  4. MagicData
  5. KeSpeech-ASR
  6. WeNetSpeech