mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 18:12:19 +00:00
* add whisper fbank for wenetspeech * add whisper fbank for other dataset * add str to bool * add decode for wenetspeech * add requirments.txt * add original model decode with 30s * test feature extractor speed * add aishell2 feat * change compute feature batch * fix overwrite * fix executor * regression * add kaldifeatwhisper fbank * fix io issue * parallel jobs * use multi machines * add wenetspeech fine-tune scripts * add monkey patch codes * remove useless file * fix subsampling factor * fix too long audios * add remove long short * fix whisper version to support multi batch beam * decode all wav files * remove utterance more than 30s in test_net * only test net * using soft links * add kespeech whisper feats * fix index error * add manifests for whisper * change to licomchunky writer * add missing option * decrease cpu usage * add speed perturb for kespeech * fix kespeech speed perturb * add dataset * load checkpoint from specific path * add speechio * add speechio results --------- Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
40 lines
983 B
Markdown
40 lines
983 B
Markdown
|
|
# Introduction
|
|
|
|
This recipe includes scripts for training Zipformer model using multiple Chinese datasets.
|
|
|
|
# Included Training Sets
|
|
1. THCHS-30
|
|
2. AiShell-{1,2,4}
|
|
3. ST-CMDS
|
|
4. Primewords
|
|
5. MagicData
|
|
6. Aidatatang_200zh
|
|
7. AliMeeting
|
|
8. WeNetSpeech
|
|
9. KeSpeech-ASR
|
|
|
|
|Datset| Number of hours| URL|
|
|
|---|---:|---|
|
|
|**TOTAL**|14,106|---|
|
|
|THCHS-30|35|https://www.openslr.org/18/|
|
|
|AiShell-1|170|https://www.openslr.org/33/|
|
|
|AiShell-2|1,000|http://www.aishelltech.com/aishell_2|
|
|
|AiShell-4|120|https://www.openslr.org/111/|
|
|
|ST-CMDS|110|https://www.openslr.org/38/|
|
|
|Primewords|99|https://www.openslr.org/47/|
|
|
|aidatatang_200zh|200|https://www.openslr.org/62/|
|
|
|MagicData|755|https://www.openslr.org/68/|
|
|
|AliMeeting|100|https://openslr.org/119/|
|
|
|WeNetSpeech|10,000|https://github.com/wenet-e2e/WenetSpeech|
|
|
|KeSpeech|1,542|https://github.com/KeSpeech/KeSpeech|
|
|
|
|
|
|
# Included Test Sets
|
|
1. Aishell-{1,2,4}
|
|
2. Aidatatang_200zh
|
|
3. AliMeeting
|
|
4. MagicData
|
|
5. KeSpeech-ASR
|
|
6. WeNetSpeech
|