icefall/RESULTS.md at e22bc78f9827ce4059cd4598c19ad08415802c0a

mirrors/icefall

Fork 0

Yuekai Zhang d5be739639

add distill whisper results (#1648 )

2024-06-13 00:20:04 +08:00

7.9 KiB

Raw Blame History

Results

SpeechIO Test Set Decoding Results

Unlocked SpeechIO test sets (ZH00001 ~ ZH00026)

Rank 排名	Model 模型	CER 字错误率	Date 时间
1	ximalaya_api_zh	1.72%	2023.12
2	aliyun_ftasr_api_zh	1.85%	2023.12
3	microsoft_batch_zh	2.40%	2023.12
4	bilibili_api_zh	2.90%	2023.09
5	tencent_api_zh	3.18%	2023.12
6	iflytek_lfasr_api_zh	3.32%	2023.12
7	aispeech_api_zh	3.62%	2023.12
8	whisper-large-ft-v1	4.32%	2024.04
9	whisper-large-ft-v0.5	4.60%	2024.04
10	whisper-large-ft-v1-distill	4.71%	2024.04
11	zipformer (70Mb)	6.17%	2023.10
12	whisper-large-ft-v0	6.34%	2023.03
13	baidu_pro_api_zh	7.29%	2023.12

Note: Above API results are from SPEECHIO. All results used the default normalize method.

For whisper-large-ft-v1-distill, instead of actually using distillation loss for training, the model structure and parameter initialization method from the distill-whisper paper were adopted: only the first and last layers of the decoder were retained.

Detail all models

Model	Training Set	Note
zipformer	multi-hans-zh	decoding with transducer head and blank penalty 2.0
whisper-large-ft-v0	wenetspeech	greedy_search, 3 epochs
whisper-large-ft-v0.5	wenetspeech(updated)	wenetspeech update method, greedy_search, 2 epochs
whisper-large-ft-v1	wenetspeech(updated), other multi-hans-zh exclude datatang 200h	wenetspeech update method, greedy search, 3 epochs
whisper-large-ft-v1-distill	wenetspeech(updated), other multi-hans-zh exclude datatang 200h	wenetspeech update method, greedy search, 6 epochs

Detail all results (字错误率 CER %)

Test Set ID	测试场景&内容领域	bilibili_api_zh (2023.09)	whisper-large-ft-v0	whisper-large-ft-v1	zipformer
Avg (01-26)		2.9	6.34	4.32	6.17
SPEECHIO_ASR_ZH00001	新闻联播	0.54	1.42	1.09	1.37
SPEECHIO_ASR_ZH00002	访谈鲁豫有约	2.78	4.76	3.21	4.67
SPEECHIO_ASR_ZH00003	电视节目天下足球	0.81	2.17	1.70	2.71
SPEECHIO_ASR_ZH00004	场馆演讲罗振宇跨年	1.48	2.53	1.86	2.54
SPEECHIO_ASR_ZH00005	在线教育李永乐科普	1.47	4.27	1.95	3.12
SPEECHIO_ASR_ZH00006	直播王者荣耀张大仙&骚白	5.85	12.55	9.46	12.86
SPEECHIO_ASR_ZH00007	直播带货李佳琪&薇娅	6.19	13.38	10.38	14.58
SPEECHIO_ASR_ZH00008	线下培训老罗语录	3.68	9.56	6.9	9.05
SPEECHIO_ASR_ZH00009	播客故事FM	3.18	5.66	3.78	5.4
SPEECHIO_ASR_ZH00010	播客创业内幕	3.51	7.84	4.36	6.4
SPEECHIO_ASR_ZH00011	在线教育罗翔刑法法考	1.77	3.22	2.40	3.12
SPEECHIO_ASR_ZH00012	在线教育张雪峰考研	2.11	5.98	3.03	4.41
SPEECHIO_ASR_ZH00013	短视频影剪谷阿莫&牛叔说电影	2.97	5.91	3.72	6.56
SPEECHIO_ASR_ZH00014	短视频美式&烹饪	3.56	6.03	4.92	8.14
SPEECHIO_ASR_ZH00015	评书单田芳白眉大侠	4.72	8.77	7.92	9.1
SPEECHIO_ASR_ZH00016	相声德云社专场	3.01	5.24	4.15	5.59
SPEECHIO_ASR_ZH00017	脱口秀吐槽大会	2.93	7.05	3.04	5.17
SPEECHIO_ASR_ZH00018	少儿卡通小猪佩奇&熊出没	1.98	3.53	3.27	4.15
SPEECHIO_ASR_ZH00019	体育赛事解说 NBA比赛	2.32	6.89	4.39	6.66
SPEECHIO_ASR_ZH00020	纪录片篮球人物	1.51	4.16	3.04	4.2
SPEECHIO_ASR_ZH00021	短视频汽车之家汽车评测	1.75	4.77	2.69	4.17
SPEECHIO_ASR_ZH00022	短视频小艾大叔豪宅带看	3.29	6.35	5.44	6.72
SPEECHIO_ASR_ZH00023	短视频开箱视频 Zeal&无聊开箱	2.18	8.99	4.08	7.94
SPEECHIO_ASR_ZH00024	短视频付老师农业种植	4.80	10.81	6.06	8.64
SPEECHIO_ASR_ZH00025	线下课堂石国鹏古希腊哲学	3.32	8.41	5.39	8.54
SPEECHIO_ASR_ZH00026	广播电台节目张震鬼故事	3.70	4.52	4.06	4.67

Command for decoding using fine-tuned whisper:

git lfs install
git clone https://huggingface.co/yuekai/icefall_asr_multi-hans-zh_whisper
ln -s icefall_asr_multi-hans-zh_whisper/v1.1/epoch-3-avg-10.pt whisper/exp_large_v2/epoch-999.pt

python3 ./whisper/decode.py \
  --exp-dir whisper/exp_large_v2 \
  --model-name large-v2 \
  --epoch 999 --avg 1 \
  --start-index 0 --end-index 26 \
  --remove-whisper-encoder-input-length-restriction True \
  --manifest-dir data/fbank \
  --beam-size 1 --max-duration 50

Command for decoding using pretrained zipformer:

git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24
cd icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24
git lfs pull --include "exp/pretrained.pt"
git lfs pull --include "data/lang_bpe_2000/*"
ln -s ../icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24/exp/pretrained.pt zipformer/exp_pretrain/epoch-999.pt
ln -s ../icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24/data/lang_bpe_2000/ ./data
wget https://huggingface.co/pkufool/icefall-asr-zipformer-wenetspeech-20230615/resolve/main/data/lang_char/words.txt
mv words.txt ./data/lang_bpe_2000/

./zipformer/decode.py \
    --epoch 999 \
    --avg 1 \
    --blank-penalty 2.0 \
    --use-averaged-model false \
    --exp-dir ./zipformer/exp_pretrain \
    --max-duration 600 \
    --start-index 0 --end-index 26 \
    --manifest-dir data/fbank_kaldi \
    --decoding-method greedy_search

SpeechIO fbank features, decoding scripts, logs, and decoding results are available at part1 and part2.

7.9 KiB Raw Blame History

Results

SpeechIO Test Set Decoding Results

Unlocked SpeechIO test sets (ZH00001 ~ ZH00026)

7.9 KiB

Raw Blame History