icefall/RESULTS.md at 948558797683d81401afd2e6b0b55e29f411968c

mirrors/icefall

Fork 0

mirror of https://github.com/k2-fsa/icefall.git synced 2025-09-08 00:24:19 +00:00

marcoyang1998 9485587976 add RESULTS.md, pending model link

2023-09-20 11:45:13 +08:00

4.1 KiB

Raw Blame History

Results

Zipformer PromptASR (zipformer + PromptASR + BERT text encoder)

zipformer_prompt_asr

See https://github.com/k2-fsa/icefall/pull/1250 for commit history and our paper https://arxiv.org/abs/2309.07414 for more details.

Training on the medium subset, with content & style prompt, no context list

You can find a pre-trained model, training logs, decoding logs, and decoding results at: <>

Number of model parameters:

decoding method	lh-test-clean	lh-test-other	comment
modified_beam_search	2.64	5.55	--pre-text-transform mixed-punc --style-text-transform mixed-punc
modified_beam_search	2.82	6.03	--pre-text-transform upper-no-punc --style-text-transform upper-no-punc
modified_beam_search	2.64	5.55	--pre-text-transform mixed-punc --style-text-transform mixed-punc

The training command is:

causal=0
subset=medium
memory_dropout_rate=0.05
text_encoder_type=BERT

python ./zipformer_prompt_asr/train_bert_encoder.py \
    --world-size 4 \
    --start-epoch 1 \
    --num-epochs 60 \
    --exp-dir ./zipformer_prompt_asr/exp \
    --use-fp16 True \
    --memory-dropout-rate $memory_dropout_rate \
    --causal $causal \
    --subset $subset \
    --manifest-dir data/fbank \
    --bpe-model data/lang_bpe_500_fallback_coverage_0.99/bpe.model \
    --max-duration 1000 \
    --text-encoder-type $text_encoder_type \
    --use-context-list 0 \
    --top-k $top_k \
    --use-style-prompt 1

Training on the medium subset, with content & style prompt, with context list

You can find a pre-trained model, training logs, decoding logs, and decoding results at: <>

Number of model parameters:

Utterance-level biasing:

decoding method	lh-test-clean	lh-test-other	comment
modified_beam_search	3.11	6.79	--use-pre-text 0 --use-style-prompt 0
modified_beam_search	2.82	6.03	--pre-text-transform upper-no-punc --style-text-transform upper-no-punc
modified_beam_search	2.64	5.55	--pre-text-transform mixed-punc --style-text-transform mixed-punc

Word-level biasing:

The results are reported on LibriSpeech test-sets using the biasing list provided from https://arxiv.org/abs/2104.02194. You need to set --use-ls-test-set 1 for the following table.

decoding method	ls-test-clean	ls-test-other	comment
modified_beam_search	2.69	5.28	--use-pre-text 0 --use-style-prompt 0
modified_beam_search	2.32	4.77	--use-ls-context-list 1 --pre-text-transform mixed-punc --style-text-transform mixed-punc --ls-distractors 0
modified_beam_search	2.36	4.91	--use-ls-context-list 1 --pre-text-transform mixed-punc --style-text-transform mixed-punc --ls-distractors 100

Note that to train this model, please first run prepare_prompt_asr.sh to prepare a manifest containing context words.