From 22f011e5ab13ce881121055c35e655ed5aa98c0d Mon Sep 17 00:00:00 2001 From: Guanbo Wang Date: Mon, 11 Apr 2022 21:46:40 +0000 Subject: [PATCH] Update results --- egs/gigaspeech/ASR/README.md | 2 +- egs/gigaspeech/ASR/RESULTS.md | 41 +++++++++++++++++++++++++++++------ 2 files changed, 35 insertions(+), 8 deletions(-) diff --git a/egs/gigaspeech/ASR/README.md b/egs/gigaspeech/ASR/README.md index 5ccbcd9ac..7796ef2a0 100644 --- a/egs/gigaspeech/ASR/README.md +++ b/egs/gigaspeech/ASR/README.md @@ -15,6 +15,6 @@ ln -sfv /path/to/GigaSpeech download/GigaSpeech ## Performance Record | | Dev | Test | |-----|-------|-------| -| WER | 11.93 | 11.86 | +| WER | 10.47 | 10.58 | See [RESULTS](/egs/gigaspeech/ASR/RESULTS.md) for details. diff --git a/egs/gigaspeech/ASR/RESULTS.md b/egs/gigaspeech/ASR/RESULTS.md index 5c8e8a84f..8ceeba39f 100644 --- a/egs/gigaspeech/ASR/RESULTS.md +++ b/egs/gigaspeech/ASR/RESULTS.md @@ -5,22 +5,23 @@ #### 2022-04-06 The best WER, as of 2022-04-06, for the gigaspeech is below -(using HLG decoding + n-gram LM rescoring + attention decoder rescoring): + +Results using HLG decoding + n-gram LM rescoring + attention decoder rescoring: | | Dev | Test | |-----|-------|-------| -| WER | 11.93 | 11.86 | +| WER | 10.47 | 10.58 | Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are: | ngram_lm_scale | attention_scale | |----------------|-----------------| -| 0.3 | 1.5 | +| 0.5 | 1.3 | To reproduce the above result, use the following commands for training: ``` -cd egs/gigaspeech/ASR/conformer_ctc +cd egs/gigaspeech/ASR ./prepare.sh export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" ./conformer_ctc/train.py \ @@ -31,12 +32,12 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" --lang-dir data/lang_bpe_500 ``` -and the following command for decoding +and the following command for decoding: ``` ./conformer_ctc/decode.py \ - --epoch 19 \ - --avg 8 \ + --epoch 18 \ + --avg 6 \ --method attention-decoder \ --num-paths 1000 \ --exp-dir conformer_ctc/exp_500 \ @@ -47,3 +48,29 @@ and the following command for decoding The tensorboard log for training is available at + +Results using HLG decoding + whole lattice rescoring: + +| | Dev | Test | +|-----|-------|-------| +| WER | 10.51 | 10.62 | + +Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are: +| lm_scale | +|----------| +| 0.2 | + +To reproduce the above result, use the training commands above, and the following command for decoding: + +``` +./conformer_ctc/decode.py \ + --epoch 18 \ + --avg 6 \ + --method whole-lattice-rescoring \ + --num-paths 1000 \ + --exp-dir conformer_ctc/exp_500 \ + --lang-dir data/lang_bpe_500 \ + --max-duration 20 \ + --num-workers 1 +``` +Note: the `whole-lattice-rescoring` method is about twice as fast as the `attention-decoder` method, with slightly worse WER.