diff --git a/egs/gigaspeech/ASR/README.md b/egs/gigaspeech/ASR/README.md index c469b6cee..5ccbcd9ac 100644 --- a/egs/gigaspeech/ASR/README.md +++ b/egs/gigaspeech/ASR/README.md @@ -13,6 +13,8 @@ ln -sfv /path/to/GigaSpeech download/GigaSpeech ``` ## Performance Record -| |Dev|Test| -|---|---|---| -|WER |11.92|11.85| +| | Dev | Test | +|-----|-------|-------| +| WER | 11.93 | 11.86 | + +See [RESULTS](/egs/gigaspeech/ASR/RESULTS.md) for details. diff --git a/egs/gigaspeech/ASR/RESULTS.md b/egs/gigaspeech/ASR/RESULTS.md new file mode 100644 index 000000000..5c8e8a84f --- /dev/null +++ b/egs/gigaspeech/ASR/RESULTS.md @@ -0,0 +1,49 @@ +## Results + +### GigaSpeech BPE training results (Conformer-CTC) + +#### 2022-04-06 + +The best WER, as of 2022-04-06, for the gigaspeech is below +(using HLG decoding + n-gram LM rescoring + attention decoder rescoring): + +| | Dev | Test | +|-----|-------|-------| +| WER | 11.93 | 11.86 | + +Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are: +| ngram_lm_scale | attention_scale | +|----------------|-----------------| +| 0.3 | 1.5 | + + +To reproduce the above result, use the following commands for training: + +``` +cd egs/gigaspeech/ASR/conformer_ctc +./prepare.sh +export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" +./conformer_ctc/train.py \ + --max-duration 120 \ + --num-workers 1 \ + --world-size 8 \ + --exp-dir conformer_ctc/exp_500 \ + --lang-dir data/lang_bpe_500 +``` + +and the following command for decoding + +``` +./conformer_ctc/decode.py \ + --epoch 19 \ + --avg 8 \ + --method attention-decoder \ + --num-paths 1000 \ + --exp-dir conformer_ctc/exp_500 \ + --lang-dir data/lang_bpe_500 \ + --max-duration 20 \ + --num-workers 1 +``` + +The tensorboard log for training is available at +