From 3cecabdc84f9ae687fe41544d57adad8a7fe5610 Mon Sep 17 00:00:00 2001 From: marcoyang1998 Date: Thu, 27 Jul 2023 11:49:34 +0800 Subject: [PATCH] minor updates --- .../decoding-with-langugage-models/index.rst | 25 ++++++++++--------- .../rescoring.rst | 12 ++++----- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/docs/source/decoding-with-langugage-models/index.rst b/docs/source/decoding-with-langugage-models/index.rst index 55649d9ee..6e5e3a4d9 100644 --- a/docs/source/decoding-with-langugage-models/index.rst +++ b/docs/source/decoding-with-langugage-models/index.rst @@ -6,22 +6,23 @@ during decoding to improve the WER of transducer models. The following decoding methods with external langugage models are available: -.. list-table:: Description of different decoding methods with external LM + +.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean) :widths: 25 50 :header-rows: 1 * - Decoding method - - Description - * - `modified_beam_search` - - This one does not use language model. Beam search (i.e. really n-best decoding, the "beam" is the value of n), similar to the original RNN-T paper - * - `modified_beam_search_lm_shallow_fusion` - - As `modified_beam_search` but interpolate RNN-T scores with language model scores, also known as shallow fusion - * - `modified_beam_search_LODR` - - Low-order Density ratio. As `modified_beam_search_lm_shallow_fusion`, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T. - * - `modified_beam_search_lm_rescore` - - As `modified_beam_search`, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them. - * - `modified_beam_search_lm_rescore_LODR` - - As `modified_beam_search_lm_rescore`, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking. + - beam=4 + * - ``modified_beam_search`` + - Beam search (i.e. really n-best decoding, the "beam" is the value of n), similar to the original RNN-T paper. Note, this method does not use language model. + * - ``modified_beam_search_lm_shallow_fusion`` + - As ``modified_beam_search``, but interpolate RNN-T scores with language model scores, also known as shallow fusion + * - ``modified_beam_search_LODR`` + - As ``modified_beam_search_lm_shallow_fusion``, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T. + * - ``modified_beam_search_lm_rescore`` + - As ``modified_beam_search``, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them. + * - ``modified_beam_search_lm_rescore_LODR`` + - As ``modified_beam_search_lm_rescore``, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking. .. toctree:: diff --git a/docs/source/decoding-with-langugage-models/rescoring.rst b/docs/source/decoding-with-langugage-models/rescoring.rst index d71acc1e5..ee2e2113c 100644 --- a/docs/source/decoding-with-langugage-models/rescoring.rst +++ b/docs/source/decoding-with-langugage-models/rescoring.rst @@ -4,7 +4,7 @@ LM rescoring for Transducer ================================= LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based -methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search. +methods (see :ref:`shallow_fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search. Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM. In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in `icefall `__. @@ -225,23 +225,23 @@ Here, we benchmark the WERs and decoding speed of them: - beam=4 - beam=8 - beam=12 - * - `modified_beam_search` + * - ``modified_beam_search`` - 3.11/7.93; 132s - 3.1/7.95; 177s - 3.1/7.96; 210s - * - `modified_beam_search_lm_shallow_fusion` + * - ``modified_beam_search_lm_shallow_fusion`` - 2.77/7.08; 262s - 2.62/6.65; 352s - 2.58/6.65; 488s - * - LODR + * - ``modified_beam_search_LODR`` - 2.61/6.74; 400s - 2.45/6.38; 610s - 2.4/6.23; 870s - * - `modified_beam_search_lm_rescore` + * - ``modified_beam_search_lm_rescore`` - 2.93/7.6; 156s - 2.67/7.11; 203s - 2.59/6.86; 255s - * - `modified_beam_search_lm_rescore_LODR` + * - ``modified_beam_search_lm_rescore_LODR`` - 2.9/7.57; 160s - 2.63/7.04; 203s - 2.52/6.73; 263s