Update descriptions for different decoding methods with external LMs (#1185)

* add some descriptions

* minor updates
This commit is contained in:
marcoyang1998 2023-07-27 12:08:20 +08:00 committed by GitHub
parent 80d922c158
commit 625b33e9ad
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 30 additions and 5 deletions

View File

@ -4,6 +4,27 @@ Decoding with language models
This section describes how to use external langugage models
during decoding to improve the WER of transducer models.
The following decoding methods with external langugage models are available:
.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
:widths: 25 50
:header-rows: 1
* - Decoding method
- beam=4
* - ``modified_beam_search``
- Beam search (i.e. really n-best decoding, the "beam" is the value of n), similar to the original RNN-T paper. Note, this method does not use language model.
* - ``modified_beam_search_lm_shallow_fusion``
- As ``modified_beam_search``, but interpolate RNN-T scores with language model scores, also known as shallow fusion
* - ``modified_beam_search_LODR``
- As ``modified_beam_search_lm_shallow_fusion``, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T.
* - ``modified_beam_search_lm_rescore``
- As ``modified_beam_search``, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them.
* - ``modified_beam_search_lm_rescore_LODR``
- As ``modified_beam_search_lm_rescore``, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking.
.. toctree::
:maxdepth: 2

View File

@ -4,7 +4,11 @@ LM rescoring for Transducer
=================================
LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
<<<<<<< HEAD
methods (see :ref:`shallow_fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
=======
methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
>>>>>>> 80d922c1583b9b7fb7e9b47008302cdc74ef58b7
Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
`icefall <https://github.com/k2-fsa/icefall>`__.
@ -225,23 +229,23 @@ Here, we benchmark the WERs and decoding speed of them:
- beam=4
- beam=8
- beam=12
* - `modified_beam_search`
* - ``modified_beam_search``
- 3.11/7.93; 132s
- 3.1/7.95; 177s
- 3.1/7.96; 210s
* - `modified_beam_search_lm_shallow_fusion`
* - ``modified_beam_search_lm_shallow_fusion``
- 2.77/7.08; 262s
- 2.62/6.65; 352s
- 2.58/6.65; 488s
* - LODR
* - ``modified_beam_search_LODR``
- 2.61/6.74; 400s
- 2.45/6.38; 610s
- 2.4/6.23; 870s
* - `modified_beam_search_lm_rescore`
* - ``modified_beam_search_lm_rescore``
- 2.93/7.6; 156s
- 2.67/7.11; 203s
- 2.59/6.86; 255s
* - `modified_beam_search_lm_rescore_LODR`
* - ``modified_beam_search_lm_rescore_LODR``
- 2.9/7.57; 160s
- 2.63/7.04; 203s
- 2.52/6.73; 263s