mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 01:52:41 +00:00
deploy: 625b33e9ad15961239ea77d12472428d8006085d
This commit is contained in:
parent
48b954a308
commit
8600aedaa3
@ -4,6 +4,27 @@ Decoding with language models
|
||||
This section describes how to use external langugage models
|
||||
during decoding to improve the WER of transducer models.
|
||||
|
||||
The following decoding methods with external langugage models are available:
|
||||
|
||||
|
||||
.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
|
||||
:widths: 25 50
|
||||
:header-rows: 1
|
||||
|
||||
* - Decoding method
|
||||
- beam=4
|
||||
* - ``modified_beam_search``
|
||||
- Beam search (i.e. really n-best decoding, the "beam" is the value of n), similar to the original RNN-T paper. Note, this method does not use language model.
|
||||
* - ``modified_beam_search_lm_shallow_fusion``
|
||||
- As ``modified_beam_search``, but interpolate RNN-T scores with language model scores, also known as shallow fusion
|
||||
* - ``modified_beam_search_LODR``
|
||||
- As ``modified_beam_search_lm_shallow_fusion``, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T.
|
||||
* - ``modified_beam_search_lm_rescore``
|
||||
- As ``modified_beam_search``, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them.
|
||||
* - ``modified_beam_search_lm_rescore_LODR``
|
||||
- As ``modified_beam_search_lm_rescore``, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking.
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
|
@ -4,7 +4,11 @@ LM rescoring for Transducer
|
||||
=================================
|
||||
|
||||
LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
|
||||
<<<<<<< HEAD
|
||||
methods (see :ref:`shallow_fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
|
||||
=======
|
||||
methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
|
||||
>>>>>>> 80d922c1583b9b7fb7e9b47008302cdc74ef58b7
|
||||
Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
|
||||
In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
|
||||
`icefall <https://github.com/k2-fsa/icefall>`__.
|
||||
@ -225,23 +229,23 @@ Here, we benchmark the WERs and decoding speed of them:
|
||||
- beam=4
|
||||
- beam=8
|
||||
- beam=12
|
||||
* - `modified_beam_search`
|
||||
* - ``modified_beam_search``
|
||||
- 3.11/7.93; 132s
|
||||
- 3.1/7.95; 177s
|
||||
- 3.1/7.96; 210s
|
||||
* - `modified_beam_search_lm_shallow_fusion`
|
||||
* - ``modified_beam_search_lm_shallow_fusion``
|
||||
- 2.77/7.08; 262s
|
||||
- 2.62/6.65; 352s
|
||||
- 2.58/6.65; 488s
|
||||
* - LODR
|
||||
* - ``modified_beam_search_LODR``
|
||||
- 2.61/6.74; 400s
|
||||
- 2.45/6.38; 610s
|
||||
- 2.4/6.23; 870s
|
||||
* - `modified_beam_search_lm_rescore`
|
||||
* - ``modified_beam_search_lm_rescore``
|
||||
- 2.93/7.6; 156s
|
||||
- 2.67/7.11; 203s
|
||||
- 2.59/6.86; 255s
|
||||
* - `modified_beam_search_lm_rescore_LODR`
|
||||
* - ``modified_beam_search_lm_rescore_LODR``
|
||||
- 2.9/7.57; 160s
|
||||
- 2.63/7.04; 203s
|
||||
- 2.52/6.73; 263s
|
||||
|
@ -229,7 +229,7 @@ $ beam_size_4 6.74 best for test-other
|
||||
<p>Recall that the lowest WER we obtained in <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a> with beam size of 4 is <code class="docutils literal notranslate"><span class="pre">2.77/7.08</span></code>, LODR
|
||||
indeed <strong>further improves</strong> the WER. We can do even better if we increase <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>:</p>
|
||||
<table class="docutils align-default" id="id1">
|
||||
<caption><span class="caption-number">Table 2 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
|
||||
<caption><span class="caption-number">Table 3 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
|
||||
<colgroup>
|
||||
<col style="width: 25%" />
|
||||
<col style="width: 25%" />
|
||||
|
@ -92,6 +92,36 @@
|
||||
<h1>Decoding with language models<a class="headerlink" href="#decoding-with-language-models" title="Permalink to this heading"></a></h1>
|
||||
<p>This section describes how to use external langugage models
|
||||
during decoding to improve the WER of transducer models.</p>
|
||||
<p>The following decoding methods with external langugage models are available:</p>
|
||||
<table class="docutils align-default" id="id1">
|
||||
<caption><span class="caption-number">Table 1 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
|
||||
<colgroup>
|
||||
<col style="width: 33%" />
|
||||
<col style="width: 67%" />
|
||||
</colgroup>
|
||||
<thead>
|
||||
<tr class="row-odd"><th class="head"><p>Decoding method</p></th>
|
||||
<th class="head"><p>beam=4</p></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code></p></td>
|
||||
<td><p>Beam search (i.e. really n-best decoding, the “beam” is the value of n), similar to the original RNN-T paper. Note, this method does not use language model.</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_shallow_fusion</span></code></p></td>
|
||||
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code>, but interpolate RNN-T scores with language model scores, also known as shallow fusion</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_LODR</span></code></p></td>
|
||||
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_shallow_fusion</span></code>, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T.</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore</span></code></p></td>
|
||||
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code>, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them.</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore_LODR</span></code></p></td>
|
||||
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore</span></code>, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking.</p></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
<div class="toctree-wrapper compound">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
|
||||
|
@ -91,7 +91,11 @@
|
||||
<section id="lm-rescoring-for-transducer">
|
||||
<span id="rescoring"></span><h1>LM rescoring for Transducer<a class="headerlink" href="#lm-rescoring-for-transducer" title="Permalink to this heading"></a></h1>
|
||||
<p>LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
|
||||
<<<<<<< HEAD
|
||||
methods (see <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a>, <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
|
||||
=======
|
||||
methods (see <span class="xref std std-ref">shallow-fusion</span>, <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
|
||||
>>>>>>> 80d922c1583b9b7fb7e9b47008302cdc74ef58b7
|
||||
Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
|
||||
In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
|
||||
<a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.</p>
|
||||
@ -196,7 +200,7 @@ $ beam_size_4 7.6 best for test-other
|
||||
<p>Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
|
||||
see the following table:</p>
|
||||
<table class="docutils align-default" id="id1">
|
||||
<caption><span class="caption-number">Table 3 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
|
||||
<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
|
||||
<colgroup>
|
||||
<col style="width: 33%" />
|
||||
<col style="width: 33%" />
|
||||
@ -274,7 +278,7 @@ $ beam_size_4 7.57 best for test-other
|
||||
<p>It’s slightly better than LM rescoring. If we further increase the beam size, we will see
|
||||
further improvements from LM rescoring + LODR:</p>
|
||||
<table class="docutils align-default" id="id2">
|
||||
<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
|
||||
<caption><span class="caption-number">Table 5 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
|
||||
<colgroup>
|
||||
<col style="width: 33%" />
|
||||
<col style="width: 33%" />
|
||||
@ -304,7 +308,7 @@ further improvements from LM rescoring + LODR:</p>
|
||||
<p>As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
|
||||
Here, we benchmark the WERs and decoding speed of them:</p>
|
||||
<table class="docutils align-default" id="id3">
|
||||
<caption><span class="caption-number">Table 5 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
|
||||
<caption><span class="caption-number">Table 6 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
|
||||
<colgroup>
|
||||
<col style="width: 25%" />
|
||||
<col style="width: 25%" />
|
||||
@ -319,27 +323,27 @@ Here, we benchmark the WERs and decoding speed of them:</p>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr class="row-even"><td><p><cite>modified_beam_search</cite></p></td>
|
||||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code></p></td>
|
||||
<td><p>3.11/7.93; 132s</p></td>
|
||||
<td><p>3.1/7.95; 177s</p></td>
|
||||
<td><p>3.1/7.96; 210s</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p><cite>modified_beam_search_lm_shallow_fusion</cite></p></td>
|
||||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_shallow_fusion</span></code></p></td>
|
||||
<td><p>2.77/7.08; 262s</p></td>
|
||||
<td><p>2.62/6.65; 352s</p></td>
|
||||
<td><p>2.58/6.65; 488s</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p>LODR</p></td>
|
||||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_LODR</span></code></p></td>
|
||||
<td><p>2.61/6.74; 400s</p></td>
|
||||
<td><p>2.45/6.38; 610s</p></td>
|
||||
<td><p>2.4/6.23; 870s</p></td>
|
||||
</tr>
|
||||
<tr class="row-odd"><td><p><cite>modified_beam_search_lm_rescore</cite></p></td>
|
||||
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore</span></code></p></td>
|
||||
<td><p>2.93/7.6; 156s</p></td>
|
||||
<td><p>2.67/7.11; 203s</p></td>
|
||||
<td><p>2.59/6.86; 255s</p></td>
|
||||
</tr>
|
||||
<tr class="row-even"><td><p><cite>modified_beam_search_lm_rescore_LODR</cite></p></td>
|
||||
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore_LODR</span></code></p></td>
|
||||
<td><p>2.9/7.57; 160s</p></td>
|
||||
<td><p>2.63/7.04; 203s</p></td>
|
||||
<td><p>2.52/6.73; 263s</p></td>
|
||||
|
@ -227,7 +227,7 @@ the LM score may dominant during decoding, leading to bad WER. A typical value o
|
||||
</ul>
|
||||
<p>Here, we also show how <cite>–beam-size</cite> effect the WER and decoding time:</p>
|
||||
<table class="docutils align-default" id="id2">
|
||||
<caption><span class="caption-number">Table 1 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
|
||||
<caption><span class="caption-number">Table 2 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
|
||||
<colgroup>
|
||||
<col style="width: 25%" />
|
||||
<col style="width: 25%" />
|
||||
|
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user