deploy: 625b33e9ad15961239ea77d12472428d8006085d

This commit is contained in:
marcoyang1998 2023-07-27 04:08:57 +00:00
parent 48b954a308
commit 8600aedaa3
7 changed files with 75 additions and 16 deletions

View File

@ -4,6 +4,27 @@ Decoding with language models
This section describes how to use external langugage models
during decoding to improve the WER of transducer models.
The following decoding methods with external langugage models are available:
.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
:widths: 25 50
:header-rows: 1
* - Decoding method
- beam=4
* - ``modified_beam_search``
- Beam search (i.e. really n-best decoding, the "beam" is the value of n), similar to the original RNN-T paper. Note, this method does not use language model.
* - ``modified_beam_search_lm_shallow_fusion``
- As ``modified_beam_search``, but interpolate RNN-T scores with language model scores, also known as shallow fusion
* - ``modified_beam_search_LODR``
- As ``modified_beam_search_lm_shallow_fusion``, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T.
* - ``modified_beam_search_lm_rescore``
- As ``modified_beam_search``, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them.
* - ``modified_beam_search_lm_rescore_LODR``
- As ``modified_beam_search_lm_rescore``, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking.
.. toctree::
:maxdepth: 2

View File

@ -4,7 +4,11 @@ LM rescoring for Transducer
=================================
LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
<<<<<<< HEAD
methods (see :ref:`shallow_fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
=======
methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
>>>>>>> 80d922c1583b9b7fb7e9b47008302cdc74ef58b7
Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
`icefall <https://github.com/k2-fsa/icefall>`__.
@ -225,23 +229,23 @@ Here, we benchmark the WERs and decoding speed of them:
- beam=4
- beam=8
- beam=12
* - `modified_beam_search`
* - ``modified_beam_search``
- 3.11/7.93; 132s
- 3.1/7.95; 177s
- 3.1/7.96; 210s
* - `modified_beam_search_lm_shallow_fusion`
* - ``modified_beam_search_lm_shallow_fusion``
- 2.77/7.08; 262s
- 2.62/6.65; 352s
- 2.58/6.65; 488s
* - LODR
* - ``modified_beam_search_LODR``
- 2.61/6.74; 400s
- 2.45/6.38; 610s
- 2.4/6.23; 870s
* - `modified_beam_search_lm_rescore`
* - ``modified_beam_search_lm_rescore``
- 2.93/7.6; 156s
- 2.67/7.11; 203s
- 2.59/6.86; 255s
* - `modified_beam_search_lm_rescore_LODR`
* - ``modified_beam_search_lm_rescore_LODR``
- 2.9/7.57; 160s
- 2.63/7.04; 203s
- 2.52/6.73; 263s

View File

@ -229,7 +229,7 @@ $ beam_size_4 6.74 best for test-other
<p>Recall that the lowest WER we obtained in <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a> with beam size of 4 is <code class="docutils literal notranslate"><span class="pre">2.77/7.08</span></code>, LODR
indeed <strong>further improves</strong> the WER. We can do even better if we increase <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>:</p>
<table class="docutils align-default" id="id1">
<caption><span class="caption-number">Table 2 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 3 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />

View File

@ -92,6 +92,36 @@
<h1>Decoding with language models<a class="headerlink" href="#decoding-with-language-models" title="Permalink to this heading"></a></h1>
<p>This section describes how to use external langugage models
during decoding to improve the WER of transducer models.</p>
<p>The following decoding methods with external langugage models are available:</p>
<table class="docutils align-default" id="id1">
<caption><span class="caption-number">Table 1 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 33%" />
<col style="width: 67%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Decoding method</p></th>
<th class="head"><p>beam=4</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code></p></td>
<td><p>Beam search (i.e. really n-best decoding, the “beam” is the value of n), similar to the original RNN-T paper. Note, this method does not use language model.</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_shallow_fusion</span></code></p></td>
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code>, but interpolate RNN-T scores with language model scores, also known as shallow fusion</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_LODR</span></code></p></td>
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_shallow_fusion</span></code>, but subtract score of a (BPE-symbol-level) bigram backoff language model used as an approximation to the internal language model of RNN-T.</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore</span></code></p></td>
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code>, but rescore the n-best hypotheses with external language model (e.g. RNNLM) and re-rank them.</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore_LODR</span></code></p></td>
<td><p>As <code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore</span></code>, but also subtract the score of a (BPE-symbol-level) bigram backoff language model during re-ranking.</p></td>
</tr>
</tbody>
</table>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>

View File

@ -91,7 +91,11 @@
<section id="lm-rescoring-for-transducer">
<span id="rescoring"></span><h1>LM rescoring for Transducer<a class="headerlink" href="#lm-rescoring-for-transducer" title="Permalink to this heading"></a></h1>
<p>LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
&lt;&lt;&lt;&lt;&lt;&lt;&lt; HEAD
methods (see <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a>, <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
=======
methods (see <span class="xref std std-ref">shallow-fusion</span>, <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
&gt;&gt;&gt;&gt;&gt;&gt;&gt; 80d922c1583b9b7fb7e9b47008302cdc74ef58b7
Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
<a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.</p>
@ -196,7 +200,7 @@ $ beam_size_4 7.6 best for test-other
<p>Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
see the following table:</p>
<table class="docutils align-default" id="id1">
<caption><span class="caption-number">Table 3 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
@ -274,7 +278,7 @@ $ beam_size_4 7.57 best for test-other
<p>Its slightly better than LM rescoring. If we further increase the beam size, we will see
further improvements from LM rescoring + LODR:</p>
<table class="docutils align-default" id="id2">
<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 5 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
@ -304,7 +308,7 @@ further improvements from LM rescoring + LODR:</p>
<p>As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
Here, we benchmark the WERs and decoding speed of them:</p>
<table class="docutils align-default" id="id3">
<caption><span class="caption-number">Table 5 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 6 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />
@ -319,27 +323,27 @@ Here, we benchmark the WERs and decoding speed of them:</p>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p><cite>modified_beam_search</cite></p></td>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code></p></td>
<td><p>3.11/7.93; 132s</p></td>
<td><p>3.1/7.95; 177s</p></td>
<td><p>3.1/7.96; 210s</p></td>
</tr>
<tr class="row-odd"><td><p><cite>modified_beam_search_lm_shallow_fusion</cite></p></td>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_shallow_fusion</span></code></p></td>
<td><p>2.77/7.08; 262s</p></td>
<td><p>2.62/6.65; 352s</p></td>
<td><p>2.58/6.65; 488s</p></td>
</tr>
<tr class="row-even"><td><p>LODR</p></td>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_LODR</span></code></p></td>
<td><p>2.61/6.74; 400s</p></td>
<td><p>2.45/6.38; 610s</p></td>
<td><p>2.4/6.23; 870s</p></td>
</tr>
<tr class="row-odd"><td><p><cite>modified_beam_search_lm_rescore</cite></p></td>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore</span></code></p></td>
<td><p>2.93/7.6; 156s</p></td>
<td><p>2.67/7.11; 203s</p></td>
<td><p>2.59/6.86; 255s</p></td>
</tr>
<tr class="row-even"><td><p><cite>modified_beam_search_lm_rescore_LODR</cite></p></td>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_rescore_LODR</span></code></p></td>
<td><p>2.9/7.57; 160s</p></td>
<td><p>2.63/7.04; 203s</p></td>
<td><p>2.52/6.73; 263s</p></td>

View File

@ -227,7 +227,7 @@ the LM score may dominant during decoding, leading to bad WER. A typical value o
</ul>
<p>Here, we also show how <cite>beam-size</cite> effect the WER and decoding time:</p>
<table class="docutils align-default" id="id2">
<caption><span class="caption-number">Table 1 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<caption><span class="caption-number">Table 2 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />

File diff suppressed because one or more lines are too long