mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 01:52:41 +00:00
deploy: 027302c902ce9ab44754d42a56cf1eba9a075be9
This commit is contained in:
parent
c128646ff4
commit
e5fed5060b
@ -30,7 +30,7 @@ of langugae model integration.
|
|||||||
First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here <https://arxiv.org/abs/2002.11268>`_
|
First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here <https://arxiv.org/abs/2002.11268>`_
|
||||||
to address the language information mismatch between the training
|
to address the language information mismatch between the training
|
||||||
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
|
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
|
||||||
are acoustically similar, DR derives the following formular for decoding with Bayes' theorem:
|
are acoustically similar, DR derives the following formula for decoding with Bayes' theorem:
|
||||||
|
|
||||||
.. math::
|
.. math::
|
||||||
|
|
||||||
@ -41,7 +41,7 @@ are acoustically similar, DR derives the following formular for decoding with Ba
|
|||||||
|
|
||||||
|
|
||||||
where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively.
|
where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively.
|
||||||
Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to
|
Here, the source domain LM is trained on the training corpus. The only difference in the above formula compared to
|
||||||
shallow fusion is the subtraction of the source domain LM.
|
shallow fusion is the subtraction of the source domain LM.
|
||||||
|
|
||||||
Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
|
Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
|
||||||
@ -58,7 +58,7 @@ during decoding for transducer model:
|
|||||||
|
|
||||||
In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
|
In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
|
||||||
the only difference lies in the choice of source domain LM. According to the original `paper <https://arxiv.org/abs/2203.16776>`_,
|
the only difference lies in the choice of source domain LM. According to the original `paper <https://arxiv.org/abs/2203.16776>`_,
|
||||||
LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
|
LODR achieves similar performance compared to DR in both intra-domain and cross-domain settings.
|
||||||
As a bi-gram is much faster to evaluate, LODR is usually much faster.
|
As a bi-gram is much faster to evaluate, LODR is usually much faster.
|
||||||
|
|
||||||
Now, we will show you how to use LODR in ``icefall``.
|
Now, we will show you how to use LODR in ``icefall``.
|
||||||
|
@ -139,7 +139,7 @@ A few parameters can be tuned to further boost the performance of shallow fusion
|
|||||||
- ``--lm-scale``
|
- ``--lm-scale``
|
||||||
|
|
||||||
Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
|
Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
|
||||||
the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.
|
the LM score might be dominant during decoding, leading to bad WER. A typical value of this is around 0.3.
|
||||||
|
|
||||||
- ``--beam-size``
|
- ``--beam-size``
|
||||||
|
|
||||||
|
@ -120,14 +120,14 @@ using that corpus.</p>
|
|||||||
<p>First, let’s have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed <a class="reference external" href="https://arxiv.org/abs/2002.11268">here</a>
|
<p>First, let’s have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed <a class="reference external" href="https://arxiv.org/abs/2002.11268">here</a>
|
||||||
to address the language information mismatch between the training
|
to address the language information mismatch between the training
|
||||||
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
|
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
|
||||||
are acoustically similar, DR derives the following formular for decoding with Bayes’ theorem:</p>
|
are acoustically similar, DR derives the following formula for decoding with Bayes’ theorem:</p>
|
||||||
<div class="math notranslate nohighlight">
|
<div class="math notranslate nohighlight">
|
||||||
\[\text{score}\left(y_u|\mathit{x},y\right) =
|
\[\text{score}\left(y_u|\mathit{x},y\right) =
|
||||||
\log p\left(y_u|\mathit{x},y_{1:u-1}\right) +
|
\log p\left(y_u|\mathit{x},y_{1:u-1}\right) +
|
||||||
\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
|
\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
|
||||||
\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
|
\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
|
||||||
<p>where <span class="math notranslate nohighlight">\(\lambda_1\)</span> and <span class="math notranslate nohighlight">\(\lambda_2\)</span> are the weights of LM scores for target domain and source domain respectively.
|
<p>where <span class="math notranslate nohighlight">\(\lambda_1\)</span> and <span class="math notranslate nohighlight">\(\lambda_2\)</span> are the weights of LM scores for target domain and source domain respectively.
|
||||||
Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to
|
Here, the source domain LM is trained on the training corpus. The only difference in the above formula compared to
|
||||||
shallow fusion is the subtraction of the source domain LM.</p>
|
shallow fusion is the subtraction of the source domain LM.</p>
|
||||||
<p>Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
|
<p>Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
|
||||||
considered to be weak and can only capture low-level language information. Therefore, <a class="reference external" href="https://arxiv.org/abs/2203.16776">LODR</a> proposed to use
|
considered to be weak and can only capture low-level language information. Therefore, <a class="reference external" href="https://arxiv.org/abs/2203.16776">LODR</a> proposed to use
|
||||||
@ -140,7 +140,7 @@ during decoding for transducer model:</p>
|
|||||||
\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
|
\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
|
||||||
<p>In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
|
<p>In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
|
||||||
the only difference lies in the choice of source domain LM. According to the original <a class="reference external" href="https://arxiv.org/abs/2203.16776">paper</a>,
|
the only difference lies in the choice of source domain LM. According to the original <a class="reference external" href="https://arxiv.org/abs/2203.16776">paper</a>,
|
||||||
LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
|
LODR achieves similar performance compared to DR in both intra-domain and cross-domain settings.
|
||||||
As a bi-gram is much faster to evaluate, LODR is usually much faster.</p>
|
As a bi-gram is much faster to evaluate, LODR is usually much faster.</p>
|
||||||
<p>Now, we will show you how to use LODR in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.
|
<p>Now, we will show you how to use LODR in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.
|
||||||
For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
|
For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
|
||||||
|
@ -223,7 +223,7 @@ A few parameters can be tuned to further boost the performance of shallow fusion
|
|||||||
<li><p><code class="docutils literal notranslate"><span class="pre">--lm-scale</span></code></p>
|
<li><p><code class="docutils literal notranslate"><span class="pre">--lm-scale</span></code></p>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<div><p>Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
|
<div><p>Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
|
||||||
the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.</p>
|
the LM score might be dominant during decoding, leading to bad WER. A typical value of this is around 0.3.</p>
|
||||||
</div></blockquote>
|
</div></blockquote>
|
||||||
</li>
|
</li>
|
||||||
<li><p><code class="docutils literal notranslate"><span class="pre">--beam-size</span></code></p>
|
<li><p><code class="docutils literal notranslate"><span class="pre">--beam-size</span></code></p>
|
||||||
|
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user