deploy: 11523c5b894f42ded965dcb974fef9a8a8122518

2023-07-06 11:11:44 +00:00 · 2023-07-06 11:11:44 +00:00 · 48b954a308
commit 48b954a308
parent 9d2d2225d7
61 changed files with 1921 additions and 36 deletions
--- a/_sources/decoding-with-langugage-models/LODR.rst.txt
+++ b/_sources/decoding-with-langugage-models/LODR.rst.txt
@ -0,0 +1,184 @@
+.. _LODR:
+
+LODR for RNN Transducer
+=======================
+
+
+As a type of E2E model, neural transducers are usually considered as having an internal 
+language model, which learns the language level information on the training corpus. 
+In real-life scenario, there is often a mismatch between the training corpus and the target corpus space. 
+This mismatch can be a problem when decoding for neural transducer models with language models as its internal
+language can act "against" the external LM. In this tutorial, we show how to use
+`Low-order Density Ratio <https://arxiv.org/abs/2203.16776>`_ to alleviate this effect to further improve the performance
+of langugae model integration. 
+
+.. note::
+
+    This tutorial is based on the recipe 
+    `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+    which is a streaming transducer model trained on `LibriSpeech`_. 
+    However, you can easily apply LODR to other recipes.
+    If you encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`__.
+
+
+.. note::
+
+    For simplicity, the training and testing corpus in this tutorial are the same (`LibriSpeech`_). However, 
+    you can change the testing set to any other domains (e.g `GigaSpeech`_) and prepare the language models 
+    using that corpus.
+
+First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here <https://arxiv.org/abs/2002.11268>`_ 
+to address the language information mismatch between the training
+corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
+are acoustically similar, DR derives the following formular for decoding with Bayes' theorem:
+
+.. math::
+
+    \text{score}\left(y_u|\mathit{x},y\right) = 
+    \log p\left(y_u|\mathit{x},y_{1:u-1}\right) + 
+    \lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - 
+    \lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)
+
+
+where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively. 
+Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to 
+shallow fusion is the subtraction of the source domain LM.
+
+Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is 
+considered to be weak and can only capture low-level language information. Therefore, `LODR <https://arxiv.org/abs/2203.16776>`__ proposed to use
+a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula
+during decoding for transducer model:
+
+.. math::
+
+    \text{score}\left(y_u|\mathit{x},y\right) = 
+    \log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) + 
+    \lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - 
+    \lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)
+
+In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Comared to DR, 
+the only difference lies in the choice of source domain LM. According to the original `paper <https://arxiv.org/abs/2203.16776>`_,
+LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
+As a bi-gram is much faster to evaluate, LODR is usually much faster.
+
+Now, we will show you how to use LODR in ``icefall``.
+For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`_.
+If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
+The testing scenario here is intra-domain (we decode the model trained on `LibriSpeech`_ on `LibriSpeech`_ testing sets).
+
+As the initial step, let's download the pre-trained model.
+
+.. code-block:: bash
+
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+    $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
+
+To test the model, let's have a look at the decoding results **without** using LM. This can be done via the following command:
+
+.. code-block:: bash
+
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --exp-dir $exp_dir \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search
+
+The following WERs are achieved on test-clean and test-other:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	3.11	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.93	best for test-other
+
+Then, we download the external language model and bi-gram LM that are necessary for LODR. 
+Note that the bi-gram is estimated on the LibriSpeech 960 hours' text.
+
+.. code-block:: bash
+
+    $ # download the external LM
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm 
+    $ # create a symbolic link so that the checkpoint can be loaded
+    $ pushd icefall-librispeech-rnn-lm/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt 
+    $ popd
+    $
+    $ # download the bi-gram
+    $ git lfs install
+    $ git clone https://huggingface.co/marcoyang/librispeech_bigram
+    $ pushd data/lang_bpe_500
+    $ ln -s ../../librispeech_bigram/2gram.fst.txt .
+    $ popd
+
+Then, we perform LODR decoding by setting ``--decoding-method`` to ``modified_beam_search_lm_LODR``:
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.42
+    $ LODR_scale=-0.24
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_LODR \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 1 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500 \
+        --tokens-ngram 2 \
+        --ngram-lm-scale $LODR_scale
+
+There are two extra arguments that need to be given when doing LODR. ``--tokens-ngram`` specifies the order of n-gram. As we
+are using a bi-gram, we set it to 2. ``--ngram-lm-scale`` is the scale of the bi-gram, it should be a negative number
+as we are subtracting the bi-gram's score during decoding.
+
+The decoding results obtained with the above command are shown below:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.61	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	6.74	best for test-other
+
+Recall that the lowest WER we obtained in :ref:`shallow_fusion` with beam size of 4 is ``2.77/7.08``, LODR
+indeed **further improves** the WER. We can do even better if we increase ``--beam-size``:
+
+.. list-table:: WER of LODR with different beam sizes
+   :widths: 25 25 50
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+   * - 4
+     - 2.61
+     - 6.74
+   * - 8
+     - 2.45
+     - 6.38
+   * - 12
+     - 2.4
+     - 6.23
--- a/_sources/decoding-with-langugage-models/index.rst.txt
+++ b/_sources/decoding-with-langugage-models/index.rst.txt
@ -0,0 +1,12 @@
+Decoding with language models
+=============================
+
+This section describes how to use external langugage models 
+during decoding to improve the WER of transducer models.
+
+.. toctree::
+   :maxdepth: 2
+
+   shallow-fusion
+   LODR
+   rescoring
--- a/_sources/decoding-with-langugage-models/rescoring.rst.txt
+++ b/_sources/decoding-with-langugage-models/rescoring.rst.txt
@ -0,0 +1,252 @@
+.. _rescoring:
+
+LM rescoring for Transducer
+=================================
+
+LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
+methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
+Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
+In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
+`icefall <https://github.com/k2-fsa/icefall>`__.
+
+.. note::
+
+    This tutorial is based on the recipe 
+    `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+    which is a streaming transducer model trained on `LibriSpeech`_. 
+    However, you can easily apply shallow fusion to other recipes.
+    If you encounter any problems, please open an issue `here <https://github.com/k2-fsa/icefall/issues>`_.
+
+.. note::
+
+    For simplicity, the training and testing corpus in this tutorial is the same (`LibriSpeech`_). However, you can change the testing set
+    to any other domains (e.g `GigaSpeech`_) and use an external LM trained on that domain.
+
+.. HINT::
+
+  We recommend you to use a GPU for decoding.
+
+For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`__.
+If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
+
+As the initial step, let's download the pre-trained model.
+
+.. code-block:: bash
+
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+    $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
+
+As usual, we first test the model's performance without external LM. This can be done via the following command:
+
+.. code-block:: bash
+
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --exp-dir $exp_dir \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model 
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search
+
+The following WERs are achieved on test-clean and test-other:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	3.11	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.93	best for test-other
+
+Now, we will try to improve the above WER numbers via external LM rescoring. We will download 
+a pre-trained LM from this `link <https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm>`__.
+
+.. note::
+
+    This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+    You may also train a RNN LM from scratch. Please refer to this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py>`__
+    for training a RNN LM and this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py>`__ to train a transformer LM.
+
+.. code-block:: bash
+
+    $ # download the external LM
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm 
+    $ # create a symbolic link so that the checkpoint can be loaded
+    $ pushd icefall-librispeech-rnn-lm/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt 
+    $ popd
+
+
+With the RNNLM available, we can rescore the n-best hypotheses generated from `modified_beam_search`. Here,
+`n` should be the number of beams, i.e ``--beam-size``. The command for LM rescoring is
+as follows. Note that the ``--decoding-method`` is set to `modified_beam_search_lm_rescore` and ``--use-shallow-fusion``
+is set to `False`.
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.43
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_rescore \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 0 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.93	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.6	best for test-other
+
+Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
+see the following table:
+
+.. list-table:: WERs of LM rescoring with different beam sizes
+   :widths: 25 25 25
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+   * - 4
+     - 2.93
+     - 7.6
+   * - 8
+     - 2.67
+     - 7.11
+   * - 12
+     - 2.59
+     - 6.86
+
+In fact, we can also apply LODR (see :ref:`LODR`) when doing LM rescoring. To do so, we need to 
+download the bi-gram required by LODR:
+
+.. code-block:: bash
+
+    $ # download the bi-gram
+    $ git lfs install
+    $ git clone https://huggingface.co/marcoyang/librispeech_bigram
+    $ pushd data/lang_bpe_500
+    $ ln -s ../../librispeech_bigram/2gram.arpa .
+    $ popd
+
+Then we can performn LM rescoring + LODR by changing the decoding method to `modified_beam_search_lm_rescore_LODR`. 
+
+.. note:: 
+
+    This decoding method requires the dependency of `kenlm <https://github.com/kpu/kenlm>`_. You can install it
+    via this command: `pip install https://github.com/kpu/kenlm/archive/master.zip`. 
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.43
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_rescore_LODR \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 0 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500
+
+You should see the following WERs after executing the commands above:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.9	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.57	best for test-other
+
+It's slightly better than LM rescoring. If we further increase the beam size, we will see
+further improvements from LM rescoring + LODR:
+
+.. list-table:: WERs of LM rescoring + LODR with different beam sizes
+   :widths: 25 25 25
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+   * - 4
+     - 2.9
+     - 7.57
+   * - 8
+     - 2.63
+     - 7.04
+   * - 12
+     - 2.52
+     - 6.73
+
+As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
+Here, we benchmark the WERs and decoding speed of them:
+
+.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
+   :widths: 25 25 25 25
+   :header-rows: 1
+
+   * - Decoding method
+     - beam=4
+     - beam=8
+     - beam=12
+   * - `modified_beam_search`
+     - 3.11/7.93; 132s
+     - 3.1/7.95; 177s
+     - 3.1/7.96; 210s
+   * - `modified_beam_search_lm_shallow_fusion`
+     - 2.77/7.08; 262s
+     - 2.62/6.65; 352s
+     - 2.58/6.65; 488s
+   * - LODR
+     - 2.61/6.74; 400s
+     - 2.45/6.38; 610s
+     - 2.4/6.23; 870s
+   * - `modified_beam_search_lm_rescore`
+     - 2.93/7.6; 156s
+     - 2.67/7.11; 203s
+     - 2.59/6.86; 255s
+   * - `modified_beam_search_lm_rescore_LODR`
+     - 2.9/7.57; 160s
+     - 2.63/7.04; 203s
+     - 2.52/6.73; 263s
+
+.. note::
+
+    Decoding is performed with a single 32G V100, we set ``--max-duration`` to 600. 
+    Decoding time here is only for reference and it may vary.
--- a/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt
+++ b/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt
@ -0,0 +1,176 @@
+.. _shallow_fusion:
+
+Shallow fusion for Transducer
+=================================
+
+External language models (LM) are commonly used to improve WERs for E2E ASR models.
+This tutorial shows you how to perform ``shallow fusion`` with an external LM
+to improve the word-error-rate of a transducer model.
+
+.. note::
+
+    This tutorial is based on the recipe 
+    `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+    which is a streaming transducer model trained on `LibriSpeech`_. 
+    However, you can easily apply shallow fusion to other recipes.
+    If you encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.
+
+.. note::
+
+    For simplicity, the training and testing corpus in this tutorial is the same (`LibriSpeech`_). However, you can change the testing set
+    to any other domains (e.g `GigaSpeech`_) and use an external LM trained on that domain.
+
+.. HINT::
+
+  We recommend you to use a GPU for decoding.
+
+For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`__.
+If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
+
+As the initial step, let's download the pre-trained model.
+
+.. code-block:: bash
+
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+    $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
+
+To test the model, let's have a look at the decoding results without using LM. This can be done via the following command:
+
+.. code-block:: bash
+
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --exp-dir $exp_dir \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model 
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search
+
+The following WERs are achieved on test-clean and test-other:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	3.11	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.93	best for test-other
+
+These are already good numbers! But we can further improve it by using shallow fusion with external LM.
+Training a language model usually takes a long time, we can download a pre-trained LM from this `link <https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm>`__.
+
+.. code-block:: bash
+
+    $ # download the external LM
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm 
+    $ # create a symbolic link so that the checkpoint can be loaded
+    $ pushd icefall-librispeech-rnn-lm/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt 
+    $ popd
+
+.. note::
+
+    This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+    You may also train a RNN LM from scratch. Please refer to this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py>`__
+    for training a RNN LM and this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py>`__ to train a transformer LM.
+
+To use shallow fusion for decoding, we can execute the following command:
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.29
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_shallow_fusion \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 1 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500
+
+Note that we set ``--decoding-method modified_beam_search_lm_shallow_fusion`` and ``--use-shallow-fusion True``
+to use shallow fusion. ``--lm-type`` specifies the type of neural LM we are going to use, you can either choose
+between ``rnn`` or ``transformer``. The following three arguments are associated with the rnn:
+
+- ``--rnn-lm-embedding-dim``
+    The embedding dimension of the RNN LM
+
+- ``--rnn-lm-hidden-dim``
+    The hidden dimension of the RNN LM
+
+- ``--rnn-lm-num-layers``
+    The number of RNN layers in the RNN LM.
+
+
+The decoding result obtained with the above command are shown below.
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.77	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.08	best for test-other
+
+The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%. 
+A few parameters can be tuned to further boost the performance of shallow fusion:
+
+- ``--lm-scale`` 
+
+    Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large, 
+    the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.
+
+- ``--beam-size`` 
+    
+    The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy.
+
+Here, we also show how `--beam-size` effect the WER and decoding time:
+
+.. list-table:: WERs and decoding time (on test-clean) of shallow fusion with different beam sizes
+   :widths: 25 25 25 25
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+     - Decoding time on test-clean (s)
+   * - 4
+     - 2.77
+     - 7.08
+     - 262
+   * - 8
+     - 2.62
+     - 6.65
+     - 352
+   * - 12
+     - 2.58
+     - 6.65
+     - 488
+
+As we see, a larger beam size during shallow fusion improves the WER, but is also slower.
+
+
+
+
+
+
+
+ 
--- a/_sources/index.rst.txt
+++ b/_sources/index.rst.txt
@ -34,3 +34,8 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.

   contributing/index
   huggingface/index
+
+.. toctree::
+   :maxdepth: 2
+   
+   decoding-with-langugage-models/index
--- a/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
+++ b/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
@ -1,7 +1,7 @@
 Distillation with HuBERT
 ========================

-This tutorial shows you how to perform knowledge distillation in `icefall`_
+This tutorial shows you how to perform knowledge distillation in `icefall <https://github.com/k2-fsa/icefall>`_
 with the `LibriSpeech`_ dataset. The distillation method
 used here is called "Multi Vector Quantization Knowledge Distillation" (MVQ-KD).
 Please have a look at our paper `Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation <https://arxiv.org/abs/2211.00508>`_
@ -13,7 +13,7 @@ for more details about MVQ-KD.
    `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_.
    Currently, we only implement MVQ-KD in this recipe. However, MVQ-KD is theoretically applicable to all recipes
    with only minor changes needed. Feel free to try out MVQ-KD in different recipes. If you
-    encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.
+    encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`__.

 .. note::

@ -217,7 +217,7 @@ the following command.
    --exp-dir $exp_dir \
    --enable-distillation True

-You should get similar results as `here <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert>`_.
+You should get similar results as `here <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert>`__.

 That's all! Feel free to experiment with your own setups and report your results.
-If you encounter any problems during training, please open up an issue `here <https://github.com/k2-fsa/icefall/issues>`_.
+If you encounter any problems during training, please open up an issue `here <https://github.com/k2-fsa/icefall/issues>`__.
--- a/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
+++ b/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
@ -8,10 +8,10 @@ with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.

 .. Note::

-   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
-   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
-   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
-   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_,
+   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`__,
+   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`__,
+   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`__,
+   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`__,
   We will take pruned_transducer_stateless4 as an example in this tutorial.

 .. HINT::
@ -237,7 +237,7 @@ them, please modify ``./pruned_transducer_stateless4/train.py`` directly.

 .. NOTE::

-  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`_ are a little different from
+  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`__ are a little different from
  other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5.


@ -529,13 +529,13 @@ Download pretrained models
 If you don't want to train from scratch, you can download the pretrained models
 by visiting the following links:

-  - `pruned_transducer_stateless <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12>`_
+  - `pruned_transducer_stateless <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12>`__

-  - `pruned_transducer_stateless2 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29>`_
+  - `pruned_transducer_stateless2 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29>`__

-  - `pruned_transducer_stateless4 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless4-2022-06-03>`_
+  - `pruned_transducer_stateless4 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless4-2022-06-03>`__

-  - `pruned_transducer_stateless5 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07>`_
+  - `pruned_transducer_stateless5 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07>`__

  See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
  for the details of the above pretrained models
--- a/_sources/recipes/Streaming-ASR/introduction.rst.txt
+++ b/_sources/recipes/Streaming-ASR/introduction.rst.txt
@ -45,9 +45,9 @@ the input features.

 We have three variants of Emformer models in ``icefall``.

- - ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_stateless_emformer_rnnt2>`_.
+ - ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_stateless_emformer_rnnt2>`__.
 - ``conv_emformer_transducer_stateless`` using ConvEmformer implemented by ourself. Different from the Emformer in torchaudio,
   ConvEmformer has a convolution in each layer and uses the mechanisms in our reworked conformer model.
-   See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless>`_.
+   See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless>`__.
 - ``conv_emformer_transducer_stateless2`` using ConvEmformer implemented by ourself. The only difference from the above one is that
   it uses a simplified memory bank. See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2>`_.
--- a/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
+++ b/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
@ -6,10 +6,10 @@ with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.

 .. Note::

-   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
-   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
-   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
-   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_,
+   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`__,
+   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`__,
+   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`__,
+   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`__,
   We will take pruned_transducer_stateless4 as an example in this tutorial.

 .. HINT::
@ -264,7 +264,7 @@ them, please modify ``./pruned_transducer_stateless4/train.py`` directly.

 .. NOTE::

-  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`_ are a little different from
+  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`__ are a little different from
  other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5.


--- a/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt
+++ b/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt
@ -6,7 +6,7 @@ with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.

 .. Note::

-   The tutorial is suitable for `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+   The tutorial is suitable for `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`__,

 .. HINT::

@ -642,7 +642,7 @@ Download pretrained models
 If you don't want to train from scratch, you can download the pretrained models
 by visiting the following links:

-  - `pruned_transducer_stateless7_streaming <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`_
+  - `pruned_transducer_stateless7_streaming <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`__

  See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
  for the details of the above pretrained models
--- a/contributing/code-style.html
+++ b/contributing/code-style.html
@ -59,6 +59,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/contributing/doc.html
+++ b/contributing/doc.html
@ -59,6 +59,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/contributing/how-to-create-a-recipe.html
+++ b/contributing/how-to-create-a-recipe.html
@ -65,6 +65,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/contributing/index.html
+++ b/contributing/index.html
@ -59,6 +59,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/decoding-with-langugage-models/LODR.html
+++ b/decoding-with-langugage-models/LODR.html
@ -0,0 +1,292 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>LODR for RNN Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+        <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="LM rescoring for Transducer" href="rescoring.html" />
+    <link rel="prev" title="Shallow fusion for Transducer" href="shallow-fusion.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Decoding with language models</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" href="#">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+          <li class="breadcrumb-item"><a href="index.html">Decoding with language models</a></li>
+      <li class="breadcrumb-item active">LODR for RNN Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/LODR.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="lodr-for-rnn-transducer">
+<span id="lodr"></span><h1>LODR for RNN Transducer<a class="headerlink" href="#lodr-for-rnn-transducer" title="Permalink to this heading"></a></h1>
+<p>As a type of E2E model, neural transducers are usually considered as having an internal
+language model, which learns the language level information on the training corpus.
+In real-life scenario, there is often a mismatch between the training corpus and the target corpus space.
+This mismatch can be a problem when decoding for neural transducer models with language models as its internal
+language can act “against” the external LM. In this tutorial, we show how to use
+<a class="reference external" href="https://arxiv.org/abs/2203.16776">Low-order Density Ratio</a> to alleviate this effect to further improve the performance
+of langugae model integration.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is based on the recipe
+<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming">pruned_transducer_stateless7_streaming</a>,
+which is a streaming transducer model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>.
+However, you can easily apply LODR to other recipes.
+If you encounter any problems, please open an issue here <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>For simplicity, the training and testing corpus in this tutorial are the same (<a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>). However,
+you can change the testing set to any other domains (e.g <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>) and prepare the language models
+using that corpus.</p>
+</div>
+<p>First, let’s have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed <a class="reference external" href="https://arxiv.org/abs/2002.11268">here</a>
+to address the language information mismatch between the training
+corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
+are acoustically similar, DR derives the following formular for decoding with Bayes’ theorem:</p>
+<div class="math notranslate nohighlight">
+\[\text{score}\left(y_u|\mathit{x},y\right) =
+\log p\left(y_u|\mathit{x},y_{1:u-1}\right) +
+\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
+\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
+<p>where <span class="math notranslate nohighlight">\(\lambda_1\)</span> and <span class="math notranslate nohighlight">\(\lambda_2\)</span> are the weights of LM scores for target domain and source domain respectively.
+Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to
+shallow fusion is the subtraction of the source domain LM.</p>
+<p>Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
+considered to be weak and can only capture low-level language information. Therefore, <a class="reference external" href="https://arxiv.org/abs/2203.16776">LODR</a> proposed to use
+a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula
+during decoding for transducer model:</p>
+<div class="math notranslate nohighlight">
+\[\text{score}\left(y_u|\mathit{x},y\right) =
+\log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) +
+\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
+\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
+<p>In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Comared to DR,
+the only difference lies in the choice of source domain LM. According to the original <a class="reference external" href="https://arxiv.org/abs/2203.16776">paper</a>,
+LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
+As a bi-gram is much faster to evaluate, LODR is usually much faster.</p>
+<p>Now, we will show you how to use LODR in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.
+For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
+If you want to train your model from scratch, please have a look at <a class="reference internal" href="../recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html#non-streaming-librispeech-pruned-transducer-stateless"><span class="std std-ref">Pruned transducer statelessX</span></a>.
+The testing scenario here is intra-domain (we decode the model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> testing sets).</p>
+<p>As the initial step, let’s download the pre-trained model.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+</pre></div>
+</div>
+<p>To test the model, let’s have a look at the decoding results <strong>without</strong> using LM. This can be done via the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search
+</pre></div>
+</div>
+<p>The following WERs are achieved on test-clean and test-other:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       3.11    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.93    best for test-other
+</pre></div>
+</div>
+<p>Then, we download the external language model and bi-gram LM that are necessary for LODR.
+Note that the bi-gram is estimated on the LibriSpeech 960 hours’ text.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the external LM</span>
+$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
+$<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
+$<span class="w"> </span><span class="nb">popd</span>
+$
+$<span class="w"> </span><span class="c1"># download the bi-gram</span>
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
+$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/marcoyang/librispeech_bigram
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>data/lang_bpe_500
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>../../librispeech_bigram/2gram.fst.txt<span class="w"> </span>.
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<p>Then, we perform LODR decoding by setting <code class="docutils literal notranslate"><span class="pre">--decoding-method</span></code> to <code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_LODR</span></code>:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.42
+$<span class="w"> </span><span class="nv">LODR_scale</span><span class="o">=</span>-0.24
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_LODR<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--tokens-ngram<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--ngram-lm-scale<span class="w"> </span><span class="nv">$LODR_scale</span>
+</pre></div>
+</div>
+<p>There are two extra arguments that need to be given when doing LODR. <code class="docutils literal notranslate"><span class="pre">--tokens-ngram</span></code> specifies the order of n-gram. As we
+are using a bi-gram, we set it to 2. <code class="docutils literal notranslate"><span class="pre">--ngram-lm-scale</span></code> is the scale of the bi-gram, it should be a negative number
+as we are subtracting the bi-gram’s score during decoding.</p>
+<p>The decoding results obtained with the above command are shown below:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.61    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       6.74    best for test-other
+</pre></div>
+</div>
+<p>Recall that the lowest WER we obtained in <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a> with beam size of 4 is <code class="docutils literal notranslate"><span class="pre">2.77/7.08</span></code>, LODR
+indeed <strong>further improves</strong> the WER. We can do even better if we increase <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>:</p>
+<table class="docutils align-default" id="id1">
+<caption><span class="caption-number">Table 2 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 50%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.61</p></td>
+<td><p>6.74</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.45</p></td>
+<td><p>6.38</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.4</p></td>
+<td><p>6.23</p></td>
+</tr>
+</tbody>
+</table>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="shallow-fusion.html" class="btn btn-neutral float-left" title="Shallow fusion for Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="rescoring.html" class="btn btn-neutral float-right" title="LM rescoring for Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
--- a/decoding-with-langugage-models/index.html
+++ b/decoding-with-langugage-models/index.html
@ -0,0 +1,135 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Decoding with language models &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="Shallow fusion for Transducer" href="shallow-fusion.html" />
+    <link rel="prev" title="Huggingface spaces" href="../huggingface/spaces.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="current reference internal" href="#">Decoding with language models</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+      <li class="breadcrumb-item active">Decoding with language models</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/index.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="decoding-with-language-models">
+<h1>Decoding with language models<a class="headerlink" href="#decoding-with-language-models" title="Permalink to this heading"></a></h1>
+<p>This section describes how to use external langugage models
+during decoding to improve the WER of transducer models.</p>
+<div class="toctree-wrapper compound">
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l1"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</div>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="../huggingface/spaces.html" class="btn btn-neutral float-left" title="Huggingface spaces" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="shallow-fusion.html" class="btn btn-neutral float-right" title="Shallow fusion for Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
--- a/decoding-with-langugage-models/rescoring.html
+++ b/decoding-with-langugage-models/rescoring.html
@ -0,0 +1,386 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>LM rescoring for Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="prev" title="LODR for RNN Transducer" href="LODR.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Decoding with language models</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" href="#">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+          <li class="breadcrumb-item"><a href="index.html">Decoding with language models</a></li>
+      <li class="breadcrumb-item active">LM rescoring for Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/rescoring.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="lm-rescoring-for-transducer">
+<span id="rescoring"></span><h1>LM rescoring for Transducer<a class="headerlink" href="#lm-rescoring-for-transducer" title="Permalink to this heading"></a></h1>
+<p>LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
+methods (see <span class="xref std std-ref">shallow-fusion</span>, <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
+Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
+In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
+<a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is based on the recipe
+<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming">pruned_transducer_stateless7_streaming</a>,
+which is a streaming transducer model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>.
+However, you can easily apply shallow fusion to other recipes.
+If you encounter any problems, please open an issue <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">here</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>For simplicity, the training and testing corpus in this tutorial is the same (<a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>). However, you can change the testing set
+to any other domains (e.g <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>) and use an external LM trained on that domain.</p>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>We recommend you to use a GPU for decoding.</p>
+</div>
+<p>For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
+If you want to train your model from scratch, please have a look at <a class="reference internal" href="../recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html#non-streaming-librispeech-pruned-transducer-stateless"><span class="std std-ref">Pruned transducer statelessX</span></a>.</p>
+<p>As the initial step, let’s download the pre-trained model.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+</pre></div>
+</div>
+<p>As usual, we first test the model’s performance without external LM. This can be done via the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search
+</pre></div>
+</div>
+<p>The following WERs are achieved on test-clean and test-other:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       3.11    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.93    best for test-other
+</pre></div>
+</div>
+<p>Now, we will try to improve the above WER numbers via external LM rescoring. We will download
+a pre-trained LM from this <a class="reference external" href="https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm">link</a>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+You may also train a RNN LM from scratch. Please refer to this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py">script</a>
+for training a RNN LM and this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py">script</a> to train a transformer LM.</p>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the external LM</span>
+$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
+$<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<p>With the RNNLM available, we can rescore the n-best hypotheses generated from <cite>modified_beam_search</cite>. Here,
+<cite>n</cite> should be the number of beams, i.e <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>. The command for LM rescoring is
+as follows. Note that the <code class="docutils literal notranslate"><span class="pre">--decoding-method</span></code> is set to <cite>modified_beam_search_lm_rescore</cite> and <code class="docutils literal notranslate"><span class="pre">--use-shallow-fusion</span></code>
+is set to <cite>False</cite>.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.43
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_rescore<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span>
+</pre></div>
+</div>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.93    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.6     best for test-other
+</pre></div>
+</div>
+<p>Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
+see the following table:</p>
+<table class="docutils align-default" id="id1">
+<caption><span class="caption-number">Table 3 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 33%" />
+<col style="width: 33%" />
+<col style="width: 33%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.93</p></td>
+<td><p>7.6</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.67</p></td>
+<td><p>7.11</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.59</p></td>
+<td><p>6.86</p></td>
+</tr>
+</tbody>
+</table>
+<p>In fact, we can also apply LODR (see <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>) when doing LM rescoring. To do so, we need to
+download the bi-gram required by LODR:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the bi-gram</span>
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
+$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/marcoyang/librispeech_bigram
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>data/lang_bpe_500
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>../../librispeech_bigram/2gram.arpa<span class="w"> </span>.
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<p>Then we can performn LM rescoring + LODR by changing the decoding method to <cite>modified_beam_search_lm_rescore_LODR</cite>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This decoding method requires the dependency of <a class="reference external" href="https://github.com/kpu/kenlm">kenlm</a>. You can install it
+via this command: <cite>pip install https://github.com/kpu/kenlm/archive/master.zip</cite>.</p>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.43
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_rescore_LODR<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span>
+</pre></div>
+</div>
+<p>You should see the following WERs after executing the commands above:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.9     best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.57    best for test-other
+</pre></div>
+</div>
+<p>It’s slightly better than LM rescoring. If we further increase the beam size, we will see
+further improvements from LM rescoring + LODR:</p>
+<table class="docutils align-default" id="id2">
+<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 33%" />
+<col style="width: 33%" />
+<col style="width: 33%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.9</p></td>
+<td><p>7.57</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.63</p></td>
+<td><p>7.04</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.52</p></td>
+<td><p>6.73</p></td>
+</tr>
+</tbody>
+</table>
+<p>As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
+Here, we benchmark the WERs and decoding speed of them:</p>
+<table class="docutils align-default" id="id3">
+<caption><span class="caption-number">Table 5 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Decoding method</p></th>
+<th class="head"><p>beam=4</p></th>
+<th class="head"><p>beam=8</p></th>
+<th class="head"><p>beam=12</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><cite>modified_beam_search</cite></p></td>
+<td><p>3.11/7.93; 132s</p></td>
+<td><p>3.1/7.95; 177s</p></td>
+<td><p>3.1/7.96; 210s</p></td>
+</tr>
+<tr class="row-odd"><td><p><cite>modified_beam_search_lm_shallow_fusion</cite></p></td>
+<td><p>2.77/7.08; 262s</p></td>
+<td><p>2.62/6.65; 352s</p></td>
+<td><p>2.58/6.65; 488s</p></td>
+</tr>
+<tr class="row-even"><td><p>LODR</p></td>
+<td><p>2.61/6.74; 400s</p></td>
+<td><p>2.45/6.38; 610s</p></td>
+<td><p>2.4/6.23; 870s</p></td>
+</tr>
+<tr class="row-odd"><td><p><cite>modified_beam_search_lm_rescore</cite></p></td>
+<td><p>2.93/7.6; 156s</p></td>
+<td><p>2.67/7.11; 203s</p></td>
+<td><p>2.59/6.86; 255s</p></td>
+</tr>
+<tr class="row-even"><td><p><cite>modified_beam_search_lm_rescore_LODR</cite></p></td>
+<td><p>2.9/7.57; 160s</p></td>
+<td><p>2.63/7.04; 203s</p></td>
+<td><p>2.52/6.73; 263s</p></td>
+</tr>
+</tbody>
+</table>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>Decoding is performed with a single 32G V100, we set <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> to 600.
+Decoding time here is only for reference and it may vary.</p>
+</div>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="LODR.html" class="btn btn-neutral float-left" title="LODR for RNN Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
--- a/decoding-with-langugage-models/shallow-fusion.html
+++ b/decoding-with-langugage-models/shallow-fusion.html
@ -0,0 +1,296 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Shallow fusion for Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="LODR for RNN Transducer" href="LODR.html" />
+    <link rel="prev" title="Decoding with language models" href="index.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Decoding with language models</a><ul class="current">
+<li class="toctree-l2 current"><a class="current reference internal" href="#">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+          <li class="breadcrumb-item"><a href="index.html">Decoding with language models</a></li>
+      <li class="breadcrumb-item active">Shallow fusion for Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/shallow-fusion.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="shallow-fusion-for-transducer">
+<span id="shallow-fusion"></span><h1>Shallow fusion for Transducer<a class="headerlink" href="#shallow-fusion-for-transducer" title="Permalink to this heading"></a></h1>
+<p>External language models (LM) are commonly used to improve WERs for E2E ASR models.
+This tutorial shows you how to perform <code class="docutils literal notranslate"><span class="pre">shallow</span> <span class="pre">fusion</span></code> with an external LM
+to improve the word-error-rate of a transducer model.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is based on the recipe
+<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming">pruned_transducer_stateless7_streaming</a>,
+which is a streaming transducer model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>.
+However, you can easily apply shallow fusion to other recipes.
+If you encounter any problems, please open an issue here <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>For simplicity, the training and testing corpus in this tutorial is the same (<a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>). However, you can change the testing set
+to any other domains (e.g <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>) and use an external LM trained on that domain.</p>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>We recommend you to use a GPU for decoding.</p>
+</div>
+<p>For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
+If you want to train your model from scratch, please have a look at <a class="reference internal" href="../recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html#non-streaming-librispeech-pruned-transducer-stateless"><span class="std std-ref">Pruned transducer statelessX</span></a>.</p>
+<p>As the initial step, let’s download the pre-trained model.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+</pre></div>
+</div>
+<p>To test the model, let’s have a look at the decoding results without using LM. This can be done via the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search
+</pre></div>
+</div>
+<p>The following WERs are achieved on test-clean and test-other:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       3.11    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.93    best for test-other
+</pre></div>
+</div>
+<p>These are already good numbers! But we can further improve it by using shallow fusion with external LM.
+Training a language model usually takes a long time, we can download a pre-trained LM from this <a class="reference external" href="https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm">link</a>.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the external LM</span>
+$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
+$<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+You may also train a RNN LM from scratch. Please refer to this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py">script</a>
+for training a RNN LM and this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py">script</a> to train a transformer LM.</p>
+</div>
+<p>To use shallow fusion for decoding, we can execute the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.29
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_shallow_fusion<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span>
+</pre></div>
+</div>
+<p>Note that we set <code class="docutils literal notranslate"><span class="pre">--decoding-method</span> <span class="pre">modified_beam_search_lm_shallow_fusion</span></code> and <code class="docutils literal notranslate"><span class="pre">--use-shallow-fusion</span> <span class="pre">True</span></code>
+to use shallow fusion. <code class="docutils literal notranslate"><span class="pre">--lm-type</span></code> specifies the type of neural LM we are going to use, you can either choose
+between <code class="docutils literal notranslate"><span class="pre">rnn</span></code> or <code class="docutils literal notranslate"><span class="pre">transformer</span></code>. The following three arguments are associated with the rnn:</p>
+<ul class="simple">
+<li><dl class="simple">
+<dt><code class="docutils literal notranslate"><span class="pre">--rnn-lm-embedding-dim</span></code></dt><dd><p>The embedding dimension of the RNN LM</p>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt><code class="docutils literal notranslate"><span class="pre">--rnn-lm-hidden-dim</span></code></dt><dd><p>The hidden dimension of the RNN LM</p>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt><code class="docutils literal notranslate"><span class="pre">--rnn-lm-num-layers</span></code></dt><dd><p>The number of RNN layers in the RNN LM.</p>
+</dd>
+</dl>
+</li>
+</ul>
+<p>The decoding result obtained with the above command are shown below.</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.77    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.08    best for test-other
+</pre></div>
+</div>
+<p>The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%.
+A few parameters can be tuned to further boost the performance of shallow fusion:</p>
+<ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">--lm-scale</span></code></p>
+<blockquote>
+<div><p>Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
+the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.</p>
+</div></blockquote>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--beam-size</span></code></p>
+<blockquote>
+<div><p>The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy.</p>
+</div></blockquote>
+</li>
+</ul>
+<p>Here, we also show how <cite>–beam-size</cite> effect the WER and decoding time:</p>
+<table class="docutils align-default" id="id2">
+<caption><span class="caption-number">Table 1 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+<th class="head"><p>Decoding time on test-clean (s)</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.77</p></td>
+<td><p>7.08</p></td>
+<td><p>262</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.62</p></td>
+<td><p>6.65</p></td>
+<td><p>352</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.58</p></td>
+<td><p>6.65</p></td>
+<td><p>488</p></td>
+</tr>
+</tbody>
+</table>
+<p>As we see, a larger beam size during shallow fusion improves the WER, but is also slower.</p>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="index.html" class="btn btn-neutral float-left" title="Decoding with language models" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="LODR.html" class="btn btn-neutral float-right" title="LODR for RNN Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
--- a/faqs.html
+++ b/faqs.html
@ -59,6 +59,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/genindex.html
+++ b/genindex.html
@ -51,6 +51,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/huggingface/index.html
+++ b/huggingface/index.html
@ -58,6 +58,9 @@
 <li class="toctree-l2"><a class="reference internal" href="spaces.html">Huggingface spaces</a></li>
 </ul>
 </li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/huggingface/pretrained-models.html
+++ b/huggingface/pretrained-models.html
@ -58,6 +58,9 @@
 <li class="toctree-l2"><a class="reference internal" href="spaces.html">Huggingface spaces</a></li>
 </ul>
 </li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/huggingface/spaces.html
+++ b/huggingface/spaces.html
@ -19,6 +19,7 @@
    <script src="../_static/js/theme.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="Decoding with language models" href="../decoding-with-langugage-models/index.html" />
    <link rel="prev" title="Pre-trained models" href="pretrained-models.html" /> 
 </head>

@ -60,6 +61,9 @@
 </li>
 </ul>
 </li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
@ -144,6 +148,7 @@ the following YouTube channel by <a class="reference external" href="https://www
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="pretrained-models.html" class="btn btn-neutral float-left" title="Pre-trained models" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="../decoding-with-langugage-models/index.html" class="btn btn-neutral float-right" title="Decoding with language models" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>
--- a/index.html
+++ b/index.html
@ -53,6 +53,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
@ -148,6 +151,16 @@ speech recognition recipes using <a class="reference external" href="https://git
 </li>
 </ul>
 </div>
+<div class="toctree-wrapper compound">
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="decoding-with-langugage-models/shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="decoding-with-langugage-models/LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="decoding-with-langugage-models/rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+</div>
 </section>


--- a/installation/index.html
+++ b/installation/index.html
@ -76,6 +76,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-model-state-dict.html
+++ b/model-export/export-model-state-dict.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-ncnn-conv-emformer.html
+++ b/model-export/export-ncnn-conv-emformer.html
@ -75,6 +75,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-ncnn-lstm.html
+++ b/model-export/export-ncnn-lstm.html
@ -75,6 +75,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-ncnn-zipformer.html
+++ b/model-export/export-ncnn-zipformer.html
@ -74,6 +74,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-ncnn.html
+++ b/model-export/export-ncnn.html
@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-onnx.html
+++ b/model-export/export-onnx.html
@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-with-torch-jit-script.html
+++ b/model-export/export-with-torch-jit-script.html
@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/export-with-torch-jit-trace.html
+++ b/model-export/export-with-torch-jit-trace.html
@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/model-export/index.html
+++ b/model-export/index.html
@ -61,6 +61,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/objects.inv
+++ b/objects.inv
--- a/recipes/Non-streaming-ASR/aishell/conformer_ctc.html
+++ b/recipes/Non-streaming-ASR/aishell/conformer_ctc.html
@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/aishell/index.html
+++ b/recipes/Non-streaming-ASR/aishell/index.html
@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/aishell/stateless_transducer.html
+++ b/recipes/Non-streaming-ASR/aishell/stateless_transducer.html
@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html
@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/index.html
+++ b/recipes/Non-streaming-ASR/index.html
@ -64,6 +64,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
+++ b/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/librispeech/distillation.html
+++ b/recipes/Non-streaming-ASR/librispeech/distillation.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
@ -103,7 +106,7 @@
             
  <section id="distillation-with-hubert">
 <h1>Distillation with HuBERT<a class="headerlink" href="#distillation-with-hubert" title="Permalink to this heading"></a></h1>
-<p>This tutorial shows you how to perform knowledge distillation in <a href="#id7"><span class="problematic" id="id8">`icefall`_</span></a>
+<p>This tutorial shows you how to perform knowledge distillation in <a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>
 with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset. The distillation method
 used here is called “Multi Vector Quantization Knowledge Distillation” (MVQ-KD).
 Please have a look at our paper <a class="reference external" href="https://arxiv.org/abs/2211.00508">Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation</a>
@ -119,7 +122,7 @@ encounter any problems, please open an issue here <a class="reference external"
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
-the environment for <a href="#id9"><span class="problematic" id="id10">`icefall`_</span></a>.</p>
+the environment for <a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.</p>
 </div>
 <div class="admonition hint">
 <p class="admonition-title">Hint</p>
@ -190,7 +193,7 @@ run <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/m
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 <p>There are 5 stages in total, the first and second stage will be automatically skipped
-when choosing to downloaded codebook indexes prepared by <a href="#id11"><span class="problematic" id="id12">`icefall`_</span></a>.
+when choosing to downloaded codebook indexes prepared by <a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.
 Of course, you can extract and compute the codebook indexes by yourself. This
 will require you downloading a HuBERT-XL model and it can take a while for
 the extraction of codebook indexes.</p>
@ -226,10 +229,10 @@ and prepares MVQ-augmented training manifests.</p>
 </div>
 <p>Please see the
 following screenshot for the output of an example execution.</p>
-<figure class="align-center" id="id5">
+<figure class="align-center" id="id4">
 <a class="reference internal image-reference" href="../../../_images/distillation_codebook.png"><img alt="Downloading codebook indexes and preparing training manifest." src="../../../_images/distillation_codebook.png" style="width: 800px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 6 </span><span class="caption-text">Downloading codebook indexes and preparing training manifest.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 6 </span><span class="caption-text">Downloading codebook indexes and preparing training manifest.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 <div class="admonition hint">
@ -241,10 +244,10 @@ set <code class="docutils literal notranslate"><span class="pre">use_extracted_c
 <code class="docutils literal notranslate"><span class="pre">num_codebooks</span></code> by yourself.</p>
 </div>
 <p>Now, you should see the following files under the directory <code class="docutils literal notranslate"><span class="pre">./data/vq_fbank_layer36_cb8</span></code>.</p>
-<figure class="align-center" id="id6">
+<figure class="align-center" id="id5">
 <a class="reference internal image-reference" href="../../../_images/distillation_directory.png"><img alt="MVQ-augmented training manifests" src="../../../_images/distillation_directory.png" style="width: 800px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 7 </span><span class="caption-text">MVQ-augmented training manifests.</span><a class="headerlink" href="#id6" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 7 </span><span class="caption-text">MVQ-augmented training manifests.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 <p>Whola! You are ready to perform knowledge distillation training now!</p>
--- a/recipes/Non-streaming-ASR/librispeech/index.html
+++ b/recipes/Non-streaming-ASR/librispeech/index.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
+++ b/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
@ -381,10 +384,10 @@ $<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w">
 <p>Note there is a URL in the above output. Click it and you will see
 the following screenshot:</p>
 <blockquote>
-<div><figure class="align-center" id="id9">
+<div><figure class="align-center" id="id4">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/QOGSPBgsR8KzcRMmie9JGw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id9" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
+++ b/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
+++ b/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/timit/index.html
+++ b/recipes/Non-streaming-ASR/timit/index.html
@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html
+++ b/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html
@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html
@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/yesno/index.html
+++ b/recipes/Non-streaming-ASR/yesno/index.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Non-streaming-ASR/yesno/tdnn.html
+++ b/recipes/Non-streaming-ASR/yesno/tdnn.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Streaming-ASR/index.html
+++ b/recipes/Streaming-ASR/index.html
@ -62,6 +62,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Streaming-ASR/introduction.html
+++ b/recipes/Streaming-ASR/introduction.html
@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Streaming-ASR/librispeech/index.html
+++ b/recipes/Streaming-ASR/librispeech/index.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
+++ b/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
@ -380,10 +383,10 @@ $<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w">
 <p>Note there is a URL in the above output. Click it and you will see
 the following screenshot:</p>
 <blockquote>
-<div><figure class="align-center" id="id4">
+<div><figure class="align-center" id="id5">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-lstm-transducer-tensorboard-log.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 10 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 10 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
+++ b/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
@ -400,10 +403,10 @@ $<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w">
 <p>Note there is a URL in the above output. Click it and you will see
 the following screenshot:</p>
 <blockquote>
-<div><figure class="align-center" id="id10">
+<div><figure class="align-center" id="id5">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/97VKXf80Ru61CnP2ALWZZg/"><img alt="TensorBoard screenshot" src="../../../_images/streaming-librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 9 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id10" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 9 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/recipes/Streaming-ASR/librispeech/zipformer_transducer.html
+++ b/recipes/Streaming-ASR/librispeech/zipformer_transducer.html
@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/recipes/index.html
+++ b/recipes/index.html
@ -58,6 +58,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/search.html
+++ b/search.html
@ -54,6 +54,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>

        </div>
--- a/searchindex.js
+++ b/searchindex.js