diff --git a/_sources/decoding-with-langugage-models/LODR.rst.txt b/_sources/decoding-with-langugage-models/LODR.rst.txt
new file mode 100644
index 000000000..7ffa0c128
--- /dev/null
+++ b/_sources/decoding-with-langugage-models/LODR.rst.txt
@@ -0,0 +1,184 @@
+.. _LODR:
+
+LODR for RNN Transducer
+=======================
+
+
+As a type of E2E model, neural transducers are usually considered as having an internal 
+language model, which learns the language level information on the training corpus. 
+In real-life scenario, there is often a mismatch between the training corpus and the target corpus space. 
+This mismatch can be a problem when decoding for neural transducer models with language models as its internal
+language can act "against" the external LM. In this tutorial, we show how to use
+`Low-order Density Ratio <https://arxiv.org/abs/2203.16776>`_ to alleviate this effect to further improve the performance
+of langugae model integration. 
+
+.. note::
+
+    This tutorial is based on the recipe 
+    `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+    which is a streaming transducer model trained on `LibriSpeech`_. 
+    However, you can easily apply LODR to other recipes.
+    If you encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`__.
+
+
+.. note::
+
+    For simplicity, the training and testing corpus in this tutorial are the same (`LibriSpeech`_). However, 
+    you can change the testing set to any other domains (e.g `GigaSpeech`_) and prepare the language models 
+    using that corpus.
+
+First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here <https://arxiv.org/abs/2002.11268>`_ 
+to address the language information mismatch between the training
+corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
+are acoustically similar, DR derives the following formular for decoding with Bayes' theorem:
+
+.. math::
+
+    \text{score}\left(y_u|\mathit{x},y\right) = 
+    \log p\left(y_u|\mathit{x},y_{1:u-1}\right) + 
+    \lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - 
+    \lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)
+
+
+where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively. 
+Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to 
+shallow fusion is the subtraction of the source domain LM.
+
+Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is 
+considered to be weak and can only capture low-level language information. Therefore, `LODR <https://arxiv.org/abs/2203.16776>`__ proposed to use
+a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula
+during decoding for transducer model:
+
+.. math::
+
+    \text{score}\left(y_u|\mathit{x},y\right) = 
+    \log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) + 
+    \lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - 
+    \lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)
+
+In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Comared to DR, 
+the only difference lies in the choice of source domain LM. According to the original `paper <https://arxiv.org/abs/2203.16776>`_,
+LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
+As a bi-gram is much faster to evaluate, LODR is usually much faster.
+
+Now, we will show you how to use LODR in ``icefall``.
+For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`_.
+If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
+The testing scenario here is intra-domain (we decode the model trained on `LibriSpeech`_ on `LibriSpeech`_ testing sets).
+
+As the initial step, let's download the pre-trained model.
+
+.. code-block:: bash
+
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+    $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
+
+To test the model, let's have a look at the decoding results **without** using LM. This can be done via the following command:
+
+.. code-block:: bash
+
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --exp-dir $exp_dir \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search
+
+The following WERs are achieved on test-clean and test-other:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	3.11	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.93	best for test-other
+
+Then, we download the external language model and bi-gram LM that are necessary for LODR. 
+Note that the bi-gram is estimated on the LibriSpeech 960 hours' text.
+
+.. code-block:: bash
+
+    $ # download the external LM
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm 
+    $ # create a symbolic link so that the checkpoint can be loaded
+    $ pushd icefall-librispeech-rnn-lm/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt 
+    $ popd
+    $
+    $ # download the bi-gram
+    $ git lfs install
+    $ git clone https://huggingface.co/marcoyang/librispeech_bigram
+    $ pushd data/lang_bpe_500
+    $ ln -s ../../librispeech_bigram/2gram.fst.txt .
+    $ popd
+
+Then, we perform LODR decoding by setting ``--decoding-method`` to ``modified_beam_search_lm_LODR``:
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.42
+    $ LODR_scale=-0.24
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_LODR \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 1 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500 \
+        --tokens-ngram 2 \
+        --ngram-lm-scale $LODR_scale
+
+There are two extra arguments that need to be given when doing LODR. ``--tokens-ngram`` specifies the order of n-gram. As we
+are using a bi-gram, we set it to 2. ``--ngram-lm-scale`` is the scale of the bi-gram, it should be a negative number
+as we are subtracting the bi-gram's score during decoding.
+
+The decoding results obtained with the above command are shown below:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.61	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	6.74	best for test-other
+
+Recall that the lowest WER we obtained in :ref:`shallow_fusion` with beam size of 4 is ``2.77/7.08``, LODR
+indeed **further improves** the WER. We can do even better if we increase ``--beam-size``:
+
+.. list-table:: WER of LODR with different beam sizes
+   :widths: 25 25 50
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+   * - 4
+     - 2.61
+     - 6.74
+   * - 8
+     - 2.45
+     - 6.38
+   * - 12
+     - 2.4
+     - 6.23
\ No newline at end of file
diff --git a/_sources/decoding-with-langugage-models/index.rst.txt b/_sources/decoding-with-langugage-models/index.rst.txt
new file mode 100644
index 000000000..577ebbdfb
--- /dev/null
+++ b/_sources/decoding-with-langugage-models/index.rst.txt
@@ -0,0 +1,12 @@
+Decoding with language models
+=============================
+
+This section describes how to use external langugage models 
+during decoding to improve the WER of transducer models.
+
+.. toctree::
+   :maxdepth: 2
+
+   shallow-fusion
+   LODR
+   rescoring
diff --git a/_sources/decoding-with-langugage-models/rescoring.rst.txt b/_sources/decoding-with-langugage-models/rescoring.rst.txt
new file mode 100644
index 000000000..d71acc1e5
--- /dev/null
+++ b/_sources/decoding-with-langugage-models/rescoring.rst.txt
@@ -0,0 +1,252 @@
+.. _rescoring:
+
+LM rescoring for Transducer
+=================================
+
+LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
+methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
+Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
+In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
+`icefall <https://github.com/k2-fsa/icefall>`__.
+
+.. note::
+
+    This tutorial is based on the recipe 
+    `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+    which is a streaming transducer model trained on `LibriSpeech`_. 
+    However, you can easily apply shallow fusion to other recipes.
+    If you encounter any problems, please open an issue `here <https://github.com/k2-fsa/icefall/issues>`_.
+
+.. note::
+
+    For simplicity, the training and testing corpus in this tutorial is the same (`LibriSpeech`_). However, you can change the testing set
+    to any other domains (e.g `GigaSpeech`_) and use an external LM trained on that domain.
+
+.. HINT::
+
+  We recommend you to use a GPU for decoding.
+
+For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`__.
+If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
+
+As the initial step, let's download the pre-trained model.
+
+.. code-block:: bash
+
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+    $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
+
+As usual, we first test the model's performance without external LM. This can be done via the following command:
+
+.. code-block:: bash
+
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --exp-dir $exp_dir \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model 
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search
+
+The following WERs are achieved on test-clean and test-other:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	3.11	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.93	best for test-other
+
+Now, we will try to improve the above WER numbers via external LM rescoring. We will download 
+a pre-trained LM from this `link <https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm>`__.
+
+.. note::
+
+    This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+    You may also train a RNN LM from scratch. Please refer to this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py>`__
+    for training a RNN LM and this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py>`__ to train a transformer LM.
+
+.. code-block:: bash
+
+    $ # download the external LM
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm 
+    $ # create a symbolic link so that the checkpoint can be loaded
+    $ pushd icefall-librispeech-rnn-lm/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt 
+    $ popd
+
+
+With the RNNLM available, we can rescore the n-best hypotheses generated from `modified_beam_search`. Here,
+`n` should be the number of beams, i.e ``--beam-size``. The command for LM rescoring is
+as follows. Note that the ``--decoding-method`` is set to `modified_beam_search_lm_rescore` and ``--use-shallow-fusion``
+is set to `False`.
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.43
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_rescore \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 0 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.93	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.6	best for test-other
+
+Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
+see the following table:
+
+.. list-table:: WERs of LM rescoring with different beam sizes
+   :widths: 25 25 25
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+   * - 4
+     - 2.93
+     - 7.6
+   * - 8
+     - 2.67
+     - 7.11
+   * - 12
+     - 2.59
+     - 6.86
+
+In fact, we can also apply LODR (see :ref:`LODR`) when doing LM rescoring. To do so, we need to 
+download the bi-gram required by LODR:
+
+.. code-block:: bash
+
+    $ # download the bi-gram
+    $ git lfs install
+    $ git clone https://huggingface.co/marcoyang/librispeech_bigram
+    $ pushd data/lang_bpe_500
+    $ ln -s ../../librispeech_bigram/2gram.arpa .
+    $ popd
+
+Then we can performn LM rescoring + LODR by changing the decoding method to `modified_beam_search_lm_rescore_LODR`. 
+
+.. note:: 
+
+    This decoding method requires the dependency of `kenlm <https://github.com/kpu/kenlm>`_. You can install it
+    via this command: `pip install https://github.com/kpu/kenlm/archive/master.zip`. 
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.43
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_rescore_LODR \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 0 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500
+
+You should see the following WERs after executing the commands above:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.9	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.57	best for test-other
+
+It's slightly better than LM rescoring. If we further increase the beam size, we will see
+further improvements from LM rescoring + LODR:
+
+.. list-table:: WERs of LM rescoring + LODR with different beam sizes
+   :widths: 25 25 25
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+   * - 4
+     - 2.9
+     - 7.57
+   * - 8
+     - 2.63
+     - 7.04
+   * - 12
+     - 2.52
+     - 6.73
+
+As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
+Here, we benchmark the WERs and decoding speed of them:
+
+.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)
+   :widths: 25 25 25 25
+   :header-rows: 1
+
+   * - Decoding method
+     - beam=4
+     - beam=8
+     - beam=12
+   * - `modified_beam_search`
+     - 3.11/7.93; 132s
+     - 3.1/7.95; 177s
+     - 3.1/7.96; 210s
+   * - `modified_beam_search_lm_shallow_fusion`
+     - 2.77/7.08; 262s
+     - 2.62/6.65; 352s
+     - 2.58/6.65; 488s
+   * - LODR
+     - 2.61/6.74; 400s
+     - 2.45/6.38; 610s
+     - 2.4/6.23; 870s
+   * - `modified_beam_search_lm_rescore`
+     - 2.93/7.6; 156s
+     - 2.67/7.11; 203s
+     - 2.59/6.86; 255s
+   * - `modified_beam_search_lm_rescore_LODR`
+     - 2.9/7.57; 160s
+     - 2.63/7.04; 203s
+     - 2.52/6.73; 263s
+
+.. note::
+
+    Decoding is performed with a single 32G V100, we set ``--max-duration`` to 600. 
+    Decoding time here is only for reference and it may vary.
\ No newline at end of file
diff --git a/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt b/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt
new file mode 100644
index 000000000..0d2837372
--- /dev/null
+++ b/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt
@@ -0,0 +1,176 @@
+.. _shallow_fusion:
+
+Shallow fusion for Transducer
+=================================
+
+External language models (LM) are commonly used to improve WERs for E2E ASR models.
+This tutorial shows you how to perform ``shallow fusion`` with an external LM
+to improve the word-error-rate of a transducer model.
+
+.. note::
+
+    This tutorial is based on the recipe 
+    `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+    which is a streaming transducer model trained on `LibriSpeech`_. 
+    However, you can easily apply shallow fusion to other recipes.
+    If you encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.
+
+.. note::
+
+    For simplicity, the training and testing corpus in this tutorial is the same (`LibriSpeech`_). However, you can change the testing set
+    to any other domains (e.g `GigaSpeech`_) and use an external LM trained on that domain.
+
+.. HINT::
+
+  We recommend you to use a GPU for decoding.
+
+For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`__.
+If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
+
+As the initial step, let's download the pre-trained model.
+
+.. code-block:: bash
+
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+    $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
+
+To test the model, let's have a look at the decoding results without using LM. This can be done via the following command:
+
+.. code-block:: bash
+
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --exp-dir $exp_dir \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model 
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search
+
+The following WERs are achieved on test-clean and test-other:
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	3.11	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.93	best for test-other
+
+These are already good numbers! But we can further improve it by using shallow fusion with external LM.
+Training a language model usually takes a long time, we can download a pre-trained LM from this `link <https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm>`__.
+
+.. code-block:: bash
+
+    $ # download the external LM
+    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm 
+    $ # create a symbolic link so that the checkpoint can be loaded
+    $ pushd icefall-librispeech-rnn-lm/exp
+    $ git lfs pull --include "pretrained.pt"
+    $ ln -s pretrained.pt epoch-99.pt 
+    $ popd
+
+.. note::
+
+    This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+    You may also train a RNN LM from scratch. Please refer to this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py>`__
+    for training a RNN LM and this `script <https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py>`__ to train a transformer LM.
+
+To use shallow fusion for decoding, we can execute the following command:
+
+.. code-block:: bash
+    
+    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+    $ lm_dir=./icefall-librispeech-rnn-lm/exp
+    $ lm_scale=0.29
+    $ ./pruned_transducer_stateless7_streaming/decode.py \
+        --epoch 99 \
+        --avg 1 \
+        --use-averaged-model False \
+        --beam-size 4 \
+        --exp-dir $exp_dir \
+        --max-duration 600 \
+        --decode-chunk-len 32 \
+        --decoding-method modified_beam_search_lm_shallow_fusion \
+        --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+        --use-shallow-fusion 1 \
+        --lm-type rnn \
+        --lm-exp-dir $lm_dir \
+        --lm-epoch 99 \
+        --lm-scale $lm_scale \
+        --lm-avg 1 \
+        --rnn-lm-embedding-dim 2048 \
+        --rnn-lm-hidden-dim 2048 \
+        --rnn-lm-num-layers 3 \
+        --lm-vocab-size 500
+
+Note that we set ``--decoding-method modified_beam_search_lm_shallow_fusion`` and ``--use-shallow-fusion True``
+to use shallow fusion. ``--lm-type`` specifies the type of neural LM we are going to use, you can either choose
+between ``rnn`` or ``transformer``. The following three arguments are associated with the rnn:
+
+- ``--rnn-lm-embedding-dim``
+    The embedding dimension of the RNN LM
+
+- ``--rnn-lm-hidden-dim``
+    The hidden dimension of the RNN LM
+
+- ``--rnn-lm-num-layers``
+    The number of RNN layers in the RNN LM.
+
+
+The decoding result obtained with the above command are shown below.
+
+.. code-block:: text
+
+    $ For test-clean, WER of different settings are:
+    $ beam_size_4	2.77	best for test-clean
+    $ For test-other, WER of different settings are:
+    $ beam_size_4	7.08	best for test-other
+
+The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%. 
+A few parameters can be tuned to further boost the performance of shallow fusion:
+
+- ``--lm-scale`` 
+
+    Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large, 
+    the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.
+
+- ``--beam-size`` 
+    
+    The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy.
+
+Here, we also show how `--beam-size` effect the WER and decoding time:
+
+.. list-table:: WERs and decoding time (on test-clean) of shallow fusion with different beam sizes
+   :widths: 25 25 25 25
+   :header-rows: 1
+
+   * - Beam size
+     - test-clean
+     - test-other
+     - Decoding time on test-clean (s)
+   * - 4
+     - 2.77
+     - 7.08
+     - 262
+   * - 8
+     - 2.62
+     - 6.65
+     - 352
+   * - 12
+     - 2.58
+     - 6.65
+     - 488
+
+As we see, a larger beam size during shallow fusion improves the WER, but is also slower.
+
+
+
+
+
+
+
+ 
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
index 8d76eb68b..a7d365a15 100644
--- a/_sources/index.rst.txt
+++ b/_sources/index.rst.txt
@@ -34,3 +34,8 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
 
    contributing/index
    huggingface/index
+
+.. toctree::
+   :maxdepth: 2
+   
+   decoding-with-langugage-models/index
\ No newline at end of file
diff --git a/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt b/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
index ea9f350cd..2e8d0893a 100644
--- a/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
+++ b/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt
@@ -1,7 +1,7 @@
 Distillation with HuBERT
 ========================
 
-This tutorial shows you how to perform knowledge distillation in `icefall`_
+This tutorial shows you how to perform knowledge distillation in `icefall <https://github.com/k2-fsa/icefall>`_
 with the `LibriSpeech`_ dataset. The distillation method
 used here is called "Multi Vector Quantization Knowledge Distillation" (MVQ-KD).
 Please have a look at our paper `Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation <https://arxiv.org/abs/2211.00508>`_
@@ -13,7 +13,7 @@ for more details about MVQ-KD.
     `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_.
     Currently, we only implement MVQ-KD in this recipe. However, MVQ-KD is theoretically applicable to all recipes
     with only minor changes needed. Feel free to try out MVQ-KD in different recipes. If you
-    encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.
+    encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`__.
 
 .. note::
 
@@ -217,7 +217,7 @@ the following command.
     --exp-dir $exp_dir \
     --enable-distillation True
 
-You should get similar results as `here <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert>`_.
+You should get similar results as `here <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS-100hours.md#distillation-with-hubert>`__.
 
 That's all! Feel free to experiment with your own setups and report your results.
-If you encounter any problems during training, please open up an issue `here <https://github.com/k2-fsa/icefall/issues>`_.
+If you encounter any problems during training, please open up an issue `here <https://github.com/k2-fsa/icefall/issues>`__.
diff --git a/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt b/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
index 42fd3df77..1bc1dd984 100644
--- a/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
+++ b/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
@@ -8,10 +8,10 @@ with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
 
 .. Note::
 
-   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
-   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
-   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
-   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_,
+   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`__,
+   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`__,
+   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`__,
+   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`__,
    We will take pruned_transducer_stateless4 as an example in this tutorial.
 
 .. HINT::
@@ -237,7 +237,7 @@ them, please modify ``./pruned_transducer_stateless4/train.py`` directly.
 
 .. NOTE::
 
-  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`_ are a little different from
+  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`__ are a little different from
   other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5.
 
 
@@ -529,13 +529,13 @@ Download pretrained models
 If you don't want to train from scratch, you can download the pretrained models
 by visiting the following links:
 
-  - `pruned_transducer_stateless <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12>`_
+  - `pruned_transducer_stateless <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12>`__
 
-  - `pruned_transducer_stateless2 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29>`_
+  - `pruned_transducer_stateless2 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29>`__
 
-  - `pruned_transducer_stateless4 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless4-2022-06-03>`_
+  - `pruned_transducer_stateless4 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless4-2022-06-03>`__
 
-  - `pruned_transducer_stateless5 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07>`_
+  - `pruned_transducer_stateless5 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07>`__
 
   See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
   for the details of the above pretrained models
diff --git a/_sources/recipes/Streaming-ASR/introduction.rst.txt b/_sources/recipes/Streaming-ASR/introduction.rst.txt
index e1382e77d..ac77a51d1 100644
--- a/_sources/recipes/Streaming-ASR/introduction.rst.txt
+++ b/_sources/recipes/Streaming-ASR/introduction.rst.txt
@@ -45,9 +45,9 @@ the input features.
 
 We have three variants of Emformer models in ``icefall``.
 
- - ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_stateless_emformer_rnnt2>`_.
+ - ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_stateless_emformer_rnnt2>`__.
  - ``conv_emformer_transducer_stateless`` using ConvEmformer implemented by ourself. Different from the Emformer in torchaudio,
    ConvEmformer has a convolution in each layer and uses the mechanisms in our reworked conformer model.
-   See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless>`_.
+   See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless>`__.
  - ``conv_emformer_transducer_stateless2`` using ConvEmformer implemented by ourself. The only difference from the above one is that
    it uses a simplified memory bank. See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2>`_.
diff --git a/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt b/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
index de7102ba8..2ca70bcf3 100644
--- a/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
+++ b/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt
@@ -6,10 +6,10 @@ with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
 
 .. Note::
 
-   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
-   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
-   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
-   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_,
+   The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`__,
+   `pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`__,
+   `pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`__,
+   `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`__,
    We will take pruned_transducer_stateless4 as an example in this tutorial.
 
 .. HINT::
@@ -264,7 +264,7 @@ them, please modify ``./pruned_transducer_stateless4/train.py`` directly.
 
 .. NOTE::
 
-  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`_ are a little different from
+  The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`__ are a little different from
   other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5.
 
 
diff --git a/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt b/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt
index f0e8961d7..8b75473c6 100644
--- a/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt
+++ b/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt
@@ -6,7 +6,7 @@ with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
 
 .. Note::
 
-   The tutorial is suitable for `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
+   The tutorial is suitable for `pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`__,
 
 .. HINT::
 
@@ -642,7 +642,7 @@ Download pretrained models
 If you don't want to train from scratch, you can download the pretrained models
 by visiting the following links:
 
-  - `pruned_transducer_stateless7_streaming <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`_
+  - `pruned_transducer_stateless7_streaming <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`__
 
   See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
   for the details of the above pretrained models
diff --git a/contributing/code-style.html b/contributing/code-style.html
index d7f643e6a..0e54abcab 100644
--- a/contributing/code-style.html
+++ b/contributing/code-style.html
@@ -59,6 +59,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/contributing/doc.html b/contributing/doc.html
index 72b5f4056..c08322d68 100644
--- a/contributing/doc.html
+++ b/contributing/doc.html
@@ -59,6 +59,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/contributing/how-to-create-a-recipe.html b/contributing/how-to-create-a-recipe.html
index c8df20ace..b2325ad10 100644
--- a/contributing/how-to-create-a-recipe.html
+++ b/contributing/how-to-create-a-recipe.html
@@ -65,6 +65,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/contributing/index.html b/contributing/index.html
index 52819b7ef..1f9b31e3a 100644
--- a/contributing/index.html
+++ b/contributing/index.html
@@ -59,6 +59,9 @@
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/decoding-with-langugage-models/LODR.html b/decoding-with-langugage-models/LODR.html
new file mode 100644
index 000000000..7181d098c
--- /dev/null
+++ b/decoding-with-langugage-models/LODR.html
@@ -0,0 +1,292 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>LODR for RNN Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+        <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="LM rescoring for Transducer" href="rescoring.html" />
+    <link rel="prev" title="Shallow fusion for Transducer" href="shallow-fusion.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Decoding with language models</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" href="#">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+          <li class="breadcrumb-item"><a href="index.html">Decoding with language models</a></li>
+      <li class="breadcrumb-item active">LODR for RNN Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/LODR.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="lodr-for-rnn-transducer">
+<span id="lodr"></span><h1>LODR for RNN Transducer<a class="headerlink" href="#lodr-for-rnn-transducer" title="Permalink to this heading"></a></h1>
+<p>As a type of E2E model, neural transducers are usually considered as having an internal
+language model, which learns the language level information on the training corpus.
+In real-life scenario, there is often a mismatch between the training corpus and the target corpus space.
+This mismatch can be a problem when decoding for neural transducer models with language models as its internal
+language can act “against” the external LM. In this tutorial, we show how to use
+<a class="reference external" href="https://arxiv.org/abs/2203.16776">Low-order Density Ratio</a> to alleviate this effect to further improve the performance
+of langugae model integration.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is based on the recipe
+<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming">pruned_transducer_stateless7_streaming</a>,
+which is a streaming transducer model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>.
+However, you can easily apply LODR to other recipes.
+If you encounter any problems, please open an issue here <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>For simplicity, the training and testing corpus in this tutorial are the same (<a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>). However,
+you can change the testing set to any other domains (e.g <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>) and prepare the language models
+using that corpus.</p>
+</div>
+<p>First, let’s have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed <a class="reference external" href="https://arxiv.org/abs/2002.11268">here</a>
+to address the language information mismatch between the training
+corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
+are acoustically similar, DR derives the following formular for decoding with Bayes’ theorem:</p>
+<div class="math notranslate nohighlight">
+\[\text{score}\left(y_u|\mathit{x},y\right) =
+\log p\left(y_u|\mathit{x},y_{1:u-1}\right) +
+\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
+\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
+<p>where <span class="math notranslate nohighlight">\(\lambda_1\)</span> and <span class="math notranslate nohighlight">\(\lambda_2\)</span> are the weights of LM scores for target domain and source domain respectively.
+Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to
+shallow fusion is the subtraction of the source domain LM.</p>
+<p>Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
+considered to be weak and can only capture low-level language information. Therefore, <a class="reference external" href="https://arxiv.org/abs/2203.16776">LODR</a> proposed to use
+a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula
+during decoding for transducer model:</p>
+<div class="math notranslate nohighlight">
+\[\text{score}\left(y_u|\mathit{x},y\right) =
+\log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) +
+\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
+\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
+<p>In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Comared to DR,
+the only difference lies in the choice of source domain LM. According to the original <a class="reference external" href="https://arxiv.org/abs/2203.16776">paper</a>,
+LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
+As a bi-gram is much faster to evaluate, LODR is usually much faster.</p>
+<p>Now, we will show you how to use LODR in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.
+For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
+If you want to train your model from scratch, please have a look at <a class="reference internal" href="../recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html#non-streaming-librispeech-pruned-transducer-stateless"><span class="std std-ref">Pruned transducer statelessX</span></a>.
+The testing scenario here is intra-domain (we decode the model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> testing sets).</p>
+<p>As the initial step, let’s download the pre-trained model.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+</pre></div>
+</div>
+<p>To test the model, let’s have a look at the decoding results <strong>without</strong> using LM. This can be done via the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search
+</pre></div>
+</div>
+<p>The following WERs are achieved on test-clean and test-other:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       3.11    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.93    best for test-other
+</pre></div>
+</div>
+<p>Then, we download the external language model and bi-gram LM that are necessary for LODR.
+Note that the bi-gram is estimated on the LibriSpeech 960 hours’ text.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the external LM</span>
+$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
+$<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
+$<span class="w"> </span><span class="nb">popd</span>
+$
+$<span class="w"> </span><span class="c1"># download the bi-gram</span>
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
+$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/marcoyang/librispeech_bigram
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>data/lang_bpe_500
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>../../librispeech_bigram/2gram.fst.txt<span class="w"> </span>.
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<p>Then, we perform LODR decoding by setting <code class="docutils literal notranslate"><span class="pre">--decoding-method</span></code> to <code class="docutils literal notranslate"><span class="pre">modified_beam_search_lm_LODR</span></code>:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.42
+$<span class="w"> </span><span class="nv">LODR_scale</span><span class="o">=</span>-0.24
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_LODR<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--tokens-ngram<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--ngram-lm-scale<span class="w"> </span><span class="nv">$LODR_scale</span>
+</pre></div>
+</div>
+<p>There are two extra arguments that need to be given when doing LODR. <code class="docutils literal notranslate"><span class="pre">--tokens-ngram</span></code> specifies the order of n-gram. As we
+are using a bi-gram, we set it to 2. <code class="docutils literal notranslate"><span class="pre">--ngram-lm-scale</span></code> is the scale of the bi-gram, it should be a negative number
+as we are subtracting the bi-gram’s score during decoding.</p>
+<p>The decoding results obtained with the above command are shown below:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.61    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       6.74    best for test-other
+</pre></div>
+</div>
+<p>Recall that the lowest WER we obtained in <a class="reference internal" href="shallow-fusion.html#shallow-fusion"><span class="std std-ref">Shallow fusion for Transducer</span></a> with beam size of 4 is <code class="docutils literal notranslate"><span class="pre">2.77/7.08</span></code>, LODR
+indeed <strong>further improves</strong> the WER. We can do even better if we increase <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>:</p>
+<table class="docutils align-default" id="id1">
+<caption><span class="caption-number">Table 2 </span><span class="caption-text">WER of LODR with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 50%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.61</p></td>
+<td><p>6.74</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.45</p></td>
+<td><p>6.38</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.4</p></td>
+<td><p>6.23</p></td>
+</tr>
+</tbody>
+</table>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="shallow-fusion.html" class="btn btn-neutral float-left" title="Shallow fusion for Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="rescoring.html" class="btn btn-neutral float-right" title="LM rescoring for Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
\ No newline at end of file
diff --git a/decoding-with-langugage-models/index.html b/decoding-with-langugage-models/index.html
new file mode 100644
index 000000000..f83205d74
--- /dev/null
+++ b/decoding-with-langugage-models/index.html
@@ -0,0 +1,135 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Decoding with language models &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="Shallow fusion for Transducer" href="shallow-fusion.html" />
+    <link rel="prev" title="Huggingface spaces" href="../huggingface/spaces.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="current reference internal" href="#">Decoding with language models</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+      <li class="breadcrumb-item active">Decoding with language models</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/index.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="decoding-with-language-models">
+<h1>Decoding with language models<a class="headerlink" href="#decoding-with-language-models" title="Permalink to this heading"></a></h1>
+<p>This section describes how to use external langugage models
+during decoding to improve the WER of transducer models.</p>
+<div class="toctree-wrapper compound">
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l1"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l1"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</div>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="../huggingface/spaces.html" class="btn btn-neutral float-left" title="Huggingface spaces" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="shallow-fusion.html" class="btn btn-neutral float-right" title="Shallow fusion for Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
\ No newline at end of file
diff --git a/decoding-with-langugage-models/rescoring.html b/decoding-with-langugage-models/rescoring.html
new file mode 100644
index 000000000..363352202
--- /dev/null
+++ b/decoding-with-langugage-models/rescoring.html
@@ -0,0 +1,386 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>LM rescoring for Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="prev" title="LODR for RNN Transducer" href="LODR.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Decoding with language models</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2 current"><a class="current reference internal" href="#">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+          <li class="breadcrumb-item"><a href="index.html">Decoding with language models</a></li>
+      <li class="breadcrumb-item active">LM rescoring for Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/rescoring.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="lm-rescoring-for-transducer">
+<span id="rescoring"></span><h1>LM rescoring for Transducer<a class="headerlink" href="#lm-rescoring-for-transducer" title="Permalink to this heading"></a></h1>
+<p>LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based
+methods (see <span class="xref std std-ref">shallow-fusion</span>, <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>), rescoring is usually performed to re-rank the n-best hypotheses after beam search.
+Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM.
+In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in
+<a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is based on the recipe
+<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming">pruned_transducer_stateless7_streaming</a>,
+which is a streaming transducer model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>.
+However, you can easily apply shallow fusion to other recipes.
+If you encounter any problems, please open an issue <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">here</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>For simplicity, the training and testing corpus in this tutorial is the same (<a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>). However, you can change the testing set
+to any other domains (e.g <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>) and use an external LM trained on that domain.</p>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>We recommend you to use a GPU for decoding.</p>
+</div>
+<p>For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
+If you want to train your model from scratch, please have a look at <a class="reference internal" href="../recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html#non-streaming-librispeech-pruned-transducer-stateless"><span class="std std-ref">Pruned transducer statelessX</span></a>.</p>
+<p>As the initial step, let’s download the pre-trained model.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+</pre></div>
+</div>
+<p>As usual, we first test the model’s performance without external LM. This can be done via the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search
+</pre></div>
+</div>
+<p>The following WERs are achieved on test-clean and test-other:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       3.11    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.93    best for test-other
+</pre></div>
+</div>
+<p>Now, we will try to improve the above WER numbers via external LM rescoring. We will download
+a pre-trained LM from this <a class="reference external" href="https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm">link</a>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+You may also train a RNN LM from scratch. Please refer to this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py">script</a>
+for training a RNN LM and this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py">script</a> to train a transformer LM.</p>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the external LM</span>
+$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
+$<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<p>With the RNNLM available, we can rescore the n-best hypotheses generated from <cite>modified_beam_search</cite>. Here,
+<cite>n</cite> should be the number of beams, i.e <code class="docutils literal notranslate"><span class="pre">--beam-size</span></code>. The command for LM rescoring is
+as follows. Note that the <code class="docutils literal notranslate"><span class="pre">--decoding-method</span></code> is set to <cite>modified_beam_search_lm_rescore</cite> and <code class="docutils literal notranslate"><span class="pre">--use-shallow-fusion</span></code>
+is set to <cite>False</cite>.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.43
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_rescore<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span>
+</pre></div>
+</div>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.93    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.6     best for test-other
+</pre></div>
+</div>
+<p>Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance,
+see the following table:</p>
+<table class="docutils align-default" id="id1">
+<caption><span class="caption-number">Table 3 </span><span class="caption-text">WERs of LM rescoring with different beam sizes</span><a class="headerlink" href="#id1" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 33%" />
+<col style="width: 33%" />
+<col style="width: 33%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.93</p></td>
+<td><p>7.6</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.67</p></td>
+<td><p>7.11</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.59</p></td>
+<td><p>6.86</p></td>
+</tr>
+</tbody>
+</table>
+<p>In fact, we can also apply LODR (see <a class="reference internal" href="LODR.html#lodr"><span class="std std-ref">LODR for RNN Transducer</span></a>) when doing LM rescoring. To do so, we need to
+download the bi-gram required by LODR:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the bi-gram</span>
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>install
+$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/marcoyang/librispeech_bigram
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>data/lang_bpe_500
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>../../librispeech_bigram/2gram.arpa<span class="w"> </span>.
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<p>Then we can performn LM rescoring + LODR by changing the decoding method to <cite>modified_beam_search_lm_rescore_LODR</cite>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This decoding method requires the dependency of <a class="reference external" href="https://github.com/kpu/kenlm">kenlm</a>. You can install it
+via this command: <cite>pip install https://github.com/kpu/kenlm/archive/master.zip</cite>.</p>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.43
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_rescore_LODR<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span>
+</pre></div>
+</div>
+<p>You should see the following WERs after executing the commands above:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.9     best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.57    best for test-other
+</pre></div>
+</div>
+<p>It’s slightly better than LM rescoring. If we further increase the beam size, we will see
+further improvements from LM rescoring + LODR:</p>
+<table class="docutils align-default" id="id2">
+<caption><span class="caption-number">Table 4 </span><span class="caption-text">WERs of LM rescoring + LODR with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 33%" />
+<col style="width: 33%" />
+<col style="width: 33%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.9</p></td>
+<td><p>7.57</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.63</p></td>
+<td><p>7.04</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.52</p></td>
+<td><p>6.73</p></td>
+</tr>
+</tbody>
+</table>
+<p>As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods.
+Here, we benchmark the WERs and decoding speed of them:</p>
+<table class="docutils align-default" id="id3">
+<caption><span class="caption-number">Table 5 </span><span class="caption-text">LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Decoding method</p></th>
+<th class="head"><p>beam=4</p></th>
+<th class="head"><p>beam=8</p></th>
+<th class="head"><p>beam=12</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><cite>modified_beam_search</cite></p></td>
+<td><p>3.11/7.93; 132s</p></td>
+<td><p>3.1/7.95; 177s</p></td>
+<td><p>3.1/7.96; 210s</p></td>
+</tr>
+<tr class="row-odd"><td><p><cite>modified_beam_search_lm_shallow_fusion</cite></p></td>
+<td><p>2.77/7.08; 262s</p></td>
+<td><p>2.62/6.65; 352s</p></td>
+<td><p>2.58/6.65; 488s</p></td>
+</tr>
+<tr class="row-even"><td><p>LODR</p></td>
+<td><p>2.61/6.74; 400s</p></td>
+<td><p>2.45/6.38; 610s</p></td>
+<td><p>2.4/6.23; 870s</p></td>
+</tr>
+<tr class="row-odd"><td><p><cite>modified_beam_search_lm_rescore</cite></p></td>
+<td><p>2.93/7.6; 156s</p></td>
+<td><p>2.67/7.11; 203s</p></td>
+<td><p>2.59/6.86; 255s</p></td>
+</tr>
+<tr class="row-even"><td><p><cite>modified_beam_search_lm_rescore_LODR</cite></p></td>
+<td><p>2.9/7.57; 160s</p></td>
+<td><p>2.63/7.04; 203s</p></td>
+<td><p>2.52/6.73; 263s</p></td>
+</tr>
+</tbody>
+</table>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>Decoding is performed with a single 32G V100, we set <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> to 600.
+Decoding time here is only for reference and it may vary.</p>
+</div>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="LODR.html" class="btn btn-neutral float-left" title="LODR for RNN Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
\ No newline at end of file
diff --git a/decoding-with-langugage-models/shallow-fusion.html b/decoding-with-langugage-models/shallow-fusion.html
new file mode 100644
index 000000000..50ef537ce
--- /dev/null
+++ b/decoding-with-langugage-models/shallow-fusion.html
@@ -0,0 +1,296 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Shallow fusion for Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script src="../_static/jquery.js"></script>
+        <script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
+        <script src="../_static/doctools.js"></script>
+        <script src="../_static/sphinx_highlight.js"></script>
+    <script src="../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../genindex.html" />
+    <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="LODR for RNN Transducer" href="LODR.html" />
+    <link rel="prev" title="Decoding with language models" href="index.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+
+          
+          
+          <a href="../index.html" class="icon icon-home">
+            icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../faqs.html">Frequently Asked Questions (FAQs)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul class="current">
+<li class="toctree-l1 current"><a class="reference internal" href="index.html">Decoding with language models</a><ul class="current">
+<li class="toctree-l2 current"><a class="current reference internal" href="#">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../index.html" class="icon icon-home" aria-label="Home"></a></li>
+          <li class="breadcrumb-item"><a href="index.html">Decoding with language models</a></li>
+      <li class="breadcrumb-item active">Shallow fusion for Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/decoding-with-langugage-models/shallow-fusion.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="shallow-fusion-for-transducer">
+<span id="shallow-fusion"></span><h1>Shallow fusion for Transducer<a class="headerlink" href="#shallow-fusion-for-transducer" title="Permalink to this heading"></a></h1>
+<p>External language models (LM) are commonly used to improve WERs for E2E ASR models.
+This tutorial shows you how to perform <code class="docutils literal notranslate"><span class="pre">shallow</span> <span class="pre">fusion</span></code> with an external LM
+to improve the word-error-rate of a transducer model.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is based on the recipe
+<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming">pruned_transducer_stateless7_streaming</a>,
+which is a streaming transducer model trained on <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>.
+However, you can easily apply shallow fusion to other recipes.
+If you encounter any problems, please open an issue here <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a>.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>For simplicity, the training and testing corpus in this tutorial is the same (<a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>). However, you can change the testing set
+to any other domains (e.g <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a>) and use an external LM trained on that domain.</p>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>We recommend you to use a GPU for decoding.</p>
+</div>
+<p>For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
+If you want to train your model from scratch, please have a look at <a class="reference internal" href="../recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html#non-streaming-librispeech-pruned-transducer-stateless"><span class="std std-ref">Pruned transducer statelessX</span></a>.</p>
+<p>As the initial step, let’s download the pre-trained model.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+</pre></div>
+</div>
+<p>To test the model, let’s have a look at the decoding results without using LM. This can be done via the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search
+</pre></div>
+</div>
+<p>The following WERs are achieved on test-clean and test-other:</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       3.11    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.93    best for test-other
+</pre></div>
+</div>
+<p>These are already good numbers! But we can further improve it by using shallow fusion with external LM.
+Training a language model usually takes a long time, we can download a pre-trained LM from this <a class="reference external" href="https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm">link</a>.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="c1"># download the external LM</span>
+$<span class="w"> </span><span class="nv">GIT_LFS_SKIP_SMUDGE</span><span class="o">=</span><span class="m">1</span><span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
+$<span class="w"> </span><span class="c1"># create a symbolic link so that the checkpoint can be loaded</span>
+$<span class="w"> </span><span class="nb">pushd</span><span class="w"> </span>icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span>git<span class="w"> </span>lfs<span class="w"> </span>pull<span class="w"> </span>--include<span class="w"> </span><span class="s2">&quot;pretrained.pt&quot;</span>
+$<span class="w"> </span>ln<span class="w"> </span>-s<span class="w"> </span>pretrained.pt<span class="w"> </span>epoch-99.pt
+$<span class="w"> </span><span class="nb">popd</span>
+</pre></div>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus.
+You may also train a RNN LM from scratch. Please refer to this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/rnn_lm/train.py">script</a>
+for training a RNN LM and this <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/icefall/transformer_lm/train.py">script</a> to train a transformer LM.</p>
+</div>
+<p>To use shallow fusion for decoding, we can execute the following command:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nv">exp_dir</span><span class="o">=</span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
+$<span class="w"> </span><span class="nv">lm_dir</span><span class="o">=</span>./icefall-librispeech-rnn-lm/exp
+$<span class="w"> </span><span class="nv">lm_scale</span><span class="o">=</span><span class="m">0</span>.29
+$<span class="w"> </span>./pruned_transducer_stateless7_streaming/decode.py<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--use-averaged-model<span class="w"> </span>False<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--beam-size<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--exp-dir<span class="w"> </span><span class="nv">$exp_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--max-duration<span class="w"> </span><span class="m">600</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decode-chunk-len<span class="w"> </span><span class="m">32</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--decoding-method<span class="w"> </span>modified_beam_search_lm_shallow_fusion<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--bpe-model<span class="w"> </span>./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
+<span class="w">    </span>--use-shallow-fusion<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-type<span class="w"> </span>rnn<span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-exp-dir<span class="w"> </span><span class="nv">$lm_dir</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-epoch<span class="w"> </span><span class="m">99</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-scale<span class="w"> </span><span class="nv">$lm_scale</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-avg<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-embedding-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-hidden-dim<span class="w"> </span><span class="m">2048</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--rnn-lm-num-layers<span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="se">\</span>
+<span class="w">    </span>--lm-vocab-size<span class="w"> </span><span class="m">500</span>
+</pre></div>
+</div>
+<p>Note that we set <code class="docutils literal notranslate"><span class="pre">--decoding-method</span> <span class="pre">modified_beam_search_lm_shallow_fusion</span></code> and <code class="docutils literal notranslate"><span class="pre">--use-shallow-fusion</span> <span class="pre">True</span></code>
+to use shallow fusion. <code class="docutils literal notranslate"><span class="pre">--lm-type</span></code> specifies the type of neural LM we are going to use, you can either choose
+between <code class="docutils literal notranslate"><span class="pre">rnn</span></code> or <code class="docutils literal notranslate"><span class="pre">transformer</span></code>. The following three arguments are associated with the rnn:</p>
+<ul class="simple">
+<li><dl class="simple">
+<dt><code class="docutils literal notranslate"><span class="pre">--rnn-lm-embedding-dim</span></code></dt><dd><p>The embedding dimension of the RNN LM</p>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt><code class="docutils literal notranslate"><span class="pre">--rnn-lm-hidden-dim</span></code></dt><dd><p>The hidden dimension of the RNN LM</p>
+</dd>
+</dl>
+</li>
+<li><dl class="simple">
+<dt><code class="docutils literal notranslate"><span class="pre">--rnn-lm-num-layers</span></code></dt><dd><p>The number of RNN layers in the RNN LM.</p>
+</dd>
+</dl>
+</li>
+</ul>
+<p>The decoding result obtained with the above command are shown below.</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>$ For test-clean, WER of different settings are:
+$ beam_size_4       2.77    best for test-clean
+$ For test-other, WER of different settings are:
+$ beam_size_4       7.08    best for test-other
+</pre></div>
+</div>
+<p>The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%.
+A few parameters can be tuned to further boost the performance of shallow fusion:</p>
+<ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">--lm-scale</span></code></p>
+<blockquote>
+<div><p>Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
+the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.</p>
+</div></blockquote>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--beam-size</span></code></p>
+<blockquote>
+<div><p>The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy.</p>
+</div></blockquote>
+</li>
+</ul>
+<p>Here, we also show how <cite>–beam-size</cite> effect the WER and decoding time:</p>
+<table class="docutils align-default" id="id2">
+<caption><span class="caption-number">Table 1 </span><span class="caption-text">WERs and decoding time (on test-clean) of shallow fusion with different beam sizes</span><a class="headerlink" href="#id2" title="Permalink to this table"></a></caption>
+<colgroup>
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+<col style="width: 25%" />
+</colgroup>
+<thead>
+<tr class="row-odd"><th class="head"><p>Beam size</p></th>
+<th class="head"><p>test-clean</p></th>
+<th class="head"><p>test-other</p></th>
+<th class="head"><p>Decoding time on test-clean (s)</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p>4</p></td>
+<td><p>2.77</p></td>
+<td><p>7.08</p></td>
+<td><p>262</p></td>
+</tr>
+<tr class="row-odd"><td><p>8</p></td>
+<td><p>2.62</p></td>
+<td><p>6.65</p></td>
+<td><p>352</p></td>
+</tr>
+<tr class="row-even"><td><p>12</p></td>
+<td><p>2.58</p></td>
+<td><p>6.65</p></td>
+<td><p>488</p></td>
+</tr>
+</tbody>
+</table>
+<p>As we see, a larger beam size during shallow fusion improves the WER, but is also slower.</p>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="index.html" class="btn btn-neutral float-left" title="Decoding with language models" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="LODR.html" class="btn btn-neutral float-right" title="LODR for RNN Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
\ No newline at end of file
diff --git a/faqs.html b/faqs.html
index b2cd5952b..ffe125679 100644
--- a/faqs.html
+++ b/faqs.html
@@ -59,6 +59,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/genindex.html b/genindex.html
index 3ea6ec482..9ae9d458b 100644
--- a/genindex.html
+++ b/genindex.html
@@ -51,6 +51,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/huggingface/index.html b/huggingface/index.html
index 0cdd8fa13..570462c18 100644
--- a/huggingface/index.html
+++ b/huggingface/index.html
@@ -58,6 +58,9 @@
 <li class="toctree-l2"><a class="reference internal" href="spaces.html">Huggingface spaces</a></li>
 </ul>
 </li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/huggingface/pretrained-models.html b/huggingface/pretrained-models.html
index 18a696caf..1f1a0ffd5 100644
--- a/huggingface/pretrained-models.html
+++ b/huggingface/pretrained-models.html
@@ -58,6 +58,9 @@
 <li class="toctree-l2"><a class="reference internal" href="spaces.html">Huggingface spaces</a></li>
 </ul>
 </li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/huggingface/spaces.html b/huggingface/spaces.html
index b702ead36..c82af354c 100644
--- a/huggingface/spaces.html
+++ b/huggingface/spaces.html
@@ -19,6 +19,7 @@
     <script src="../_static/js/theme.js"></script>
     <link rel="index" title="Index" href="../genindex.html" />
     <link rel="search" title="Search" href="../search.html" />
+    <link rel="next" title="Decoding with language models" href="../decoding-with-langugage-models/index.html" />
     <link rel="prev" title="Pre-trained models" href="pretrained-models.html" /> 
 </head>
 
@@ -60,6 +61,9 @@
 </li>
 </ul>
 </li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
@@ -144,6 +148,7 @@ the following YouTube channel by <a class="reference external" href="https://www
           </div>
           <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
         <a href="pretrained-models.html" class="btn btn-neutral float-left" title="Pre-trained models" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="../decoding-with-langugage-models/index.html" class="btn btn-neutral float-right" title="Decoding with language models" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
     </div>
 
   <hr/>
diff --git a/index.html b/index.html
index 60ed00b9d..93c9b2ca5 100644
--- a/index.html
+++ b/index.html
@@ -53,6 +53,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
@@ -148,6 +151,16 @@ speech recognition recipes using <a class="reference external" href="https://git
 </li>
 </ul>
 </div>
+<div class="toctree-wrapper compound">
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="decoding-with-langugage-models/shallow-fusion.html">Shallow fusion for Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="decoding-with-langugage-models/LODR.html">LODR for RNN Transducer</a></li>
+<li class="toctree-l2"><a class="reference internal" href="decoding-with-langugage-models/rescoring.html">LM rescoring for Transducer</a></li>
+</ul>
+</li>
+</ul>
+</div>
 </section>
 
 
diff --git a/installation/index.html b/installation/index.html
index 01e817409..a7e1bd057 100644
--- a/installation/index.html
+++ b/installation/index.html
@@ -76,6 +76,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-model-state-dict.html b/model-export/export-model-state-dict.html
index 8e378efdc..0ee2fa000 100644
--- a/model-export/export-model-state-dict.html
+++ b/model-export/export-model-state-dict.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-ncnn-conv-emformer.html b/model-export/export-ncnn-conv-emformer.html
index 7784202a7..f0521d694 100644
--- a/model-export/export-ncnn-conv-emformer.html
+++ b/model-export/export-ncnn-conv-emformer.html
@@ -75,6 +75,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-ncnn-lstm.html b/model-export/export-ncnn-lstm.html
index 5ee447022..4b3392faa 100644
--- a/model-export/export-ncnn-lstm.html
+++ b/model-export/export-ncnn-lstm.html
@@ -75,6 +75,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-ncnn-zipformer.html b/model-export/export-ncnn-zipformer.html
index 85909488d..45fa51fac 100644
--- a/model-export/export-ncnn-zipformer.html
+++ b/model-export/export-ncnn-zipformer.html
@@ -74,6 +74,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-ncnn.html b/model-export/export-ncnn.html
index 5990017be..583f82e95 100644
--- a/model-export/export-ncnn.html
+++ b/model-export/export-ncnn.html
@@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-onnx.html b/model-export/export-onnx.html
index 1bd94d732..b52c539a2 100644
--- a/model-export/export-onnx.html
+++ b/model-export/export-onnx.html
@@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-with-torch-jit-script.html b/model-export/export-with-torch-jit-script.html
index 6fd69f3e6..5f46f9ee1 100644
--- a/model-export/export-with-torch-jit-script.html
+++ b/model-export/export-with-torch-jit-script.html
@@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/export-with-torch-jit-trace.html b/model-export/export-with-torch-jit-trace.html
index d29ab9121..b85004cfd 100644
--- a/model-export/export-with-torch-jit-trace.html
+++ b/model-export/export-with-torch-jit-trace.html
@@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/model-export/index.html b/model-export/index.html
index d7fc088e7..6a4894bdb 100644
--- a/model-export/index.html
+++ b/model-export/index.html
@@ -61,6 +61,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/objects.inv b/objects.inv
index c5bbb969b..eacb8b263 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/recipes/Non-streaming-ASR/aishell/conformer_ctc.html b/recipes/Non-streaming-ASR/aishell/conformer_ctc.html
index cb49f3445..a966f5c64 100644
--- a/recipes/Non-streaming-ASR/aishell/conformer_ctc.html
+++ b/recipes/Non-streaming-ASR/aishell/conformer_ctc.html
@@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/aishell/index.html b/recipes/Non-streaming-ASR/aishell/index.html
index a631035a7..8301b81d9 100644
--- a/recipes/Non-streaming-ASR/aishell/index.html
+++ b/recipes/Non-streaming-ASR/aishell/index.html
@@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/aishell/stateless_transducer.html b/recipes/Non-streaming-ASR/aishell/stateless_transducer.html
index 5927c866c..717a5ab0e 100644
--- a/recipes/Non-streaming-ASR/aishell/stateless_transducer.html
+++ b/recipes/Non-streaming-ASR/aishell/stateless_transducer.html
@@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html b/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html
index 57d7d377c..710fce058 100644
--- a/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html
@@ -69,6 +69,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/index.html b/recipes/Non-streaming-ASR/index.html
index 5095cc140..73a07bc7e 100644
--- a/recipes/Non-streaming-ASR/index.html
+++ b/recipes/Non-streaming-ASR/index.html
@@ -64,6 +64,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html b/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
index 6d9cac4af..ba23bf7df 100644
--- a/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
+++ b/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/librispeech/distillation.html b/recipes/Non-streaming-ASR/librispeech/distillation.html
index e806c3c84..70dcea8ca 100644
--- a/recipes/Non-streaming-ASR/librispeech/distillation.html
+++ b/recipes/Non-streaming-ASR/librispeech/distillation.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
@@ -103,7 +106,7 @@
              
   <section id="distillation-with-hubert">
 <h1>Distillation with HuBERT<a class="headerlink" href="#distillation-with-hubert" title="Permalink to this heading"></a></h1>
-<p>This tutorial shows you how to perform knowledge distillation in <a href="#id7"><span class="problematic" id="id8">`icefall`_</span></a>
+<p>This tutorial shows you how to perform knowledge distillation in <a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>
 with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset. The distillation method
 used here is called “Multi Vector Quantization Knowledge Distillation” (MVQ-KD).
 Please have a look at our paper <a class="reference external" href="https://arxiv.org/abs/2211.00508">Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation</a>
@@ -119,7 +122,7 @@ encounter any problems, please open an issue here <a class="reference external"
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
-the environment for <a href="#id9"><span class="problematic" id="id10">`icefall`_</span></a>.</p>
+the environment for <a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.</p>
 </div>
 <div class="admonition hint">
 <p class="admonition-title">Hint</p>
@@ -190,7 +193,7 @@ run <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/m
 <div class="admonition note">
 <p class="admonition-title">Note</p>
 <p>There are 5 stages in total, the first and second stage will be automatically skipped
-when choosing to downloaded codebook indexes prepared by <a href="#id11"><span class="problematic" id="id12">`icefall`_</span></a>.
+when choosing to downloaded codebook indexes prepared by <a class="reference external" href="https://github.com/k2-fsa/icefall">icefall</a>.
 Of course, you can extract and compute the codebook indexes by yourself. This
 will require you downloading a HuBERT-XL model and it can take a while for
 the extraction of codebook indexes.</p>
@@ -226,10 +229,10 @@ and prepares MVQ-augmented training manifests.</p>
 </div>
 <p>Please see the
 following screenshot for the output of an example execution.</p>
-<figure class="align-center" id="id5">
+<figure class="align-center" id="id4">
 <a class="reference internal image-reference" href="../../../_images/distillation_codebook.png"><img alt="Downloading codebook indexes and preparing training manifest." src="../../../_images/distillation_codebook.png" style="width: 800px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 6 </span><span class="caption-text">Downloading codebook indexes and preparing training manifest.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 6 </span><span class="caption-text">Downloading codebook indexes and preparing training manifest.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 <div class="admonition hint">
@@ -241,10 +244,10 @@ set <code class="docutils literal notranslate"><span class="pre">use_extracted_c
 <code class="docutils literal notranslate"><span class="pre">num_codebooks</span></code> by yourself.</p>
 </div>
 <p>Now, you should see the following files under the directory <code class="docutils literal notranslate"><span class="pre">./data/vq_fbank_layer36_cb8</span></code>.</p>
-<figure class="align-center" id="id6">
+<figure class="align-center" id="id5">
 <a class="reference internal image-reference" href="../../../_images/distillation_directory.png"><img alt="MVQ-augmented training manifests" src="../../../_images/distillation_directory.png" style="width: 800px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 7 </span><span class="caption-text">MVQ-augmented training manifests.</span><a class="headerlink" href="#id6" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 7 </span><span class="caption-text">MVQ-augmented training manifests.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 <p>Whola! You are ready to perform knowledge distillation training now!</p>
diff --git a/recipes/Non-streaming-ASR/librispeech/index.html b/recipes/Non-streaming-ASR/librispeech/index.html
index de995da42..5809a5f19 100644
--- a/recipes/Non-streaming-ASR/librispeech/index.html
+++ b/recipes/Non-streaming-ASR/librispeech/index.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html b/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
index f0dfe4475..96ba89680 100644
--- a/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
+++ b/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
@@ -381,10 +384,10 @@ $<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w">
 <p>Note there is a URL in the above output. Click it and you will see
 the following screenshot:</p>
 <blockquote>
-<div><figure class="align-center" id="id9">
+<div><figure class="align-center" id="id4">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/QOGSPBgsR8KzcRMmie9JGw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id9" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
diff --git a/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html b/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
index 3c3420c12..11a988c05 100644
--- a/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html b/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
index b1bfe6c77..66ad3db1d 100644
--- a/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
+++ b/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html b/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
index 6dd89e5b8..e46e1a5dc 100644
--- a/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
+++ b/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html
@@ -72,6 +72,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/timit/index.html b/recipes/Non-streaming-ASR/timit/index.html
index 8418740c5..08733f79e 100644
--- a/recipes/Non-streaming-ASR/timit/index.html
+++ b/recipes/Non-streaming-ASR/timit/index.html
@@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html b/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html
index 0970c41af..9581f9367 100644
--- a/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html
+++ b/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html
@@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html b/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html
index 0afe1777f..b39536ff4 100644
--- a/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html
+++ b/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html
@@ -68,6 +68,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/yesno/index.html b/recipes/Non-streaming-ASR/yesno/index.html
index 7704c24b9..742e738ae 100644
--- a/recipes/Non-streaming-ASR/yesno/index.html
+++ b/recipes/Non-streaming-ASR/yesno/index.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Non-streaming-ASR/yesno/tdnn.html b/recipes/Non-streaming-ASR/yesno/tdnn.html
index 2a4849baa..c977d762c 100644
--- a/recipes/Non-streaming-ASR/yesno/tdnn.html
+++ b/recipes/Non-streaming-ASR/yesno/tdnn.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Streaming-ASR/index.html b/recipes/Streaming-ASR/index.html
index 279eb8f32..b1cdcdbf4 100644
--- a/recipes/Streaming-ASR/index.html
+++ b/recipes/Streaming-ASR/index.html
@@ -62,6 +62,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Streaming-ASR/introduction.html b/recipes/Streaming-ASR/introduction.html
index edde29646..9e43a2d69 100644
--- a/recipes/Streaming-ASR/introduction.html
+++ b/recipes/Streaming-ASR/introduction.html
@@ -66,6 +66,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Streaming-ASR/librispeech/index.html b/recipes/Streaming-ASR/librispeech/index.html
index ba1d9fda4..bda684f82 100644
--- a/recipes/Streaming-ASR/librispeech/index.html
+++ b/recipes/Streaming-ASR/librispeech/index.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html b/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
index afd94794d..82e1e392e 100644
--- a/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
+++ b/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
@@ -380,10 +383,10 @@ $<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w">
 <p>Note there is a URL in the above output. Click it and you will see
 the following screenshot:</p>
 <blockquote>
-<div><figure class="align-center" id="id4">
+<div><figure class="align-center" id="id5">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-lstm-transducer-tensorboard-log.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 10 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 10 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
diff --git a/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html b/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
index 67195d4ab..654e02874 100644
--- a/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
+++ b/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
@@ -400,10 +403,10 @@ $<span class="w"> </span>tensorboard<span class="w"> </span>dev<span class="w">
 <p>Note there is a URL in the above output. Click it and you will see
 the following screenshot:</p>
 <blockquote>
-<div><figure class="align-center" id="id10">
+<div><figure class="align-center" id="id5">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/97VKXf80Ru61CnP2ALWZZg/"><img alt="TensorBoard screenshot" src="../../../_images/streaming-librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 9 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id10" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 9 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id5" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
diff --git a/recipes/Streaming-ASR/librispeech/zipformer_transducer.html b/recipes/Streaming-ASR/librispeech/zipformer_transducer.html
index 4d266b959..f9e370b8d 100644
--- a/recipes/Streaming-ASR/librispeech/zipformer_transducer.html
+++ b/recipes/Streaming-ASR/librispeech/zipformer_transducer.html
@@ -67,6 +67,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/recipes/index.html b/recipes/index.html
index 646f5378f..c53fc3c54 100644
--- a/recipes/index.html
+++ b/recipes/index.html
@@ -58,6 +58,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/search.html b/search.html
index cbe8c2152..bc5fbe20f 100644
--- a/search.html
+++ b/search.html
@@ -54,6 +54,9 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
 <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
+</ul>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="decoding-with-langugage-models/index.html">Decoding with language models</a></li>
 </ul>
 
         </div>
diff --git a/searchindex.js b/searchindex.js
index 090fd4fbf..62a6f9d6d 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributing/code-style", "contributing/doc", "contributing/how-to-create-a-recipe", "contributing/index", "faqs", "huggingface/index", "huggingface/pretrained-models", "huggingface/spaces", "index", "installation/index", "model-export/export-model-state-dict", "model-export/export-ncnn", "model-export/export-ncnn-conv-emformer", "model-export/export-ncnn-lstm", "model-export/export-ncnn-zipformer", "model-export/export-onnx", "model-export/export-with-torch-jit-script", "model-export/export-with-torch-jit-trace", "model-export/index", "recipes/Non-streaming-ASR/aishell/conformer_ctc", "recipes/Non-streaming-ASR/aishell/index", "recipes/Non-streaming-ASR/aishell/stateless_transducer", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/index", "recipes/Non-streaming-ASR/librispeech/conformer_ctc", "recipes/Non-streaming-ASR/librispeech/distillation", "recipes/Non-streaming-ASR/librispeech/index", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi", "recipes/Non-streaming-ASR/timit/index", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/yesno/index", "recipes/Non-streaming-ASR/yesno/tdnn", "recipes/Streaming-ASR/index", "recipes/Streaming-ASR/introduction", "recipes/Streaming-ASR/librispeech/index", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Streaming-ASR/librispeech/zipformer_transducer", "recipes/index"], "filenames": ["contributing/code-style.rst", "contributing/doc.rst", "contributing/how-to-create-a-recipe.rst", "contributing/index.rst", "faqs.rst", "huggingface/index.rst", "huggingface/pretrained-models.rst", "huggingface/spaces.rst", "index.rst", "installation/index.rst", "model-export/export-model-state-dict.rst", "model-export/export-ncnn.rst", "model-export/export-ncnn-conv-emformer.rst", "model-export/export-ncnn-lstm.rst", "model-export/export-ncnn-zipformer.rst", "model-export/export-onnx.rst", "model-export/export-with-torch-jit-script.rst", "model-export/export-with-torch-jit-trace.rst", "model-export/index.rst", "recipes/Non-streaming-ASR/aishell/conformer_ctc.rst", "recipes/Non-streaming-ASR/aishell/index.rst", "recipes/Non-streaming-ASR/aishell/stateless_transducer.rst", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/index.rst", "recipes/Non-streaming-ASR/librispeech/conformer_ctc.rst", "recipes/Non-streaming-ASR/librispeech/distillation.rst", "recipes/Non-streaming-ASR/librispeech/index.rst", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi.rst", "recipes/Non-streaming-ASR/timit/index.rst", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.rst", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/yesno/index.rst", "recipes/Non-streaming-ASR/yesno/tdnn.rst", "recipes/Streaming-ASR/index.rst", "recipes/Streaming-ASR/introduction.rst", "recipes/Streaming-ASR/librispeech/index.rst", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Streaming-ASR/librispeech/zipformer_transducer.rst", "recipes/index.rst"], "titles": ["Follow the code style", "Contributing to Documentation", "How to create a recipe", "Contributing", "Frequently Asked Questions (FAQs)", "Huggingface", "Pre-trained models", "Huggingface spaces", "Icefall", "Installation", "Export model.state_dict()", "Export to ncnn", "Export ConvEmformer transducer models to ncnn", "Export LSTM transducer models to ncnn", "Export streaming Zipformer transducer models to ncnn", "Export to ONNX", "Export model with torch.jit.script()", "Export model with torch.jit.trace()", "Model export", "Conformer CTC", "aishell", "Stateless Transducer", "TDNN-LSTM CTC", "Non Streaming ASR", "Conformer CTC", "Distillation with HuBERT", "LibriSpeech", "Pruned transducer statelessX", "TDNN-LSTM-CTC", "Zipformer CTC Blank Skip", "Zipformer MMI", "TIMIT", "TDNN-LiGRU-CTC", "TDNN-LSTM-CTC", "YesNo", "TDNN-CTC", "Streaming ASR", "Introduction", "LibriSpeech", "LSTM Transducer", "Pruned transducer statelessX", "Zipformer Transducer", "Recipes"], "terms": {"we": [0, 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "us": [0, 1, 2, 4, 5, 7, 8, 9, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 24, 25, 28, 32, 33, 35, 37], "tool": [0, 4, 12], "make": [0, 1, 3, 12, 13, 14, 19, 21, 24, 37], "consist": [0, 21, 27, 39, 40, 41], "possibl": [0, 2, 3, 9, 19, 24], "black": 0, "format": [0, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "flake8": 0, "check": [0, 24], "qualiti": [0, 20], "isort": 0, "sort": [0, 9], "import": [0, 4, 12, 40, 41], "The": [0, 1, 2, 4, 7, 9, 10, 12, 13, 14, 19, 20, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "version": [0, 8, 9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 32, 33, 40], "abov": [0, 4, 9, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40, 41], "ar": [0, 1, 3, 4, 9, 10, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41, 42], "22": [0, 9, 12, 13, 24, 32, 33, 35], "3": [0, 4, 8, 10, 11, 15, 18, 22, 25, 27, 28, 29, 30, 35, 39, 40, 41], "0": [0, 1, 8, 10, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "5": [0, 11, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "4": [0, 4, 8, 10, 11, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "10": [0, 8, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "1": [0, 8, 10, 11, 15, 16, 17, 18, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "after": [0, 1, 7, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "run": [0, 2, 4, 7, 9, 12, 13, 14, 15, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "command": [0, 1, 4, 9, 10, 12, 13, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "git": [0, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "clone": [0, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "http": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "github": [0, 2, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "com": [0, 2, 6, 7, 9, 10, 12, 13, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "k2": [0, 2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "fsa": [0, 2, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 24, 27, 29, 30, 39, 40, 41], "icefal": [0, 2, 3, 4, 6, 7, 10, 11, 15, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "cd": [0, 1, 2, 4, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "pip": [0, 1, 4, 9, 12, 15, 21], "instal": [0, 1, 4, 5, 7, 8, 10, 11, 15, 18, 25, 27, 29, 30, 35, 39, 40, 41], "pre": [0, 3, 5, 7, 8, 9, 11, 18, 25], "commit": 0, "whenev": 0, "you": [0, 1, 2, 4, 6, 7, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "automat": [0, 7, 25], "hook": 0, "invok": 0, "fail": [0, 9], "If": [0, 2, 4, 7, 12, 13, 14, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "ani": [0, 9, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40], "your": [0, 1, 2, 5, 7, 8, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "wa": [0, 9, 10, 24, 28], "success": [0, 9, 12, 13], "pleas": [0, 1, 2, 4, 7, 9, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "fix": [0, 4, 9, 12, 13, 14, 24], "issu": [0, 4, 9, 12, 13, 24, 25, 40, 41], "report": [0, 4, 9, 25], "some": [0, 1, 10, 12, 13, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "i": [0, 1, 2, 4, 7, 9, 10, 11, 12, 13, 14, 15, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "e": [0, 2, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "modifi": [0, 11, 18, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "file": [0, 2, 7, 8, 10, 12, 13, 14, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "place": [0, 9, 10, 21, 24, 28], "so": [0, 7, 8, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "statu": 0, "failur": 0, "see": [0, 1, 7, 9, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "which": [0, 2, 7, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 40, 41], "ha": [0, 2, 8, 11, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 37, 39, 40, 41], "been": [0, 11, 12, 13, 14, 21], "befor": [0, 1, 10, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "further": 0, "chang": [0, 4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "all": [0, 6, 7, 10, 12, 13, 14, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "again": [0, 12, 13, 35], "should": [0, 2, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "succe": 0, "thi": [0, 2, 3, 4, 5, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "time": [0, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "succeed": 0, "want": [0, 9, 10, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "can": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "do": [0, 2, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "Or": 0, "without": [0, 5, 7, 19, 24], "your_changed_fil": 0, "py": [0, 2, 4, 9, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "sphinx": 1, "write": [1, 2, 3], "have": [1, 2, 6, 7, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "prepar": [1, 3, 10], "environ": [1, 4, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 32, 33, 35, 40, 41], "doc": [1, 10, 37], "r": [1, 9, 12, 13, 14, 32, 33], "requir": [1, 9, 14, 25, 40, 41], "txt": [1, 9, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "set": [1, 4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "up": [1, 9, 10, 12, 13, 14, 19, 22, 24, 25, 27, 28, 29, 30, 40, 41], "readi": [1, 19, 24, 25], "refer": [1, 2, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 22, 24, 27, 28, 29, 32, 33, 35, 37, 40, 41], "restructuredtext": 1, "primer": 1, "familiar": 1, "build": [1, 9, 10, 12, 13, 14, 19, 21, 24], "local": [1, 9, 27, 29, 30, 39, 40, 41], "preview": 1, "what": [1, 2, 9, 12, 13, 14, 21, 37], "look": [1, 2, 6, 9, 12, 13, 14, 19, 21, 22, 24, 25], "like": [1, 2, 7, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40], "publish": [1, 10, 20], "html": [1, 2, 4, 9, 11, 12, 13, 14, 15, 16, 17, 27, 39, 40, 41], "gener": [1, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "view": [1, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40, 41], "follow": [1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "python3": [1, 4, 9, 13, 14], "m": [1, 9, 12, 13, 14, 21, 27, 29, 30, 32, 33, 39, 40, 41], "server": [1, 7, 9, 39], "It": [1, 2, 5, 9, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "print": [1, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "serv": [1, 27, 29, 30, 39, 40, 41], "port": [1, 25, 27, 29, 30, 39, 40, 41], "8000": [1, 35], "open": [1, 8, 10, 12, 13, 14, 20, 21, 24, 25], "browser": [1, 5, 7, 27, 29, 30, 39, 40, 41], "go": [1, 9, 19, 21, 24, 27, 29, 30, 39, 40, 41], "read": [2, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "code": [2, 3, 4, 8, 12, 13, 14, 19, 24, 25, 27, 28, 32, 33, 35, 37, 40, 41], "style": [2, 3, 8], "adjust": 2, "sytl": 2, "design": 2, "python": [2, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 24, 27, 29, 30, 39, 40, 41], "recommend": [2, 9, 19, 21, 22, 24, 25, 27, 40, 41], "test": [2, 8, 10, 11, 18, 19, 21, 22, 24, 25, 28, 29, 32, 33], "valid": [2, 9, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "dataset": [2, 4, 9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "lhots": [2, 8, 10, 12, 13, 14, 19, 21, 24], "readthedoc": [2, 9], "io": [2, 9, 11, 12, 13, 14, 15, 16, 17, 27, 39, 40, 41], "en": [2, 9, 12], "latest": [2, 7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "index": [2, 9, 11, 12, 13, 14, 15, 16, 17, 39, 40, 41], "yesno": [2, 4, 8, 9, 23, 35, 42], "veri": [2, 3, 12, 13, 14, 21, 32, 33, 35, 40, 41], "good": 2, "exampl": [2, 7, 8, 10, 12, 13, 14, 16, 17, 18, 25, 28, 32, 33, 35], "speech": [2, 7, 8, 9, 11, 20, 21, 35, 42], "pull": [2, 12, 13, 14, 15, 19, 21, 24, 37], "380": [2, 12, 33], "show": [2, 7, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "add": [2, 12, 13, 14, 19, 21, 22, 40, 42], "new": [2, 3, 7, 9, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 35, 39, 40, 41], "suppos": [2, 40, 41], "would": [2, 9, 10, 12, 13, 14, 24, 28, 40, 41], "name": [2, 4, 10, 12, 13, 14, 15, 19, 21, 27, 29, 30, 40, 41], "foo": [2, 17, 19, 24, 27, 29, 30, 39, 40, 41], "eg": [2, 4, 6, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "mkdir": [2, 12, 13, 19, 21, 22, 24, 28, 32, 33, 35], "p": [2, 9, 12, 13, 21, 32, 33], "asr": [2, 4, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "touch": 2, "sh": [2, 9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "chmod": 2, "x": [2, 14, 37], "simpl": [2, 21], "own": [2, 25, 27, 40, 41], "otherwis": [2, 12, 13, 14, 19, 21, 24, 25, 27, 29, 30, 39, 40, 41], "librispeech": [2, 4, 6, 8, 10, 12, 13, 14, 15, 16, 17, 23, 24, 25, 27, 28, 29, 30, 36, 37, 39, 40, 41, 42], "assum": [2, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 39, 40, 41], "fanci": 2, "call": [2, 4, 15, 25], "bar": [2, 17, 19, 24, 27, 29, 30, 39, 40, 41], "organ": 2, "wai": [2, 3, 18, 27, 29, 30, 37, 39, 40, 41], "readm": [2, 19, 21, 22, 24, 28, 32, 33, 35], "md": [2, 6, 10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "asr_datamodul": [2, 4, 9], "pretrain": [2, 10, 12, 13, 14, 15, 17, 19, 21, 22, 24, 28, 32, 33, 35], "For": [2, 4, 6, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "instanc": [2, 4, 6, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "tdnn": [2, 4, 9, 20, 23, 26, 31, 34], "its": [2, 10, 11, 12, 13, 14, 17, 21, 29], "directori": [2, 8, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "structur": [2, 14], "descript": [2, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "contain": [2, 8, 10, 11, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41, 42], "inform": [2, 10, 19, 21, 22, 24, 27, 28, 29, 32, 33, 35, 37, 39, 40, 41], "g": [2, 9, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "wer": [2, 9, 10, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "etc": [2, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "provid": [2, 7, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41, 42], "pytorch": [2, 4, 8, 12, 13, 14, 21], "dataload": [2, 9], "take": [2, 10, 25, 27, 35, 40, 41], "input": [2, 10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35, 37], "checkpoint": [2, 9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "save": [2, 9, 10, 13, 14, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "dure": [2, 4, 7, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "stage": [2, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "": [2, 9, 10, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "definit": [2, 12, 13], "neural": [2, 19, 24], "network": [2, 19, 21, 24, 27, 29, 30, 39, 40, 41], "script": [2, 8, 9, 17, 18, 19, 21, 22, 24, 25, 28, 32, 33, 35, 39], "infer": [2, 10, 12, 13], "tdnn_lstm_ctc": [2, 22, 28, 33], "conformer_ctc": [2, 19, 24], "get": [2, 7, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 35, 37, 39, 40, 41], "feel": [2, 25, 39], "result": [2, 6, 7, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "everi": [2, 10, 27, 29, 30, 39, 40, 41], "kept": [2, 27, 40, 41], "self": [2, 11, 14, 37], "toler": 2, "duplic": 2, "among": [2, 9], "differ": [2, 9, 12, 13, 14, 15, 19, 20, 24, 25, 27, 37, 39, 40, 41], "invoc": [2, 12, 13], "help": [2, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "blob": [2, 6, 10, 17, 27, 29, 30, 39, 40, 41], "master": [2, 6, 9, 10, 13, 14, 16, 17, 21, 25, 27, 29, 30, 39, 40, 41], "transform": [2, 19, 24, 39], "conform": [2, 16, 20, 21, 23, 26, 27, 29, 39, 40, 41], "base": [2, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "lstm": [2, 11, 17, 18, 20, 23, 26, 31, 36, 38], "attent": [2, 14, 21, 22, 25, 37, 40, 41], "lm": [2, 9, 21, 27, 28, 32, 33, 35, 40, 41], "rescor": [2, 22, 28, 30, 32, 33, 35], "demonstr": [2, 5, 7, 10, 15], "consid": [2, 14], "colab": 2, "notebook": 2, "welcom": 3, "There": [3, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "mani": [3, 40, 41], "two": [3, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "them": [3, 5, 6, 7, 9, 12, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "To": [3, 7, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "document": [3, 8, 10, 11, 12, 13, 14, 15, 30], "repositori": [3, 12, 13, 14, 15], "recip": [3, 6, 8, 9, 10, 15, 19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 37, 39, 40, 41], "In": [3, 4, 7, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 25, 28, 32, 33, 35, 37], "page": [3, 7, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "describ": [3, 5, 10, 12, 13, 15, 16, 17, 18, 19, 21, 22, 24, 27, 28, 32, 33, 40, 41], "how": [3, 5, 7, 8, 9, 12, 13, 14, 15, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "creat": [3, 8, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40], "data": [3, 10, 12, 13, 14, 15, 16, 17, 20], "train": [3, 4, 5, 7, 8, 10, 11, 16, 17, 18, 37], "decod": [3, 4, 7, 12, 13, 14, 17, 18], "model": [3, 5, 7, 8, 9, 11, 25, 37], "section": [4, 5, 9, 10, 15, 16, 17, 18, 19, 24], "collect": [4, 9], "user": [4, 9], "post": 4, "correspond": [4, 6, 7], "solut": 4, "One": 4, "torch": [4, 8, 9, 10, 11, 18, 19, 21, 24], "torchaudio": [4, 8, 37], "cu111": 4, "torchvis": 4, "11": [4, 9, 12, 13, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "f": [4, 9, 32, 33], "download": [4, 7, 8, 11, 18, 20, 25], "org": [4, 9, 20, 21, 27, 39, 40, 41], "whl": [4, 9], "torch_stabl": [4, 9], "throw": [4, 12, 13, 14], "error": [4, 9, 12, 13, 14, 24], "when": [4, 7, 12, 13, 14, 18, 21, 24, 25, 27, 29, 30, 40, 41], "specifi": [4, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "cuda": [4, 8, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "while": [4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "That": [4, 12, 13, 25, 27, 39, 40, 41], "cu11": 4, "therefor": 4, "correct": 4, "log": [4, 9, 12, 13, 14, 28, 32, 33, 35], "traceback": 4, "most": [4, 40, 41], "recent": [4, 12, 13, 14], "last": 4, "line": [4, 9, 12, 13, 14, 27, 40, 41], "14": [4, 9, 10, 12, 13, 16, 19, 24, 27, 28, 29, 32, 39, 40, 41], "from": [4, 5, 7, 9, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "yesnoasrdatamodul": 4, "home": [4, 12, 13, 19, 24], "xxx": [4, 10, 12, 13, 14], "next": [4, 7, 9, 12, 13, 14, 24, 25, 27, 28, 29, 30, 39, 40, 41], "gen": [4, 7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "kaldi": [4, 7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "34": [4, 9, 12, 13], "datamodul": 4, "__init__": [4, 9, 10, 12, 13, 14, 19, 21, 24], "23": [4, 9, 12, 13, 14, 19, 21, 22, 24, 32, 33, 35], "util": [4, 9, 24], "add_eo": 4, "add_so": 4, "get_text": 4, "39": [4, 9, 12, 14, 21, 24, 28, 32], "tensorboard": [4, 9, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "summarywrit": 4, "miniconda3": 4, "env": 4, "yyi": 4, "lib": [4, 9, 14], "8": [4, 9, 10, 12, 13, 14, 19, 21, 24, 25, 27, 28, 29, 30, 35, 39, 40, 41], "site": [4, 9, 14], "packag": [4, 9, 14], "loosevers": 4, "uninstal": 4, "setuptool": [4, 9], "58": [4, 9, 24], "conda": [4, 9], "encount": [4, 9, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "dev": [4, 9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "yangyifan": 4, "anaconda3": 4, "dev20230112": 4, "cuda11": [4, 9], "6": [4, 9, 11, 18, 19, 21, 24, 27, 28, 32, 33, 39], "torch1": [4, 9], "13": [4, 9, 10, 12, 13, 14, 21, 22, 24, 28, 29, 32], "py3": [4, 9], "linux": [4, 7, 9, 11, 12, 13, 14, 15], "x86_64": [4, 9, 12], "egg": [4, 9], "24": [4, 9, 12, 13, 22, 28, 32, 33, 35], "_k2": [4, 9], "determinizeweightpushingtyp": 4, "handl": [4, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "except": [4, 10], "anoth": 4, "occur": 4, "pruned_transducer_stateless7_ctc_b": [4, 29], "104": 4, "30": [4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "rais": 4, "note": [4, 10, 12, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "re": [4, 19, 22, 24, 25, 27, 29, 30, 37, 39, 40, 41], "anaconda": 4, "maco": [4, 7, 11, 12, 13, 14, 15], "probabl": [4, 9, 21, 27, 29, 39, 40, 41], "variabl": [4, 9, 12, 13, 14, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "export": [4, 8, 9, 19, 21, 22, 24, 25, 28, 32, 33, 35], "dyld_library_path": 4, "conda_prefix": 4, "first": [4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "try": [4, 5, 7, 25, 27, 29, 30, 39, 40, 41], "find": [4, 5, 6, 7, 9, 10, 12, 13, 14, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "where": [4, 40], "locat": [4, 12], "libpython": 4, "abl": 4, "insid": [4, 17], "codna_prefix": 4, "ld_library_path": 4, "also": [5, 6, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40, 41], "within": [5, 7, 12, 13], "anyth": [5, 7], "space": [5, 8], "youtub": [5, 8, 24, 25, 27, 28, 29, 30, 39, 40, 41], "video": [5, 8, 24, 25, 27, 28, 29, 30, 39, 40, 41], "upload": [6, 7, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "huggingfac": [6, 8, 10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 29, 30, 32, 33, 35, 39], "co": [6, 7, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 28, 29, 30, 32, 33, 35, 39], "visit": [6, 7, 27, 29, 30, 39, 40, 41], "link": [6, 9, 10, 11, 27, 29, 30, 39, 40, 41], "search": [6, 7], "specif": [6, 15, 21], "aishel": [6, 8, 19, 21, 22, 23, 42], "gigaspeech": [6, 16, 39], "wenetspeech": [6, 16], "integr": 7, "framework": [7, 27, 40], "sherpa": [7, 11, 16, 17, 18, 39], "need": [7, 9, 10, 11, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "window": [7, 11, 12, 13, 14, 15], "even": [7, 9, 13], "ipad": 7, "phone": 7, "start": [7, 9, 10, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "address": [7, 9, 10, 12, 13, 14, 21, 27, 30, 39, 40, 41], "recognit": [7, 8, 11, 12, 13, 20, 21, 35, 42], "screenshot": [7, 19, 21, 22, 24, 25, 27, 35, 39, 40], "select": [7, 12, 13, 14, 27, 28, 32, 33, 35, 39, 40, 41], "languag": [7, 19, 21, 22], "current": [7, 12, 13, 21, 25, 37, 39, 40, 41, 42], "chines": [7, 20, 21], "english": [7, 35, 39], "target": 7, "method": [7, 9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 39, 40, 41], "greedi": 7, "modified_beam_search": [7, 21, 25, 27, 29, 39, 40, 41], "choos": [7, 9, 25, 27, 29, 30, 39, 40, 41], "number": [7, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "activ": 7, "path": [7, 9, 10, 12, 13, 14, 17, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "either": [7, 19, 21, 22, 24, 40, 41], "record": [7, 13, 14, 19, 20, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "click": [7, 9, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "button": 7, "submit": 7, "wait": 7, "moment": 7, "an": [7, 9, 10, 12, 13, 14, 15, 16, 17, 19, 20, 21, 24, 25, 27, 30, 35, 39, 40, 41], "bottom": [7, 27, 29, 30, 39, 40, 41], "part": [7, 9, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "tabl": [7, 12, 13, 14], "one": [7, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "subscrib": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "channel": [7, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "nadira": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "povei": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "www": [7, 9, 20, 24, 25, 27, 28, 29, 30, 39, 40, 41], "uc_vaumpkminz1pnkfxan9mw": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "toolkit": 8, "cudnn": 8, "2": [8, 10, 11, 18, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "frequent": 8, "ask": 8, "question": 8, "faq": 8, "oserror": 8, "libtorch_hip": 8, "cannot": [8, 12, 13, 14], "share": [8, 9], "object": [8, 9, 19, 21, 22, 27, 35, 39, 40], "attributeerror": 8, "modul": [8, 9, 12, 14, 29, 40], "distutil": 8, "attribut": [8, 14, 24], "importerror": 8, "libpython3": 8, "No": [8, 12, 13, 14, 35], "state_dict": [8, 18, 19, 21, 22, 24, 28, 32, 33, 35], "jit": [8, 11, 18, 24], "trace": [8, 11, 16, 18], "onnx": [8, 10, 18], "ncnn": [8, 18], "non": [8, 24, 37, 40, 42], "stream": [8, 11, 12, 13, 15, 18, 19, 24, 32, 33, 39, 42], "timit": [8, 23, 32, 33, 42], "introduct": [8, 36, 42], "contribut": 8, "depend": [9, 19, 24], "step": [9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "99": [9, 12, 13, 14, 15], "who": 9, "about": [9, 12, 13, 14, 21, 25, 27, 30, 39, 40, 41], "suggest": [9, 27, 29, 30, 39, 40, 41], "virut": 9, "venv": 9, "my_env": 9, "sourc": [9, 10, 12, 13, 14, 19, 20, 21, 24], "bin": [9, 12, 13, 14, 19, 24], "order": [9, 12, 13, 14, 19, 22, 24, 28, 32, 33], "matter": [9, 12], "compil": [9, 12, 13, 19, 21, 24], "wheel": [9, 12], "same": [9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "don": [9, 12, 13, 14, 16, 19, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "t": [9, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "from_sourc": 9, "for_develop": 9, "alwai": [9, 10], "strongli": 9, "pythonpath": [9, 12, 13, 14], "point": [9, 10, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "folder": [9, 10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "tmp": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "setup": [9, 12, 19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 40, 41], "put": [9, 12, 13, 29, 40], "sever": [9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "switch": [9, 19, 24, 30], "just": [9, 12, 13, 14, 37], "virtualenv": 9, "cpython3": 9, "final": [9, 10, 12, 13, 24, 28], "64": [9, 10, 12, 21, 40], "1540m": 9, "creator": 9, "cpython3posix": 9, "dest": 9, "ceph": [9, 10, 19, 21, 24], "fj": [9, 10, 12, 13, 14, 21, 24], "fangjun": [9, 10, 12, 13, 14, 21, 24], "clear": 9, "fals": [9, 10, 12, 13, 14, 19, 21, 24, 25], "no_vcs_ignor": 9, "global": 9, "seeder": 9, "fromappdata": 9, "bundl": 9, "via": [9, 11, 16, 17, 18], "copi": [9, 37], "app_data_dir": 9, "root": [9, 12, 13, 14], "v": [9, 12, 13, 14, 24, 32, 33], "irtualenv": 9, "ad": [9, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40, 41], "seed": 9, "21": [9, 10, 12, 19, 21, 24, 32, 33], "57": [9, 13, 24, 28], "36": [9, 12, 21, 24, 25], "bashactiv": 9, "cshellactiv": 9, "fishactiv": 9, "powershellactiv": 9, "pythonactiv": 9, "xonshactiv": 9, "dev20210822": 9, "cpu": [9, 10, 12, 13, 14, 16, 19, 27, 29, 30, 35, 40, 41], "9": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 35, 39, 40, 41], "nightli": 9, "2bcpu": 9, "cp38": 9, "linux_x86_64": 9, "mb": [9, 12, 13, 14], "________________________________": 9, "185": [9, 19, 24, 35], "kb": [9, 12, 13, 14, 32, 33], "graphviz": 9, "17": [9, 10, 12, 13, 14, 19, 24, 32, 33, 39], "none": [9, 19, 24], "18": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 32, 33, 39, 40, 41], "cach": [9, 14], "manylinux1_x86_64": 9, "831": [9, 21, 33], "type": [9, 10, 12, 13, 14, 19, 21, 24, 27, 29, 30, 35, 37, 39, 40, 41], "extens": 9, "typing_extens": 9, "26": [9, 12, 13, 14, 21, 24, 33], "successfulli": [9, 12, 13, 14], "req": 9, "7b1b76ge": 9, "q": 9, "audioread": 9, "soundfil": 9, "post1": 9, "py2": 9, "7": [9, 10, 11, 14, 18, 19, 22, 24, 27, 28, 32, 33, 39, 40], "97": [9, 12, 19], "cytoolz": 9, "manylinux_2_17_x86_64": 9, "manylinux2014_x86_64": 9, "dataclass": 9, "h5py": 9, "manylinux_2_12_x86_64": 9, "manylinux2010_x86_64": 9, "684": [9, 19, 35], "intervaltre": 9, "lilcom": 9, "numpi": 9, "15": [9, 10, 12, 13, 14, 21, 22, 24, 32, 35], "40": [9, 12, 13, 14, 22, 24, 28, 32, 33], "pyyaml": 9, "662": 9, "tqdm": 9, "62": [9, 24, 28], "76": [9, 35], "73": 9, "alreadi": [9, 10], "satisfi": 9, "2a1410b": 9, "clean": [9, 14, 19, 21, 24, 25, 27, 28, 29, 30, 39, 40, 41], "toolz": 9, "55": [9, 12, 22, 24, 32], "sortedcontain": 9, "29": [9, 14, 15, 19, 21, 22, 24, 28, 29, 32, 33], "cffi": 9, "411": [9, 14, 24], "pycpars": 9, "20": [9, 10, 12, 14, 19, 21, 22, 24, 27, 28, 32, 33, 35, 40], "112": [9, 12, 13, 14], "pypars": 9, "67": 9, "done": [9, 10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "filenam": [9, 12, 13, 14, 15, 16, 17, 29, 30, 39, 41], "dev_2a1410b_clean": 9, "size": [9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "342242": 9, "sha256": 9, "f683444afa4dc0881133206b4646a": 9, "9d0f774224cc84000f55d0a67f6e4a37997": 9, "store": [9, 24], "ephem": 9, "ftu0qysz": 9, "7f": 9, "7a": 9, "8e": 9, "a0bf241336e2e3cb573e1e21e5600952d49f5162454f2e612f": 9, "warn": 9, "built": 9, "invalid": [9, 24], "metadata": [9, 32, 33], "mandat": 9, "pep": 9, "440": 9, "packa": 9, "ging": 9, "deprec": [9, 21], "legaci": 9, "becaus": 9, "could": [9, 12, 13, 14, 19, 22], "A": [9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 39, 40, 41], "replac": [9, 12, 13], "discuss": 9, "regard": 9, "pypa": 9, "sue": 9, "8368": 9, "inter": 9, "valtre": 9, "sor": 9, "tedcontain": 9, "remot": 9, "enumer": 9, "500": [9, 10, 12, 13, 14, 21, 24, 30, 39], "count": 9, "100": [9, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "compress": 9, "308": [9, 19, 21, 22], "total": [9, 13, 14, 19, 21, 22, 24, 25, 27, 28, 35, 39, 40], "delta": 9, "263": [9, 13], "reus": 9, "307": 9, "102": [9, 14, 19], "pack": [9, 40, 41], "receiv": 9, "172": 9, "49": [9, 12, 13, 24, 33, 35], "kib": 9, "385": 9, "00": [9, 12, 19, 21, 22, 24, 28, 32, 33, 35], "resolv": 9, "kaldilm": 9, "tar": 9, "gz": 9, "48": [9, 12, 13, 19, 21], "574": 9, "kaldialign": 9, "sentencepiec": [9, 24], "96": 9, "41": [9, 12, 14, 19, 21, 32, 35], "absl": 9, "absl_pi": 9, "132": 9, "googl": [9, 27, 29, 30, 39, 40, 41], "auth": 9, "oauthlib": 9, "google_auth_oauthlib": 9, "grpcio": 9, "ment": 9, "12": [9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 27, 29, 30, 32, 35, 39, 40, 41], "requi": 9, "rement": 9, "protobuf": 9, "manylinux_2_5_x86_64": 9, "werkzeug": 9, "288": 9, "tensorboard_data_serv": 9, "google_auth": 9, "35": [9, 10, 12, 13, 14, 21, 24, 39], "152": 9, "request": [9, 37], "plugin": 9, "wit": 9, "tensorboard_plugin_wit": 9, "781": 9, "markdown": 9, "six": 9, "16": [9, 10, 12, 13, 14, 17, 19, 21, 22, 24, 27, 28, 32, 33, 35, 39, 40, 41], "cachetool": 9, "rsa": 9, "pyasn1": 9, "pyasn1_modul": 9, "155": 9, "requests_oauthlib": 9, "77": [9, 24], "urllib3": 9, "27": [9, 12, 13, 14, 19, 21, 28, 33], "138": [9, 19, 21], "certifi": 9, "2017": 9, "2021": [9, 19, 22, 24, 28, 32, 33, 35], "145": 9, "charset": 9, "normal": [9, 28, 32, 33, 35, 40], "charset_norm": 9, "idna": 9, "59": [9, 12, 22, 24], "146": 9, "897233": 9, "eccb906cafcd45bf9a7e1a1718e4534254bfb": 9, "f4c0d0cbc66eee6c88d68a63862": 9, "85": 9, "7d": 9, "63": [9, 21], "f2dd586369b8797cb36d213bf3a84a789eeb92db93d2e723c9": 9, "etool": 9, "oaut": 9, "hlib": 9, "let": [9, 12, 13, 14, 19, 24], "u": [9, 12, 13, 14, 19, 21, 22, 24, 25, 35], "2023": [9, 12, 13, 14, 29], "05": [9, 10, 12, 13, 19, 21, 22, 24, 33], "main": [9, 19, 24, 37], "dl_dir": [9, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "waves_yesno": 9, "_______________________________________________________________": 9, "70m": 9, "06": [9, 10, 12, 22, 24, 28, 35], "54": [9, 13, 14, 24, 28, 32, 33], "4kb": 9, "02": [9, 10, 12, 13, 14, 21, 24, 27, 33, 39, 40], "19": [9, 10, 12, 13, 14, 19, 24, 28, 32, 33], "manifest": [9, 25], "45": [9, 12, 14, 19, 21, 24], "comput": [9, 10, 12, 13, 14, 19, 21, 22, 25, 27, 28, 30, 32, 33, 35, 39, 40, 41], "fbank": [9, 10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35], "199": [9, 24, 28], "info": [9, 10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35], "compute_fbank_yesno": 9, "65": [9, 12], "process": [9, 10, 12, 13, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "extract": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "featur": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "90": [9, 12], "212": 9, "60it": 9, "640": [9, 14], "304": [9, 13], "53it": 9, "51": [9, 12, 19, 24, 35], "lang": [9, 10, 21, 24, 30], "66": [9, 13], "project": 9, "csrc": [9, 24], "arpa_file_pars": 9, "cc": 9, "void": 9, "arpafilepars": 9, "std": 9, "istream": 9, "79": 9, "140": [9, 22], "gram": [9, 19, 21, 22, 27, 28, 30, 32, 33, 40, 41], "92": [9, 24], "hlg": [9, 28, 32, 33, 35], "28": [9, 12, 13, 21, 24, 28], "581": [9, 12, 28], "compile_hlg": 9, "124": [9, 19, 24], "lang_phon": [9, 22, 28, 32, 33, 35], "582": 9, "lexicon": [9, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "171": [9, 22, 24, 32, 33], "convert": [9, 12, 13, 14, 24], "l": [9, 12, 13, 14, 21, 32, 33, 35], "pt": [9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "linv": [9, 21, 24, 35], "609": 9, "ctc_topo": 9, "max_token_id": 9, "610": 9, "52": [9, 19, 24], "load": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "fst": [9, 21, 35], "611": 9, "intersect": [9, 27, 40, 41], "613": 9, "lg": [9, 27, 30, 40, 41], "shape": [9, 14], "connect": [9, 10, 24, 27, 28, 39, 40, 41], "614": 9, "68": [9, 24], "70": 9, "class": [9, 24], "tensor": [9, 13, 14, 19, 21, 22, 24, 27, 35, 39, 40], "71": [9, 24, 28], "determin": 9, "615": 9, "74": [9, 10], "rag": 9, "raggedtensor": 9, "remov": [9, 19, 21, 22, 24, 28, 32, 33], "disambigu": 9, "symbol": [9, 21, 27, 40, 41], "616": 9, "91": 9, "remove_epsilon": 9, "617": 9, "arc": 9, "compos": 9, "h": 9, "619": 9, "106": [9, 13, 24], "109": [9, 19, 24], "111": [9, 24], "127": [9, 12, 13, 35], "now": [9, 12, 13, 14, 19, 24, 25, 27, 28, 29, 30, 32, 33, 39, 40, 41], "cuda_visible_devic": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "gpu": [9, 12, 13, 19, 21, 22, 24, 25, 27, 29, 30, 32, 33, 35, 39, 40, 41], "avail": [9, 10, 12, 13, 14, 19, 21, 24, 28, 32, 33, 35, 39], "case": [9, 10, 12, 13, 14, 27, 29, 30, 39, 40, 41], "segment": 9, "fault": 9, "core": 9, "dump": 9, "protocol_buffers_python_implement": 9, "more": [9, 12, 13, 14, 19, 24, 25, 35, 37, 39, 40], "674": 9, "interest": [9, 25, 27, 29, 30, 39, 40, 41], "given": [9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 40, 41], "below": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40], "04": [9, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33], "759": [9, 21], "481": 9, "482": 9, "exp_dir": [9, 12, 13, 14, 21, 24, 25, 27, 29, 30, 40, 41], "posixpath": [9, 12, 13, 14, 21, 24], "exp": [9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "lang_dir": [9, 21, 24], "lr": [9, 21, 39], "01": [9, 12, 21, 22, 24, 25, 29], "feature_dim": [9, 10, 12, 13, 14, 19, 21, 24, 35], "weight_decai": 9, "1e": 9, "start_epoch": 9, "best_train_loss": [9, 10, 12, 13, 14], "inf": [9, 10, 12, 13, 14], "best_valid_loss": [9, 10, 12, 13, 14], "best_train_epoch": [9, 10, 12, 13, 14], "best_valid_epoch": [9, 10, 13, 14], "batch_idx_train": [9, 10, 12, 13, 14], "log_interv": [9, 10, 12, 13, 14], "reset_interv": [9, 10, 12, 13, 14], "valid_interv": [9, 10, 12, 13, 14], "beam_siz": [9, 10, 21], "reduct": [9, 12, 13, 29], "sum": 9, "use_double_scor": [9, 19, 24, 35], "true": [9, 10, 12, 13, 14, 19, 21, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "world_siz": [9, 25], "master_port": 9, "12354": 9, "num_epoch": 9, "42": [9, 13, 19, 24, 35], "feature_dir": [9, 24], "max_dur": [9, 24], "bucketing_sampl": [9, 24], "num_bucket": [9, 24], "concatenate_cut": [9, 24], "duration_factor": [9, 24], "gap": [9, 24], "on_the_fly_feat": [9, 24], "shuffl": [9, 24], "return_cut": [9, 24], "num_work": [9, 24], "env_info": [9, 10, 12, 13, 14, 19, 21, 24], "releas": [9, 10, 12, 13, 14, 19, 21, 24], "sha1": [9, 10, 12, 13, 14, 19, 21, 24], "3b7f09fa35e72589914f67089c0da9f196a92ca4": 9, "date": [9, 10, 12, 13, 14, 19, 21, 24], "mon": [9, 13, 14], "mai": [9, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41, 42], "6fcfced": 9, "cu118": 9, "branch": [9, 10, 12, 13, 14, 19, 21, 24, 29], "30bde4b": 9, "thu": [9, 10, 12, 13, 14, 21, 24, 28], "37": [9, 13, 19, 21, 24, 32], "47": [9, 12, 13, 14, 19, 24], "dev20230512": 9, "torch2": 9, "hostnam": [9, 10, 12, 13, 14, 21], "host": [9, 10], "ip": [9, 10, 12, 13, 14, 21], "761": 9, "168": [9, 28], "764": 9, "495": 9, "devic": [9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 40, 41], "791": [9, 28], "cut": [9, 24], "244": 9, "852": 9, "149": [9, 12, 24], "singlecutsampl": 9, "205": [9, 24], "853": 9, "218": [9, 13], "252": 9, "986": 9, "422": 9, "epoch": [9, 10, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "batch": [9, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "loss": [9, 12, 13, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "065": 9, "over": [9, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "2436": 9, "frame": [9, 21, 27, 29, 40, 41], "tot_loss": 9, "352": [9, 24], "4561": 9, "2828": 9, "7076": 9, "22192": 9, "691": 9, "444": 9, "9002": 9, "18067": 9, "996": 9, "2555": 9, "2695": 9, "484": 9, "34971": 9, "217": [9, 19, 24], "4688": 9, "251": [9, 32, 33], "75": [9, 12], "389": [9, 22, 24], "2532": 9, "637": 9, "1139": 9, "1592": 9, "859": 9, "1629": 9, "094": 9, "0767": 9, "118": [9, 24], "350": 9, "06778": 9, "395": 9, "789": 9, "01056": 9, "016": 9, "009022": 9, "009985": 9, "271": [9, 10, 13], "01088": 9, "497": 9, "01174": 9, "01077": 9, "747": 9, "01087": 9, "783": 9, "921": 9, "01045": 9, "008957": 9, "009903": 9, "374": 9, "01092": 9, "598": [9, 24], "01169": 9, "01065": 9, "824": 9, "862": [9, 13], "865": [9, 13], "555": 9, "08": [9, 14, 24, 28, 30, 32, 33, 35, 39], "483": 9, "264": [9, 14], "lm_dir": [9, 24], "search_beam": [9, 19, 24, 35], "output_beam": [9, 19, 24, 35], "min_active_st": [9, 19, 24, 35], "max_active_st": [9, 19, 24, 35], "10000": [9, 19, 24, 35], "avg": [9, 10, 12, 13, 14, 15, 16, 17, 21, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "487": 9, "273": [9, 10, 21], "513": 9, "291": 9, "averag": [9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "521": 9, "675": 9, "204": [9, 14, 24], "until": [9, 24, 29], "923": 9, "241": [9, 19], "transcript": [9, 19, 20, 21, 22, 24, 27, 28, 32, 33, 39, 40, 41], "recog": [9, 21, 24], "test_set": [9, 35], "924": 9, "558": 9, "240": [9, 19, 35], "ins": [9, 24, 35], "del": [9, 24, 35], "sub": [9, 24, 35], "925": 9, "249": [9, 13], "wrote": [9, 24], "detail": [9, 11, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "stat": [9, 24], "err": [9, 21, 24], "316": [9, 24], "congratul": [9, 12, 13, 14, 19, 22, 24, 28, 32, 33, 35], "fun": [9, 12, 13], "debug": 9, "variou": [9, 15, 18, 42], "problem": [9, 25], "period": [10, 12], "disk": 10, "optim": [10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "other": [10, 13, 14, 15, 21, 24, 25, 27, 28, 32, 33, 35, 37, 40, 41, 42], "relat": [10, 19, 21, 24, 28, 32, 33, 35], "resum": [10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "howev": [10, 13, 25], "onli": [10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "strip": 10, "reduc": [10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "each": [10, 12, 13, 15, 19, 21, 22, 24, 27, 29, 30, 37, 39, 40, 41], "well": [10, 35, 42], "usag": [10, 12, 13, 14, 16, 17, 28, 32, 33, 35], "pruned_transducer_stateless3": [10, 16, 37], "almost": [10, 27, 37, 40, 41], "dir": [10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "bpe": [10, 12, 13, 14, 15, 16, 17, 24, 27, 29, 30, 39, 40, 41], "lang_bpe_500": [10, 12, 13, 14, 15, 16, 17, 24, 27, 29, 30, 39, 40, 41], "dict": [10, 14], "csukuangfj": [10, 12, 13, 15, 19, 21, 22, 24, 28, 32, 33, 35, 39], "prune": [10, 14, 15, 21, 23, 25, 26, 36, 37, 38, 39, 41], "transduc": [10, 11, 15, 18, 20, 23, 25, 26, 36, 37, 38], "stateless3": [10, 12], "2022": [10, 12, 13, 14, 15, 21, 27, 29, 30, 39, 40], "lf": [10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 30, 32, 33, 35], "repo": [10, 15], "prefix": 10, "those": 10, "wave": [10, 12, 13, 14, 19, 24], "iter": [10, 12, 13, 14, 17, 27, 29, 30, 39, 40, 41], "1224000": 10, "greedy_search": [10, 21, 27, 29, 39, 40, 41], "test_wav": [10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "1089": [10, 12, 13, 14, 15, 24, 28], "134686": [10, 12, 13, 14, 15, 24, 28], "0001": [10, 12, 13, 14, 15, 24, 28], "wav": [10, 12, 13, 14, 15, 17, 19, 21, 22, 24, 27, 29, 30, 32, 33, 35, 39, 40, 41], "1221": [10, 12, 13, 24, 28], "135766": [10, 12, 13, 24, 28], "0002": [10, 12, 13, 24, 28], "multipl": [10, 19, 21, 22, 24, 28, 32, 33, 35], "sound": [10, 12, 13, 14, 17, 18, 19, 21, 22, 24, 28, 32, 33, 35], "Its": [10, 12, 13, 14, 24], "output": [10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "09": [10, 13, 19, 21, 22, 24, 39], "233": [10, 12, 13], "265": 10, "50": [10, 12, 13, 14, 24, 27, 32, 39, 40, 41], "200": [10, 12, 13, 14, 19, 24, 25, 32, 33, 35], "3000": [10, 12, 13, 14], "80": [10, 12, 13, 14, 19, 21, 24], "subsampling_factor": [10, 13, 14, 19, 21, 24], "encoder_dim": [10, 12, 13, 14], "512": [10, 12, 13, 14, 19, 21, 24], "nhead": [10, 12, 14, 19, 21, 24, 27, 40], "dim_feedforward": [10, 12, 13, 21], "2048": [10, 12, 13, 14, 21], "num_encoder_lay": [10, 12, 13, 14, 21], "decoder_dim": [10, 12, 13, 14], "joiner_dim": [10, 12, 13, 14], "model_warm_step": [10, 12, 13], "4810e00d8738f1a21278b0156a42ff396a2d40ac": 10, "fri": 10, "oct": [10, 24], "03": [10, 13, 21, 24, 32, 33, 39], "miss": [10, 12, 13, 14, 21, 24], "cu102": [10, 12, 13, 14], "1013": 10, "c39cba5": 10, "dirti": [10, 12, 13, 19, 24], "jsonl": 10, "de": [10, 12, 13, 14, 21], "74279": [10, 12, 13, 14, 21], "0324160024": 10, "65bfd8b584": 10, "jjlbn": 10, "177": [10, 13, 14, 21, 22, 24], "203": [10, 24], "bpe_model": [10, 12, 13, 14, 24], "sound_fil": [10, 19, 21, 24, 35], "sample_r": [10, 19, 21, 24, 35], "16000": [10, 19, 21, 22, 24, 28, 29, 32, 33], "beam": [10, 39], "max_context": 10, "max_stat": 10, "context_s": [10, 12, 13, 14, 21], "max_sym_per_fram": [10, 21], "simulate_stream": 10, "decode_chunk_s": 10, "left_context": 10, "dynamic_chunk_train": 10, "causal_convolut": 10, "short_chunk_s": [10, 14, 40, 41], "25": [10, 12, 13, 19, 24, 27, 32, 33, 35, 40], "num_left_chunk": [10, 14], "blank_id": [10, 12, 13, 14, 21], "unk_id": 10, "vocab_s": [10, 12, 13, 14, 21], "612": 10, "458": 10, "disabl": [10, 12, 13], "giga": [10, 13, 39], "623": 10, "277": 10, "paramet": [10, 12, 13, 14, 16, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "78648040": 10, "951": [10, 24], "285": [10, 21, 24], "construct": [10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35], "952": 10, "295": [10, 19, 21, 22, 24], "957": 10, "301": [10, 24], "700": 10, "329": [10, 13, 24], "912": 10, "388": 10, "earli": [10, 12, 13, 14, 24, 28], "nightfal": [10, 12, 13, 14, 24, 28], "THE": [10, 12, 13, 14, 24, 28], "yellow": [10, 12, 13, 14, 24, 28], "lamp": [10, 12, 13, 14, 24, 28], "light": [10, 12, 13, 14, 24, 28], "here": [10, 12, 13, 14, 19, 21, 22, 24, 25, 28, 37, 40], "AND": [10, 12, 13, 14, 24, 28], "THERE": [10, 12, 13, 14, 24, 28], "squalid": [10, 12, 13, 14, 24, 28], "quarter": [10, 12, 13, 14, 24, 28], "OF": [10, 12, 13, 14, 24, 28], "brothel": [10, 12, 13, 14, 24, 28], "god": [10, 24, 28], "AS": [10, 24, 28], "direct": [10, 24, 28], "consequ": [10, 24, 28], "sin": [10, 24, 28], "man": [10, 24, 28], "punish": [10, 24, 28], "had": [10, 24, 28], "her": [10, 24, 28], "love": [10, 24, 28], "child": [10, 24, 28], "whose": [10, 21, 24, 28], "ON": [10, 12, 24, 28], "THAT": [10, 24, 28], "dishonor": [10, 24, 28], "bosom": [10, 24, 28], "TO": [10, 24, 28], "parent": [10, 24, 28], "forev": [10, 24, 28], "WITH": [10, 24, 28], "race": [10, 24, 28], "descent": [10, 24, 28], "mortal": [10, 24, 28], "BE": [10, 24, 28], "bless": [10, 24, 28], "soul": [10, 24, 28], "IN": [10, 24, 28], "heaven": [10, 24, 28], "yet": [10, 12, 13, 24, 28], "THESE": [10, 24, 28], "thought": [10, 24, 28], "affect": [10, 24, 28], "hester": [10, 24, 28], "prynn": [10, 24, 28], "less": [10, 24, 28, 35, 40, 41], "hope": [10, 20, 24, 28], "than": [10, 13, 19, 21, 22, 24, 27, 28, 29, 30, 35, 39, 40, 41], "apprehens": [10, 24, 28], "390": 10, "down": [10, 19, 24, 27, 29, 30, 39, 40, 41], "reproduc": [10, 24], "ln": [10, 12, 13, 14, 15, 19, 24, 27, 29, 30, 39, 40, 41], "9999": [10, 29, 30, 39], "symlink": 10, "pass": [10, 14, 19, 21, 22, 24, 27, 29, 30, 37, 39, 40, 41], "max": [10, 12, 13, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "durat": [10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "600": [10, 24, 27, 29, 39, 40, 41], "reason": [10, 12, 13, 14, 40], "support": [11, 12, 13, 14, 19, 21, 24, 27, 29, 30, 37, 39, 40, 41], "zipform": [11, 15, 18, 23, 26, 36, 38], "convemform": [11, 18, 37], "perform": [11, 21, 25, 40], "platform": [11, 15], "android": [11, 12, 13, 14, 15], "raspberri": [11, 15], "pi": [11, 15], "\u7231\u82af\u6d3e": 11, "maix": 11, "iii": 11, "axera": 11, "rv1126": 11, "static": 11, "produc": [11, 27, 29, 30, 39, 40, 41], "binari": [11, 12, 13, 14, 19, 21, 22, 24, 27, 35, 39, 40], "everyth": 11, "pnnx": [11, 18], "torchscript": [11, 16, 17, 18], "encod": [11, 15, 17, 18, 19, 21, 22, 24, 27, 28, 29, 35, 37, 39, 40, 41], "option": [11, 15, 18, 21, 25, 28, 32, 33, 35], "int8": [11, 18], "quantiz": [11, 18, 25], "zengwei": [12, 14, 15, 30, 39], "conv": [12, 13], "emform": [12, 13, 16], "stateless2": [12, 13, 39], "07": [12, 13, 14, 19, 21, 22, 24], "ubuntu": [12, 13, 14], "work": [12, 13, 14, 24], "cpp": [12, 16], "pretrained_model": [12, 13, 14], "online_transduc": 12, "continu": [12, 13, 14, 15, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "git_lfs_skip_smudg": [12, 13, 14, 15], "includ": [12, 13, 14, 15, 27, 29, 30, 39, 40, 41], "jit_xxx": [12, 13, 14], "anywher": [12, 13], "submodul": 12, "updat": [12, 13, 14], "recurs": 12, "init": 12, "cmake": [12, 13, 19, 24], "dcmake_build_typ": [12, 19, 24], "dncnn_python": 12, "dncnn_build_benchmark": 12, "off": 12, "dncnn_build_exampl": 12, "dncnn_build_tool": 12, "j4": 12, "pwd": 12, "src": [12, 14], "compon": [12, 37], "execut": [12, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "ncnn2int8": [12, 13], "our": [12, 13, 14, 16, 17, 24, 25, 27, 37, 40, 41], "cpython": 12, "38": [12, 19, 21, 24, 32], "gnu": 12, "am": 12, "sai": [12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "But": [12, 27, 29, 30, 39, 40, 41], "doe": [12, 13, 14, 19, 21, 24, 35], "As": [12, 21, 24, 25], "long": 12, "later": [12, 13, 14, 19, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "termin": 12, "tencent": [12, 13], "made": 12, "modif": [12, 21], "offic": 12, "synchron": 12, "offici": 12, "renam": [12, 13, 14], "conv_emformer_transducer_stateless2": [12, 37], "num": [12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "layer": [12, 13, 14, 21, 25, 27, 37, 39, 40, 41], "chunk": [12, 14, 15, 40, 41], "length": [12, 14, 21, 40, 41], "32": [12, 13, 14, 15, 19, 21, 22, 41], "cnn": [12, 14], "kernel": [12, 14, 21], "31": [12, 13, 14, 24], "left": [12, 14, 21, 40, 41], "context": [12, 21, 27, 37, 39, 40, 41], "right": [12, 21, 37, 40], "memori": [12, 19, 21, 24, 37], "dim": [12, 13, 14, 21, 27, 40], "configur": [12, 14, 21, 25, 28, 32, 33, 35], "accordingli": [12, 13, 14], "yourself": [12, 13, 14, 25, 40, 41], "tune": [12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "best": [12, 13, 14, 19, 22, 24], "combin": [12, 13, 14], "677": 12, "220": [12, 21, 22, 24], "681": 12, "229": [12, 19], "best_v": 12, "alid_epoch": 12, "subsampl": [12, 40, 41], "ing_factor": 12, "a34171ed85605b0926eebbd0463d059431f4f74a": 12, "wed": [12, 19, 21, 24], "dec": 12, "ver": 12, "ion": 12, "530e8a1": 12, "tue": [12, 24], "star": [12, 13, 14], "op": 12, "1220120619": [12, 13, 14], "7695ff496b": [12, 13, 14], "s9n4w": [12, 13, 14], "icefa": 12, "ll": 12, "transdu": 12, "cer": 12, "use_averaged_model": [12, 13, 14], "cnn_module_kernel": [12, 14], "left_context_length": 12, "chunk_length": 12, "right_context_length": 12, "memory_s": 12, "231": [12, 13, 14], "053": 12, "022": 12, "708": [12, 19, 21, 24, 35], "315": [12, 19, 21, 22, 24, 28], "75490012": 12, "318": [12, 13], "320": [12, 21], "682": 12, "lh": [12, 13, 14], "rw": [12, 13, 14], "kuangfangjun": [12, 13, 14], "289m": 12, "jan": [12, 13, 14], "289": 12, "roughli": [12, 13, 14], "equal": [12, 13, 14, 40, 41], "1024": [12, 13, 14, 39], "287": [12, 35], "1010k": [12, 13], "decoder_jit_trac": [12, 13, 14, 17, 39, 41], "283m": 12, "encoder_jit_trac": [12, 13, 14, 17, 39, 41], "0m": [12, 13], "joiner_jit_trac": [12, 13, 14, 17, 39, 41], "sure": [12, 13, 14], "found": [12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "param": [12, 13, 14], "503k": [12, 13], "437": [12, 13, 14], "142m": 12, "79k": 12, "5m": [12, 13], "488": [12, 13, 14], "text": [12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "architectur": [12, 13, 14, 39], "editor": [12, 13, 14], "content": [12, 13, 14], "compar": [12, 13, 14, 40], "283": [12, 14], "1010": [12, 13], "142": [12, 19, 22, 24], "503": [12, 13], "convers": [12, 13, 14], "half": [12, 13, 14, 27, 40, 41], "joiner": [12, 13, 14, 15, 17, 21, 27, 39, 40, 41], "default": [12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "float32": [12, 13, 14], "float16": [12, 13, 14], "occupi": [12, 13, 14], "byte": [12, 13, 14], "twice": [12, 13, 14], "smaller": [12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "fp16": [12, 13, 14, 27, 29, 30, 39, 40, 41], "won": [12, 13, 14, 15, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "token": [12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "accept": [12, 13, 14], "216": [12, 19, 24, 32, 33], "encoder_param_filenam": [12, 13, 14], "encoder_bin_filenam": [12, 13, 14], "decoder_param_filenam": [12, 13, 14], "decoder_bin_filenam": [12, 13, 14], "joiner_param_filenam": [12, 13, 14], "joiner_bin_filenam": [12, 13, 14], "sound_filenam": [12, 13, 14], "141": 12, "328": 12, "151": 12, "331": [12, 13, 24, 28], "176": [12, 21, 24], "336": 12, "106000": [12, 13, 14, 24, 28], "381": 12, "few": [12, 13, 14, 25], "7767517": [12, 13, 14], "1060": 12, "1342": 12, "in0": [12, 13, 14], "explan": [12, 13, 14], "three": [12, 13, 14, 17, 19, 21, 37], "magic": [12, 13, 14], "intermedi": [12, 13, 14], "mean": [12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "extra": [12, 13, 14, 21, 37, 40], "increment": [12, 13, 14], "1061": 12, "sherpametadata": [12, 13, 14], "sherpa_meta_data1": [12, 13, 14], "still": [12, 13, 14], "sinc": [12, 13, 14, 25, 35, 39], "newli": [12, 13, 14], "must": [12, 13, 14, 40], "kei": [12, 13, 14, 24], "valu": [12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "eas": [12, 13, 14], "list": [12, 13, 14, 19, 21, 22, 24, 28, 32, 33], "pair": [12, 13, 14], "sad": [12, 13, 14], "rememb": [12, 13, 14], "anymor": [12, 13, 14], "flexibl": [12, 13, 14], "edit": [12, 13, 14], "arm": [12, 13, 14], "aarch64": [12, 13, 14], "onc": [12, 13], "mayb": [12, 13], "year": [12, 13], "_jit_trac": [12, 13], "56": [12, 13, 24, 32], "fp32": [12, 13], "doubl": [12, 13], "j": [12, 13, 19, 24], "scale": [12, 13, 19, 24, 25, 28, 30, 32, 33], "py38": [12, 13, 14], "arg": [12, 13], "wave_filenam": [12, 13], "16k": [12, 13], "hz": [12, 13, 32, 33], "mono": [12, 13], "calibr": [12, 13], "purpos": [12, 13], "cat": [12, 13], "eof": [12, 13], "calcul": [12, 13, 29, 40, 41], "has_gpu": [12, 13], "config": [12, 13], "use_vulkan_comput": [12, 13], "88": [12, 21], "conv_87": 12, "942385": [12, 13], "threshold": [12, 13, 29], "938493": 12, "968131": 12, "conv_88": 12, "442448": 12, "549335": 12, "167552": 12, "conv_89": 12, "228289": 12, "001738": 12, "871552": 12, "linear_90": 12, "976146": 12, "101789": 12, "115": [12, 13, 19, 24], "267128": 12, "linear_91": 12, "962030": 12, "162033": 12, "602713": 12, "linear_92": 12, "323041": 12, "853959": 12, "953129": 12, "linear_94": 12, "905416": 12, "648006": 12, "323545": 12, "linear_93": 12, "474093": 12, "200188": 12, "linear_95": 12, "888012": 12, "403563": 12, "483986": 12, "linear_96": 12, "856741": 12, "398679": 12, "524273": 12, "linear_97": 12, "635942": 12, "613655": 12, "590950": 12, "linear_98": 12, "460340": 12, "670146": 12, "398010": 12, "linear_99": 12, "532276": 12, "585537": 12, "119396": 12, "linear_101": 12, "585871": 12, "719224": 12, "205809": 12, "linear_100": 12, "751382": 12, "081648": 12, "linear_102": 12, "593344": 12, "450581": 12, "87": 12, "551147": 12, "linear_103": 12, "592681": 12, "705824": 12, "257959": 12, "linear_104": 12, "752957": 12, "980955": 12, "110489": 12, "linear_105": 12, "696240": 12, "877193": 12, "608953": 12, "linear_106": 12, "059659": 12, "643138": 12, "048950": 12, "linear_108": 12, "975461": 12, "589567": 12, "671457": 12, "linear_107": 12, "190381": 12, "515701": 12, "linear_109": 12, "710759": 12, "305635": 12, "082436": 12, "linear_110": 12, "531228": 12, "731162": 12, "159557": 12, "linear_111": 12, "528083": 12, "259322": 12, "211544": 12, "linear_112": 12, "148807": 12, "500842": 12, "087374": 12, "linear_113": 12, "592566": 12, "948851": 12, "166611": 12, "linear_115": 12, "437109": 12, "608947": 12, "642395": 12, "linear_114": 12, "193942": 12, "503904": 12, "linear_116": 12, "966980": 12, "200896": 12, "676392": 12, "linear_117": 12, "451303": 12, "061664": 12, "951344": 12, "linear_118": 12, "077262": 12, "965800": 12, "023804": 12, "linear_119": 12, "671615": 12, "847613": 12, "198460": 12, "linear_120": 12, "625638": 12, "131427": 12, "556595": 12, "linear_122": 12, "274080": 12, "888716": 12, "978189": 12, "linear_121": 12, "420480": 12, "429659": 12, "linear_123": 12, "826197": 12, "599617": 12, "281532": 12, "linear_124": 12, "396383": 12, "325849": 12, "335875": 12, "linear_125": 12, "337198": 12, "941410": 12, "221970": 12, "linear_126": 12, "699965": 12, "842878": 12, "224073": 12, "linear_127": 12, "775370": 12, "884215": 12, "696438": 12, "linear_129": 12, "872276": 12, "837319": 12, "254213": 12, "linear_128": 12, "180057": 12, "687883": 12, "linear_130": 12, "150427": 12, "454298": 12, "765789": 12, "linear_131": 12, "112692": 12, "924847": 12, "025545": 12, "linear_132": 12, "852893": 12, "116593": 12, "749626": 12, "linear_133": 12, "517084": 12, "024665": 12, "275314": 12, "linear_134": 12, "683807": 12, "878618": 12, "743618": 12, "linear_136": 12, "421055": 12, "322729": 12, "086264": 12, "linear_135": 12, "309880": 12, "917679": 12, "linear_137": 12, "827781": 12, "744595": 12, "33": [12, 13, 19, 20, 21, 24, 32], "915554": 12, "linear_138": 12, "422395": 12, "742882": 12, "402161": 12, "linear_139": 12, "527538": 12, "866123": 12, "849449": 12, "linear_140": 12, "128619": 12, "657793": 12, "266134": 12, "linear_141": 12, "839593": 12, "845993": 12, "021378": 12, "linear_143": 12, "442304": 12, "099039": 12, "889746": 12, "linear_142": 12, "325038": 12, "849592": 12, "linear_144": 12, "929444": 12, "618206": 12, "605080": 12, "linear_145": 12, "382126": 12, "321095": 12, "625010": 12, "linear_146": 12, "894987": 12, "867645": 12, "836517": 12, "linear_147": 12, "915313": 12, "906028": 12, "886522": 12, "linear_148": 12, "614287": 12, "908151": 12, "496181": 12, "linear_150": 12, "724932": 12, "485588": 12, "312899": 12, "linear_149": 12, "161146": 12, "606939": 12, "linear_151": 12, "164453": 12, "847355": 12, "719223": 12, "linear_152": 12, "086471": 12, "984121": 12, "222834": 12, "linear_153": 12, "099524": 12, "991601": 12, "816805": 12, "linear_154": 12, "054585": 12, "489706": 12, "286930": 12, "linear_155": 12, "389185": 12, "100321": 12, "963501": 12, "linear_157": 12, "982999": 12, "154796": 12, "637253": 12, "linear_156": 12, "537706": 12, "875190": 12, "linear_158": 12, "420287": 12, "502287": 12, "531588": 12, "linear_159": 12, "014746": 12, "423280": 12, "477261": 12, "linear_160": 12, "633553": 12, "715335": 12, "220921": 12, "linear_161": 12, "371849": 12, "117830": 12, "815203": 12, "linear_162": 12, "492933": 12, "126283": 12, "623318": 12, "linear_164": 12, "697504": 12, "825712": 12, "317358": 12, "linear_163": 12, "078367": 12, "008038": 12, "linear_165": 12, "023975": 12, "836278": 12, "577358": 12, "linear_166": 12, "860619": 12, "259792": 12, "493614": 12, "linear_167": 12, "380934": 12, "496160": 12, "107042": 12, "linear_168": 12, "691216": 12, "733317": 12, "831076": 12, "linear_169": 12, "723948": 12, "952728": 12, "129707": 12, "linear_171": 12, "034811": 12, "366547": 12, "665123": 12, "linear_170": 12, "356277": 12, "710501": 12, "linear_172": 12, "556884": 12, "729481": 12, "166058": 12, "linear_173": 12, "033039": 12, "207264": 12, "442120": 12, "linear_174": 12, "597379": 12, "658676": 12, "768131": 12, "linear_2": [12, 13], "293503": 12, "305265": 12, "877850": 12, "linear_1": [12, 13], "812222": 12, "766452": 12, "487047": 12, "linear_3": [12, 13], "999999": 12, "999755": 12, "031174": 12, "wish": [12, 13], "low": [12, 13], "accuraci": [12, 13, 20], "955k": 12, "18k": 12, "inparam": [12, 13], "inbin": [12, 13], "outparam": [12, 13], "outbin": [12, 13], "99m": 12, "78k": 12, "774k": [12, 13], "496": [12, 13, 24, 28], "774": [12, 13], "much": [12, 13], "linear": [12, 13, 21], "convolut": [12, 13, 29, 37, 40], "exact": [12, 13], "4x": [12, 13], "speed": [12, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "comparison": 12, "44": [12, 13, 24, 32, 33], "468000": [13, 17, 39], "lstm_transducer_stateless2": [13, 17, 39], "rnn": [13, 21, 27, 29, 39, 40, 41], "hidden": [13, 39], "222": [13, 22, 24], "is_pnnx": 13, "62e404dd3f3a811d73e424199b3408e309c06e1a": [13, 14], "6d7a559": [13, 14], "feb": [13, 14, 21], "147": [13, 14], "rnn_hidden_s": 13, "aux_layer_period": 13, "235": 13, "43": [13, 14, 24], "239": [13, 21], "472": 13, "595": 13, "324": 13, "83137520": 13, "596": 13, "325": 13, "257024": 13, "326": 13, "781812": 13, "327": 13, "84176356": 13, "182": [13, 14, 19, 28], "158": 13, "183": [13, 32, 33], "335": 13, "101": 13, "tracerwarn": [13, 14], "boolean": [13, 14], "might": [13, 14, 40, 41], "caus": [13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "incorrect": [13, 14, 21], "flow": [13, 14], "treat": [13, 14], "constant": [13, 14], "futur": [13, 14, 21, 42], "need_pad": 13, "bool": 13, "259": [13, 19], "180": [13, 19, 24], "339": 13, "207": [13, 22, 24], "84": [13, 19], "324m": 13, "321": [13, 19], "107": [13, 28], "318m": 13, "159m": 13, "21k": 13, "159": [13, 24, 35], "861": 13, "255": [13, 14], "425": [13, 24], "427": [13, 24], "266": [13, 14, 24, 28], "431": 13, "342": 13, "343": 13, "267": [13, 21, 32, 33], "379": 13, "268": [13, 24, 28], "317m": 13, "317": 13, "conv_15": 13, "930708": 13, "972025": 13, "conv_16": 13, "978855": 13, "031788": 13, "456645": 13, "conv_17": 13, "868437": 13, "830528": 13, "218575": 13, "linear_18": 13, "107259": 13, "194808": 13, "293236": 13, "linear_19": 13, "193777": 13, "634748": 13, "401705": 13, "linear_20": 13, "259933": 13, "606617": 13, "722160": 13, "linear_21": 13, "186600": 13, "790260": 13, "512129": 13, "linear_22": 13, "759041": 13, "265832": 13, "050053": 13, "linear_23": 13, "931209": 13, "099090": 13, "979767": 13, "linear_24": 13, "324160": 13, "215561": 13, "321835": 13, "linear_25": 13, "800708": 13, "599352": 13, "284134": 13, "linear_26": 13, "492444": 13, "153369": 13, "274391": 13, "linear_27": 13, "660161": 13, "720994": 13, "46": [13, 19, 24], "674126": 13, "linear_28": 13, "415265": 13, "174434": 13, "007133": 13, "linear_29": 13, "038418": 13, "118534": 13, "724262": 13, "linear_30": 13, "072084": 13, "936867": 13, "259155": 13, "linear_31": 13, "342712": 13, "599489": 13, "282787": 13, "linear_32": 13, "340535": 13, "120308": 13, "701103": 13, "linear_33": 13, "846987": 13, "630030": 13, "985939": 13, "linear_34": 13, "686298": 13, "204571": 13, "607586": 13, "linear_35": 13, "904821": 13, "575518": 13, "756420": 13, "linear_36": 13, "806659": 13, "585589": 13, "118401": 13, "linear_37": 13, "402340": 13, "047157": 13, "162680": 13, "linear_38": 13, "174589": 13, "923361": 13, "030258": 13, "linear_39": 13, "178576": 13, "556058": 13, "807705": 13, "linear_40": 13, "901954": 13, "301267": 13, "956539": 13, "linear_41": 13, "839805": 13, "597429": 13, "716181": 13, "linear_42": 13, "178945": 13, "651595": 13, "895699": 13, "829245": 13, "627592": 13, "637907": 13, "746186": 13, "255032": 13, "167313": 13, "000000": 13, "999756": 13, "031013": 13, "345k": 13, "17k": 13, "218m": 13, "larger": [13, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "counterpart": 13, "bit": [13, 19, 21, 22, 24, 28, 35], "4532": 13, "stateless7": [14, 15], "pruned_transducer_stateless7_stream": [14, 15, 41], "len": [14, 15, 41], "feedforward": [14, 21, 27, 40], "384": [14, 24], "192": [14, 24], "unmask": 14, "256": [14, 32, 33], "downsampl": [14, 20], "factor": [14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "473": [14, 24], "246": [14, 21, 24, 32, 33], "477": 14, "warm_step": 14, "2000": [14, 22], "feedforward_dim": 14, "attention_dim": [14, 19, 21, 24], "encoder_unmasked_dim": 14, "zipformer_downsampling_factor": 14, "decode_chunk_len": 14, "257": [14, 21, 32, 33], "023": 14, "zipformer2": 14, "419": 14, "At": [14, 19, 24], "stack": 14, "downsampling_factor": 14, "037": 14, "655": 14, "346": 14, "68944004": 14, "347": 14, "260096": 14, "348": [14, 32], "716276": 14, "656": [14, 24], "349": 14, "69920376": 14, "351": 14, "353": 14, "174": [14, 24], "175": 14, "1344": 14, "assert": 14, "cached_len": 14, "num_lay": 14, "1348": 14, "cached_avg": 14, "1352": 14, "cached_kei": 14, "1356": 14, "cached_v": 14, "1360": 14, "cached_val2": 14, "1364": 14, "cached_conv1": 14, "1368": 14, "cached_conv2": 14, "1373": 14, "left_context_len": 14, "1884": 14, "x_size": 14, "2442": 14, "2449": 14, "2469": 14, "2473": 14, "2483": 14, "kv_len": 14, "k": [14, 27, 32, 33, 39, 40, 41], "2570": 14, "attn_output": 14, "bsz": 14, "num_head": 14, "seq_len": 14, "head_dim": 14, "2926": 14, "lorder": 14, "2652": 14, "2653": 14, "embed_dim": 14, "2666": 14, "1543": 14, "in_x_siz": 14, "1637": 14, "1643": 14, "in_channel": 14, "1571": 14, "1763": 14, "src1": 14, "src2": 14, "1779": 14, "dim1": 14, "1780": 14, "dim2": 14, "_trace": 14, "958": 14, "tracer": 14, "instead": [14, 21, 40], "tupl": 14, "namedtupl": 14, "absolut": 14, "know": [14, 25], "side": 14, "effect": 14, "strict": [14, 20], "allow": [14, 27, 40], "behavior": [14, 21], "_c": 14, "_create_method_from_trac": 14, "646": 14, "357": 14, "embedding_out": 14, "686": 14, "361": [14, 24, 28], "735": 14, "69": 14, "269m": 14, "53": [14, 19, 27, 28, 33, 39, 40], "269": [14, 19, 32, 33], "725": [14, 28], "1022k": 14, "266m": 14, "8m": 14, "509k": 14, "133m": 14, "152k": 14, "4m": 14, "1022": 14, "133": 14, "509": 14, "260": [14, 24], "360": 14, "365": 14, "280": [14, 24], "372": [14, 19], "state": [14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "026": 14, "410": 14, "2028": 14, "2547": 14, "2029": 14, "23316": 14, "23317": 14, "23318": 14, "23319": 14, "23320": 14, "amount": [14, 20], "pad": [14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "conv2dsubsampl": 14, "v2": [14, 19, 24], "arrai": 14, "23300": 14, "element": 14, "onnx_pretrain": 15, "onnxruntim": 15, "separ": 15, "deploi": [15, 19, 24], "repo_url": 15, "basenam": 15, "pushd": 15, "popd": 15, "tree": [16, 17, 19, 21, 22, 24, 28, 32, 33, 35, 39], "cpu_jit": [16, 19, 24, 27, 29, 30, 40, 41], "confus": 16, "move": [16, 27, 29, 30, 40, 41], "why": 16, "streaming_asr": [16, 17, 39, 40, 41], "conv_emform": 16, "offline_asr": [16, 27], "jit_pretrain": [17, 29, 30, 39], "baz": 17, "tutori": [19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 39, 40, 41], "learn": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "singl": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "1best": [19, 22, 24, 28, 29, 30, 32, 33], "automag": [19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "stop": [19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "control": [19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "By": [19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "musan": [19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "thei": [19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "intal": [19, 22], "initi": [19, 22], "sudo": [19, 22], "apt": [19, 22], "permiss": [19, 22], "commandlin": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "quit": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "often": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "experi": [19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "world": [19, 21, 22, 24, 25, 27, 28, 29, 30, 39, 40, 41], "multi": [19, 21, 22, 24, 25, 27, 29, 30, 37, 39, 40, 41], "machin": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "ddp": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "implement": [19, 21, 22, 24, 25, 27, 29, 30, 37, 39, 40, 41], "present": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "second": [19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "utter": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "oom": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "v100": [19, 21, 22, 24], "nvidia": [19, 21, 22, 24], "due": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "usual": [19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "increas": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "weight": [19, 22, 24, 29, 30, 39], "decai": [19, 22, 24, 29, 30, 39], "warmup": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "function": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "get_param": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "realli": [19, 22, 24, 27, 29, 30, 39, 40, 41], "directli": [19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "perturb": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "actual": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "3x150": [19, 21, 22], "450": [19, 21, 22], "hour": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "These": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "rate": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "visual": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "logdir": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "labelsmooth": 19, "someth": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "tensorflow": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "press": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40, 41], "ctrl": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40, 41], "engw8ksktzqs24zbv5dgcg": 19, "22t11": 19, "scan": [19, 21, 22, 24, 27, 35, 39, 40], "116068": 19, "scalar": [19, 21, 22, 24, 27, 35, 39, 40], "listen": [19, 21, 22, 27, 35, 39, 40], "url": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "xxxx": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "saw": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "consol": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "typic": [19, 21, 22, 24], "avoid": [19, 21, 24], "commonli": [19, 21, 22, 24, 28, 32, 33, 35], "nbest": [19, 24, 30], "lattic": [19, 22, 24, 27, 28, 32, 33, 40, 41], "score": [19, 24, 27, 40, 41], "uniqu": [19, 24, 27, 40, 41], "pkufool": [19, 22, 28], "icefall_asr_aishell_conformer_ctc": 19, "transcrib": [19, 21, 22, 24], "v1": [19, 22, 24, 28, 32, 33], "lang_char": [19, 21], "word": [19, 21, 22, 24, 28, 32, 33, 35], "bac009s0764w0121": [19, 21, 22], "bac009s0764w0122": [19, 21, 22], "bac009s0764w0123": [19, 21, 22], "tran": [19, 22, 24, 28, 32, 33], "graph": [19, 22, 24, 27, 28, 32, 33, 40, 41], "id": [19, 22, 24, 28, 32, 33], "conveni": [19, 22, 24, 25], "eo": [19, 22, 24], "easili": [19, 22, 24], "obtain": [19, 21, 22, 24, 28, 32, 33], "soxi": [19, 21, 22, 24, 28, 35], "sampl": [19, 21, 22, 24, 28, 29, 35, 40, 41], "precis": [19, 21, 22, 24, 27, 28, 35, 40, 41], "67263": [19, 21, 22], "cdda": [19, 21, 22, 24, 28, 35], "sector": [19, 21, 22, 24, 28, 35], "135k": [19, 21, 22], "256k": [19, 21, 22, 24], "sign": [19, 21, 22, 24, 35], "integ": [19, 21, 22, 24, 35], "pcm": [19, 21, 22, 24, 35], "65840": [19, 21, 22], "625": [19, 21, 22], "132k": [19, 21, 22], "64000": [19, 21, 22], "300": [19, 21, 22, 24, 25, 27, 40], "128k": [19, 21, 22, 35], "displai": [19, 21, 22, 24], "topologi": [19, 24], "707": [19, 24], "num_decoder_lay": [19, 24], "vgg_frontend": [19, 21, 24], "use_feat_batchnorm": [19, 24], "f2fd997f752ed11bbef4c306652c433e83f9cf12": 19, "sun": 19, "sep": 19, "33cfe45": 19, "d57a873": 19, "nov": [19, 24], "hw": 19, "kangwei": 19, "icefall_aishell3": 19, "k2_releas": 19, "tokens_fil": 19, "words_fil": [19, 24, 35], "num_path": [19, 24, 27, 40, 41], "ngram_lm_scal": [19, 24], "attention_decoder_scal": [19, 24], "nbest_scal": [19, 24], "sos_id": [19, 24], "eos_id": [19, 24], "num_class": [19, 24, 35], "4336": [19, 21], "242": [19, 24], "131": [19, 24], "134": 19, "275": 19, "293": [19, 24], "704": [19, 32], "369": [19, 24], "\u751a": [19, 21], "\u81f3": [19, 21], "\u51fa": [19, 21], "\u73b0": [19, 21], "\u4ea4": [19, 21], "\u6613": [19, 21], "\u51e0": [19, 21], "\u4e4e": [19, 21], "\u505c": [19, 21], "\u6b62": 19, "\u7684": [19, 21, 22], "\u60c5": [19, 21], "\u51b5": [19, 21], "\u4e00": [19, 21], "\u4e8c": [19, 21], "\u7ebf": [19, 21, 22], "\u57ce": [19, 21], "\u5e02": [19, 21], "\u867d": [19, 21], "\u7136": [19, 21], "\u4e5f": [19, 21, 22], "\u5904": [19, 21], "\u4e8e": [19, 21], "\u8c03": [19, 21], "\u6574": [19, 21], "\u4e2d": [19, 21, 22], "\u4f46": [19, 21, 22], "\u56e0": [19, 21], "\u4e3a": [19, 21], "\u805a": [19, 21], "\u96c6": [19, 21], "\u4e86": [19, 21, 22], "\u8fc7": [19, 21], "\u591a": [19, 21], "\u516c": [19, 21], "\u5171": [19, 21], "\u8d44": [19, 21], "\u6e90": [19, 21], "371": 19, "683": 19, "651": [19, 35], "654": 19, "659": 19, "752": 19, "887": 19, "340": 19, "370": 19, "\u751a\u81f3": [19, 22], "\u51fa\u73b0": [19, 22], "\u4ea4\u6613": [19, 22], "\u51e0\u4e4e": [19, 22], "\u505c\u6b62": 19, "\u60c5\u51b5": [19, 22], "\u4e00\u4e8c": [19, 22], "\u57ce\u5e02": [19, 22], "\u867d\u7136": [19, 22], "\u5904\u4e8e": [19, 22], "\u8c03\u6574": [19, 22], "\u56e0\u4e3a": [19, 22], "\u805a\u96c6": [19, 22], "\u8fc7\u591a": [19, 22], "\u516c\u5171": [19, 22], "\u8d44\u6e90": [19, 22], "n": [19, 25, 27, 29, 30, 32, 33, 39, 40, 41], "recor": [19, 24], "highest": [19, 24], "965": 19, "966": 19, "821": 19, "822": 19, "826": 19, "916": 19, "345": 19, "888": 19, "889": 19, "limit": [19, 21, 24, 37, 40], "upgrad": [19, 24], "pro": [19, 24], "finish": [19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 40, 41], "NOT": [19, 21, 24, 35], "checkout": [19, 24], "hlg_decod": [19, 24], "four": [19, 24], "messag": [19, 24, 27, 29, 30, 39, 40, 41], "nn_model": [19, 24], "use_gpu": [19, 24], "word_tabl": [19, 24], "caution": [19, 24], "forward": [19, 24, 29], "89": 19, "cu": [19, 24], "int": [19, 24], "char": [19, 24], "98": 19, "150": [19, 24], "693": [19, 32], "165": [19, 24], "nnet_output": [19, 24], "489": 19, "mandarin": 20, "corpu": 20, "beij": 20, "shell": 20, "technologi": 20, "ltd": 20, "400": 20, "peopl": 20, "accent": 20, "area": 20, "china": 20, "invit": 20, "particip": 20, "conduct": 20, "quiet": 20, "indoor": 20, "high": 20, "fidel": 20, "microphon": 20, "16khz": 20, "manual": 20, "95": 20, "through": 20, "profession": 20, "annot": 20, "inspect": 20, "free": [20, 25, 39], "academ": 20, "moder": 20, "research": 20, "field": 20, "openslr": 20, "ctc": [20, 23, 26, 30, 31, 34], "stateless": [20, 23, 27, 39, 40, 41], "head": [21, 37], "embed": [21, 27, 39, 40, 41], "conv1d": [21, 27, 39, 40, 41], "nn": [21, 27, 29, 30, 39, 40, 41], "tanh": 21, "borrow": 21, "ieeexplor": 21, "ieee": 21, "stamp": 21, "jsp": 21, "arnumb": 21, "9054419": 21, "predict": [21, 25, 27, 39, 40, 41], "charact": 21, "unit": 21, "vocabulari": 21, "87939824": 21, "optimized_transduc": 21, "technqiu": 21, "propos": [21, 37, 41], "improv": 21, "end": [21, 27, 29, 30, 35, 39, 40, 41], "furthermor": 21, "maximum": 21, "emit": 21, "per": [21, 27, 40, 41], "simplifi": [21, 37], "significantli": 21, "degrad": 21, "exactli": 21, "benchmark": 21, "unprun": 21, "advantag": 21, "minim": 21, "pruned_transducer_stateless": [21, 27, 37, 40], "altern": 21, "though": 21, "transducer_stateless_modifi": 21, "pr": 21, "gb": 21, "ram": 21, "small": [21, 32, 33, 35], "tri": 21, "prob": [21, 39], "appli": [21, 37], "219": [21, 24], "c": [21, 22, 27, 29, 30, 35, 39, 40, 41], "lagz6hrcqxoigbfd5e0y3q": 21, "03t14": 21, "8477": 21, "250": [21, 28], "sym": [21, 27, 40, 41], "beam_search": [21, 27, 40, 41], "decoding_method": 21, "beam_4": 21, "ensur": 21, "give": 21, "poor": 21, "531": [21, 22], "994": [21, 24], "027": 21, "encoder_out_dim": 21, "f4fefe4882bc0ae59af951da3f47335d5495ef71": 21, "50d2281": 21, "mar": 21, "0815224919": 21, "75d558775b": 21, "mmnv8": 21, "72": [21, 24], "248": 21, "878": [21, 33], "880": 21, "891": 21, "113": [21, 24], "userwarn": 21, "__floordiv__": 21, "round": 21, "toward": 21, "trunc": 21, "floor": 21, "neg": 21, "keep": [21, 27, 40, 41], "div": 21, "b": [21, 24, 32, 33], "rounding_mod": 21, "divis": 21, "x_len": 21, "163": [21, 24], "\u6ede": 21, "322": 21, "760": 21, "919": 21, "922": 21, "929": 21, "046": 21, "047": 21, "319": [21, 24], "798": 21, "214": [21, 24], "215": [21, 24, 28], "402": 21, "topk_hyp_index": 21, "topk_index": 21, "logit": 21, "583": [21, 33], "lji9mwuorlow3jkdhxwk8a": 22, "13t11": 22, "4454": 22, "icefall_asr_aishell_tdnn_lstm_ctc": 22, "858": [22, 24], "154": 22, "161": [22, 24], "536": 22, "539": 22, "917": 22, "129": 22, "\u505c\u6ede": 22, "statelessx": [23, 25, 26, 36, 37, 38], "mmi": [23, 26], "blank": [23, 26], "skip": [23, 25, 26, 27, 39, 40, 41], "distil": [23, 26], "hubert": [23, 26], "ligru": [23, 31], "full": [24, 25, 27, 29, 30, 39, 40, 41], "libri": [24, 25, 27, 29, 30, 39, 40, 41], "960": [24, 27, 29, 30, 39, 40, 41], "subset": [24, 27, 29, 30, 39, 40, 41], "3x960": [24, 27, 29, 30, 39, 40, 41], "2880": [24, 27, 29, 30, 39, 40, 41], "lzgnetjwrxc3yghnmd4kpw": 24, "24t16": 24, "4540": 24, "sentenc": 24, "piec": 24, "And": [24, 27, 29, 30, 39, 40, 41], "neither": 24, "nor": 24, "vocab": 24, "5000": 24, "033": 24, "537": 24, "538": 24, "full_libri": [24, 25], "406": 24, "464": 24, "548": 24, "776": 24, "652": [24, 35], "109226120": 24, "714": [24, 32], "206": 24, "944": 24, "1328": 24, "443": [24, 28], "2563": 24, "494": 24, "592": 24, "1715": 24, "52576": 24, "128": 24, "1424": 24, "807": 24, "506": 24, "808": [24, 32], "522": 24, "362": 24, "565": 24, "1477": 24, "2922": 24, "208": 24, "4295": 24, "52343": 24, "396": 24, "3584": 24, "432": 24, "433": 24, "680": [24, 32], "_pickl": 24, "unpicklingerror": 24, "hlg_modifi": 24, "g_4_gram": [24, 28, 32, 33], "875": [24, 28], "212k": 24, "267440": [24, 28], "1253": [24, 28], "535k": 24, "83": [24, 28], "77200": [24, 28], "154k": 24, "554": 24, "7178d67e594bc7fa89c2b331ad7bd1c62a6a9eb4": 24, "8d93169": 24, "601": 24, "758": 24, "025": 24, "broffel": 24, "osom": 24, "723": 24, "775": 24, "881": 24, "234": 24, "571": 24, "whole": [24, 28, 32, 33, 40, 41], "ngram": [24, 28, 32, 33], "857": 24, "979": 24, "980": 24, "055": 24, "117": 24, "051": 24, "363": 24, "959": [24, 33], "546": 24, "599": [24, 28], "833": 24, "834": 24, "915": 24, "076": 24, "110": 24, "397": 24, "999": [24, 27, 40, 41], "concaten": 24, "bucket": 24, "sampler": 24, "1000": 24, "ctc_decod": 24, "ngram_lm_rescor": 24, "attention_rescor": 24, "kind": [24, 27, 29, 30, 39, 40, 41], "105": 24, "221": 24, "125": [24, 35], "136": 24, "228": 24, "144": 24, "543": 24, "topo": 24, "547": 24, "729": 24, "702": 24, "703": 24, "545": 24, "279": 24, "122": 24, "126": 24, "135": [24, 35], "153": [24, 35], "945": 24, "475": 24, "191": [24, 32, 33], "398": 24, "515": 24, "w": [24, 32, 33], "deseri": 24, "441": 24, "fsaclass": 24, "loadfsa": 24, "const": 24, "string": 24, "c10": 24, "ignor": 24, "dummi": 24, "589": 24, "attention_scal": 24, "162": 24, "169": [24, 32, 33], "188": 24, "984": 24, "624": 24, "519": [24, 33], "632": 24, "645": [24, 35], "243": 24, "970": 24, "303": 24, "179": 24, "knowledg": 25, "_": 25, "vector": 25, "mvq": 25, "kd": 25, "paper": [25, 27, 39, 40, 41], "pruned_transducer_stateless4": [25, 27, 37, 40], "theoret": 25, "applic": 25, "minor": 25, "out": 25, "necessari": 25, "thing": 25, "distillation_with_hubert": 25, "Of": 25, "cours": 25, "xl": 25, "proce": 25, "960h": [25, 29], "use_extracted_codebook": 25, "augment": 25, "th": [25, 32, 33], "fine": 25, "embedding_lay": 25, "num_codebook": 25, "under": 25, "vq_fbank_layer36_cb8": 25, "whola": 25, "snippet": 25, "echo": 25, "awk": 25, "split": 25, "pruned_transducer_stateless6": 25, "12359": 25, "spec": 25, "aug": 25, "warp": 25, "enabl": 25, "argument": [25, 37], "paid": 25, "similar": [25, 29, 40, 41], "suitabl": [27, 39, 40, 41], "pruned_transducer_stateless2": [27, 37, 40], "pruned_transducer_stateless5": [27, 37, 40], "scroll": [27, 29, 30, 39, 40, 41], "scratch": [27, 29, 30, 39, 40, 41], "arxiv": [27, 39, 40, 41], "ab": [27, 39, 40, 41], "2206": [27, 39, 40, 41], "13236": [27, 39, 40, 41], "rework": [27, 37, 40], "daniel": [27, 40, 41], "joint": [27, 39, 40, 41], "contrari": [27, 39, 40, 41], "convent": [27, 39, 40, 41], "recurr": [27, 39, 40, 41], "2x": [27, 40, 41], "dimens": [27, 40, 41], "littl": [27, 40], "436000": [27, 29, 30, 39, 40, 41], "438000": [27, 29, 30, 39, 40, 41], "qogspbgsr8kzcrmmie9jgw": 27, "20t15": [27, 39, 40], "4468": [27, 39, 40], "210171": [27, 39, 40], "access": [27, 29, 30, 39, 40, 41], "6008": [27, 29, 30, 39, 40, 41], "localhost": [27, 29, 30, 39, 40, 41], "expos": [27, 29, 30, 39, 40, 41], "proxi": [27, 29, 30, 39, 40, 41], "bind_al": [27, 29, 30, 39, 40, 41], "both": [27, 29, 30, 37, 39, 40, 41], "lowest": [27, 29, 30, 39, 40, 41], "fast_beam_search": [27, 29, 39, 40, 41], "474000": [27, 39, 40, 41], "largest": [27, 40, 41], "posterior": [27, 29, 40, 41], "algorithm": [27, 40, 41], "pdf": [27, 30, 40, 41], "1211": [27, 40, 41], "3711": [27, 40, 41], "espnet": [27, 40, 41], "net": [27, 40, 41], "beam_search_transduc": [27, 40, 41], "basicli": [27, 40, 41], "topk": [27, 40, 41], "expand": [27, 40, 41], "mode": [27, 40, 41], "being": [27, 40, 41], "hardcod": [27, 40, 41], "composit": [27, 40, 41], "between": [27, 40, 41], "log_prob": [27, 40, 41], "hard": [27, 37, 40, 41], "2211": [27, 40, 41], "00484": [27, 40, 41], "rnnt": [27, 40, 41], "effici": [27, 40, 41], "fast_beam_search_lg": [27, 40, 41], "trivial": [27, 40, 41], "fast_beam_search_nbest": [27, 40, 41], "random_path": [27, 40, 41], "shortest": [27, 40, 41], "fast_beam_search_nbest_lg": [27, 40, 41], "logic": [27, 40, 41], "smallest": [27, 39, 40, 41], "icefall_asr_librispeech_tdnn": 28, "lstm_ctc": 28, "flac": 28, "116k": 28, "140k": 28, "343k": 28, "164k": 28, "105k": 28, "174k": 28, "pretraind": 28, "170": 28, "584": [28, 33], "209": 28, "245": 28, "098": 28, "099": 28, "methond": [28, 32, 33], "403": 28, "631": 28, "190": 28, "121": 28, "010": 28, "guidanc": 29, "bigger": 29, "simpli": 29, "discard": 29, "prevent": 29, "lconv": 29, "encourag": [29, 30, 39], "stabil": [29, 30], "doesn": 29, "warm": [29, 30], "xyozukpeqm62hbilud4upa": [29, 30], "ctc_guide_decode_b": 29, "pretrained_ctc": 29, "jit_pretrained_ctc": 29, "100h": 29, "yfyeung": 29, "wechat": 30, "zipformer_mmi": 30, "worker": [30, 39], "hp": 30, "tdnn_ligru_ctc": 32, "enough": [32, 33, 35], "luomingshuang": [32, 33], "icefall_asr_timit_tdnn_ligru_ctc": 32, "pretrained_average_9_25": 32, "fdhc0_si1559": [32, 33], "felc0_si756": [32, 33], "fmgd0_si1564": [32, 33], "ffprobe": [32, 33], "show_format": [32, 33], "nistspher": [32, 33], "database_id": [32, 33], "database_vers": [32, 33], "utterance_id": [32, 33], "dhc0_si1559": [32, 33], "sample_min": [32, 33], "4176": [32, 33], "sample_max": [32, 33], "5984": [32, 33], "bitrat": [32, 33], "258": [32, 33], "audio": [32, 33], "pcm_s16le": [32, 33], "s16": [32, 33], "elc0_si756": [32, 33], "1546": [32, 33], "1989": [32, 33], "mgd0_si1564": [32, 33], "7626": [32, 33], "10573": [32, 33], "660": 32, "695": 32, "697": 32, "210": [32, 33], "819": 32, "829": 32, "sil": [32, 33], "dh": [32, 33], "ih": [32, 33], "uw": [32, 33], "ah": [32, 33], "ii": [32, 33], "z": [32, 33], "aa": [32, 33], "ei": [32, 33], "dx": [32, 33], "d": [32, 33], "uh": [32, 33], "ng": [32, 33], "eh": [32, 33], "jh": [32, 33], "er": [32, 33], "ai": [32, 33], "hh": [32, 33], "aw": 32, "ae": [32, 33], "705": 32, "715": 32, "720": 32, "ch": 32, "icefall_asr_timit_tdnn_lstm_ctc": 33, "pretrained_average_16_25": 33, "816": 33, "827": 33, "387": 33, "unk": 33, "739": 33, "971": 33, "977": 33, "978": 33, "981": 33, "ow": 33, "ykubhb5wrmosxykid1z9eg": 35, "23t23": 35, "icefall_asr_yesno_tdnn": 35, "l_disambig": 35, "lexicon_disambig": 35, "arpa": 35, "0_0_0_1_0_0_0_1": 35, "0_0_1_0_0_0_1_0": 35, "0_0_1_0_0_1_1_1": 35, "0_0_1_0_1_0_0_1": 35, "0_0_1_1_0_0_0_1": 35, "0_0_1_1_0_1_1_0": 35, "0_0_1_1_1_0_0_0": 35, "0_0_1_1_1_1_0_0": 35, "0_1_0_0_0_1_0_0": 35, "0_1_0_0_1_0_1_0": 35, "0_1_0_1_0_0_0_0": 35, "0_1_0_1_1_1_0_0": 35, "0_1_1_0_0_1_1_1": 35, "0_1_1_1_0_0_1_0": 35, "0_1_1_1_1_0_1_0": 35, "1_0_0_0_0_0_0_0": 35, "1_0_0_0_0_0_1_1": 35, "1_0_0_1_0_1_1_1": 35, "1_0_1_1_0_1_1_1": 35, "1_0_1_1_1_1_0_1": 35, "1_1_0_0_0_1_1_1": 35, "1_1_0_0_1_0_1_1": 35, "1_1_0_1_0_1_0_0": 35, "1_1_0_1_1_0_0_1": 35, "1_1_0_1_1_1_1_0": 35, "1_1_1_0_0_1_0_1": 35, "1_1_1_0_1_0_1_0": 35, "1_1_1_1_0_0_1_0": 35, "1_1_1_1_1_0_0_0": 35, "1_1_1_1_1_1_1_1": 35, "54080": 35, "507": 35, "108k": 35, "ye": 35, "hebrew": 35, "NO": 35, "621": 35, "119": 35, "650": 35, "139": 35, "143": 35, "198": 35, "181": 35, "186": 35, "187": 35, "213": 35, "correctli": 35, "simplest": 35, "former": 37, "idea": 37, "achiev": 37, "mask": [37, 40, 41], "wenet": 37, "did": 37, "metion": 37, "complic": 37, "techniqu": 37, "bank": 37, "memor": 37, "histori": 37, "introduc": 37, "variant": 37, "pruned_stateless_emformer_rnnt2": 37, "conv_emformer_transducer_stateless": 37, "ourself": 37, "mechan": 37, "onlin": 39, "lstm_transducer_stateless": 39, "lower": 39, "prepare_giga_speech": 39, "cj2vtpiwqhkn9q1tx6ptpg": 39, "dynam": [40, 41], "causal": 40, "short": [40, 41], "2012": 40, "05481": 40, "flag": 40, "indic": [40, 41], "whether": 40, "sequenc": [40, 41], "uniformli": [40, 41], "seen": [40, 41], "97vkxf80ru61cnp2alwzzg": 40, "streaming_decod": [40, 41], "acoust": [40, 41], "wise": [40, 41], "parallel": [40, 41], "bath": [40, 41], "parallelli": [40, 41], "seem": 40, "benefit": 40, "mismatch": 40, "mdoel": 40, "320m": 41, "550": 41, "scriptmodul": 41, "jit_trace_export": 41, "jit_trace_pretrain": 41, "task": 42}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"follow": 0, "code": 0, "style": 0, "contribut": [1, 3], "document": 1, "how": [2, 10, 16, 17], "creat": [2, 9], "recip": [2, 42], "data": [2, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "prepar": [2, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "train": [2, 6, 9, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "decod": [2, 9, 10, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "pre": [2, 6, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "model": [2, 6, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "frequent": 4, "ask": 4, "question": 4, "faq": 4, "oserror": 4, "libtorch_hip": 4, "so": 4, "cannot": 4, "open": 4, "share": 4, "object": 4, "file": [4, 15], "directori": 4, "attributeerror": 4, "modul": 4, "distutil": 4, "ha": 4, "attribut": 4, "version": 4, "importerror": 4, "libpython3": 4, "10": 4, "1": [4, 9, 12, 13, 14, 19, 21, 22, 24], "0": [4, 9], "No": 4, "huggingfac": [5, 7], "space": 7, "youtub": [7, 9], "video": [7, 9], "icefal": [8, 9, 12, 13, 14], "content": [8, 42], "instal": [9, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33], "cuda": 9, "toolkit": 9, "cudnn": 9, "pytorch": 9, "torchaudio": 9, "2": [9, 12, 13, 14, 19, 21, 22, 24], "k2": 9, "3": [9, 12, 13, 14, 19, 21, 24], "lhots": 9, "4": [9, 12, 13, 14], "download": [9, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "exampl": [9, 15, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "virtual": 9, "environ": 9, "activ": 9, "your": 9, "5": [9, 12, 13, 14], "test": [9, 12, 13, 14], "export": [10, 11, 12, 13, 14, 15, 16, 17, 18, 27, 29, 30, 39, 40, 41], "state_dict": [10, 27, 29, 30, 39, 40, 41], "when": [10, 16, 17], "us": [10, 16, 17, 27, 29, 30, 39, 40, 41], "run": 10, "py": 10, "ncnn": [11, 12, 13, 14], "convemform": 12, "transduc": [12, 13, 14, 21, 27, 39, 40, 41], "pnnx": [12, 13, 14], "via": [12, 13, 14], "torch": [12, 13, 14, 16, 17, 27, 29, 30, 39, 40, 41], "jit": [12, 13, 14, 16, 17, 27, 29, 30, 39, 40, 41], "trace": [12, 13, 14, 17, 39, 41], "torchscript": [12, 13, 14], "6": [12, 13, 14], "modifi": [12, 13, 14, 21], "encod": [12, 13, 14], "sherpa": [12, 13, 14, 15, 27, 40, 41], "7": [12, 13], "option": [12, 13, 19, 22, 24, 27, 29, 30, 39, 40, 41], "int8": [12, 13], "quantiz": [12, 13], "lstm": [13, 22, 28, 33, 39], "stream": [14, 23, 36, 37, 40, 41], "zipform": [14, 29, 30, 41], "onnx": 15, "sound": 15, "script": [16, 27, 29, 30, 40, 41], "conform": [19, 24, 37], "ctc": [19, 22, 24, 28, 29, 32, 33, 35], "configur": [19, 22, 24, 27, 29, 30, 39, 40, 41], "log": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "usag": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "case": [19, 21, 22, 24], "kaldifeat": [19, 21, 22, 24, 28, 32, 33, 35], "hlg": [19, 22, 24], "attent": [19, 24], "rescor": [19, 24], "colab": [19, 21, 22, 24, 28, 32, 33, 35], "notebook": [19, 21, 22, 24, 28, 32, 33, 35], "deploy": [19, 24], "c": [19, 24], "aishel": 20, "stateless": 21, "The": 21, "loss": 21, "todo": 21, "greedi": 21, "search": 21, "beam": 21, "tdnn": [22, 28, 32, 33, 35], "non": 23, "asr": [23, 36], "lm": 24, "comput": 24, "wer": 24, "n": 24, "gram": 24, "distil": 25, "hubert": 25, "codebook": 25, "index": 25, "librispeech": [26, 38], "prune": [27, 40], "statelessx": [27, 40], "pretrain": [27, 29, 30, 39, 40, 41], "deploi": [27, 40, 41], "infer": [28, 32, 33, 35], "blank": 29, "skip": 29, "mmi": 30, "timit": 31, "ligru": 32, "yesno": 34, "introduct": 37, "emform": 37, "which": 39, "simul": [40, 41], "real": [40, 41], "tabl": 42}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.todo": 2, "sphinx": 57}, "alltitles": {"Follow the code style": [[0, "follow-the-code-style"]], "Contributing to Documentation": [[1, "contributing-to-documentation"]], "How to create a recipe": [[2, "how-to-create-a-recipe"]], "Data Preparation": [[2, "data-preparation"], [21, "data-preparation"]], "Training": [[2, "training"], [9, "training"], [19, "training"], [21, "training"], [22, "training"], [24, "training"], [25, "training"], [27, "training"], [28, "training"], [29, "training"], [30, "training"], [32, "training"], [33, "training"], [35, "training"], [39, "training"], [40, "training"], [41, "training"]], "Decoding": [[2, "decoding"], [9, "decoding"], [19, "decoding"], [21, "decoding"], [22, "decoding"], [24, "decoding"], [25, "decoding"], [27, "decoding"], [28, "decoding"], [29, "decoding"], [30, "decoding"], [32, "decoding"], [33, "decoding"], [35, "decoding"], [39, "decoding"], [40, "decoding"], [41, "decoding"]], "Pre-trained model": [[2, "pre-trained-model"]], "Contributing": [[3, "contributing"]], "Frequently Asked Questions (FAQs)": [[4, "frequently-asked-questions-faqs"]], "OSError: libtorch_hip.so: cannot open shared object file: no such file or directory": [[4, "oserror-libtorch-hip-so-cannot-open-shared-object-file-no-such-file-or-directory"]], "AttributeError: module \u2018distutils\u2019 has no attribute \u2018version\u2019": [[4, "attributeerror-module-distutils-has-no-attribute-version"]], "ImportError: libpython3.10.so.1.0: cannot open shared object file: No such file or directory": [[4, "importerror-libpython3-10-so-1-0-cannot-open-shared-object-file-no-such-file-or-directory"]], "Huggingface": [[5, "huggingface"]], "Pre-trained models": [[6, "pre-trained-models"]], "Huggingface spaces": [[7, "huggingface-spaces"]], "YouTube Video": [[7, "youtube-video"], [9, "youtube-video"]], "Icefall": [[8, "icefall"]], "Contents:": [[8, null]], "Installation": [[9, "installation"]], "(0) Install CUDA toolkit and cuDNN": [[9, "install-cuda-toolkit-and-cudnn"]], "(1) Install PyTorch and torchaudio": [[9, "install-pytorch-and-torchaudio"]], "(2) Install k2": [[9, "install-k2"]], "(3) Install lhotse": [[9, "install-lhotse"]], "(4) Download icefall": [[9, "download-icefall"]], "Installation example": [[9, "installation-example"]], "(1) Create a virtual environment": [[9, "create-a-virtual-environment"]], "(2) Activate your virtual environment": [[9, "activate-your-virtual-environment"]], "(3) Install k2": [[9, "id1"]], "(4) Install lhotse": [[9, "id2"]], "(5) Download icefall": [[9, "id3"]], "Test Your Installation": [[9, "test-your-installation"]], "Data preparation": [[9, "data-preparation"], [19, "data-preparation"], [22, "data-preparation"], [24, "data-preparation"], [25, "data-preparation"], [27, "data-preparation"], [28, "data-preparation"], [29, "data-preparation"], [30, "data-preparation"], [32, "data-preparation"], [33, "data-preparation"], [35, "data-preparation"], [39, "data-preparation"], [40, "data-preparation"], [41, "data-preparation"]], "Export model.state_dict()": [[10, "export-model-state-dict"], [27, "export-model-state-dict"], [29, "export-model-state-dict"], [30, "export-model-state-dict"], [39, "export-model-state-dict"], [40, "export-model-state-dict"], [41, "export-model-state-dict"]], "When to use it": [[10, "when-to-use-it"], [16, "when-to-use-it"], [17, "when-to-use-it"]], "How to export": [[10, "how-to-export"], [16, "how-to-export"], [17, "how-to-export"]], "How to use the exported model": [[10, "how-to-use-the-exported-model"], [16, "how-to-use-the-exported-model"]], "Use the exported model to run decode.py": [[10, "use-the-exported-model-to-run-decode-py"]], "Export to ncnn": [[11, "export-to-ncnn"]], "Export ConvEmformer transducer models to ncnn": [[12, "export-convemformer-transducer-models-to-ncnn"]], "1. Download the pre-trained model": [[12, "download-the-pre-trained-model"], [13, "download-the-pre-trained-model"], [14, "download-the-pre-trained-model"]], "2. Install ncnn and pnnx": [[12, "install-ncnn-and-pnnx"], [13, "install-ncnn-and-pnnx"], [14, "install-ncnn-and-pnnx"]], "3. Export the model via torch.jit.trace()": [[12, "export-the-model-via-torch-jit-trace"], [13, "export-the-model-via-torch-jit-trace"], [14, "export-the-model-via-torch-jit-trace"]], "4. Export torchscript model via pnnx": [[12, "export-torchscript-model-via-pnnx"], [13, "export-torchscript-model-via-pnnx"], [14, "export-torchscript-model-via-pnnx"]], "5. Test the exported models in icefall": [[12, "test-the-exported-models-in-icefall"], [13, "test-the-exported-models-in-icefall"], [14, "test-the-exported-models-in-icefall"]], "6. Modify the exported encoder for sherpa-ncnn": [[12, "modify-the-exported-encoder-for-sherpa-ncnn"], [13, "modify-the-exported-encoder-for-sherpa-ncnn"], [14, "modify-the-exported-encoder-for-sherpa-ncnn"]], "7. (Optional) int8 quantization with sherpa-ncnn": [[12, "optional-int8-quantization-with-sherpa-ncnn"], [13, "optional-int8-quantization-with-sherpa-ncnn"]], "Export LSTM transducer models to ncnn": [[13, "export-lstm-transducer-models-to-ncnn"]], "Export streaming Zipformer transducer models to ncnn": [[14, "export-streaming-zipformer-transducer-models-to-ncnn"]], "Export to ONNX": [[15, "export-to-onnx"]], "sherpa-onnx": [[15, "sherpa-onnx"]], "Example": [[15, "example"]], "Download the pre-trained model": [[15, "download-the-pre-trained-model"], [19, "download-the-pre-trained-model"], [21, "download-the-pre-trained-model"], [22, "download-the-pre-trained-model"], [24, "download-the-pre-trained-model"], [28, "download-the-pre-trained-model"], [32, "download-the-pre-trained-model"], [33, "download-the-pre-trained-model"], [35, "download-the-pre-trained-model"]], "Export the model to ONNX": [[15, "export-the-model-to-onnx"]], "Decode sound files with exported ONNX models": [[15, "decode-sound-files-with-exported-onnx-models"]], "Export model with torch.jit.script()": [[16, "export-model-with-torch-jit-script"]], "Export model with torch.jit.trace()": [[17, "export-model-with-torch-jit-trace"]], "How to use the exported models": [[17, "how-to-use-the-exported-models"]], "Model export": [[18, "model-export"]], "Conformer CTC": [[19, "conformer-ctc"], [24, "conformer-ctc"]], "Configurable options": [[19, "configurable-options"], [22, "configurable-options"], [24, "configurable-options"], [27, "configurable-options"], [29, "configurable-options"], [30, "configurable-options"], [39, "configurable-options"], [40, "configurable-options"], [41, "configurable-options"]], "Pre-configured options": [[19, "pre-configured-options"], [22, "pre-configured-options"], [24, "pre-configured-options"], [27, "pre-configured-options"], [29, "pre-configured-options"], [30, "pre-configured-options"], [39, "pre-configured-options"], [40, "pre-configured-options"], [41, "pre-configured-options"]], "Training logs": [[19, "training-logs"], [21, "training-logs"], [22, "training-logs"], [24, "training-logs"], [27, "training-logs"], [29, "training-logs"], [30, "training-logs"], [39, "training-logs"], [40, "training-logs"], [41, "training-logs"]], "Usage examples": [[19, "usage-examples"], [21, "usage-examples"], [22, "usage-examples"], [24, "usage-examples"]], "Case 1": [[19, "case-1"], [21, "case-1"], [22, "case-1"], [24, "case-1"]], "Case 2": [[19, "case-2"], [21, "case-2"], [22, "case-2"], [24, "case-2"]], "Case 3": [[19, "case-3"], [21, "case-3"], [24, "case-3"]], "Pre-trained Model": [[19, "pre-trained-model"], [21, "pre-trained-model"], [22, "pre-trained-model"], [24, "pre-trained-model"], [28, "pre-trained-model"], [32, "pre-trained-model"], [33, "pre-trained-model"], [35, "pre-trained-model"]], "Install kaldifeat": [[19, "install-kaldifeat"], [21, "install-kaldifeat"], [22, "install-kaldifeat"], [24, "install-kaldifeat"], [28, "install-kaldifeat"], [32, "install-kaldifeat"], [33, "install-kaldifeat"]], "Usage": [[19, "usage"], [21, "usage"], [22, "usage"], [24, "usage"]], "CTC decoding": [[19, "ctc-decoding"], [24, "ctc-decoding"], [24, "id2"]], "HLG decoding": [[19, "hlg-decoding"], [19, "id2"], [22, "hlg-decoding"], [24, "hlg-decoding"], [24, "id3"]], "HLG decoding + attention decoder rescoring": [[19, "hlg-decoding-attention-decoder-rescoring"]], "Colab notebook": [[19, "colab-notebook"], [21, "colab-notebook"], [22, "colab-notebook"], [24, "colab-notebook"], [28, "colab-notebook"], [32, "colab-notebook"], [33, "colab-notebook"], [35, "colab-notebook"]], "Deployment with C++": [[19, "deployment-with-c"], [24, "deployment-with-c"]], "aishell": [[20, "aishell"]], "Stateless Transducer": [[21, "stateless-transducer"]], "The Model": [[21, "the-model"]], "The Loss": [[21, "the-loss"]], "Todo": [[21, "id1"]], "Greedy search": [[21, "greedy-search"]], "Beam search": [[21, "beam-search"]], "Modified Beam search": [[21, "modified-beam-search"]], "TDNN-LSTM CTC": [[22, "tdnn-lstm-ctc"]], "Non Streaming ASR": [[23, "non-streaming-asr"]], "HLG decoding + LM rescoring": [[24, "hlg-decoding-lm-rescoring"]], "HLG decoding + LM rescoring + attention decoder rescoring": [[24, "hlg-decoding-lm-rescoring-attention-decoder-rescoring"]], "Compute WER with the pre-trained model": [[24, "compute-wer-with-the-pre-trained-model"]], "HLG decoding + n-gram LM rescoring": [[24, "hlg-decoding-n-gram-lm-rescoring"]], "HLG decoding + n-gram LM rescoring + attention decoder rescoring": [[24, "hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring"]], "Distillation with HuBERT": [[25, "distillation-with-hubert"]], "Codebook index preparation": [[25, "codebook-index-preparation"]], "LibriSpeech": [[26, "librispeech"], [38, "librispeech"]], "Pruned transducer statelessX": [[27, "pruned-transducer-statelessx"], [40, "pruned-transducer-statelessx"]], "Usage example": [[27, "usage-example"], [29, "usage-example"], [30, "usage-example"], [39, "usage-example"], [40, "usage-example"], [41, "usage-example"]], "Export Model": [[27, "export-model"], [40, "export-model"], [41, "export-model"]], "Export model using torch.jit.script()": [[27, "export-model-using-torch-jit-script"], [29, "export-model-using-torch-jit-script"], [30, "export-model-using-torch-jit-script"], [40, "export-model-using-torch-jit-script"], [41, "export-model-using-torch-jit-script"]], "Download pretrained models": [[27, "download-pretrained-models"], [29, "download-pretrained-models"], [30, "download-pretrained-models"], [39, "download-pretrained-models"], [40, "download-pretrained-models"], [41, "download-pretrained-models"]], "Deploy with Sherpa": [[27, "deploy-with-sherpa"], [40, "deploy-with-sherpa"], [41, "deploy-with-sherpa"]], "TDNN-LSTM-CTC": [[28, "tdnn-lstm-ctc"], [33, "tdnn-lstm-ctc"]], "Inference with a pre-trained model": [[28, "inference-with-a-pre-trained-model"], [32, "inference-with-a-pre-trained-model"], [33, "inference-with-a-pre-trained-model"], [35, "inference-with-a-pre-trained-model"]], "Zipformer CTC Blank Skip": [[29, "zipformer-ctc-blank-skip"]], "Export models": [[29, "export-models"], [30, "export-models"], [39, "export-models"]], "Zipformer MMI": [[30, "zipformer-mmi"]], "TIMIT": [[31, "timit"]], "TDNN-LiGRU-CTC": [[32, "tdnn-ligru-ctc"]], "YesNo": [[34, "yesno"]], "TDNN-CTC": [[35, "tdnn-ctc"]], "Download kaldifeat": [[35, "download-kaldifeat"]], "Streaming ASR": [[36, "streaming-asr"]], "Introduction": [[37, "introduction"]], "Streaming Conformer": [[37, "streaming-conformer"]], "Streaming Emformer": [[37, "streaming-emformer"]], "LSTM Transducer": [[39, "lstm-transducer"]], "Which model to use": [[39, "which-model-to-use"]], "Export model using torch.jit.trace()": [[39, "export-model-using-torch-jit-trace"], [41, "export-model-using-torch-jit-trace"]], "Simulate streaming decoding": [[40, "simulate-streaming-decoding"], [41, "simulate-streaming-decoding"]], "Real streaming decoding": [[40, "real-streaming-decoding"], [41, "real-streaming-decoding"]], "Zipformer Transducer": [[41, "zipformer-transducer"]], "Recipes": [[42, "recipes"]], "Table of Contents": [[42, null]]}, "indexentries": {}})
\ No newline at end of file
+Search.setIndex({"docnames": ["contributing/code-style", "contributing/doc", "contributing/how-to-create-a-recipe", "contributing/index", "decoding-with-langugage-models/LODR", "decoding-with-langugage-models/index", "decoding-with-langugage-models/rescoring", "decoding-with-langugage-models/shallow-fusion", "faqs", "huggingface/index", "huggingface/pretrained-models", "huggingface/spaces", "index", "installation/index", "model-export/export-model-state-dict", "model-export/export-ncnn", "model-export/export-ncnn-conv-emformer", "model-export/export-ncnn-lstm", "model-export/export-ncnn-zipformer", "model-export/export-onnx", "model-export/export-with-torch-jit-script", "model-export/export-with-torch-jit-trace", "model-export/index", "recipes/Non-streaming-ASR/aishell/conformer_ctc", "recipes/Non-streaming-ASR/aishell/index", "recipes/Non-streaming-ASR/aishell/stateless_transducer", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/index", "recipes/Non-streaming-ASR/librispeech/conformer_ctc", "recipes/Non-streaming-ASR/librispeech/distillation", "recipes/Non-streaming-ASR/librispeech/index", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi", "recipes/Non-streaming-ASR/timit/index", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/yesno/index", "recipes/Non-streaming-ASR/yesno/tdnn", "recipes/Streaming-ASR/index", "recipes/Streaming-ASR/introduction", "recipes/Streaming-ASR/librispeech/index", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Streaming-ASR/librispeech/zipformer_transducer", "recipes/index"], "filenames": ["contributing/code-style.rst", "contributing/doc.rst", "contributing/how-to-create-a-recipe.rst", "contributing/index.rst", "decoding-with-langugage-models/LODR.rst", "decoding-with-langugage-models/index.rst", "decoding-with-langugage-models/rescoring.rst", "decoding-with-langugage-models/shallow-fusion.rst", "faqs.rst", "huggingface/index.rst", "huggingface/pretrained-models.rst", "huggingface/spaces.rst", "index.rst", "installation/index.rst", "model-export/export-model-state-dict.rst", "model-export/export-ncnn.rst", "model-export/export-ncnn-conv-emformer.rst", "model-export/export-ncnn-lstm.rst", "model-export/export-ncnn-zipformer.rst", "model-export/export-onnx.rst", "model-export/export-with-torch-jit-script.rst", "model-export/export-with-torch-jit-trace.rst", "model-export/index.rst", "recipes/Non-streaming-ASR/aishell/conformer_ctc.rst", "recipes/Non-streaming-ASR/aishell/index.rst", "recipes/Non-streaming-ASR/aishell/stateless_transducer.rst", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/index.rst", "recipes/Non-streaming-ASR/librispeech/conformer_ctc.rst", "recipes/Non-streaming-ASR/librispeech/distillation.rst", "recipes/Non-streaming-ASR/librispeech/index.rst", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi.rst", "recipes/Non-streaming-ASR/timit/index.rst", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.rst", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/yesno/index.rst", "recipes/Non-streaming-ASR/yesno/tdnn.rst", "recipes/Streaming-ASR/index.rst", "recipes/Streaming-ASR/introduction.rst", "recipes/Streaming-ASR/librispeech/index.rst", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Streaming-ASR/librispeech/zipformer_transducer.rst", "recipes/index.rst"], "titles": ["Follow the code style", "Contributing to Documentation", "How to create a recipe", "Contributing", "LODR for RNN Transducer", "Decoding with language models", "LM rescoring for Transducer", "Shallow fusion for Transducer", "Frequently Asked Questions (FAQs)", "Huggingface", "Pre-trained models", "Huggingface spaces", "Icefall", "Installation", "Export model.state_dict()", "Export to ncnn", "Export ConvEmformer transducer models to ncnn", "Export LSTM transducer models to ncnn", "Export streaming Zipformer transducer models to ncnn", "Export to ONNX", "Export model with torch.jit.script()", "Export model with torch.jit.trace()", "Model export", "Conformer CTC", "aishell", "Stateless Transducer", "TDNN-LSTM CTC", "Non Streaming ASR", "Conformer CTC", "Distillation with HuBERT", "LibriSpeech", "Pruned transducer statelessX", "TDNN-LSTM-CTC", "Zipformer CTC Blank Skip", "Zipformer MMI", "TIMIT", "TDNN-LiGRU-CTC", "TDNN-LSTM-CTC", "YesNo", "TDNN-CTC", "Streaming ASR", "Introduction", "LibriSpeech", "LSTM Transducer", "Pruned transducer statelessX", "Zipformer Transducer", "Recipes"], "terms": {"we": [0, 1, 2, 3, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "us": [0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 28, 29, 32, 36, 37, 39, 41], "tool": [0, 8, 16], "make": [0, 1, 3, 16, 17, 18, 23, 25, 28, 41], "consist": [0, 25, 31, 43, 44, 45], "possibl": [0, 2, 3, 13, 23, 28], "black": 0, "format": [0, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "flake8": 0, "check": [0, 28], "qualiti": [0, 24], "isort": 0, "sort": [0, 13], "import": [0, 8, 16, 44, 45], "The": [0, 1, 2, 4, 7, 8, 11, 13, 14, 16, 17, 18, 23, 24, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "version": [0, 12, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 36, 37, 44], "abov": [0, 4, 6, 7, 8, 13, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44, 45], "ar": [0, 1, 3, 4, 6, 7, 8, 13, 14, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45, 46], "22": [0, 13, 16, 17, 28, 36, 37, 39], "3": [0, 4, 6, 7, 8, 12, 14, 15, 19, 22, 26, 29, 31, 32, 33, 34, 39, 43, 44, 45], "0": [0, 1, 4, 6, 7, 12, 14, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "5": [0, 7, 15, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "4": [0, 4, 6, 7, 8, 12, 14, 15, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "10": [0, 7, 12, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "1": [0, 4, 6, 7, 12, 14, 15, 19, 20, 21, 22, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "after": [0, 1, 6, 11, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "run": [0, 2, 8, 11, 13, 16, 17, 18, 19, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "command": [0, 1, 4, 6, 7, 8, 13, 14, 16, 17, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "git": [0, 4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "clone": [0, 4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "http": [0, 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "github": [0, 2, 6, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "com": [0, 2, 6, 10, 11, 13, 14, 16, 17, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "k2": [0, 2, 8, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "fsa": [0, 2, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 28, 31, 33, 34, 43, 44, 45], "icefal": [0, 2, 3, 4, 6, 7, 8, 10, 11, 14, 15, 19, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "cd": [0, 1, 2, 8, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "pip": [0, 1, 6, 8, 13, 16, 19, 25], "instal": [0, 1, 4, 6, 8, 9, 11, 12, 14, 15, 19, 22, 29, 31, 33, 34, 39, 43, 44, 45], "pre": [0, 3, 4, 6, 7, 9, 11, 12, 13, 15, 22, 29], "commit": 0, "whenev": 0, "you": [0, 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "automat": [0, 11, 29], "hook": 0, "invok": 0, "fail": [0, 13], "If": [0, 2, 4, 6, 7, 8, 11, 16, 17, 18, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "ani": [0, 4, 6, 7, 13, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44], "your": [0, 1, 2, 4, 6, 7, 9, 11, 12, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "wa": [0, 13, 14, 28, 32], "success": [0, 13, 16, 17], "pleas": [0, 1, 2, 4, 6, 7, 8, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "fix": [0, 8, 13, 16, 17, 18, 28], "issu": [0, 4, 6, 7, 8, 13, 16, 17, 28, 29, 44, 45], "report": [0, 8, 13, 29], "some": [0, 1, 4, 6, 14, 16, 17, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "i": [0, 1, 2, 4, 7, 8, 11, 13, 14, 15, 16, 17, 18, 19, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "e": [0, 2, 4, 6, 7, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "modifi": [0, 15, 22, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "file": [0, 2, 11, 12, 14, 16, 17, 18, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "place": [0, 13, 14, 25, 28, 32], "so": [0, 4, 6, 7, 11, 12, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "statu": 0, "failur": 0, "see": [0, 1, 6, 7, 11, 13, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "which": [0, 2, 4, 6, 7, 11, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 44, 45], "ha": [0, 2, 12, 15, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 41, 43, 44, 45], "been": [0, 15, 16, 17, 18, 25], "befor": [0, 1, 14, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "further": [0, 4, 6, 7], "chang": [0, 4, 6, 7, 8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "all": [0, 10, 11, 14, 16, 17, 18, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "again": [0, 16, 17, 39], "should": [0, 2, 4, 6, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "succe": 0, "thi": [0, 2, 3, 4, 5, 6, 7, 8, 9, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "time": [0, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "succeed": 0, "want": [0, 4, 6, 7, 13, 14, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "can": [0, 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "do": [0, 2, 4, 6, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "Or": 0, "without": [0, 4, 6, 7, 9, 11, 23, 28], "your_changed_fil": 0, "py": [0, 2, 4, 6, 7, 8, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "sphinx": 1, "write": [1, 2, 3], "have": [1, 2, 4, 6, 7, 10, 11, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "prepar": [1, 3, 4, 14], "environ": [1, 8, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 36, 37, 39, 44, 45], "doc": [1, 14, 41], "r": [1, 13, 16, 17, 18, 36, 37], "requir": [1, 4, 6, 13, 18, 29, 44, 45], "txt": [1, 4, 13, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "set": [1, 4, 6, 7, 8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "up": [1, 13, 14, 16, 17, 18, 23, 26, 28, 29, 31, 32, 33, 34, 44, 45], "readi": [1, 23, 28, 29], "refer": [1, 2, 6, 7, 13, 14, 15, 16, 17, 18, 20, 21, 23, 25, 26, 28, 31, 32, 33, 36, 37, 39, 41, 44, 45], "restructuredtext": 1, "primer": 1, "familiar": 1, "build": [1, 13, 14, 16, 17, 18, 23, 25, 28], "local": [1, 13, 31, 33, 34, 43, 44, 45], "preview": 1, "what": [1, 2, 13, 16, 17, 18, 25, 41], "look": [1, 2, 4, 6, 7, 10, 13, 16, 17, 18, 23, 25, 26, 28, 29], "like": [1, 2, 11, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44], "publish": [1, 14, 24], "html": [1, 2, 8, 13, 15, 16, 17, 18, 19, 20, 21, 31, 43, 44, 45], "gener": [1, 6, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "view": [1, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44, 45], "follow": [1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "python3": [1, 8, 13, 17, 18], "m": [1, 13, 16, 17, 18, 25, 31, 33, 34, 36, 37, 43, 44, 45], "server": [1, 11, 13, 43], "It": [1, 2, 6, 7, 9, 13, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "print": [1, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "serv": [1, 31, 33, 34, 43, 44, 45], "port": [1, 29, 31, 33, 34, 43, 44, 45], "8000": [1, 39], "open": [1, 4, 6, 7, 12, 14, 16, 17, 18, 24, 25, 28, 29], "browser": [1, 9, 11, 31, 33, 34, 43, 44, 45], "go": [1, 7, 13, 23, 25, 28, 31, 33, 34, 43, 44, 45], "read": [2, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "code": [2, 3, 8, 12, 16, 17, 18, 23, 28, 29, 31, 32, 36, 37, 39, 41, 44, 45], "style": [2, 3, 12], "adjust": 2, "sytl": 2, "design": 2, "python": [2, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 28, 31, 33, 34, 43, 44, 45], "recommend": [2, 6, 7, 13, 23, 25, 26, 28, 29, 31, 44, 45], "test": [2, 4, 12, 14, 15, 22, 23, 25, 26, 28, 29, 32, 33, 36, 37], "valid": [2, 13, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "dataset": [2, 8, 13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "lhots": [2, 12, 14, 16, 17, 18, 23, 25, 28], "readthedoc": [2, 13], "io": [2, 13, 15, 16, 17, 18, 19, 20, 21, 31, 43, 44, 45], "en": [2, 13, 16], "latest": [2, 11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "index": [2, 13, 15, 16, 17, 18, 19, 20, 21, 43, 44, 45], "yesno": [2, 8, 12, 13, 27, 39, 46], "veri": [2, 3, 7, 16, 17, 18, 25, 36, 37, 39, 44, 45], "good": [2, 7], "exampl": [2, 11, 12, 14, 16, 17, 18, 20, 21, 22, 29, 32, 36, 37, 39], "speech": [2, 11, 12, 13, 15, 24, 25, 39, 46], "pull": [2, 4, 6, 7, 16, 17, 18, 19, 23, 25, 28, 41], "380": [2, 16, 37], "show": [2, 4, 6, 7, 11, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "add": [2, 16, 17, 18, 23, 25, 26, 44, 46], "new": [2, 3, 11, 13, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 39, 43, 44, 45], "suppos": [2, 44, 45], "would": [2, 13, 14, 16, 17, 18, 28, 32, 44, 45], "name": [2, 8, 14, 16, 17, 18, 19, 23, 25, 31, 33, 34, 44, 45], "foo": [2, 21, 23, 28, 31, 33, 34, 43, 44, 45], "eg": [2, 8, 10, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "mkdir": [2, 16, 17, 23, 25, 26, 28, 32, 36, 37, 39], "p": [2, 4, 13, 16, 17, 25, 36, 37], "asr": [2, 4, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "touch": 2, "sh": [2, 13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "chmod": 2, "x": [2, 4, 18, 41], "simpl": [2, 25], "own": [2, 29, 31, 44, 45], "otherwis": [2, 16, 17, 18, 23, 25, 28, 29, 31, 33, 34, 43, 44, 45], "librispeech": [2, 4, 6, 7, 8, 10, 12, 14, 16, 17, 18, 19, 20, 21, 27, 28, 29, 31, 32, 33, 34, 40, 41, 43, 44, 45, 46], "assum": [2, 4, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 43, 44, 45], "fanci": 2, "call": [2, 8, 19, 29], "bar": [2, 21, 23, 28, 31, 33, 34, 43, 44, 45], "organ": 2, "wai": [2, 3, 22, 31, 33, 34, 41, 43, 44, 45], "readm": [2, 23, 25, 26, 28, 32, 36, 37, 39], "md": [2, 10, 14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "asr_datamodul": [2, 8, 13], "pretrain": [2, 4, 6, 7, 14, 16, 17, 18, 19, 21, 23, 25, 26, 28, 32, 36, 37, 39], "For": [2, 4, 6, 7, 8, 10, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "instanc": [2, 8, 10, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "tdnn": [2, 8, 13, 24, 27, 30, 35, 38], "its": [2, 4, 14, 15, 16, 17, 18, 21, 25, 33], "directori": [2, 12, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "structur": [2, 18], "descript": [2, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "contain": [2, 12, 14, 15, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45, 46], "inform": [2, 4, 6, 14, 23, 25, 26, 28, 31, 32, 33, 36, 37, 39, 41, 43, 44, 45], "g": [2, 4, 6, 7, 13, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "wer": [2, 5, 13, 14, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "etc": [2, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "provid": [2, 11, 13, 14, 15, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45, 46], "pytorch": [2, 8, 12, 16, 17, 18, 25], "dataload": [2, 13], "take": [2, 7, 14, 29, 31, 39, 44, 45], "input": [2, 14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39, 41], "checkpoint": [2, 4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "save": [2, 13, 14, 17, 18, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "dure": [2, 4, 5, 7, 8, 11, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "stage": [2, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "": [2, 4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "definit": [2, 16, 17], "neural": [2, 4, 6, 7, 23, 28], "network": [2, 23, 25, 28, 31, 33, 34, 43, 44, 45], "script": [2, 6, 7, 12, 13, 21, 22, 23, 25, 26, 28, 29, 32, 36, 37, 39, 43], "infer": [2, 14, 16, 17], "tdnn_lstm_ctc": [2, 26, 32, 37], "conformer_ctc": [2, 23, 28], "get": [2, 11, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 39, 41, 43, 44, 45], "feel": [2, 29, 43], "result": [2, 4, 7, 10, 11, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "everi": [2, 14, 31, 33, 34, 43, 44, 45], "kept": [2, 31, 44, 45], "self": [2, 15, 18, 41], "toler": 2, "duplic": 2, "among": [2, 13], "differ": [2, 13, 16, 17, 18, 19, 23, 24, 28, 29, 31, 41, 43, 44, 45], "invoc": [2, 16, 17], "help": [2, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "blob": [2, 10, 14, 21, 31, 33, 34, 43, 44, 45], "master": [2, 6, 10, 13, 14, 17, 18, 20, 21, 25, 29, 31, 33, 34, 43, 44, 45], "transform": [2, 6, 7, 23, 28, 43], "conform": [2, 20, 24, 25, 27, 30, 31, 33, 43, 44, 45], "base": [2, 4, 7, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "lstm": [2, 15, 21, 22, 24, 27, 30, 35, 40, 42], "attent": [2, 18, 25, 26, 29, 41, 44, 45], "lm": [2, 4, 5, 7, 12, 13, 25, 31, 32, 36, 37, 39, 44, 45], "rescor": [2, 5, 12, 26, 32, 34, 36, 37, 39], "demonstr": [2, 9, 11, 14, 19], "consid": [2, 4, 18], "colab": 2, "notebook": 2, "welcom": 3, "There": [3, 4, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "mani": [3, 44, 45], "two": [3, 4, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "them": [3, 6, 9, 10, 11, 13, 16, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "To": [3, 4, 6, 7, 11, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "document": [3, 12, 14, 15, 16, 17, 18, 19, 34], "repositori": [3, 16, 17, 18, 19], "recip": [3, 4, 6, 7, 10, 12, 13, 14, 19, 23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 41, 43, 44, 45], "In": [3, 4, 6, 8, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 32, 36, 37, 39, 41], "page": [3, 11, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "describ": [3, 5, 9, 14, 16, 17, 19, 20, 21, 22, 23, 25, 26, 28, 31, 32, 36, 37, 44, 45], "how": [3, 4, 5, 6, 7, 9, 11, 12, 13, 16, 17, 18, 19, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "creat": [3, 4, 6, 7, 12, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44], "data": [3, 4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 24], "train": [3, 4, 6, 7, 8, 9, 11, 12, 14, 15, 20, 21, 22, 41], "decod": [3, 4, 8, 11, 12, 16, 17, 18, 21, 22], "model": [3, 4, 6, 7, 9, 11, 12, 13, 15, 29, 41], "As": [4, 6, 7, 16, 25, 28, 29], "type": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 28, 31, 33, 34, 39, 41, 43, 44, 45], "e2": [4, 7], "usual": [4, 6, 7, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "an": [4, 6, 7, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 28, 29, 31, 34, 39, 43, 44, 45], "intern": 4, "languag": [4, 7, 11, 12, 23, 25, 26], "learn": [4, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "level": 4, "corpu": [4, 6, 7, 24], "real": 4, "life": 4, "scenario": 4, "often": [4, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "mismatch": [4, 44], "between": [4, 7, 31, 44, 45], "target": [4, 11], "space": [4, 9, 12], "problem": [4, 6, 7, 13, 29], "when": [4, 6, 8, 11, 16, 17, 18, 22, 25, 28, 29, 31, 33, 34, 44, 45], "act": 4, "against": 4, "extern": [4, 5, 6, 7], "tutori": [4, 6, 7, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 43, 44, 45], "low": [4, 16, 17], "order": [4, 13, 16, 17, 18, 23, 26, 28, 32, 36, 37], "densiti": 4, "ratio": 4, "allevi": 4, "effect": [4, 7, 18], "improv": [4, 5, 6, 7, 25], "perform": [4, 6, 7, 15, 25, 29, 44], "languga": 4, "integr": [4, 11], "pruned_transducer_stateless7_stream": [4, 6, 7, 18, 19, 45], "stream": [4, 6, 7, 12, 15, 16, 17, 19, 22, 23, 28, 36, 37, 43, 46], "howev": [4, 6, 7, 14, 17, 29], "easili": [4, 6, 7, 23, 26, 28], "appli": [4, 6, 7, 25, 41], "other": [4, 7, 14, 17, 18, 19, 25, 28, 29, 31, 32, 36, 37, 39, 41, 44, 45, 46], "encount": [4, 6, 7, 8, 13, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "here": [4, 6, 7, 14, 16, 17, 18, 23, 25, 26, 28, 29, 32, 41, 44], "simplic": [4, 6, 7], "same": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "domain": [4, 6, 7], "gigaspeech": [4, 6, 7, 10, 20, 43], "first": [4, 6, 8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "let": [4, 6, 7, 13, 16, 17, 18, 23, 28], "background": 4, "predecessor": 4, "dr": 4, "propos": [4, 25, 41, 45], "address": [4, 11, 13, 14, 16, 17, 18, 25, 31, 34, 43, 44, 45], "sourc": [4, 13, 14, 16, 17, 18, 23, 24, 25, 28], "acoust": [4, 44, 45], "similar": [4, 29, 33, 44, 45], "deriv": 4, "formular": 4, "bay": 4, "theorem": 4, "text": [4, 6, 7, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "score": [4, 7, 23, 28, 31, 44, 45], "left": [4, 16, 18, 25, 44, 45], "y_u": 4, "mathit": 4, "y": 4, "right": [4, 16, 25, 41, 44], "log": [4, 8, 13, 16, 17, 18, 32, 36, 37, 39], "y_": 4, "u": [4, 13, 16, 17, 18, 23, 25, 26, 28, 29, 39], "lambda_1": 4, "p_": 4, "lambda_2": 4, "where": [4, 8, 44], "weight": [4, 23, 26, 28, 33, 34, 43], "respect": 4, "onli": [4, 6, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "compar": [4, 16, 17, 18, 44], "shallow": [4, 5, 12], "fusion": [4, 5, 12], "subtract": 4, "work": [4, 16, 17, 18, 28], "treat": [4, 17, 18], "predictor": 4, "joiner": [4, 16, 17, 18, 19, 21, 25, 31, 43, 44, 45], "weak": 4, "captur": 4, "therefor": [4, 8], "n": [4, 6, 23, 29, 31, 33, 34, 36, 37, 43, 44, 45], "gram": [4, 6, 13, 23, 25, 26, 31, 32, 34, 36, 37, 44, 45], "approxim": 4, "ilm": 4, "lead": [4, 7], "formula": 4, "rnnt": [4, 31, 44, 45], "bi": [4, 6], "addit": 4, "estim": 4, "comar": 4, "li": 4, "choic": 4, "accord": 4, "origin": 4, "paper": [4, 29, 31, 43, 44, 45], "achiev": [4, 6, 7, 41], "both": [4, 31, 33, 34, 41, 43, 44, 45], "intra": 4, "cross": 4, "much": [4, 16, 17], "faster": [4, 6], "evalu": 4, "now": [4, 6, 13, 16, 17, 18, 23, 28, 29, 31, 32, 33, 34, 36, 37, 43, 44, 45], "illustr": [4, 6, 7], "purpos": [4, 6, 7, 16, 17], "from": [4, 6, 7, 8, 9, 11, 13, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "link": [4, 6, 7, 10, 13, 14, 15, 31, 33, 34, 43, 44, 45], "scratch": [4, 6, 7, 31, 33, 34, 43, 44, 45], "prune": [4, 6, 7, 14, 18, 19, 25, 27, 29, 30, 40, 41, 42, 43, 45], "statelessx": [4, 6, 7, 27, 29, 30, 40, 41, 42], "initi": [4, 6, 7, 23, 26], "step": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "download": [4, 6, 7, 8, 11, 12, 15, 22, 24, 29], "git_lfs_skip_smudg": [4, 6, 7, 16, 17, 18, 19], "huggingfac": [4, 6, 7, 10, 12, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 33, 34, 36, 37, 39, 43], "co": [4, 6, 7, 10, 11, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 32, 33, 34, 36, 37, 39, 43], "zengwei": [4, 6, 7, 16, 18, 19, 34, 43], "stateless7": [4, 6, 7, 18, 19], "2022": [4, 6, 7, 14, 16, 17, 18, 19, 25, 31, 33, 34, 43, 44], "12": [4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 31, 33, 34, 36, 39, 43, 44, 45], "29": [4, 6, 7, 13, 18, 19, 23, 25, 26, 28, 32, 33, 36, 37], "pushd": [4, 6, 7, 19], "exp": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "lf": [4, 6, 7, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 34, 36, 37, 39], "includ": [4, 6, 7, 16, 17, 18, 19, 31, 33, 34, 43, 44, 45], "pt": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "ln": [4, 6, 7, 14, 16, 17, 18, 19, 23, 28, 31, 33, 34, 43, 44, 45], "epoch": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "99": [4, 6, 7, 13, 16, 17, 18, 19], "symbol": [4, 6, 7, 13, 25, 31, 44, 45], "load": [4, 6, 7, 13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "done": [4, 6, 7, 13, 14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "via": [4, 6, 7, 13, 15, 20, 21, 22], "exp_dir": [4, 6, 7, 13, 16, 17, 18, 25, 28, 29, 31, 33, 34, 44, 45], "avg": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 21, 25, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "averag": [4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "fals": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 28, 29], "dir": [4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "bpe": [4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 28, 31, 33, 34, 43, 44, 45], "lang_bpe_500": [4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 28, 31, 33, 34, 43, 44, 45], "max": [4, 6, 7, 14, 16, 17, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "durat": [4, 6, 7, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "600": [4, 6, 7, 14, 28, 31, 33, 43, 44, 45], "chunk": [4, 6, 7, 16, 18, 19, 44, 45], "len": [4, 6, 7, 18, 19, 45], "32": [4, 6, 7, 16, 17, 18, 19, 23, 25, 26, 45], "method": [4, 7, 11, 13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 43, 44, 45], "modified_beam_search": [4, 6, 7, 11, 25, 29, 31, 33, 43, 44, 45], "clean": [4, 13, 18, 23, 25, 28, 29, 31, 32, 33, 34, 43, 44, 45], "beam_size_4": [4, 6, 7], "11": [4, 6, 7, 8, 13, 16, 17, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "best": [4, 6, 7, 16, 17, 18, 23, 26, 28], "7": [4, 6, 7, 13, 14, 15, 18, 22, 23, 26, 28, 31, 32, 36, 37, 43, 44], "93": [4, 6, 7], "Then": [4, 6], "necessari": [4, 29], "note": [4, 6, 7, 8, 14, 16, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "960": [4, 28, 31, 33, 34, 43, 44, 45], "hour": [4, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "ezerhouni": [4, 6, 7], "popd": [4, 6, 7, 19], "marcoyang": [4, 6], "librispeech_bigram": [4, 6], "2gram": [4, 6], "fst": [4, 13, 25, 39], "modified_beam_search_lm_lodr": 4, "lm_dir": [4, 6, 7, 13, 28], "lm_scale": [4, 6, 7], "42": [4, 13, 17, 23, 28, 39], "lodr_scal": 4, "24": [4, 8, 13, 16, 17, 26, 32, 36, 37, 39], "scale": [4, 6, 7, 16, 17, 23, 28, 29, 32, 34, 36, 37], "embed": [4, 6, 7, 25, 31, 43, 44, 45], "dim": [4, 6, 7, 16, 17, 18, 25, 31, 44], "2048": [4, 6, 7, 14, 16, 17, 18, 25], "hidden": [4, 6, 7, 17, 43], "num": [4, 6, 7, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "layer": [4, 6, 7, 16, 17, 18, 25, 29, 31, 41, 43, 44, 45], "vocab": [4, 6, 7, 28], "500": [4, 6, 7, 13, 14, 16, 17, 18, 25, 28, 34, 43], "token": [4, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "ngram": [4, 28, 32, 36, 37], "2": [4, 6, 7, 12, 14, 15, 22, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "extra": [4, 16, 17, 18, 25, 41, 44], "argument": [4, 7, 29, 41], "need": [4, 6, 11, 13, 14, 15, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "given": [4, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 44, 45], "specifi": [4, 7, 8, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "neg": [4, 25], "number": [4, 7, 11, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "obtain": [4, 7, 23, 25, 26, 28, 32, 36, 37], "shown": [4, 7], "below": [4, 7, 13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44], "61": [4, 6], "6": [4, 6, 7, 8, 13, 15, 22, 23, 25, 28, 31, 32, 36, 37, 43], "74": [4, 6, 13, 14], "recal": 4, "lowest": [4, 31, 33, 34, 43, 44, 45], "77": [4, 6, 7, 13, 28], "08": [4, 6, 7, 13, 18, 28, 32, 34, 36, 37, 39, 43], "inde": 4, "even": [4, 11, 13, 17], "better": [4, 6], "increas": [4, 6, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "8": [4, 6, 7, 8, 13, 14, 16, 17, 18, 23, 25, 28, 29, 31, 32, 33, 34, 39, 43, 44, 45], "45": [4, 6, 13, 16, 18, 23, 25, 28], "38": [4, 6, 16, 23, 25, 28, 36], "23": [4, 6, 8, 13, 16, 17, 18, 23, 25, 26, 28, 36, 37, 39], "section": [5, 8, 9, 13, 14, 19, 20, 21, 22, 23, 28], "langugag": 5, "transduc": [5, 12, 14, 15, 19, 22, 24, 27, 29, 30, 40, 41, 42], "lodr": [5, 12], "rnn": [5, 6, 7, 12, 17, 25, 31, 33, 43, 44, 45], "commonli": [6, 7, 23, 25, 26, 28, 32, 36, 37, 39], "approach": 6, "incorpor": 6, "unlik": 6, "re": [6, 8, 23, 26, 28, 29, 31, 33, 34, 41, 43, 44, 45], "rank": 6, "hypothes": 6, "search": [6, 7, 10, 11], "more": [6, 13, 16, 17, 18, 23, 28, 29, 39, 41, 43, 44], "effici": [6, 7, 31, 44, 45], "than": [6, 14, 17, 23, 25, 26, 28, 31, 32, 33, 34, 39, 43, 44, 45], "sinc": [6, 16, 17, 18, 29, 39, 43], "less": [6, 14, 28, 32, 39, 44, 45], "comput": [6, 13, 14, 16, 17, 18, 23, 25, 26, 29, 31, 32, 34, 36, 37, 39, 43, 44, 45], "gpu": [6, 7, 13, 16, 17, 23, 25, 26, 28, 29, 31, 33, 34, 36, 37, 39, 43, 44, 45], "try": [6, 8, 9, 11, 29, 31, 33, 34, 43, 44, 45], "might": [6, 7, 17, 18, 44, 45], "ideal": [6, 7], "mai": [6, 7, 13, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45, 46], "also": [6, 7, 9, 10, 13, 14, 15, 16, 17, 18, 19, 21, 23, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44, 45], "With": 6, "rnnlm": 6, "avail": [6, 13, 14, 16, 17, 18, 23, 25, 28, 32, 36, 37, 39, 43], "modified_beam_search_lm_rescor": 6, "43": [6, 17, 18, 28], "great": 6, "made": [6, 16], "boost": [6, 7], "tabl": [6, 11, 16, 17, 18], "67": [6, 13], "59": [6, 13, 16, 26, 28], "86": 6, "fact": 6, "arpa": [6, 39], "performn": 6, "modified_beam_search_lm_rescore_lodr": 6, "depend": [6, 13, 23, 28], "kenlm": 6, "kpu": 6, "archiv": 6, "zip": 6, "execut": [6, 7, 16, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "9": [6, 13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 39, 43, 44, 45], "57": [6, 13, 17, 28, 32], "slightli": 6, "63": [6, 13, 25], "04": [6, 13, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37], "52": [6, 13, 23, 28], "73": [6, 13], "mention": 6, "earlier": 6, "benchmark": [6, 25], "speed": [6, 16, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "132": [6, 13], "95": [6, 24], "177": [6, 14, 17, 18, 25, 26, 28], "96": [6, 13], "210": [6, 36, 37], "modified_beam_search_lm_shallow_fus": [6, 7], "262": [6, 7], "62": [6, 7, 13, 28, 32], "65": [6, 7, 13, 16], "352": [6, 7, 13, 28], "58": [6, 7, 8, 13, 28], "488": [6, 7, 16, 17, 18], "400": [6, 24], "610": [6, 13], "870": 6, "156": 6, "203": [6, 14, 28], "255": [6, 17, 18], "160": 6, "263": [6, 13, 17], "singl": [6, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "32g": 6, "v100": [6, 23, 25, 26, 28], "vari": 6, "word": [7, 23, 25, 26, 28, 32, 36, 37, 39], "error": [7, 8, 13, 16, 17, 18, 28], "rate": [7, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "These": [7, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "alreadi": [7, 13, 14], "But": [7, 16, 31, 33, 34, 43, 44, 45], "long": [7, 16], "true": [7, 13, 14, 16, 17, 18, 23, 25, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "either": [7, 11, 23, 25, 26, 28, 44, 45], "choos": [7, 11, 13, 29, 31, 33, 34, 43, 44, 45], "three": [7, 16, 17, 18, 21, 23, 25, 41], "associ": 7, "dimens": [7, 31, 44, 45], "obviou": 7, "rel": 7, "reduct": [7, 13, 16, 17, 33], "around": 7, "A": [7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 43, 44, 45], "few": [7, 16, 17, 18, 29], "paramet": [7, 14, 16, 17, 18, 20, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "tune": [7, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "control": [7, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "too": 7, "small": [7, 25, 36, 37, 39], "fulli": 7, "util": [7, 8, 13, 28], "larg": 7, "domin": 7, "bad": 7, "typic": [7, 23, 25, 26, 28], "valu": [7, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "activ": [7, 11], "path": [7, 11, 13, 14, 16, 17, 18, 21, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "trade": 7, "off": [7, 16], "accuraci": [7, 16, 17, 24], "larger": [7, 17, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "slower": 7, "collect": [8, 13], "user": [8, 13], "post": 8, "correspond": [8, 10, 11], "solut": 8, "One": 8, "torch": [8, 12, 13, 14, 15, 22, 23, 25, 28], "torchaudio": [8, 12, 41], "cu111": 8, "torchvis": 8, "f": [8, 13, 36, 37], "org": [8, 13, 24, 25, 31, 43, 44, 45], "whl": [8, 13], "torch_stabl": [8, 13], "throw": [8, 16, 17, 18], "cuda": [8, 12, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "while": [8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "That": [8, 16, 17, 29, 31, 43, 44, 45], "cu11": 8, "correct": 8, "traceback": 8, "most": [8, 44, 45], "recent": [8, 16, 17, 18], "last": 8, "line": [8, 13, 16, 17, 18, 31, 44, 45], "14": [8, 13, 14, 16, 17, 20, 23, 28, 31, 32, 33, 36, 43, 44, 45], "yesnoasrdatamodul": 8, "home": [8, 16, 17, 23, 28], "xxx": [8, 14, 16, 17, 18], "next": [8, 11, 13, 16, 17, 18, 28, 29, 31, 32, 33, 34, 43, 44, 45], "gen": [8, 11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "kaldi": [8, 11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "34": [8, 13, 16, 17], "datamodul": 8, "__init__": [8, 13, 14, 16, 17, 18, 23, 25, 28], "add_eo": 8, "add_so": 8, "get_text": 8, "39": [8, 13, 16, 18, 25, 28, 32, 36], "tensorboard": [8, 13, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "summarywrit": 8, "miniconda3": 8, "env": 8, "yyi": 8, "lib": [8, 13, 18], "site": [8, 13, 18], "packag": [8, 13, 18], "loosevers": 8, "uninstal": 8, "setuptool": [8, 13], "conda": [8, 13], "dev": [8, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "yangyifan": 8, "anaconda3": 8, "dev20230112": 8, "cuda11": [8, 13], "torch1": [8, 13], "13": [8, 13, 14, 16, 17, 18, 25, 26, 28, 32, 33, 36], "py3": [8, 13], "linux": [8, 11, 13, 15, 16, 17, 18, 19], "x86_64": [8, 13, 16], "egg": [8, 13], "_k2": [8, 13], "determinizeweightpushingtyp": 8, "handl": [8, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "except": [8, 14], "anoth": 8, "occur": 8, "pruned_transducer_stateless7_ctc_b": [8, 33], "104": 8, "30": [8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "rais": 8, "anaconda": 8, "maco": [8, 11, 15, 16, 17, 18, 19], "probabl": [8, 13, 25, 31, 33, 43, 44, 45], "variabl": [8, 13, 16, 17, 18, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "export": [8, 12, 13, 23, 25, 26, 28, 29, 32, 36, 37, 39], "dyld_library_path": 8, "conda_prefix": 8, "find": [8, 9, 10, 11, 13, 14, 16, 17, 18, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "locat": [8, 16], "libpython": 8, "abl": 8, "insid": [8, 21], "codna_prefix": 8, "ld_library_path": 8, "within": [9, 11, 16, 17], "anyth": [9, 11], "youtub": [9, 12, 28, 29, 31, 32, 33, 34, 43, 44, 45], "video": [9, 12, 28, 29, 31, 32, 33, 34, 43, 44, 45], "upload": [10, 11, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "visit": [10, 11, 31, 33, 34, 43, 44, 45], "specif": [10, 19, 25], "aishel": [10, 12, 23, 25, 26, 27, 46], "wenetspeech": [10, 20], "framework": [11, 31, 44], "sherpa": [11, 15, 20, 21, 22, 43], "window": [11, 15, 16, 17, 18, 19], "ipad": 11, "phone": 11, "start": [11, 13, 14, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "recognit": [11, 12, 15, 16, 17, 24, 25, 39, 46], "screenshot": [11, 23, 25, 26, 28, 29, 31, 39, 43, 44], "select": [11, 16, 17, 18, 31, 32, 36, 37, 39, 43, 44, 45], "current": [11, 16, 17, 25, 29, 41, 43, 44, 45, 46], "chines": [11, 24, 25], "english": [11, 39, 43], "greedi": 11, "record": [11, 17, 18, 23, 24, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "click": [11, 13, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "button": 11, "submit": 11, "wait": 11, "moment": 11, "bottom": [11, 31, 33, 34, 43, 44, 45], "part": [11, 13, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "one": [11, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "subscrib": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "channel": [11, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "nadira": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "povei": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "www": [11, 13, 24, 28, 29, 31, 32, 33, 34, 43, 44, 45], "uc_vaumpkminz1pnkfxan9mw": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "toolkit": 12, "cudnn": 12, "frequent": 12, "ask": 12, "question": 12, "faq": 12, "oserror": 12, "libtorch_hip": 12, "cannot": [12, 16, 17, 18], "share": [12, 13], "object": [12, 13, 23, 25, 26, 31, 39, 43, 44], "attributeerror": 12, "modul": [12, 13, 16, 18, 33, 44], "distutil": 12, "attribut": [12, 18, 28], "importerror": 12, "libpython3": 12, "No": [12, 16, 17, 18, 39], "state_dict": [12, 22, 23, 25, 26, 28, 32, 36, 37, 39], "jit": [12, 15, 22, 28], "trace": [12, 15, 20, 22], "onnx": [12, 14, 22], "ncnn": [12, 22], "non": [12, 28, 41, 44, 46], "timit": [12, 27, 36, 37, 46], "introduct": [12, 40, 46], "contribut": 12, "who": 13, "about": [13, 16, 17, 18, 25, 29, 31, 34, 43, 44, 45], "suggest": [13, 31, 33, 34, 43, 44, 45], "virut": 13, "venv": 13, "my_env": 13, "bin": [13, 16, 17, 18, 23, 28], "matter": [13, 16], "compil": [13, 16, 17, 23, 25, 28], "wheel": [13, 16], "don": [13, 16, 17, 18, 20, 23, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "t": [13, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "from_sourc": 13, "for_develop": 13, "alwai": [13, 14], "strongli": 13, "pythonpath": [13, 16, 17, 18], "point": [13, 14, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "folder": [13, 14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "tmp": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "setup": [13, 16, 23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 44, 45], "put": [13, 16, 17, 33, 44], "sever": [13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "switch": [13, 23, 28, 34], "just": [13, 16, 17, 18, 41], "virtualenv": 13, "cpython3": 13, "final": [13, 14, 16, 17, 28, 32], "64": [13, 14, 16, 25, 44], "1540m": 13, "creator": 13, "cpython3posix": 13, "dest": 13, "ceph": [13, 14, 23, 25, 28], "fj": [13, 14, 16, 17, 18, 25, 28], "fangjun": [13, 14, 16, 17, 18, 25, 28], "clear": 13, "no_vcs_ignor": 13, "global": 13, "seeder": 13, "fromappdata": 13, "bundl": 13, "copi": [13, 41], "app_data_dir": 13, "root": [13, 16, 17, 18], "v": [13, 16, 17, 18, 28, 36, 37], "irtualenv": 13, "ad": [13, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44, 45], "seed": 13, "21": [13, 14, 16, 23, 25, 28, 36, 37], "36": [13, 16, 25, 28, 29], "bashactiv": 13, "cshellactiv": 13, "fishactiv": 13, "powershellactiv": 13, "pythonactiv": 13, "xonshactiv": 13, "dev20210822": 13, "cpu": [13, 14, 16, 17, 18, 20, 23, 31, 33, 34, 39, 44, 45], "nightli": 13, "2bcpu": 13, "cp38": 13, "linux_x86_64": 13, "mb": [13, 16, 17, 18], "________________________________": 13, "185": [13, 23, 28, 39], "kb": [13, 16, 17, 18, 36, 37], "graphviz": 13, "17": [13, 14, 16, 17, 18, 23, 28, 36, 37, 43], "none": [13, 23, 28], "18": [13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 36, 37, 43, 44, 45], "cach": [13, 18], "manylinux1_x86_64": 13, "831": [13, 25, 37], "extens": 13, "typing_extens": 13, "26": [13, 16, 17, 18, 25, 28, 37], "successfulli": [13, 16, 17, 18], "req": 13, "7b1b76ge": 13, "q": 13, "audioread": 13, "soundfil": 13, "post1": 13, "py2": 13, "97": [13, 16, 23], "cytoolz": 13, "manylinux_2_17_x86_64": 13, "manylinux2014_x86_64": 13, "dataclass": 13, "h5py": 13, "manylinux_2_12_x86_64": 13, "manylinux2010_x86_64": 13, "684": [13, 23, 39], "intervaltre": 13, "lilcom": 13, "numpi": 13, "15": [13, 14, 16, 17, 18, 25, 26, 28, 36, 39], "40": [13, 16, 17, 18, 26, 28, 32, 36, 37], "pyyaml": 13, "662": 13, "tqdm": 13, "76": [13, 39], "satisfi": 13, "2a1410b": 13, "toolz": 13, "55": [13, 16, 26, 28, 36], "sortedcontain": 13, "cffi": 13, "411": [13, 18, 28], "pycpars": 13, "20": [13, 14, 16, 18, 23, 25, 26, 28, 31, 32, 36, 37, 39, 44], "112": [13, 16, 17, 18], "pypars": 13, "filenam": [13, 16, 17, 18, 19, 20, 21, 33, 34, 43, 45], "dev_2a1410b_clean": 13, "size": [13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "342242": 13, "sha256": 13, "f683444afa4dc0881133206b4646a": 13, "9d0f774224cc84000f55d0a67f6e4a37997": 13, "store": [13, 28], "ephem": 13, "ftu0qysz": 13, "7f": 13, "7a": 13, "8e": 13, "a0bf241336e2e3cb573e1e21e5600952d49f5162454f2e612f": 13, "warn": 13, "built": 13, "invalid": [13, 28], "metadata": [13, 36, 37], "mandat": 13, "pep": 13, "440": 13, "packa": 13, "ging": 13, "deprec": [13, 25], "legaci": 13, "becaus": 13, "could": [13, 16, 17, 18, 23, 26], "replac": [13, 16, 17], "discuss": 13, "regard": 13, "pypa": 13, "sue": 13, "8368": 13, "inter": 13, "valtre": 13, "sor": 13, "tedcontain": 13, "remot": 13, "enumer": 13, "count": 13, "100": [13, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "compress": 13, "308": [13, 23, 25, 26], "total": [13, 17, 18, 23, 25, 26, 28, 29, 31, 32, 39, 43, 44], "delta": 13, "reus": 13, "307": 13, "102": [13, 18, 23], "pack": [13, 44, 45], "receiv": 13, "172": 13, "49": [13, 16, 17, 28, 37, 39], "kib": 13, "385": 13, "00": [13, 16, 23, 25, 26, 28, 32, 36, 37, 39], "resolv": 13, "kaldilm": 13, "tar": 13, "gz": 13, "48": [13, 16, 17, 23, 25], "574": 13, "kaldialign": 13, "sentencepiec": [13, 28], "41": [13, 16, 18, 23, 25, 36, 39], "absl": 13, "absl_pi": 13, "googl": [13, 31, 33, 34, 43, 44, 45], "auth": 13, "oauthlib": 13, "google_auth_oauthlib": 13, "grpcio": 13, "ment": 13, "requi": 13, "rement": 13, "protobuf": 13, "manylinux_2_5_x86_64": 13, "werkzeug": 13, "288": 13, "tensorboard_data_serv": 13, "google_auth": 13, "35": [13, 14, 16, 17, 18, 25, 28, 43], "152": 13, "request": [13, 41], "plugin": 13, "wit": 13, "tensorboard_plugin_wit": 13, "781": 13, "markdown": 13, "six": 13, "16": [13, 14, 16, 17, 18, 21, 23, 25, 26, 28, 31, 32, 36, 37, 39, 43, 44, 45], "cachetool": 13, "rsa": 13, "pyasn1": 13, "pyasn1_modul": 13, "155": 13, "requests_oauthlib": 13, "urllib3": 13, "27": [13, 16, 17, 18, 23, 25, 32, 37], "138": [13, 23, 25], "certifi": 13, "2017": 13, "2021": [13, 23, 26, 28, 32, 36, 37, 39], "145": 13, "charset": 13, "normal": [13, 32, 36, 37, 39, 44], "charset_norm": 13, "idna": 13, "146": 13, "897233": 13, "eccb906cafcd45bf9a7e1a1718e4534254bfb": 13, "f4c0d0cbc66eee6c88d68a63862": 13, "85": 13, "7d": 13, "f2dd586369b8797cb36d213bf3a84a789eeb92db93d2e723c9": 13, "etool": 13, "oaut": 13, "hlib": 13, "2023": [13, 16, 17, 18, 33], "05": [13, 14, 16, 17, 23, 25, 26, 28, 37], "main": [13, 23, 28, 41], "dl_dir": [13, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "waves_yesno": 13, "_______________________________________________________________": 13, "70m": 13, "06": [13, 14, 16, 26, 28, 32, 39], "54": [13, 17, 18, 28, 32, 36, 37], "4kb": 13, "02": [13, 14, 16, 17, 18, 25, 28, 31, 37, 43, 44], "19": [13, 14, 16, 17, 18, 23, 28, 32, 36, 37], "manifest": [13, 29], "fbank": [13, 14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39], "199": [13, 28, 32], "info": [13, 14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39], "compute_fbank_yesno": 13, "process": [13, 14, 16, 17, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "extract": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "featur": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "90": [13, 16], "212": 13, "60it": 13, "640": [13, 18], "304": [13, 17], "53it": 13, "51": [13, 16, 23, 28, 39], "lang": [13, 14, 25, 28, 34], "66": [13, 17], "project": 13, "csrc": [13, 28], "arpa_file_pars": 13, "cc": 13, "void": 13, "arpafilepars": 13, "std": 13, "istream": 13, "79": 13, "140": [13, 26], "92": [13, 28], "hlg": [13, 32, 36, 37, 39], "28": [13, 16, 17, 25, 28, 32], "581": [13, 16, 32], "compile_hlg": 13, "124": [13, 23, 28], "lang_phon": [13, 26, 32, 36, 37, 39], "582": 13, "lexicon": [13, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "171": [13, 26, 28, 36, 37], "convert": [13, 16, 17, 18, 28], "l": [13, 16, 17, 18, 25, 36, 37, 39], "linv": [13, 25, 28, 39], "609": 13, "ctc_topo": 13, "max_token_id": 13, "611": 13, "intersect": [13, 31, 44, 45], "613": 13, "lg": [13, 31, 34, 44, 45], "shape": [13, 18], "connect": [13, 14, 28, 31, 32, 43, 44, 45], "614": 13, "68": [13, 28], "70": 13, "class": [13, 28], "tensor": [13, 17, 18, 23, 25, 26, 28, 31, 39, 43, 44], "71": [13, 28, 32], "determin": 13, "615": 13, "rag": 13, "raggedtensor": 13, "remov": [13, 23, 25, 26, 28, 32, 36, 37], "disambigu": 13, "616": 13, "91": 13, "remove_epsilon": 13, "617": 13, "arc": 13, "compos": 13, "h": 13, "619": 13, "106": [13, 17, 28], "109": [13, 23, 28], "111": [13, 28], "127": [13, 16, 17, 39], "cuda_visible_devic": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "case": [13, 14, 16, 17, 18, 31, 33, 34, 43, 44, 45], "segment": 13, "fault": 13, "core": 13, "dump": 13, "protocol_buffers_python_implement": 13, "674": 13, "interest": [13, 29, 31, 33, 34, 43, 44, 45], "759": [13, 25], "481": 13, "482": 13, "posixpath": [13, 16, 17, 18, 25, 28], "lang_dir": [13, 25, 28], "lr": [13, 25, 43], "01": [13, 16, 25, 26, 28, 29, 33], "feature_dim": [13, 14, 16, 17, 18, 23, 25, 28, 39], "weight_decai": 13, "1e": 13, "start_epoch": 13, "best_train_loss": [13, 14, 16, 17, 18], "inf": [13, 14, 16, 17, 18], "best_valid_loss": [13, 14, 16, 17, 18], "best_train_epoch": [13, 14, 16, 17, 18], "best_valid_epoch": [13, 14, 17, 18], "batch_idx_train": [13, 14, 16, 17, 18], "log_interv": [13, 14, 16, 17, 18], "reset_interv": [13, 14, 16, 17, 18], "valid_interv": [13, 14, 16, 17, 18], "beam_siz": [13, 14, 25], "sum": 13, "use_double_scor": [13, 23, 28, 39], "world_siz": [13, 29], "master_port": 13, "12354": 13, "num_epoch": 13, "feature_dir": [13, 28], "max_dur": [13, 28], "bucketing_sampl": [13, 28], "num_bucket": [13, 28], "concatenate_cut": [13, 28], "duration_factor": [13, 28], "gap": [13, 28], "on_the_fly_feat": [13, 28], "shuffl": [13, 28], "return_cut": [13, 28], "num_work": [13, 28], "env_info": [13, 14, 16, 17, 18, 23, 25, 28], "releas": [13, 14, 16, 17, 18, 23, 25, 28], "sha1": [13, 14, 16, 17, 18, 23, 25, 28], "3b7f09fa35e72589914f67089c0da9f196a92ca4": 13, "date": [13, 14, 16, 17, 18, 23, 25, 28], "mon": [13, 17, 18], "6fcfced": 13, "cu118": 13, "branch": [13, 14, 16, 17, 18, 23, 25, 28, 33], "30bde4b": 13, "thu": [13, 14, 16, 17, 18, 25, 28, 32], "37": [13, 17, 23, 25, 28, 36], "47": [13, 16, 17, 18, 23, 28], "dev20230512": 13, "torch2": 13, "hostnam": [13, 14, 16, 17, 18, 25], "host": [13, 14], "ip": [13, 14, 16, 17, 18, 25], "761": 13, "168": [13, 32], "764": 13, "495": 13, "devic": [13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 44, 45], "791": [13, 32], "cut": [13, 28], "244": 13, "852": 13, "149": [13, 16, 28], "singlecutsampl": 13, "205": [13, 28], "853": 13, "218": [13, 17], "252": 13, "986": 13, "422": 13, "batch": [13, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "loss": [13, 16, 17, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "065": 13, "over": [13, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "2436": 13, "frame": [13, 25, 31, 33, 44, 45], "tot_loss": 13, "4561": 13, "2828": 13, "7076": 13, "22192": 13, "691": 13, "444": 13, "9002": 13, "18067": 13, "996": 13, "2555": 13, "2695": 13, "484": 13, "34971": 13, "217": [13, 23, 28], "4688": 13, "251": [13, 36, 37], "75": [13, 16], "389": [13, 26, 28], "2532": 13, "637": 13, "1139": 13, "1592": 13, "859": 13, "1629": 13, "094": 13, "0767": 13, "118": [13, 28], "350": 13, "06778": 13, "395": 13, "789": 13, "01056": 13, "016": 13, "009022": 13, "009985": 13, "271": [13, 14, 17], "01088": 13, "497": 13, "01174": 13, "01077": 13, "747": 13, "01087": 13, "783": 13, "921": 13, "01045": 13, "008957": 13, "009903": 13, "374": 13, "01092": 13, "598": [13, 28], "01169": 13, "01065": 13, "824": 13, "862": [13, 17], "865": [13, 17], "555": 13, "483": 13, "264": [13, 18], "search_beam": [13, 23, 28, 39], "output_beam": [13, 23, 28, 39], "min_active_st": [13, 23, 28, 39], "max_active_st": [13, 23, 28, 39], "10000": [13, 23, 28, 39], "487": 13, "273": [13, 14, 25], "513": 13, "291": 13, "521": 13, "675": 13, "204": [13, 18, 28], "until": [13, 28, 33], "923": 13, "241": [13, 23], "transcript": [13, 23, 24, 25, 26, 28, 31, 32, 36, 37, 43, 44, 45], "recog": [13, 25, 28], "test_set": [13, 39], "924": 13, "558": 13, "240": [13, 23, 39], "ins": [13, 28, 39], "del": [13, 28, 39], "sub": [13, 28, 39], "925": 13, "249": [13, 17], "wrote": [13, 28], "detail": [13, 15, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "stat": [13, 28], "err": [13, 25, 28], "316": [13, 28], "congratul": [13, 16, 17, 18, 23, 26, 28, 32, 36, 37, 39], "fun": [13, 16, 17], "debug": 13, "variou": [13, 19, 22, 46], "period": [14, 16], "disk": 14, "optim": [14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "relat": [14, 23, 25, 28, 32, 36, 37, 39], "resum": [14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "strip": 14, "reduc": [14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "each": [14, 16, 17, 19, 23, 25, 26, 28, 31, 33, 34, 41, 43, 44, 45], "well": [14, 39, 46], "usag": [14, 16, 17, 18, 20, 21, 32, 36, 37, 39], "pruned_transducer_stateless3": [14, 20, 41], "almost": [14, 31, 41, 44, 45], "dict": [14, 18], "csukuangfj": [14, 16, 17, 19, 23, 25, 26, 28, 32, 36, 37, 39, 43], "stateless3": [14, 16], "repo": [14, 19], "prefix": 14, "those": 14, "wave": [14, 16, 17, 18, 23, 28], "iter": [14, 16, 17, 18, 21, 31, 33, 34, 43, 44, 45], "1224000": 14, "greedy_search": [14, 25, 31, 33, 43, 44, 45], "test_wav": [14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "1089": [14, 16, 17, 18, 19, 28, 32], "134686": [14, 16, 17, 18, 19, 28, 32], "0001": [14, 16, 17, 18, 19, 28, 32], "wav": [14, 16, 17, 18, 19, 21, 23, 25, 26, 28, 31, 33, 34, 36, 37, 39, 43, 44, 45], "1221": [14, 16, 17, 28, 32], "135766": [14, 16, 17, 28, 32], "0002": [14, 16, 17, 28, 32], "multipl": [14, 23, 25, 26, 28, 32, 36, 37, 39], "sound": [14, 16, 17, 18, 21, 22, 23, 25, 26, 28, 32, 36, 37, 39], "Its": [14, 16, 17, 18, 28], "output": [14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "09": [14, 17, 23, 25, 26, 28, 43], "233": [14, 16, 17], "265": 14, "50": [14, 16, 17, 18, 28, 31, 36, 43, 44, 45], "200": [14, 16, 17, 18, 23, 28, 29, 36, 37, 39], "3000": [14, 16, 17, 18], "80": [14, 16, 17, 18, 23, 25, 28], "subsampling_factor": [14, 17, 18, 23, 25, 28], "encoder_dim": [14, 16, 17, 18], "512": [14, 16, 17, 18, 23, 25, 28], "nhead": [14, 16, 18, 23, 25, 28, 31, 44], "dim_feedforward": [14, 16, 17, 25], "num_encoder_lay": [14, 16, 17, 18, 25], "decoder_dim": [14, 16, 17, 18], "joiner_dim": [14, 16, 17, 18], "model_warm_step": [14, 16, 17], "4810e00d8738f1a21278b0156a42ff396a2d40ac": 14, "fri": 14, "oct": [14, 28], "03": [14, 17, 25, 28, 36, 37, 43], "miss": [14, 16, 17, 18, 25, 28], "cu102": [14, 16, 17, 18], "1013": 14, "c39cba5": 14, "dirti": [14, 16, 17, 23, 28], "jsonl": 14, "de": [14, 16, 17, 18, 25], "74279": [14, 16, 17, 18, 25], "0324160024": 14, "65bfd8b584": 14, "jjlbn": 14, "bpe_model": [14, 16, 17, 18, 28], "sound_fil": [14, 23, 25, 28, 39], "sample_r": [14, 23, 25, 28, 39], "16000": [14, 23, 25, 26, 28, 32, 33, 36, 37], "beam": [14, 43], "max_context": 14, "max_stat": 14, "context_s": [14, 16, 17, 18, 25], "max_sym_per_fram": [14, 25], "simulate_stream": 14, "decode_chunk_s": 14, "left_context": 14, "dynamic_chunk_train": 14, "causal_convolut": 14, "short_chunk_s": [14, 18, 44, 45], "25": [14, 16, 17, 23, 28, 31, 36, 37, 39, 44], "num_left_chunk": [14, 18], "blank_id": [14, 16, 17, 18, 25], "unk_id": 14, "vocab_s": [14, 16, 17, 18, 25], "612": 14, "458": 14, "disabl": [14, 16, 17], "giga": [14, 17, 43], "623": 14, "277": 14, "78648040": 14, "951": [14, 28], "285": [14, 25, 28], "construct": [14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39], "952": 14, "295": [14, 23, 25, 26, 28], "957": 14, "301": [14, 28], "700": 14, "329": [14, 17, 28], "912": 14, "388": 14, "earli": [14, 16, 17, 18, 28, 32], "nightfal": [14, 16, 17, 18, 28, 32], "THE": [14, 16, 17, 18, 28, 32], "yellow": [14, 16, 17, 18, 28, 32], "lamp": [14, 16, 17, 18, 28, 32], "light": [14, 16, 17, 18, 28, 32], "AND": [14, 16, 17, 18, 28, 32], "THERE": [14, 16, 17, 18, 28, 32], "squalid": [14, 16, 17, 18, 28, 32], "quarter": [14, 16, 17, 18, 28, 32], "OF": [14, 16, 17, 18, 28, 32], "brothel": [14, 16, 17, 18, 28, 32], "god": [14, 28, 32], "AS": [14, 28, 32], "direct": [14, 28, 32], "consequ": [14, 28, 32], "sin": [14, 28, 32], "man": [14, 28, 32], "punish": [14, 28, 32], "had": [14, 28, 32], "her": [14, 28, 32], "love": [14, 28, 32], "child": [14, 28, 32], "whose": [14, 25, 28, 32], "ON": [14, 16, 28, 32], "THAT": [14, 28, 32], "dishonor": [14, 28, 32], "bosom": [14, 28, 32], "TO": [14, 28, 32], "parent": [14, 28, 32], "forev": [14, 28, 32], "WITH": [14, 28, 32], "race": [14, 28, 32], "descent": [14, 28, 32], "mortal": [14, 28, 32], "BE": [14, 28, 32], "bless": [14, 28, 32], "soul": [14, 28, 32], "IN": [14, 28, 32], "heaven": [14, 28, 32], "yet": [14, 16, 17, 28, 32], "THESE": [14, 28, 32], "thought": [14, 28, 32], "affect": [14, 28, 32], "hester": [14, 28, 32], "prynn": [14, 28, 32], "hope": [14, 24, 28, 32], "apprehens": [14, 28, 32], "390": 14, "down": [14, 23, 28, 31, 33, 34, 43, 44, 45], "reproduc": [14, 28], "9999": [14, 33, 34, 43], "symlink": 14, "pass": [14, 18, 23, 25, 26, 28, 31, 33, 34, 41, 43, 44, 45], "reason": [14, 16, 17, 18, 44], "support": [15, 16, 17, 18, 23, 25, 28, 31, 33, 34, 41, 43, 44, 45], "zipform": [15, 19, 22, 27, 30, 40, 42], "convemform": [15, 22, 41], "platform": [15, 19], "android": [15, 16, 17, 18, 19], "raspberri": [15, 19], "pi": [15, 19], "\u7231\u82af\u6d3e": 15, "maix": 15, "iii": 15, "axera": 15, "rv1126": 15, "static": 15, "produc": [15, 31, 33, 34, 43, 44, 45], "binari": [15, 16, 17, 18, 23, 25, 26, 28, 31, 39, 43, 44], "everyth": 15, "pnnx": [15, 22], "torchscript": [15, 20, 21, 22], "encod": [15, 19, 21, 22, 23, 25, 26, 28, 31, 32, 33, 39, 41, 43, 44, 45], "option": [15, 19, 22, 25, 29, 32, 36, 37, 39], "int8": [15, 22], "quantiz": [15, 22, 29], "conv": [16, 17], "emform": [16, 17, 20], "stateless2": [16, 17, 43], "07": [16, 17, 18, 23, 25, 26, 28], "ubuntu": [16, 17, 18], "cpp": [16, 20], "pretrained_model": [16, 17, 18], "online_transduc": 16, "continu": [16, 17, 18, 19, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "jit_xxx": [16, 17, 18], "anywher": [16, 17], "submodul": 16, "updat": [16, 17, 18], "recurs": 16, "init": 16, "cmake": [16, 17, 23, 28], "dcmake_build_typ": [16, 23, 28], "dncnn_python": 16, "dncnn_build_benchmark": 16, "dncnn_build_exampl": 16, "dncnn_build_tool": 16, "j4": 16, "pwd": 16, "src": [16, 18], "compon": [16, 41], "ncnn2int8": [16, 17], "our": [16, 17, 18, 20, 21, 28, 29, 31, 41, 44, 45], "cpython": 16, "gnu": 16, "am": 16, "sai": [16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "doe": [16, 17, 18, 23, 25, 28, 39], "later": [16, 17, 18, 23, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "termin": 16, "tencent": [16, 17], "modif": [16, 25], "offic": 16, "synchron": 16, "offici": 16, "renam": [16, 17, 18], "conv_emformer_transducer_stateless2": [16, 41], "length": [16, 18, 25, 44, 45], "cnn": [16, 18], "kernel": [16, 18, 25], "31": [16, 17, 18, 28], "context": [16, 25, 31, 41, 43, 44, 45], "memori": [16, 23, 25, 28, 41], "configur": [16, 18, 25, 29, 32, 36, 37, 39], "accordingli": [16, 17, 18], "yourself": [16, 17, 18, 29, 44, 45], "combin": [16, 17, 18], "677": 16, "220": [16, 25, 26, 28], "681": 16, "229": [16, 23], "best_v": 16, "alid_epoch": 16, "subsampl": [16, 44, 45], "ing_factor": 16, "a34171ed85605b0926eebbd0463d059431f4f74a": 16, "wed": [16, 23, 25, 28], "dec": 16, "ver": 16, "ion": 16, "530e8a1": 16, "tue": [16, 28], "star": [16, 17, 18], "op": 16, "1220120619": [16, 17, 18], "7695ff496b": [16, 17, 18], "s9n4w": [16, 17, 18], "icefa": 16, "ll": 16, "transdu": 16, "cer": 16, "use_averaged_model": [16, 17, 18], "cnn_module_kernel": [16, 18], "left_context_length": 16, "chunk_length": 16, "right_context_length": 16, "memory_s": 16, "231": [16, 17, 18], "053": 16, "022": 16, "708": [16, 23, 25, 28, 39], "315": [16, 23, 25, 26, 28, 32], "75490012": 16, "318": [16, 17], "320": [16, 25], "682": 16, "lh": [16, 17, 18], "rw": [16, 17, 18], "kuangfangjun": [16, 17, 18], "289m": 16, "jan": [16, 17, 18], "289": 16, "roughli": [16, 17, 18], "equal": [16, 17, 18, 44, 45], "1024": [16, 17, 18, 43], "287": [16, 39], "1010k": [16, 17], "decoder_jit_trac": [16, 17, 18, 21, 43, 45], "283m": 16, "encoder_jit_trac": [16, 17, 18, 21, 43, 45], "0m": [16, 17], "joiner_jit_trac": [16, 17, 18, 21, 43, 45], "sure": [16, 17, 18], "found": [16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "param": [16, 17, 18], "503k": [16, 17], "437": [16, 17, 18], "142m": 16, "79k": 16, "5m": [16, 17], "architectur": [16, 17, 18, 43], "editor": [16, 17, 18], "content": [16, 17, 18], "283": [16, 18], "1010": [16, 17], "142": [16, 23, 26, 28], "503": [16, 17], "convers": [16, 17, 18], "half": [16, 17, 18, 31, 44, 45], "default": [16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "float32": [16, 17, 18], "float16": [16, 17, 18], "occupi": [16, 17, 18], "byte": [16, 17, 18], "twice": [16, 17, 18], "smaller": [16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "fp16": [16, 17, 18, 31, 33, 34, 43, 44, 45], "won": [16, 17, 18, 19, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "accept": [16, 17, 18], "216": [16, 23, 28, 36, 37], "encoder_param_filenam": [16, 17, 18], "encoder_bin_filenam": [16, 17, 18], "decoder_param_filenam": [16, 17, 18], "decoder_bin_filenam": [16, 17, 18], "joiner_param_filenam": [16, 17, 18], "joiner_bin_filenam": [16, 17, 18], "sound_filenam": [16, 17, 18], "141": 16, "328": 16, "151": 16, "331": [16, 17, 28, 32], "176": [16, 25, 28], "336": 16, "106000": [16, 17, 18, 28, 32], "381": 16, "7767517": [16, 17, 18], "1060": 16, "1342": 16, "in0": [16, 17, 18], "explan": [16, 17, 18], "magic": [16, 17, 18], "intermedi": [16, 17, 18], "mean": [16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "increment": [16, 17, 18], "1061": 16, "sherpametadata": [16, 17, 18], "sherpa_meta_data1": [16, 17, 18], "still": [16, 17, 18], "newli": [16, 17, 18], "must": [16, 17, 18, 44], "kei": [16, 17, 18, 28], "eas": [16, 17, 18], "list": [16, 17, 18, 23, 25, 26, 28, 32, 36, 37], "pair": [16, 17, 18], "sad": [16, 17, 18], "rememb": [16, 17, 18], "anymor": [16, 17, 18], "flexibl": [16, 17, 18], "edit": [16, 17, 18], "arm": [16, 17, 18], "aarch64": [16, 17, 18], "onc": [16, 17], "mayb": [16, 17], "year": [16, 17], "_jit_trac": [16, 17], "56": [16, 17, 28, 36], "fp32": [16, 17], "doubl": [16, 17], "j": [16, 17, 23, 28], "py38": [16, 17, 18], "arg": [16, 17], "wave_filenam": [16, 17], "16k": [16, 17], "hz": [16, 17, 36, 37], "mono": [16, 17], "calibr": [16, 17], "cat": [16, 17], "eof": [16, 17], "calcul": [16, 17, 33, 44, 45], "has_gpu": [16, 17], "config": [16, 17], "use_vulkan_comput": [16, 17], "88": [16, 25], "conv_87": 16, "942385": [16, 17], "threshold": [16, 17, 33], "938493": 16, "968131": 16, "conv_88": 16, "442448": 16, "549335": 16, "167552": 16, "conv_89": 16, "228289": 16, "001738": 16, "871552": 16, "linear_90": 16, "976146": 16, "101789": 16, "115": [16, 17, 23, 28], "267128": 16, "linear_91": 16, "962030": 16, "162033": 16, "602713": 16, "linear_92": 16, "323041": 16, "853959": 16, "953129": 16, "linear_94": 16, "905416": 16, "648006": 16, "323545": 16, "linear_93": 16, "474093": 16, "200188": 16, "linear_95": 16, "888012": 16, "403563": 16, "483986": 16, "linear_96": 16, "856741": 16, "398679": 16, "524273": 16, "linear_97": 16, "635942": 16, "613655": 16, "590950": 16, "linear_98": 16, "460340": 16, "670146": 16, "398010": 16, "linear_99": 16, "532276": 16, "585537": 16, "119396": 16, "linear_101": 16, "585871": 16, "719224": 16, "205809": 16, "linear_100": 16, "751382": 16, "081648": 16, "linear_102": 16, "593344": 16, "450581": 16, "87": 16, "551147": 16, "linear_103": 16, "592681": 16, "705824": 16, "257959": 16, "linear_104": 16, "752957": 16, "980955": 16, "110489": 16, "linear_105": 16, "696240": 16, "877193": 16, "608953": 16, "linear_106": 16, "059659": 16, "643138": 16, "048950": 16, "linear_108": 16, "975461": 16, "589567": 16, "671457": 16, "linear_107": 16, "190381": 16, "515701": 16, "linear_109": 16, "710759": 16, "305635": 16, "082436": 16, "linear_110": 16, "531228": 16, "731162": 16, "159557": 16, "linear_111": 16, "528083": 16, "259322": 16, "211544": 16, "linear_112": 16, "148807": 16, "500842": 16, "087374": 16, "linear_113": 16, "592566": 16, "948851": 16, "166611": 16, "linear_115": 16, "437109": 16, "608947": 16, "642395": 16, "linear_114": 16, "193942": 16, "503904": 16, "linear_116": 16, "966980": 16, "200896": 16, "676392": 16, "linear_117": 16, "451303": 16, "061664": 16, "951344": 16, "linear_118": 16, "077262": 16, "965800": 16, "023804": 16, "linear_119": 16, "671615": 16, "847613": 16, "198460": 16, "linear_120": 16, "625638": 16, "131427": 16, "556595": 16, "linear_122": 16, "274080": 16, "888716": 16, "978189": 16, "linear_121": 16, "420480": 16, "429659": 16, "linear_123": 16, "826197": 16, "599617": 16, "281532": 16, "linear_124": 16, "396383": 16, "325849": 16, "335875": 16, "linear_125": 16, "337198": 16, "941410": 16, "221970": 16, "linear_126": 16, "699965": 16, "842878": 16, "224073": 16, "linear_127": 16, "775370": 16, "884215": 16, "696438": 16, "linear_129": 16, "872276": 16, "837319": 16, "254213": 16, "linear_128": 16, "180057": 16, "687883": 16, "linear_130": 16, "150427": 16, "454298": 16, "765789": 16, "linear_131": 16, "112692": 16, "924847": 16, "025545": 16, "linear_132": 16, "852893": 16, "116593": 16, "749626": 16, "linear_133": 16, "517084": 16, "024665": 16, "275314": 16, "linear_134": 16, "683807": 16, "878618": 16, "743618": 16, "linear_136": 16, "421055": 16, "322729": 16, "086264": 16, "linear_135": 16, "309880": 16, "917679": 16, "linear_137": 16, "827781": 16, "744595": 16, "33": [16, 17, 23, 24, 25, 28, 36], "915554": 16, "linear_138": 16, "422395": 16, "742882": 16, "402161": 16, "linear_139": 16, "527538": 16, "866123": 16, "849449": 16, "linear_140": 16, "128619": 16, "657793": 16, "266134": 16, "linear_141": 16, "839593": 16, "845993": 16, "021378": 16, "linear_143": 16, "442304": 16, "099039": 16, "889746": 16, "linear_142": 16, "325038": 16, "849592": 16, "linear_144": 16, "929444": 16, "618206": 16, "605080": 16, "linear_145": 16, "382126": 16, "321095": 16, "625010": 16, "linear_146": 16, "894987": 16, "867645": 16, "836517": 16, "linear_147": 16, "915313": 16, "906028": 16, "886522": 16, "linear_148": 16, "614287": 16, "908151": 16, "496181": 16, "linear_150": 16, "724932": 16, "485588": 16, "312899": 16, "linear_149": 16, "161146": 16, "606939": 16, "linear_151": 16, "164453": 16, "847355": 16, "719223": 16, "linear_152": 16, "086471": 16, "984121": 16, "222834": 16, "linear_153": 16, "099524": 16, "991601": 16, "816805": 16, "linear_154": 16, "054585": 16, "489706": 16, "286930": 16, "linear_155": 16, "389185": 16, "100321": 16, "963501": 16, "linear_157": 16, "982999": 16, "154796": 16, "637253": 16, "linear_156": 16, "537706": 16, "875190": 16, "linear_158": 16, "420287": 16, "502287": 16, "531588": 16, "linear_159": 16, "014746": 16, "423280": 16, "477261": 16, "linear_160": 16, "633553": 16, "715335": 16, "220921": 16, "linear_161": 16, "371849": 16, "117830": 16, "815203": 16, "linear_162": 16, "492933": 16, "126283": 16, "623318": 16, "linear_164": 16, "697504": 16, "825712": 16, "317358": 16, "linear_163": 16, "078367": 16, "008038": 16, "linear_165": 16, "023975": 16, "836278": 16, "577358": 16, "linear_166": 16, "860619": 16, "259792": 16, "493614": 16, "linear_167": 16, "380934": 16, "496160": 16, "107042": 16, "linear_168": 16, "691216": 16, "733317": 16, "831076": 16, "linear_169": 16, "723948": 16, "952728": 16, "129707": 16, "linear_171": 16, "034811": 16, "366547": 16, "665123": 16, "linear_170": 16, "356277": 16, "710501": 16, "linear_172": 16, "556884": 16, "729481": 16, "166058": 16, "linear_173": 16, "033039": 16, "207264": 16, "442120": 16, "linear_174": 16, "597379": 16, "658676": 16, "768131": 16, "linear_2": [16, 17], "293503": 16, "305265": 16, "877850": 16, "linear_1": [16, 17], "812222": 16, "766452": 16, "487047": 16, "linear_3": [16, 17], "999999": 16, "999755": 16, "031174": 16, "wish": [16, 17], "955k": 16, "18k": 16, "inparam": [16, 17], "inbin": [16, 17], "outparam": [16, 17], "outbin": [16, 17], "99m": 16, "78k": 16, "774k": [16, 17], "496": [16, 17, 28, 32], "774": [16, 17], "linear": [16, 17, 25], "convolut": [16, 17, 33, 41, 44], "exact": [16, 17], "4x": [16, 17], "comparison": 16, "44": [16, 17, 28, 36, 37], "468000": [17, 21, 43], "lstm_transducer_stateless2": [17, 21, 43], "222": [17, 26, 28], "is_pnnx": 17, "62e404dd3f3a811d73e424199b3408e309c06e1a": [17, 18], "6d7a559": [17, 18], "feb": [17, 18, 25], "147": [17, 18], "rnn_hidden_s": 17, "aux_layer_period": 17, "235": 17, "239": [17, 25], "472": 17, "595": 17, "324": 17, "83137520": 17, "596": 17, "325": 17, "257024": 17, "326": 17, "781812": 17, "327": 17, "84176356": 17, "182": [17, 18, 23, 32], "158": 17, "183": [17, 36, 37], "335": 17, "101": 17, "tracerwarn": [17, 18], "boolean": [17, 18], "caus": [17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "incorrect": [17, 18, 25], "flow": [17, 18], "constant": [17, 18], "futur": [17, 18, 25, 46], "need_pad": 17, "bool": 17, "259": [17, 23], "180": [17, 23, 28], "339": 17, "207": [17, 26, 28], "84": [17, 23], "324m": 17, "321": [17, 23], "107": [17, 32], "318m": 17, "159m": 17, "21k": 17, "159": [17, 28, 39], "861": 17, "425": [17, 28], "427": [17, 28], "266": [17, 18, 28, 32], "431": 17, "342": 17, "343": 17, "267": [17, 25, 36, 37], "379": 17, "268": [17, 28, 32], "317m": 17, "317": 17, "conv_15": 17, "930708": 17, "972025": 17, "conv_16": 17, "978855": 17, "031788": 17, "456645": 17, "conv_17": 17, "868437": 17, "830528": 17, "218575": 17, "linear_18": 17, "107259": 17, "194808": 17, "293236": 17, "linear_19": 17, "193777": 17, "634748": 17, "401705": 17, "linear_20": 17, "259933": 17, "606617": 17, "722160": 17, "linear_21": 17, "186600": 17, "790260": 17, "512129": 17, "linear_22": 17, "759041": 17, "265832": 17, "050053": 17, "linear_23": 17, "931209": 17, "099090": 17, "979767": 17, "linear_24": 17, "324160": 17, "215561": 17, "321835": 17, "linear_25": 17, "800708": 17, "599352": 17, "284134": 17, "linear_26": 17, "492444": 17, "153369": 17, "274391": 17, "linear_27": 17, "660161": 17, "720994": 17, "46": [17, 23, 28], "674126": 17, "linear_28": 17, "415265": 17, "174434": 17, "007133": 17, "linear_29": 17, "038418": 17, "118534": 17, "724262": 17, "linear_30": 17, "072084": 17, "936867": 17, "259155": 17, "linear_31": 17, "342712": 17, "599489": 17, "282787": 17, "linear_32": 17, "340535": 17, "120308": 17, "701103": 17, "linear_33": 17, "846987": 17, "630030": 17, "985939": 17, "linear_34": 17, "686298": 17, "204571": 17, "607586": 17, "linear_35": 17, "904821": 17, "575518": 17, "756420": 17, "linear_36": 17, "806659": 17, "585589": 17, "118401": 17, "linear_37": 17, "402340": 17, "047157": 17, "162680": 17, "linear_38": 17, "174589": 17, "923361": 17, "030258": 17, "linear_39": 17, "178576": 17, "556058": 17, "807705": 17, "linear_40": 17, "901954": 17, "301267": 17, "956539": 17, "linear_41": 17, "839805": 17, "597429": 17, "716181": 17, "linear_42": 17, "178945": 17, "651595": 17, "895699": 17, "829245": 17, "627592": 17, "637907": 17, "746186": 17, "255032": 17, "167313": 17, "000000": 17, "999756": 17, "031013": 17, "345k": 17, "17k": 17, "218m": 17, "counterpart": 17, "bit": [17, 23, 25, 26, 28, 32, 39], "4532": 17, "feedforward": [18, 25, 31, 44], "384": [18, 28], "192": [18, 28], "unmask": 18, "256": [18, 36, 37], "downsampl": [18, 24], "factor": [18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "473": [18, 28], "246": [18, 25, 28, 36, 37], "477": 18, "warm_step": 18, "2000": [18, 26], "feedforward_dim": 18, "attention_dim": [18, 23, 25, 28], "encoder_unmasked_dim": 18, "zipformer_downsampling_factor": 18, "decode_chunk_len": 18, "257": [18, 25, 36, 37], "023": 18, "zipformer2": 18, "419": 18, "At": [18, 23, 28], "stack": 18, "downsampling_factor": 18, "037": 18, "655": 18, "346": 18, "68944004": 18, "347": 18, "260096": 18, "348": [18, 36], "716276": 18, "656": [18, 28], "349": 18, "69920376": 18, "351": 18, "353": 18, "174": [18, 28], "175": 18, "1344": 18, "assert": 18, "cached_len": 18, "num_lay": 18, "1348": 18, "cached_avg": 18, "1352": 18, "cached_kei": 18, "1356": 18, "cached_v": 18, "1360": 18, "cached_val2": 18, "1364": 18, "cached_conv1": 18, "1368": 18, "cached_conv2": 18, "1373": 18, "left_context_len": 18, "1884": 18, "x_size": 18, "2442": 18, "2449": 18, "2469": 18, "2473": 18, "2483": 18, "kv_len": 18, "k": [18, 31, 36, 37, 43, 44, 45], "2570": 18, "attn_output": 18, "bsz": 18, "num_head": 18, "seq_len": 18, "head_dim": 18, "2926": 18, "lorder": 18, "2652": 18, "2653": 18, "embed_dim": 18, "2666": 18, "1543": 18, "in_x_siz": 18, "1637": 18, "1643": 18, "in_channel": 18, "1571": 18, "1763": 18, "src1": 18, "src2": 18, "1779": 18, "dim1": 18, "1780": 18, "dim2": 18, "_trace": 18, "958": 18, "tracer": 18, "instead": [18, 25, 44], "tupl": 18, "namedtupl": 18, "absolut": 18, "know": [18, 29], "side": 18, "strict": [18, 24], "allow": [18, 31, 44], "behavior": [18, 25], "_c": 18, "_create_method_from_trac": 18, "646": 18, "357": 18, "embedding_out": 18, "686": 18, "361": [18, 28, 32], "735": 18, "69": 18, "269m": 18, "53": [18, 23, 31, 32, 37, 43, 44], "269": [18, 23, 36, 37], "725": [18, 32], "1022k": 18, "266m": 18, "8m": 18, "509k": 18, "133m": 18, "152k": 18, "4m": 18, "1022": 18, "133": 18, "509": 18, "260": [18, 28], "360": 18, "365": 18, "280": [18, 28], "372": [18, 23], "state": [18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "026": 18, "410": 18, "2028": 18, "2547": 18, "2029": 18, "23316": 18, "23317": 18, "23318": 18, "23319": 18, "23320": 18, "amount": [18, 24], "pad": [18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "conv2dsubsampl": 18, "v2": [18, 23, 28], "arrai": 18, "23300": 18, "element": 18, "onnx_pretrain": 19, "onnxruntim": 19, "separ": 19, "deploi": [19, 23, 28], "repo_url": 19, "basenam": 19, "tree": [20, 21, 23, 25, 26, 28, 32, 36, 37, 39, 43], "cpu_jit": [20, 23, 28, 31, 33, 34, 44, 45], "confus": 20, "move": [20, 31, 33, 34, 44, 45], "why": 20, "streaming_asr": [20, 21, 43, 44, 45], "conv_emform": 20, "offline_asr": [20, 31], "jit_pretrain": [21, 33, 34, 43], "baz": 21, "1best": [23, 26, 28, 32, 33, 34, 36, 37], "automag": [23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "stop": [23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "By": [23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "musan": [23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "thei": [23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "intal": [23, 26], "sudo": [23, 26], "apt": [23, 26], "permiss": [23, 26], "commandlin": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "quit": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "experi": [23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "world": [23, 25, 26, 28, 29, 31, 32, 33, 34, 43, 44, 45], "multi": [23, 25, 26, 28, 29, 31, 33, 34, 41, 43, 44, 45], "machin": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "ddp": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "implement": [23, 25, 26, 28, 29, 31, 33, 34, 41, 43, 44, 45], "present": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "second": [23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "utter": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "oom": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "nvidia": [23, 25, 26, 28], "due": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "decai": [23, 26, 28, 33, 34, 43], "warmup": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "function": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "get_param": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "realli": [23, 26, 28, 31, 33, 34, 43, 44, 45], "directli": [23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "perturb": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "actual": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "3x150": [23, 25, 26], "450": [23, 25, 26], "visual": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "logdir": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "labelsmooth": 23, "someth": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "tensorflow": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "press": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44, 45], "ctrl": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44, 45], "engw8ksktzqs24zbv5dgcg": 23, "22t11": 23, "scan": [23, 25, 26, 28, 31, 39, 43, 44], "116068": 23, "scalar": [23, 25, 26, 28, 31, 39, 43, 44], "listen": [23, 25, 26, 31, 39, 43, 44], "url": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "xxxx": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "saw": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "consol": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "avoid": [23, 25, 28], "nbest": [23, 28, 34], "lattic": [23, 26, 28, 31, 32, 36, 37, 44, 45], "uniqu": [23, 28, 31, 44, 45], "pkufool": [23, 26, 32], "icefall_asr_aishell_conformer_ctc": 23, "transcrib": [23, 25, 26, 28], "v1": [23, 26, 28, 32, 36, 37], "lang_char": [23, 25], "bac009s0764w0121": [23, 25, 26], "bac009s0764w0122": [23, 25, 26], "bac009s0764w0123": [23, 25, 26], "tran": [23, 26, 28, 32, 36, 37], "graph": [23, 26, 28, 31, 32, 36, 37, 44, 45], "id": [23, 26, 28, 32, 36, 37], "conveni": [23, 26, 28, 29], "eo": [23, 26, 28], "soxi": [23, 25, 26, 28, 32, 39], "sampl": [23, 25, 26, 28, 32, 33, 39, 44, 45], "precis": [23, 25, 26, 28, 31, 32, 39, 44, 45], "67263": [23, 25, 26], "cdda": [23, 25, 26, 28, 32, 39], "sector": [23, 25, 26, 28, 32, 39], "135k": [23, 25, 26], "256k": [23, 25, 26, 28], "sign": [23, 25, 26, 28, 39], "integ": [23, 25, 26, 28, 39], "pcm": [23, 25, 26, 28, 39], "65840": [23, 25, 26], "625": [23, 25, 26], "132k": [23, 25, 26], "64000": [23, 25, 26], "300": [23, 25, 26, 28, 29, 31, 44], "128k": [23, 25, 26, 39], "displai": [23, 25, 26, 28], "topologi": [23, 28], "707": [23, 28], "num_decoder_lay": [23, 28], "vgg_frontend": [23, 25, 28], "use_feat_batchnorm": [23, 28], "f2fd997f752ed11bbef4c306652c433e83f9cf12": 23, "sun": 23, "sep": 23, "33cfe45": 23, "d57a873": 23, "nov": [23, 28], "hw": 23, "kangwei": 23, "icefall_aishell3": 23, "k2_releas": 23, "tokens_fil": 23, "words_fil": [23, 28, 39], "num_path": [23, 28, 31, 44, 45], "ngram_lm_scal": [23, 28], "attention_decoder_scal": [23, 28], "nbest_scal": [23, 28], "sos_id": [23, 28], "eos_id": [23, 28], "num_class": [23, 28, 39], "4336": [23, 25], "242": [23, 28], "131": [23, 28], "134": 23, "275": 23, "293": [23, 28], "704": [23, 36], "369": [23, 28], "\u751a": [23, 25], "\u81f3": [23, 25], "\u51fa": [23, 25], "\u73b0": [23, 25], "\u4ea4": [23, 25], "\u6613": [23, 25], "\u51e0": [23, 25], "\u4e4e": [23, 25], "\u505c": [23, 25], "\u6b62": 23, "\u7684": [23, 25, 26], "\u60c5": [23, 25], "\u51b5": [23, 25], "\u4e00": [23, 25], "\u4e8c": [23, 25], "\u7ebf": [23, 25, 26], "\u57ce": [23, 25], "\u5e02": [23, 25], "\u867d": [23, 25], "\u7136": [23, 25], "\u4e5f": [23, 25, 26], "\u5904": [23, 25], "\u4e8e": [23, 25], "\u8c03": [23, 25], "\u6574": [23, 25], "\u4e2d": [23, 25, 26], "\u4f46": [23, 25, 26], "\u56e0": [23, 25], "\u4e3a": [23, 25], "\u805a": [23, 25], "\u96c6": [23, 25], "\u4e86": [23, 25, 26], "\u8fc7": [23, 25], "\u591a": [23, 25], "\u516c": [23, 25], "\u5171": [23, 25], "\u8d44": [23, 25], "\u6e90": [23, 25], "371": 23, "683": 23, "651": [23, 39], "654": 23, "659": 23, "752": 23, "887": 23, "340": 23, "370": 23, "\u751a\u81f3": [23, 26], "\u51fa\u73b0": [23, 26], "\u4ea4\u6613": [23, 26], "\u51e0\u4e4e": [23, 26], "\u505c\u6b62": 23, "\u60c5\u51b5": [23, 26], "\u4e00\u4e8c": [23, 26], "\u57ce\u5e02": [23, 26], "\u867d\u7136": [23, 26], "\u5904\u4e8e": [23, 26], "\u8c03\u6574": [23, 26], "\u56e0\u4e3a": [23, 26], "\u805a\u96c6": [23, 26], "\u8fc7\u591a": [23, 26], "\u516c\u5171": [23, 26], "\u8d44\u6e90": [23, 26], "recor": [23, 28], "highest": [23, 28], "965": 23, "966": 23, "821": 23, "822": 23, "826": 23, "916": 23, "345": 23, "888": 23, "889": 23, "limit": [23, 25, 28, 41, 44], "upgrad": [23, 28], "pro": [23, 28], "finish": [23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 44, 45], "NOT": [23, 25, 28, 39], "checkout": [23, 28], "hlg_decod": [23, 28], "four": [23, 28], "messag": [23, 28, 31, 33, 34, 43, 44, 45], "nn_model": [23, 28], "use_gpu": [23, 28], "word_tabl": [23, 28], "caution": [23, 28], "forward": [23, 28, 33], "89": 23, "cu": [23, 28], "int": [23, 28], "char": [23, 28], "98": 23, "150": [23, 28], "693": [23, 36], "165": [23, 28], "nnet_output": [23, 28], "489": 23, "mandarin": 24, "beij": 24, "shell": 24, "technologi": 24, "ltd": 24, "peopl": 24, "accent": 24, "area": 24, "china": 24, "invit": 24, "particip": 24, "conduct": 24, "quiet": 24, "indoor": 24, "high": 24, "fidel": 24, "microphon": 24, "16khz": 24, "manual": 24, "through": 24, "profession": 24, "annot": 24, "inspect": 24, "free": [24, 29, 43], "academ": 24, "moder": 24, "research": 24, "field": 24, "openslr": 24, "ctc": [24, 27, 30, 34, 35, 38], "stateless": [24, 27, 31, 43, 44, 45], "head": [25, 41], "conv1d": [25, 31, 43, 44, 45], "nn": [25, 31, 33, 34, 43, 44, 45], "tanh": 25, "borrow": 25, "ieeexplor": 25, "ieee": 25, "stamp": 25, "jsp": 25, "arnumb": 25, "9054419": 25, "predict": [25, 29, 31, 43, 44, 45], "charact": 25, "unit": 25, "vocabulari": 25, "87939824": 25, "optimized_transduc": 25, "technqiu": 25, "end": [25, 31, 33, 34, 39, 43, 44, 45], "furthermor": 25, "maximum": 25, "emit": 25, "per": [25, 31, 44, 45], "simplifi": [25, 41], "significantli": 25, "degrad": 25, "exactli": 25, "unprun": 25, "advantag": 25, "minim": 25, "pruned_transducer_stateless": [25, 31, 41, 44], "altern": 25, "though": 25, "transducer_stateless_modifi": 25, "pr": 25, "gb": 25, "ram": 25, "tri": 25, "prob": [25, 43], "219": [25, 28], "c": [25, 26, 31, 33, 34, 39, 43, 44, 45], "lagz6hrcqxoigbfd5e0y3q": 25, "03t14": 25, "8477": 25, "250": [25, 32], "sym": [25, 31, 44, 45], "beam_search": [25, 31, 44, 45], "decoding_method": 25, "beam_4": 25, "ensur": 25, "give": 25, "poor": 25, "531": [25, 26], "994": [25, 28], "027": 25, "encoder_out_dim": 25, "f4fefe4882bc0ae59af951da3f47335d5495ef71": 25, "50d2281": 25, "mar": 25, "0815224919": 25, "75d558775b": 25, "mmnv8": 25, "72": [25, 28], "248": 25, "878": [25, 37], "880": 25, "891": 25, "113": [25, 28], "userwarn": 25, "__floordiv__": 25, "round": 25, "toward": 25, "trunc": 25, "floor": 25, "keep": [25, 31, 44, 45], "div": 25, "b": [25, 28, 36, 37], "rounding_mod": 25, "divis": 25, "x_len": 25, "163": [25, 28], "\u6ede": 25, "322": 25, "760": 25, "919": 25, "922": 25, "929": 25, "046": 25, "047": 25, "319": [25, 28], "798": 25, "214": [25, 28], "215": [25, 28, 32], "402": 25, "topk_hyp_index": 25, "topk_index": 25, "logit": 25, "583": [25, 37], "lji9mwuorlow3jkdhxwk8a": 26, "13t11": 26, "4454": 26, "icefall_asr_aishell_tdnn_lstm_ctc": 26, "858": [26, 28], "154": 26, "161": [26, 28], "536": 26, "539": 26, "917": 26, "129": 26, "\u505c\u6ede": 26, "mmi": [27, 30], "blank": [27, 30], "skip": [27, 29, 30, 31, 43, 44, 45], "distil": [27, 30], "hubert": [27, 30], "ligru": [27, 35], "full": [28, 29, 31, 33, 34, 43, 44, 45], "libri": [28, 29, 31, 33, 34, 43, 44, 45], "subset": [28, 31, 33, 34, 43, 44, 45], "3x960": [28, 31, 33, 34, 43, 44, 45], "2880": [28, 31, 33, 34, 43, 44, 45], "lzgnetjwrxc3yghnmd4kpw": 28, "24t16": 28, "4540": 28, "sentenc": 28, "piec": 28, "And": [28, 31, 33, 34, 43, 44, 45], "neither": 28, "nor": 28, "5000": 28, "033": 28, "537": 28, "538": 28, "full_libri": [28, 29], "406": 28, "464": 28, "548": 28, "776": 28, "652": [28, 39], "109226120": 28, "714": [28, 36], "206": 28, "944": 28, "1328": 28, "443": [28, 32], "2563": 28, "494": 28, "592": 28, "1715": 28, "52576": 28, "128": 28, "1424": 28, "807": 28, "506": 28, "808": [28, 36], "522": 28, "362": 28, "565": 28, "1477": 28, "2922": 28, "208": 28, "4295": 28, "52343": 28, "396": 28, "3584": 28, "432": 28, "433": 28, "680": [28, 36], "_pickl": 28, "unpicklingerror": 28, "hlg_modifi": 28, "g_4_gram": [28, 32, 36, 37], "875": [28, 32], "212k": 28, "267440": [28, 32], "1253": [28, 32], "535k": 28, "83": [28, 32], "77200": [28, 32], "154k": 28, "554": 28, "7178d67e594bc7fa89c2b331ad7bd1c62a6a9eb4": 28, "8d93169": 28, "601": 28, "758": 28, "025": 28, "broffel": 28, "osom": 28, "723": 28, "775": 28, "881": 28, "234": 28, "571": 28, "whole": [28, 32, 36, 37, 44, 45], "857": 28, "979": 28, "980": 28, "055": 28, "117": 28, "051": 28, "363": 28, "959": [28, 37], "546": 28, "599": [28, 32], "833": 28, "834": 28, "915": 28, "076": 28, "110": 28, "397": 28, "999": [28, 31, 44, 45], "concaten": 28, "bucket": 28, "sampler": 28, "1000": 28, "ctc_decod": 28, "ngram_lm_rescor": 28, "attention_rescor": 28, "kind": [28, 31, 33, 34, 43, 44, 45], "105": 28, "221": 28, "125": [28, 39], "136": 28, "228": 28, "144": 28, "543": 28, "topo": 28, "547": 28, "729": 28, "702": 28, "703": 28, "545": 28, "279": 28, "122": 28, "126": 28, "135": [28, 39], "153": [28, 39], "945": 28, "475": 28, "191": [28, 36, 37], "398": 28, "515": 28, "w": [28, 36, 37], "deseri": 28, "441": 28, "fsaclass": 28, "loadfsa": 28, "const": 28, "string": 28, "c10": 28, "ignor": 28, "dummi": 28, "589": 28, "attention_scal": 28, "162": 28, "169": [28, 36, 37], "188": 28, "984": 28, "624": 28, "519": [28, 37], "632": 28, "645": [28, 39], "243": 28, "970": 28, "303": 28, "179": 28, "knowledg": 29, "vector": 29, "mvq": 29, "kd": 29, "pruned_transducer_stateless4": [29, 31, 41, 44], "theoret": 29, "applic": 29, "minor": 29, "out": 29, "thing": 29, "distillation_with_hubert": 29, "Of": 29, "cours": 29, "xl": 29, "proce": 29, "960h": [29, 33], "use_extracted_codebook": 29, "augment": 29, "th": [29, 36, 37], "fine": 29, "embedding_lay": 29, "num_codebook": 29, "under": 29, "vq_fbank_layer36_cb8": 29, "whola": 29, "snippet": 29, "echo": 29, "awk": 29, "split": 29, "_": 29, "pruned_transducer_stateless6": 29, "12359": 29, "spec": 29, "aug": 29, "warp": 29, "enabl": 29, "paid": 29, "suitabl": [31, 43, 44, 45], "pruned_transducer_stateless2": [31, 41, 44], "pruned_transducer_stateless5": [31, 41, 44], "scroll": [31, 33, 34, 43, 44, 45], "arxiv": [31, 43, 44, 45], "ab": [31, 43, 44, 45], "2206": [31, 43, 44, 45], "13236": [31, 43, 44, 45], "rework": [31, 41, 44], "daniel": [31, 44, 45], "joint": [31, 43, 44, 45], "contrari": [31, 43, 44, 45], "convent": [31, 43, 44, 45], "recurr": [31, 43, 44, 45], "2x": [31, 44, 45], "littl": [31, 44], "436000": [31, 33, 34, 43, 44, 45], "438000": [31, 33, 34, 43, 44, 45], "qogspbgsr8kzcrmmie9jgw": 31, "20t15": [31, 43, 44], "4468": [31, 43, 44], "210171": [31, 43, 44], "access": [31, 33, 34, 43, 44, 45], "6008": [31, 33, 34, 43, 44, 45], "localhost": [31, 33, 34, 43, 44, 45], "expos": [31, 33, 34, 43, 44, 45], "proxi": [31, 33, 34, 43, 44, 45], "bind_al": [31, 33, 34, 43, 44, 45], "fast_beam_search": [31, 33, 43, 44, 45], "474000": [31, 43, 44, 45], "largest": [31, 44, 45], "posterior": [31, 33, 44, 45], "algorithm": [31, 44, 45], "pdf": [31, 34, 44, 45], "1211": [31, 44, 45], "3711": [31, 44, 45], "espnet": [31, 44, 45], "net": [31, 44, 45], "beam_search_transduc": [31, 44, 45], "basicli": [31, 44, 45], "topk": [31, 44, 45], "expand": [31, 44, 45], "mode": [31, 44, 45], "being": [31, 44, 45], "hardcod": [31, 44, 45], "composit": [31, 44, 45], "log_prob": [31, 44, 45], "hard": [31, 41, 44, 45], "2211": [31, 44, 45], "00484": [31, 44, 45], "fast_beam_search_lg": [31, 44, 45], "trivial": [31, 44, 45], "fast_beam_search_nbest": [31, 44, 45], "random_path": [31, 44, 45], "shortest": [31, 44, 45], "fast_beam_search_nbest_lg": [31, 44, 45], "logic": [31, 44, 45], "smallest": [31, 43, 44, 45], "icefall_asr_librispeech_tdnn": 32, "lstm_ctc": 32, "flac": 32, "116k": 32, "140k": 32, "343k": 32, "164k": 32, "105k": 32, "174k": 32, "pretraind": 32, "170": 32, "584": [32, 37], "209": 32, "245": 32, "098": 32, "099": 32, "methond": [32, 36, 37], "403": 32, "631": 32, "190": 32, "121": 32, "010": 32, "guidanc": 33, "bigger": 33, "simpli": 33, "discard": 33, "prevent": 33, "lconv": 33, "encourag": [33, 34, 43], "stabil": [33, 34], "doesn": 33, "warm": [33, 34], "xyozukpeqm62hbilud4upa": [33, 34], "ctc_guide_decode_b": 33, "pretrained_ctc": 33, "jit_pretrained_ctc": 33, "100h": 33, "yfyeung": 33, "wechat": 34, "zipformer_mmi": 34, "worker": [34, 43], "hp": 34, "tdnn_ligru_ctc": 36, "enough": [36, 37, 39], "luomingshuang": [36, 37], "icefall_asr_timit_tdnn_ligru_ctc": 36, "pretrained_average_9_25": 36, "fdhc0_si1559": [36, 37], "felc0_si756": [36, 37], "fmgd0_si1564": [36, 37], "ffprobe": [36, 37], "show_format": [36, 37], "nistspher": [36, 37], "database_id": [36, 37], "database_vers": [36, 37], "utterance_id": [36, 37], "dhc0_si1559": [36, 37], "sample_min": [36, 37], "4176": [36, 37], "sample_max": [36, 37], "5984": [36, 37], "bitrat": [36, 37], "258": [36, 37], "audio": [36, 37], "pcm_s16le": [36, 37], "s16": [36, 37], "elc0_si756": [36, 37], "1546": [36, 37], "1989": [36, 37], "mgd0_si1564": [36, 37], "7626": [36, 37], "10573": [36, 37], "660": 36, "695": 36, "697": 36, "819": 36, "829": 36, "sil": [36, 37], "dh": [36, 37], "ih": [36, 37], "uw": [36, 37], "ah": [36, 37], "ii": [36, 37], "z": [36, 37], "aa": [36, 37], "ei": [36, 37], "dx": [36, 37], "d": [36, 37], "uh": [36, 37], "ng": [36, 37], "eh": [36, 37], "jh": [36, 37], "er": [36, 37], "ai": [36, 37], "hh": [36, 37], "aw": 36, "ae": [36, 37], "705": 36, "715": 36, "720": 36, "ch": 36, "icefall_asr_timit_tdnn_lstm_ctc": 37, "pretrained_average_16_25": 37, "816": 37, "827": 37, "387": 37, "unk": 37, "739": 37, "971": 37, "977": 37, "978": 37, "981": 37, "ow": 37, "ykubhb5wrmosxykid1z9eg": 39, "23t23": 39, "icefall_asr_yesno_tdnn": 39, "l_disambig": 39, "lexicon_disambig": 39, "0_0_0_1_0_0_0_1": 39, "0_0_1_0_0_0_1_0": 39, "0_0_1_0_0_1_1_1": 39, "0_0_1_0_1_0_0_1": 39, "0_0_1_1_0_0_0_1": 39, "0_0_1_1_0_1_1_0": 39, "0_0_1_1_1_0_0_0": 39, "0_0_1_1_1_1_0_0": 39, "0_1_0_0_0_1_0_0": 39, "0_1_0_0_1_0_1_0": 39, "0_1_0_1_0_0_0_0": 39, "0_1_0_1_1_1_0_0": 39, "0_1_1_0_0_1_1_1": 39, "0_1_1_1_0_0_1_0": 39, "0_1_1_1_1_0_1_0": 39, "1_0_0_0_0_0_0_0": 39, "1_0_0_0_0_0_1_1": 39, "1_0_0_1_0_1_1_1": 39, "1_0_1_1_0_1_1_1": 39, "1_0_1_1_1_1_0_1": 39, "1_1_0_0_0_1_1_1": 39, "1_1_0_0_1_0_1_1": 39, "1_1_0_1_0_1_0_0": 39, "1_1_0_1_1_0_0_1": 39, "1_1_0_1_1_1_1_0": 39, "1_1_1_0_0_1_0_1": 39, "1_1_1_0_1_0_1_0": 39, "1_1_1_1_0_0_1_0": 39, "1_1_1_1_1_0_0_0": 39, "1_1_1_1_1_1_1_1": 39, "54080": 39, "507": 39, "108k": 39, "ye": 39, "hebrew": 39, "NO": 39, "621": 39, "119": 39, "650": 39, "139": 39, "143": 39, "198": 39, "181": 39, "186": 39, "187": 39, "213": 39, "correctli": 39, "simplest": 39, "former": 41, "idea": 41, "mask": [41, 44, 45], "wenet": 41, "did": 41, "metion": 41, "complic": 41, "techniqu": 41, "bank": 41, "memor": 41, "histori": 41, "introduc": 41, "variant": 41, "pruned_stateless_emformer_rnnt2": 41, "conv_emformer_transducer_stateless": 41, "ourself": 41, "mechan": 41, "onlin": 43, "lstm_transducer_stateless": 43, "lower": 43, "prepare_giga_speech": 43, "cj2vtpiwqhkn9q1tx6ptpg": 43, "dynam": [44, 45], "causal": 44, "short": [44, 45], "2012": 44, "05481": 44, "flag": 44, "indic": [44, 45], "whether": 44, "sequenc": [44, 45], "uniformli": [44, 45], "seen": [44, 45], "97vkxf80ru61cnp2alwzzg": 44, "streaming_decod": [44, 45], "wise": [44, 45], "parallel": [44, 45], "bath": [44, 45], "parallelli": [44, 45], "seem": 44, "benefit": 44, "mdoel": 44, "320m": 45, "550": 45, "scriptmodul": 45, "jit_trace_export": 45, "jit_trace_pretrain": 45, "task": 46}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"follow": 0, "code": 0, "style": 0, "contribut": [1, 3], "document": 1, "how": [2, 14, 20, 21], "creat": [2, 13], "recip": [2, 46], "data": [2, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "prepar": [2, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "train": [2, 10, 13, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "decod": [2, 5, 6, 7, 13, 14, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "pre": [2, 10, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "model": [2, 5, 10, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "lodr": [4, 6], "rnn": 4, "transduc": [4, 6, 7, 16, 17, 18, 25, 31, 43, 44, 45], "wer": [4, 6, 7, 28], "differ": [4, 6, 7], "beam": [4, 6, 7, 25], "size": [4, 6, 7], "languag": 5, "lm": [6, 28], "rescor": [6, 23, 28], "base": 6, "method": 6, "v": 6, "shallow": [6, 7], "fusion": [6, 7], "The": [6, 25], "number": 6, "each": 6, "field": 6, "i": 6, "test": [6, 7, 13, 16, 17, 18], "clean": [6, 7], "other": 6, "time": [6, 7], "frequent": 8, "ask": 8, "question": 8, "faq": 8, "oserror": 8, "libtorch_hip": 8, "so": 8, "cannot": 8, "open": 8, "share": 8, "object": 8, "file": [8, 19], "directori": 8, "attributeerror": 8, "modul": 8, "distutil": 8, "ha": 8, "attribut": 8, "version": 8, "importerror": 8, "libpython3": 8, "10": 8, "1": [8, 13, 16, 17, 18, 23, 25, 26, 28], "0": [8, 13], "No": 8, "huggingfac": [9, 11], "space": 11, "youtub": [11, 13], "video": [11, 13], "icefal": [12, 13, 16, 17, 18], "content": [12, 46], "instal": [13, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37], "cuda": 13, "toolkit": 13, "cudnn": 13, "pytorch": 13, "torchaudio": 13, "2": [13, 16, 17, 18, 23, 25, 26, 28], "k2": 13, "3": [13, 16, 17, 18, 23, 25, 28], "lhots": 13, "4": [13, 16, 17, 18], "download": [13, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "exampl": [13, 19, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "virtual": 13, "environ": 13, "activ": 13, "your": 13, "5": [13, 16, 17, 18], "export": [14, 15, 16, 17, 18, 19, 20, 21, 22, 31, 33, 34, 43, 44, 45], "state_dict": [14, 31, 33, 34, 43, 44, 45], "when": [14, 20, 21], "us": [14, 20, 21, 31, 33, 34, 43, 44, 45], "run": 14, "py": 14, "ncnn": [15, 16, 17, 18], "convemform": 16, "pnnx": [16, 17, 18], "via": [16, 17, 18], "torch": [16, 17, 18, 20, 21, 31, 33, 34, 43, 44, 45], "jit": [16, 17, 18, 20, 21, 31, 33, 34, 43, 44, 45], "trace": [16, 17, 18, 21, 43, 45], "torchscript": [16, 17, 18], "6": [16, 17, 18], "modifi": [16, 17, 18, 25], "encod": [16, 17, 18], "sherpa": [16, 17, 18, 19, 31, 44, 45], "7": [16, 17], "option": [16, 17, 23, 26, 28, 31, 33, 34, 43, 44, 45], "int8": [16, 17], "quantiz": [16, 17], "lstm": [17, 26, 32, 37, 43], "stream": [18, 27, 40, 41, 44, 45], "zipform": [18, 33, 34, 45], "onnx": 19, "sound": 19, "script": [20, 31, 33, 34, 44, 45], "conform": [23, 28, 41], "ctc": [23, 26, 28, 32, 33, 36, 37, 39], "configur": [23, 26, 28, 31, 33, 34, 43, 44, 45], "log": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "usag": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "case": [23, 25, 26, 28], "kaldifeat": [23, 25, 26, 28, 32, 36, 37, 39], "hlg": [23, 26, 28], "attent": [23, 28], "colab": [23, 25, 26, 28, 32, 36, 37, 39], "notebook": [23, 25, 26, 28, 32, 36, 37, 39], "deploy": [23, 28], "c": [23, 28], "aishel": 24, "stateless": 25, "loss": 25, "todo": 25, "greedi": 25, "search": 25, "tdnn": [26, 32, 36, 37, 39], "non": 27, "asr": [27, 40], "comput": 28, "n": 28, "gram": 28, "distil": 29, "hubert": 29, "codebook": 29, "index": 29, "librispeech": [30, 42], "prune": [31, 44], "statelessx": [31, 44], "pretrain": [31, 33, 34, 43, 44, 45], "deploi": [31, 44, 45], "infer": [32, 36, 37, 39], "blank": 33, "skip": 33, "mmi": 34, "timit": 35, "ligru": 36, "yesno": 38, "introduct": 41, "emform": 41, "which": 43, "simul": [44, 45], "real": [44, 45], "tabl": 46}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.todo": 2, "sphinx": 57}, "alltitles": {"Follow the code style": [[0, "follow-the-code-style"]], "Contributing to Documentation": [[1, "contributing-to-documentation"]], "How to create a recipe": [[2, "how-to-create-a-recipe"]], "Data Preparation": [[2, "data-preparation"], [25, "data-preparation"]], "Training": [[2, "training"], [13, "training"], [23, "training"], [25, "training"], [26, "training"], [28, "training"], [29, "training"], [31, "training"], [32, "training"], [33, "training"], [34, "training"], [36, "training"], [37, "training"], [39, "training"], [43, "training"], [44, "training"], [45, "training"]], "Decoding": [[2, "decoding"], [13, "decoding"], [23, "decoding"], [25, "decoding"], [26, "decoding"], [28, "decoding"], [29, "decoding"], [31, "decoding"], [32, "decoding"], [33, "decoding"], [34, "decoding"], [36, "decoding"], [37, "decoding"], [39, "decoding"], [43, "decoding"], [44, "decoding"], [45, "decoding"]], "Pre-trained model": [[2, "pre-trained-model"]], "Contributing": [[3, "contributing"]], "LODR for RNN Transducer": [[4, "lodr-for-rnn-transducer"]], "WER of LODR with different beam sizes": [[4, "id1"]], "Decoding with language models": [[5, "decoding-with-language-models"]], "LM rescoring for Transducer": [[6, "lm-rescoring-for-transducer"]], "WERs of LM rescoring with different beam sizes": [[6, "id1"]], "WERs of LM rescoring + LODR with different beam sizes": [[6, "id2"]], "LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)": [[6, "id3"]], "Shallow fusion for Transducer": [[7, "shallow-fusion-for-transducer"]], "WERs and decoding time (on test-clean) of shallow fusion with different beam sizes": [[7, "id2"]], "Frequently Asked Questions (FAQs)": [[8, "frequently-asked-questions-faqs"]], "OSError: libtorch_hip.so: cannot open shared object file: no such file or directory": [[8, "oserror-libtorch-hip-so-cannot-open-shared-object-file-no-such-file-or-directory"]], "AttributeError: module \u2018distutils\u2019 has no attribute \u2018version\u2019": [[8, "attributeerror-module-distutils-has-no-attribute-version"]], "ImportError: libpython3.10.so.1.0: cannot open shared object file: No such file or directory": [[8, "importerror-libpython3-10-so-1-0-cannot-open-shared-object-file-no-such-file-or-directory"]], "Huggingface": [[9, "huggingface"]], "Pre-trained models": [[10, "pre-trained-models"]], "Huggingface spaces": [[11, "huggingface-spaces"]], "YouTube Video": [[11, "youtube-video"], [13, "youtube-video"]], "Icefall": [[12, "icefall"]], "Contents:": [[12, null]], "Installation": [[13, "installation"]], "(0) Install CUDA toolkit and cuDNN": [[13, "install-cuda-toolkit-and-cudnn"]], "(1) Install PyTorch and torchaudio": [[13, "install-pytorch-and-torchaudio"]], "(2) Install k2": [[13, "install-k2"]], "(3) Install lhotse": [[13, "install-lhotse"]], "(4) Download icefall": [[13, "download-icefall"]], "Installation example": [[13, "installation-example"]], "(1) Create a virtual environment": [[13, "create-a-virtual-environment"]], "(2) Activate your virtual environment": [[13, "activate-your-virtual-environment"]], "(3) Install k2": [[13, "id1"]], "(4) Install lhotse": [[13, "id2"]], "(5) Download icefall": [[13, "id3"]], "Test Your Installation": [[13, "test-your-installation"]], "Data preparation": [[13, "data-preparation"], [23, "data-preparation"], [26, "data-preparation"], [28, "data-preparation"], [29, "data-preparation"], [31, "data-preparation"], [32, "data-preparation"], [33, "data-preparation"], [34, "data-preparation"], [36, "data-preparation"], [37, "data-preparation"], [39, "data-preparation"], [43, "data-preparation"], [44, "data-preparation"], [45, "data-preparation"]], "Export model.state_dict()": [[14, "export-model-state-dict"], [31, "export-model-state-dict"], [33, "export-model-state-dict"], [34, "export-model-state-dict"], [43, "export-model-state-dict"], [44, "export-model-state-dict"], [45, "export-model-state-dict"]], "When to use it": [[14, "when-to-use-it"], [20, "when-to-use-it"], [21, "when-to-use-it"]], "How to export": [[14, "how-to-export"], [20, "how-to-export"], [21, "how-to-export"]], "How to use the exported model": [[14, "how-to-use-the-exported-model"], [20, "how-to-use-the-exported-model"]], "Use the exported model to run decode.py": [[14, "use-the-exported-model-to-run-decode-py"]], "Export to ncnn": [[15, "export-to-ncnn"]], "Export ConvEmformer transducer models to ncnn": [[16, "export-convemformer-transducer-models-to-ncnn"]], "1. Download the pre-trained model": [[16, "download-the-pre-trained-model"], [17, "download-the-pre-trained-model"], [18, "download-the-pre-trained-model"]], "2. Install ncnn and pnnx": [[16, "install-ncnn-and-pnnx"], [17, "install-ncnn-and-pnnx"], [18, "install-ncnn-and-pnnx"]], "3. Export the model via torch.jit.trace()": [[16, "export-the-model-via-torch-jit-trace"], [17, "export-the-model-via-torch-jit-trace"], [18, "export-the-model-via-torch-jit-trace"]], "4. Export torchscript model via pnnx": [[16, "export-torchscript-model-via-pnnx"], [17, "export-torchscript-model-via-pnnx"], [18, "export-torchscript-model-via-pnnx"]], "5. Test the exported models in icefall": [[16, "test-the-exported-models-in-icefall"], [17, "test-the-exported-models-in-icefall"], [18, "test-the-exported-models-in-icefall"]], "6. Modify the exported encoder for sherpa-ncnn": [[16, "modify-the-exported-encoder-for-sherpa-ncnn"], [17, "modify-the-exported-encoder-for-sherpa-ncnn"], [18, "modify-the-exported-encoder-for-sherpa-ncnn"]], "7. (Optional) int8 quantization with sherpa-ncnn": [[16, "optional-int8-quantization-with-sherpa-ncnn"], [17, "optional-int8-quantization-with-sherpa-ncnn"]], "Export LSTM transducer models to ncnn": [[17, "export-lstm-transducer-models-to-ncnn"]], "Export streaming Zipformer transducer models to ncnn": [[18, "export-streaming-zipformer-transducer-models-to-ncnn"]], "Export to ONNX": [[19, "export-to-onnx"]], "sherpa-onnx": [[19, "sherpa-onnx"]], "Example": [[19, "example"]], "Download the pre-trained model": [[19, "download-the-pre-trained-model"], [23, "download-the-pre-trained-model"], [25, "download-the-pre-trained-model"], [26, "download-the-pre-trained-model"], [28, "download-the-pre-trained-model"], [32, "download-the-pre-trained-model"], [36, "download-the-pre-trained-model"], [37, "download-the-pre-trained-model"], [39, "download-the-pre-trained-model"]], "Export the model to ONNX": [[19, "export-the-model-to-onnx"]], "Decode sound files with exported ONNX models": [[19, "decode-sound-files-with-exported-onnx-models"]], "Export model with torch.jit.script()": [[20, "export-model-with-torch-jit-script"]], "Export model with torch.jit.trace()": [[21, "export-model-with-torch-jit-trace"]], "How to use the exported models": [[21, "how-to-use-the-exported-models"]], "Model export": [[22, "model-export"]], "Conformer CTC": [[23, "conformer-ctc"], [28, "conformer-ctc"]], "Configurable options": [[23, "configurable-options"], [26, "configurable-options"], [28, "configurable-options"], [31, "configurable-options"], [33, "configurable-options"], [34, "configurable-options"], [43, "configurable-options"], [44, "configurable-options"], [45, "configurable-options"]], "Pre-configured options": [[23, "pre-configured-options"], [26, "pre-configured-options"], [28, "pre-configured-options"], [31, "pre-configured-options"], [33, "pre-configured-options"], [34, "pre-configured-options"], [43, "pre-configured-options"], [44, "pre-configured-options"], [45, "pre-configured-options"]], "Training logs": [[23, "training-logs"], [25, "training-logs"], [26, "training-logs"], [28, "training-logs"], [31, "training-logs"], [33, "training-logs"], [34, "training-logs"], [43, "training-logs"], [44, "training-logs"], [45, "training-logs"]], "Usage examples": [[23, "usage-examples"], [25, "usage-examples"], [26, "usage-examples"], [28, "usage-examples"]], "Case 1": [[23, "case-1"], [25, "case-1"], [26, "case-1"], [28, "case-1"]], "Case 2": [[23, "case-2"], [25, "case-2"], [26, "case-2"], [28, "case-2"]], "Case 3": [[23, "case-3"], [25, "case-3"], [28, "case-3"]], "Pre-trained Model": [[23, "pre-trained-model"], [25, "pre-trained-model"], [26, "pre-trained-model"], [28, "pre-trained-model"], [32, "pre-trained-model"], [36, "pre-trained-model"], [37, "pre-trained-model"], [39, "pre-trained-model"]], "Install kaldifeat": [[23, "install-kaldifeat"], [25, "install-kaldifeat"], [26, "install-kaldifeat"], [28, "install-kaldifeat"], [32, "install-kaldifeat"], [36, "install-kaldifeat"], [37, "install-kaldifeat"]], "Usage": [[23, "usage"], [25, "usage"], [26, "usage"], [28, "usage"]], "CTC decoding": [[23, "ctc-decoding"], [28, "ctc-decoding"], [28, "id2"]], "HLG decoding": [[23, "hlg-decoding"], [23, "id2"], [26, "hlg-decoding"], [28, "hlg-decoding"], [28, "id3"]], "HLG decoding + attention decoder rescoring": [[23, "hlg-decoding-attention-decoder-rescoring"]], "Colab notebook": [[23, "colab-notebook"], [25, "colab-notebook"], [26, "colab-notebook"], [28, "colab-notebook"], [32, "colab-notebook"], [36, "colab-notebook"], [37, "colab-notebook"], [39, "colab-notebook"]], "Deployment with C++": [[23, "deployment-with-c"], [28, "deployment-with-c"]], "aishell": [[24, "aishell"]], "Stateless Transducer": [[25, "stateless-transducer"]], "The Model": [[25, "the-model"]], "The Loss": [[25, "the-loss"]], "Todo": [[25, "id1"]], "Greedy search": [[25, "greedy-search"]], "Beam search": [[25, "beam-search"]], "Modified Beam search": [[25, "modified-beam-search"]], "TDNN-LSTM CTC": [[26, "tdnn-lstm-ctc"]], "Non Streaming ASR": [[27, "non-streaming-asr"]], "HLG decoding + LM rescoring": [[28, "hlg-decoding-lm-rescoring"]], "HLG decoding + LM rescoring + attention decoder rescoring": [[28, "hlg-decoding-lm-rescoring-attention-decoder-rescoring"]], "Compute WER with the pre-trained model": [[28, "compute-wer-with-the-pre-trained-model"]], "HLG decoding + n-gram LM rescoring": [[28, "hlg-decoding-n-gram-lm-rescoring"]], "HLG decoding + n-gram LM rescoring + attention decoder rescoring": [[28, "hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring"]], "Distillation with HuBERT": [[29, "distillation-with-hubert"]], "Codebook index preparation": [[29, "codebook-index-preparation"]], "LibriSpeech": [[30, "librispeech"], [42, "librispeech"]], "Pruned transducer statelessX": [[31, "pruned-transducer-statelessx"], [44, "pruned-transducer-statelessx"]], "Usage example": [[31, "usage-example"], [33, "usage-example"], [34, "usage-example"], [43, "usage-example"], [44, "usage-example"], [45, "usage-example"]], "Export Model": [[31, "export-model"], [44, "export-model"], [45, "export-model"]], "Export model using torch.jit.script()": [[31, "export-model-using-torch-jit-script"], [33, "export-model-using-torch-jit-script"], [34, "export-model-using-torch-jit-script"], [44, "export-model-using-torch-jit-script"], [45, "export-model-using-torch-jit-script"]], "Download pretrained models": [[31, "download-pretrained-models"], [33, "download-pretrained-models"], [34, "download-pretrained-models"], [43, "download-pretrained-models"], [44, "download-pretrained-models"], [45, "download-pretrained-models"]], "Deploy with Sherpa": [[31, "deploy-with-sherpa"], [44, "deploy-with-sherpa"], [45, "deploy-with-sherpa"]], "TDNN-LSTM-CTC": [[32, "tdnn-lstm-ctc"], [37, "tdnn-lstm-ctc"]], "Inference with a pre-trained model": [[32, "inference-with-a-pre-trained-model"], [36, "inference-with-a-pre-trained-model"], [37, "inference-with-a-pre-trained-model"], [39, "inference-with-a-pre-trained-model"]], "Zipformer CTC Blank Skip": [[33, "zipformer-ctc-blank-skip"]], "Export models": [[33, "export-models"], [34, "export-models"], [43, "export-models"]], "Zipformer MMI": [[34, "zipformer-mmi"]], "TIMIT": [[35, "timit"]], "TDNN-LiGRU-CTC": [[36, "tdnn-ligru-ctc"]], "YesNo": [[38, "yesno"]], "TDNN-CTC": [[39, "tdnn-ctc"]], "Download kaldifeat": [[39, "download-kaldifeat"]], "Streaming ASR": [[40, "streaming-asr"]], "Introduction": [[41, "introduction"]], "Streaming Conformer": [[41, "streaming-conformer"]], "Streaming Emformer": [[41, "streaming-emformer"]], "LSTM Transducer": [[43, "lstm-transducer"]], "Which model to use": [[43, "which-model-to-use"]], "Export model using torch.jit.trace()": [[43, "export-model-using-torch-jit-trace"], [45, "export-model-using-torch-jit-trace"]], "Simulate streaming decoding": [[44, "simulate-streaming-decoding"], [45, "simulate-streaming-decoding"]], "Real streaming decoding": [[44, "real-streaming-decoding"], [45, "real-streaming-decoding"]], "Zipformer Transducer": [[45, "zipformer-transducer"]], "Recipes": [[46, "recipes"]], "Table of Contents": [[46, null]]}, "indexentries": {}})
\ No newline at end of file