diff --git a/_sources/decoding-with-langugage-models/LODR.rst.txt b/_sources/decoding-with-langugage-models/LODR.rst.txt new file mode 100644 index 000000000..7ffa0c128 --- /dev/null +++ b/_sources/decoding-with-langugage-models/LODR.rst.txt @@ -0,0 +1,184 @@ +.. _LODR: + +LODR for RNN Transducer +======================= + + +As a type of E2E model, neural transducers are usually considered as having an internal +language model, which learns the language level information on the training corpus. +In real-life scenario, there is often a mismatch between the training corpus and the target corpus space. +This mismatch can be a problem when decoding for neural transducer models with language models as its internal +language can act "against" the external LM. In this tutorial, we show how to use +`Low-order Density Ratio `_ to alleviate this effect to further improve the performance +of langugae model integration. + +.. note:: + + This tutorial is based on the recipe + `pruned_transducer_stateless7_streaming `_, + which is a streaming transducer model trained on `LibriSpeech`_. + However, you can easily apply LODR to other recipes. + If you encounter any problems, please open an issue here `icefall `__. + + +.. note:: + + For simplicity, the training and testing corpus in this tutorial are the same (`LibriSpeech`_). However, + you can change the testing set to any other domains (e.g `GigaSpeech`_) and prepare the language models + using that corpus. + +First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here `_ +to address the language information mismatch between the training +corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain +are acoustically similar, DR derives the following formular for decoding with Bayes' theorem: + +.. math:: + + \text{score}\left(y_u|\mathit{x},y\right) = + \log p\left(y_u|\mathit{x},y_{1:u-1}\right) + + \lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - + \lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) + + +where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively. +Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to +shallow fusion is the subtraction of the source domain LM. + +Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is +considered to be weak and can only capture low-level language information. Therefore, `LODR `__ proposed to use +a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula +during decoding for transducer model: + +.. math:: + + \text{score}\left(y_u|\mathit{x},y\right) = + \log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) + + \lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - + \lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right) + +In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Comared to DR, +the only difference lies in the choice of source domain LM. According to the original `paper `_, +LODR achieves similar performance compared DR in both intra-domain and cross-domain settings. +As a bi-gram is much faster to evaluate, LODR is usually much faster. + +Now, we will show you how to use LODR in ``icefall``. +For illustration purpose, we will use a pre-trained ASR model from this `link `_. +If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`. +The testing scenario here is intra-domain (we decode the model trained on `LibriSpeech`_ on `LibriSpeech`_ testing sets). + +As the initial step, let's download the pre-trained model. + +.. code-block:: bash + + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29 + $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ git lfs pull --include "pretrained.pt" + $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded + +To test the model, let's have a look at the decoding results **without** using LM. This can be done via the following command: + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/ + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --exp-dir $exp_dir \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search + +The following WERs are achieved on test-clean and test-other: + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 3.11 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 7.93 best for test-other + +Then, we download the external language model and bi-gram LM that are necessary for LODR. +Note that the bi-gram is estimated on the LibriSpeech 960 hours' text. + +.. code-block:: bash + + $ # download the external LM + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm + $ # create a symbolic link so that the checkpoint can be loaded + $ pushd icefall-librispeech-rnn-lm/exp + $ git lfs pull --include "pretrained.pt" + $ ln -s pretrained.pt epoch-99.pt + $ popd + $ + $ # download the bi-gram + $ git lfs install + $ git clone https://huggingface.co/marcoyang/librispeech_bigram + $ pushd data/lang_bpe_500 + $ ln -s ../../librispeech_bigram/2gram.fst.txt . + $ popd + +Then, we perform LODR decoding by setting ``--decoding-method`` to ``modified_beam_search_lm_LODR``: + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ lm_dir=./icefall-librispeech-rnn-lm/exp + $ lm_scale=0.42 + $ LODR_scale=-0.24 + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --beam-size 4 \ + --exp-dir $exp_dir \ + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search_lm_LODR \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --use-shallow-fusion 1 \ + --lm-type rnn \ + --lm-exp-dir $lm_dir \ + --lm-epoch 99 \ + --lm-scale $lm_scale \ + --lm-avg 1 \ + --rnn-lm-embedding-dim 2048 \ + --rnn-lm-hidden-dim 2048 \ + --rnn-lm-num-layers 3 \ + --lm-vocab-size 500 \ + --tokens-ngram 2 \ + --ngram-lm-scale $LODR_scale + +There are two extra arguments that need to be given when doing LODR. ``--tokens-ngram`` specifies the order of n-gram. As we +are using a bi-gram, we set it to 2. ``--ngram-lm-scale`` is the scale of the bi-gram, it should be a negative number +as we are subtracting the bi-gram's score during decoding. + +The decoding results obtained with the above command are shown below: + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 2.61 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 6.74 best for test-other + +Recall that the lowest WER we obtained in :ref:`shallow_fusion` with beam size of 4 is ``2.77/7.08``, LODR +indeed **further improves** the WER. We can do even better if we increase ``--beam-size``: + +.. list-table:: WER of LODR with different beam sizes + :widths: 25 25 50 + :header-rows: 1 + + * - Beam size + - test-clean + - test-other + * - 4 + - 2.61 + - 6.74 + * - 8 + - 2.45 + - 6.38 + * - 12 + - 2.4 + - 6.23 \ No newline at end of file diff --git a/_sources/decoding-with-langugage-models/index.rst.txt b/_sources/decoding-with-langugage-models/index.rst.txt new file mode 100644 index 000000000..577ebbdfb --- /dev/null +++ b/_sources/decoding-with-langugage-models/index.rst.txt @@ -0,0 +1,12 @@ +Decoding with language models +============================= + +This section describes how to use external langugage models +during decoding to improve the WER of transducer models. + +.. toctree:: + :maxdepth: 2 + + shallow-fusion + LODR + rescoring diff --git a/_sources/decoding-with-langugage-models/rescoring.rst.txt b/_sources/decoding-with-langugage-models/rescoring.rst.txt new file mode 100644 index 000000000..d71acc1e5 --- /dev/null +++ b/_sources/decoding-with-langugage-models/rescoring.rst.txt @@ -0,0 +1,252 @@ +.. _rescoring: + +LM rescoring for Transducer +================================= + +LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based +methods (see :ref:`shallow-fusion`, :ref:`LODR`), rescoring is usually performed to re-rank the n-best hypotheses after beam search. +Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM. +In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in +`icefall `__. + +.. note:: + + This tutorial is based on the recipe + `pruned_transducer_stateless7_streaming `_, + which is a streaming transducer model trained on `LibriSpeech`_. + However, you can easily apply shallow fusion to other recipes. + If you encounter any problems, please open an issue `here `_. + +.. note:: + + For simplicity, the training and testing corpus in this tutorial is the same (`LibriSpeech`_). However, you can change the testing set + to any other domains (e.g `GigaSpeech`_) and use an external LM trained on that domain. + +.. HINT:: + + We recommend you to use a GPU for decoding. + +For illustration purpose, we will use a pre-trained ASR model from this `link `__. +If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`. + +As the initial step, let's download the pre-trained model. + +.. code-block:: bash + + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29 + $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ git lfs pull --include "pretrained.pt" + $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded + +As usual, we first test the model's performance without external LM. This can be done via the following command: + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/ + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --exp-dir $exp_dir \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search + +The following WERs are achieved on test-clean and test-other: + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 3.11 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 7.93 best for test-other + +Now, we will try to improve the above WER numbers via external LM rescoring. We will download +a pre-trained LM from this `link `__. + +.. note:: + + This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus. + You may also train a RNN LM from scratch. Please refer to this `script `__ + for training a RNN LM and this `script `__ to train a transformer LM. + +.. code-block:: bash + + $ # download the external LM + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm + $ # create a symbolic link so that the checkpoint can be loaded + $ pushd icefall-librispeech-rnn-lm/exp + $ git lfs pull --include "pretrained.pt" + $ ln -s pretrained.pt epoch-99.pt + $ popd + + +With the RNNLM available, we can rescore the n-best hypotheses generated from `modified_beam_search`. Here, +`n` should be the number of beams, i.e ``--beam-size``. The command for LM rescoring is +as follows. Note that the ``--decoding-method`` is set to `modified_beam_search_lm_rescore` and ``--use-shallow-fusion`` +is set to `False`. + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ lm_dir=./icefall-librispeech-rnn-lm/exp + $ lm_scale=0.43 + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --beam-size 4 \ + --exp-dir $exp_dir \ + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search_lm_rescore \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --use-shallow-fusion 0 \ + --lm-type rnn \ + --lm-exp-dir $lm_dir \ + --lm-epoch 99 \ + --lm-scale $lm_scale \ + --lm-avg 1 \ + --rnn-lm-embedding-dim 2048 \ + --rnn-lm-hidden-dim 2048 \ + --rnn-lm-num-layers 3 \ + --lm-vocab-size 500 + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 2.93 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 7.6 best for test-other + +Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance, +see the following table: + +.. list-table:: WERs of LM rescoring with different beam sizes + :widths: 25 25 25 + :header-rows: 1 + + * - Beam size + - test-clean + - test-other + * - 4 + - 2.93 + - 7.6 + * - 8 + - 2.67 + - 7.11 + * - 12 + - 2.59 + - 6.86 + +In fact, we can also apply LODR (see :ref:`LODR`) when doing LM rescoring. To do so, we need to +download the bi-gram required by LODR: + +.. code-block:: bash + + $ # download the bi-gram + $ git lfs install + $ git clone https://huggingface.co/marcoyang/librispeech_bigram + $ pushd data/lang_bpe_500 + $ ln -s ../../librispeech_bigram/2gram.arpa . + $ popd + +Then we can performn LM rescoring + LODR by changing the decoding method to `modified_beam_search_lm_rescore_LODR`. + +.. note:: + + This decoding method requires the dependency of `kenlm `_. You can install it + via this command: `pip install https://github.com/kpu/kenlm/archive/master.zip`. + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ lm_dir=./icefall-librispeech-rnn-lm/exp + $ lm_scale=0.43 + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --beam-size 4 \ + --exp-dir $exp_dir \ + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search_lm_rescore_LODR \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --use-shallow-fusion 0 \ + --lm-type rnn \ + --lm-exp-dir $lm_dir \ + --lm-epoch 99 \ + --lm-scale $lm_scale \ + --lm-avg 1 \ + --rnn-lm-embedding-dim 2048 \ + --rnn-lm-hidden-dim 2048 \ + --rnn-lm-num-layers 3 \ + --lm-vocab-size 500 + +You should see the following WERs after executing the commands above: + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 2.9 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 7.57 best for test-other + +It's slightly better than LM rescoring. If we further increase the beam size, we will see +further improvements from LM rescoring + LODR: + +.. list-table:: WERs of LM rescoring + LODR with different beam sizes + :widths: 25 25 25 + :header-rows: 1 + + * - Beam size + - test-clean + - test-other + * - 4 + - 2.9 + - 7.57 + * - 8 + - 2.63 + - 7.04 + * - 12 + - 2.52 + - 6.73 + +As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods. +Here, we benchmark the WERs and decoding speed of them: + +.. list-table:: LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean) + :widths: 25 25 25 25 + :header-rows: 1 + + * - Decoding method + - beam=4 + - beam=8 + - beam=12 + * - `modified_beam_search` + - 3.11/7.93; 132s + - 3.1/7.95; 177s + - 3.1/7.96; 210s + * - `modified_beam_search_lm_shallow_fusion` + - 2.77/7.08; 262s + - 2.62/6.65; 352s + - 2.58/6.65; 488s + * - LODR + - 2.61/6.74; 400s + - 2.45/6.38; 610s + - 2.4/6.23; 870s + * - `modified_beam_search_lm_rescore` + - 2.93/7.6; 156s + - 2.67/7.11; 203s + - 2.59/6.86; 255s + * - `modified_beam_search_lm_rescore_LODR` + - 2.9/7.57; 160s + - 2.63/7.04; 203s + - 2.52/6.73; 263s + +.. note:: + + Decoding is performed with a single 32G V100, we set ``--max-duration`` to 600. + Decoding time here is only for reference and it may vary. \ No newline at end of file diff --git a/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt b/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt new file mode 100644 index 000000000..0d2837372 --- /dev/null +++ b/_sources/decoding-with-langugage-models/shallow-fusion.rst.txt @@ -0,0 +1,176 @@ +.. _shallow_fusion: + +Shallow fusion for Transducer +================================= + +External language models (LM) are commonly used to improve WERs for E2E ASR models. +This tutorial shows you how to perform ``shallow fusion`` with an external LM +to improve the word-error-rate of a transducer model. + +.. note:: + + This tutorial is based on the recipe + `pruned_transducer_stateless7_streaming `_, + which is a streaming transducer model trained on `LibriSpeech`_. + However, you can easily apply shallow fusion to other recipes. + If you encounter any problems, please open an issue here `icefall `_. + +.. note:: + + For simplicity, the training and testing corpus in this tutorial is the same (`LibriSpeech`_). However, you can change the testing set + to any other domains (e.g `GigaSpeech`_) and use an external LM trained on that domain. + +.. HINT:: + + We recommend you to use a GPU for decoding. + +For illustration purpose, we will use a pre-trained ASR model from this `link `__. +If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`. + +As the initial step, let's download the pre-trained model. + +.. code-block:: bash + + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29 + $ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ git lfs pull --include "pretrained.pt" + $ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded + +To test the model, let's have a look at the decoding results without using LM. This can be done via the following command: + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/ + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --exp-dir $exp_dir \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search + +The following WERs are achieved on test-clean and test-other: + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 3.11 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 7.93 best for test-other + +These are already good numbers! But we can further improve it by using shallow fusion with external LM. +Training a language model usually takes a long time, we can download a pre-trained LM from this `link `__. + +.. code-block:: bash + + $ # download the external LM + $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm + $ # create a symbolic link so that the checkpoint can be loaded + $ pushd icefall-librispeech-rnn-lm/exp + $ git lfs pull --include "pretrained.pt" + $ ln -s pretrained.pt epoch-99.pt + $ popd + +.. note:: + + This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus. + You may also train a RNN LM from scratch. Please refer to this `script `__ + for training a RNN LM and this `script `__ to train a transformer LM. + +To use shallow fusion for decoding, we can execute the following command: + +.. code-block:: bash + + $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp + $ lm_dir=./icefall-librispeech-rnn-lm/exp + $ lm_scale=0.29 + $ ./pruned_transducer_stateless7_streaming/decode.py \ + --epoch 99 \ + --avg 1 \ + --use-averaged-model False \ + --beam-size 4 \ + --exp-dir $exp_dir \ + --max-duration 600 \ + --decode-chunk-len 32 \ + --decoding-method modified_beam_search_lm_shallow_fusion \ + --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model + --use-shallow-fusion 1 \ + --lm-type rnn \ + --lm-exp-dir $lm_dir \ + --lm-epoch 99 \ + --lm-scale $lm_scale \ + --lm-avg 1 \ + --rnn-lm-embedding-dim 2048 \ + --rnn-lm-hidden-dim 2048 \ + --rnn-lm-num-layers 3 \ + --lm-vocab-size 500 + +Note that we set ``--decoding-method modified_beam_search_lm_shallow_fusion`` and ``--use-shallow-fusion True`` +to use shallow fusion. ``--lm-type`` specifies the type of neural LM we are going to use, you can either choose +between ``rnn`` or ``transformer``. The following three arguments are associated with the rnn: + +- ``--rnn-lm-embedding-dim`` + The embedding dimension of the RNN LM + +- ``--rnn-lm-hidden-dim`` + The hidden dimension of the RNN LM + +- ``--rnn-lm-num-layers`` + The number of RNN layers in the RNN LM. + + +The decoding result obtained with the above command are shown below. + +.. code-block:: text + + $ For test-clean, WER of different settings are: + $ beam_size_4 2.77 best for test-clean + $ For test-other, WER of different settings are: + $ beam_size_4 7.08 best for test-other + +The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%. +A few parameters can be tuned to further boost the performance of shallow fusion: + +- ``--lm-scale`` + + Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large, + the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3. + +- ``--beam-size`` + + The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy. + +Here, we also show how `--beam-size` effect the WER and decoding time: + +.. list-table:: WERs and decoding time (on test-clean) of shallow fusion with different beam sizes + :widths: 25 25 25 25 + :header-rows: 1 + + * - Beam size + - test-clean + - test-other + - Decoding time on test-clean (s) + * - 4 + - 2.77 + - 7.08 + - 262 + * - 8 + - 2.62 + - 6.65 + - 352 + * - 12 + - 2.58 + - 6.65 + - 488 + +As we see, a larger beam size during shallow fusion improves the WER, but is also slower. + + + + + + + + diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt index 8d76eb68b..a7d365a15 100644 --- a/_sources/index.rst.txt +++ b/_sources/index.rst.txt @@ -34,3 +34,8 @@ speech recognition recipes using `k2 `_. contributing/index huggingface/index + +.. toctree:: + :maxdepth: 2 + + decoding-with-langugage-models/index \ No newline at end of file diff --git a/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt b/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt index ea9f350cd..2e8d0893a 100644 --- a/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt +++ b/_sources/recipes/Non-streaming-ASR/librispeech/distillation.rst.txt @@ -1,7 +1,7 @@ Distillation with HuBERT ======================== -This tutorial shows you how to perform knowledge distillation in `icefall`_ +This tutorial shows you how to perform knowledge distillation in `icefall `_ with the `LibriSpeech`_ dataset. The distillation method used here is called "Multi Vector Quantization Knowledge Distillation" (MVQ-KD). Please have a look at our paper `Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation `_ @@ -13,7 +13,7 @@ for more details about MVQ-KD. `pruned_transducer_stateless4 `_. Currently, we only implement MVQ-KD in this recipe. However, MVQ-KD is theoretically applicable to all recipes with only minor changes needed. Feel free to try out MVQ-KD in different recipes. If you - encounter any problems, please open an issue here `icefall `_. + encounter any problems, please open an issue here `icefall `__. .. note:: @@ -217,7 +217,7 @@ the following command. --exp-dir $exp_dir \ --enable-distillation True -You should get similar results as `here `_. +You should get similar results as `here `__. That's all! Feel free to experiment with your own setups and report your results. -If you encounter any problems during training, please open up an issue `here `_. +If you encounter any problems during training, please open up an issue `here `__. diff --git a/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt b/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt index 42fd3df77..1bc1dd984 100644 --- a/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt +++ b/_sources/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt @@ -8,10 +8,10 @@ with the `LibriSpeech `_ dataset. .. Note:: - The tutorial is suitable for `pruned_transducer_stateless `_, - `pruned_transducer_stateless2 `_, - `pruned_transducer_stateless4 `_, - `pruned_transducer_stateless5 `_, + The tutorial is suitable for `pruned_transducer_stateless `__, + `pruned_transducer_stateless2 `__, + `pruned_transducer_stateless4 `__, + `pruned_transducer_stateless5 `__, We will take pruned_transducer_stateless4 as an example in this tutorial. .. HINT:: @@ -237,7 +237,7 @@ them, please modify ``./pruned_transducer_stateless4/train.py`` directly. .. NOTE:: - The options for `pruned_transducer_stateless5 `_ are a little different from + The options for `pruned_transducer_stateless5 `__ are a little different from other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5. @@ -529,13 +529,13 @@ Download pretrained models If you don't want to train from scratch, you can download the pretrained models by visiting the following links: - - `pruned_transducer_stateless `_ + - `pruned_transducer_stateless `__ - - `pruned_transducer_stateless2 `_ + - `pruned_transducer_stateless2 `__ - - `pruned_transducer_stateless4 `_ + - `pruned_transducer_stateless4 `__ - - `pruned_transducer_stateless5 `_ + - `pruned_transducer_stateless5 `__ See ``_ for the details of the above pretrained models diff --git a/_sources/recipes/Streaming-ASR/introduction.rst.txt b/_sources/recipes/Streaming-ASR/introduction.rst.txt index e1382e77d..ac77a51d1 100644 --- a/_sources/recipes/Streaming-ASR/introduction.rst.txt +++ b/_sources/recipes/Streaming-ASR/introduction.rst.txt @@ -45,9 +45,9 @@ the input features. We have three variants of Emformer models in ``icefall``. - - ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe `_. + - ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe `__. - ``conv_emformer_transducer_stateless`` using ConvEmformer implemented by ourself. Different from the Emformer in torchaudio, ConvEmformer has a convolution in each layer and uses the mechanisms in our reworked conformer model. - See `LibriSpeech recipe `_. + See `LibriSpeech recipe `__. - ``conv_emformer_transducer_stateless2`` using ConvEmformer implemented by ourself. The only difference from the above one is that it uses a simplified memory bank. See `LibriSpeech recipe `_. diff --git a/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt b/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt index de7102ba8..2ca70bcf3 100644 --- a/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt +++ b/_sources/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst.txt @@ -6,10 +6,10 @@ with the `LibriSpeech `_ dataset. .. Note:: - The tutorial is suitable for `pruned_transducer_stateless `_, - `pruned_transducer_stateless2 `_, - `pruned_transducer_stateless4 `_, - `pruned_transducer_stateless5 `_, + The tutorial is suitable for `pruned_transducer_stateless `__, + `pruned_transducer_stateless2 `__, + `pruned_transducer_stateless4 `__, + `pruned_transducer_stateless5 `__, We will take pruned_transducer_stateless4 as an example in this tutorial. .. HINT:: @@ -264,7 +264,7 @@ them, please modify ``./pruned_transducer_stateless4/train.py`` directly. .. NOTE:: - The options for `pruned_transducer_stateless5 `_ are a little different from + The options for `pruned_transducer_stateless5 `__ are a little different from other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5. diff --git a/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt b/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt index f0e8961d7..8b75473c6 100644 --- a/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt +++ b/_sources/recipes/Streaming-ASR/librispeech/zipformer_transducer.rst.txt @@ -6,7 +6,7 @@ with the `LibriSpeech `_ dataset. .. Note:: - The tutorial is suitable for `pruned_transducer_stateless7_streaming `_, + The tutorial is suitable for `pruned_transducer_stateless7_streaming `__, .. HINT:: @@ -642,7 +642,7 @@ Download pretrained models If you don't want to train from scratch, you can download the pretrained models by visiting the following links: - - `pruned_transducer_stateless7_streaming `_ + - `pruned_transducer_stateless7_streaming `__ See ``_ for the details of the above pretrained models diff --git a/contributing/code-style.html b/contributing/code-style.html index d7f643e6a..0e54abcab 100644 --- a/contributing/code-style.html +++ b/contributing/code-style.html @@ -59,6 +59,9 @@
  • Huggingface
  • + + diff --git a/contributing/doc.html b/contributing/doc.html index 72b5f4056..c08322d68 100644 --- a/contributing/doc.html +++ b/contributing/doc.html @@ -59,6 +59,9 @@
  • Huggingface
  • + + diff --git a/contributing/how-to-create-a-recipe.html b/contributing/how-to-create-a-recipe.html index c8df20ace..b2325ad10 100644 --- a/contributing/how-to-create-a-recipe.html +++ b/contributing/how-to-create-a-recipe.html @@ -65,6 +65,9 @@
  • Huggingface
  • + + diff --git a/contributing/index.html b/contributing/index.html index 52819b7ef..1f9b31e3a 100644 --- a/contributing/index.html +++ b/contributing/index.html @@ -59,6 +59,9 @@
  • Huggingface
  • + + diff --git a/decoding-with-langugage-models/LODR.html b/decoding-with-langugage-models/LODR.html new file mode 100644 index 000000000..7181d098c --- /dev/null +++ b/decoding-with-langugage-models/LODR.html @@ -0,0 +1,292 @@ + + + + + + + LODR for RNN Transducer — icefall 0.1 documentation + + + + + + + + + + + + + + + + + + +
    + + +
    + +
    +
    +
    + +
    +
    +
    +
    + +
    +

    LODR for RNN Transducer

    +

    As a type of E2E model, neural transducers are usually considered as having an internal +language model, which learns the language level information on the training corpus. +In real-life scenario, there is often a mismatch between the training corpus and the target corpus space. +This mismatch can be a problem when decoding for neural transducer models with language models as its internal +language can act “against” the external LM. In this tutorial, we show how to use +Low-order Density Ratio to alleviate this effect to further improve the performance +of langugae model integration.

    +
    +

    Note

    +

    This tutorial is based on the recipe +pruned_transducer_stateless7_streaming, +which is a streaming transducer model trained on LibriSpeech. +However, you can easily apply LODR to other recipes. +If you encounter any problems, please open an issue here icefall.

    +
    +
    +

    Note

    +

    For simplicity, the training and testing corpus in this tutorial are the same (LibriSpeech). However, +you can change the testing set to any other domains (e.g GigaSpeech) and prepare the language models +using that corpus.

    +
    +

    First, let’s have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed here +to address the language information mismatch between the training +corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain +are acoustically similar, DR derives the following formular for decoding with Bayes’ theorem:

    +
    +\[\text{score}\left(y_u|\mathit{x},y\right) = +\log p\left(y_u|\mathit{x},y_{1:u-1}\right) + +\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - +\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]
    +

    where \(\lambda_1\) and \(\lambda_2\) are the weights of LM scores for target domain and source domain respectively. +Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to +shallow fusion is the subtraction of the source domain LM.

    +

    Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is +considered to be weak and can only capture low-level language information. Therefore, LODR proposed to use +a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula +during decoding for transducer model:

    +
    +\[\text{score}\left(y_u|\mathit{x},y\right) = +\log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) + +\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) - +\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]
    +

    In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Comared to DR, +the only difference lies in the choice of source domain LM. According to the original paper, +LODR achieves similar performance compared DR in both intra-domain and cross-domain settings. +As a bi-gram is much faster to evaluate, LODR is usually much faster.

    +

    Now, we will show you how to use LODR in icefall. +For illustration purpose, we will use a pre-trained ASR model from this link. +If you want to train your model from scratch, please have a look at Pruned transducer statelessX. +The testing scenario here is intra-domain (we decode the model trained on LibriSpeech on LibriSpeech testing sets).

    +

    As the initial step, let’s download the pre-trained model.

    +
    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
    +$ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ git lfs pull --include "pretrained.pt"
    +$ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
    +
    +
    +

    To test the model, let’s have a look at the decoding results without using LM. This can be done via the following command:

    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --exp-dir $exp_dir \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search
    +
    +
    +

    The following WERs are achieved on test-clean and test-other:

    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       3.11    best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       7.93    best for test-other
    +
    +
    +

    Then, we download the external language model and bi-gram LM that are necessary for LODR. +Note that the bi-gram is estimated on the LibriSpeech 960 hours’ text.

    +
    $ # download the external LM
    +$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
    +$ # create a symbolic link so that the checkpoint can be loaded
    +$ pushd icefall-librispeech-rnn-lm/exp
    +$ git lfs pull --include "pretrained.pt"
    +$ ln -s pretrained.pt epoch-99.pt
    +$ popd
    +$
    +$ # download the bi-gram
    +$ git lfs install
    +$ git clone https://huggingface.co/marcoyang/librispeech_bigram
    +$ pushd data/lang_bpe_500
    +$ ln -s ../../librispeech_bigram/2gram.fst.txt .
    +$ popd
    +
    +
    +

    Then, we perform LODR decoding by setting --decoding-method to modified_beam_search_lm_LODR:

    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ lm_dir=./icefall-librispeech-rnn-lm/exp
    +$ lm_scale=0.42
    +$ LODR_scale=-0.24
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --beam-size 4 \
    +    --exp-dir $exp_dir \
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search_lm_LODR \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --use-shallow-fusion 1 \
    +    --lm-type rnn \
    +    --lm-exp-dir $lm_dir \
    +    --lm-epoch 99 \
    +    --lm-scale $lm_scale \
    +    --lm-avg 1 \
    +    --rnn-lm-embedding-dim 2048 \
    +    --rnn-lm-hidden-dim 2048 \
    +    --rnn-lm-num-layers 3 \
    +    --lm-vocab-size 500 \
    +    --tokens-ngram 2 \
    +    --ngram-lm-scale $LODR_scale
    +
    +
    +

    There are two extra arguments that need to be given when doing LODR. --tokens-ngram specifies the order of n-gram. As we +are using a bi-gram, we set it to 2. --ngram-lm-scale is the scale of the bi-gram, it should be a negative number +as we are subtracting the bi-gram’s score during decoding.

    +

    The decoding results obtained with the above command are shown below:

    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       2.61    best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       6.74    best for test-other
    +
    +
    +

    Recall that the lowest WER we obtained in Shallow fusion for Transducer with beam size of 4 is 2.77/7.08, LODR +indeed further improves the WER. We can do even better if we increase --beam-size:

    + + +++++ + + + + + + + + + + + + + + + + + + + + +
    Table 2 WER of LODR with different beam sizes

    Beam size

    test-clean

    test-other

    4

    2.61

    6.74

    8

    2.45

    6.38

    12

    2.4

    6.23

    +
    + + +
    +
    + +
    +
    +
    +
    + + + + \ No newline at end of file diff --git a/decoding-with-langugage-models/index.html b/decoding-with-langugage-models/index.html new file mode 100644 index 000000000..f83205d74 --- /dev/null +++ b/decoding-with-langugage-models/index.html @@ -0,0 +1,135 @@ + + + + + + + Decoding with language models — icefall 0.1 documentation + + + + + + + + + + + + + + + + + +
    + + +
    + +
    +
    +
    + +
    +
    +
    +
    + +
    +

    Decoding with language models

    +

    This section describes how to use external langugage models +during decoding to improve the WER of transducer models.

    + +
    + + +
    +
    + +
    +
    +
    +
    + + + + \ No newline at end of file diff --git a/decoding-with-langugage-models/rescoring.html b/decoding-with-langugage-models/rescoring.html new file mode 100644 index 000000000..363352202 --- /dev/null +++ b/decoding-with-langugage-models/rescoring.html @@ -0,0 +1,386 @@ + + + + + + + LM rescoring for Transducer — icefall 0.1 documentation + + + + + + + + + + + + + + + + +
    + + +
    + +
    +
    +
    + +
    +
    +
    +
    + +
    +

    LM rescoring for Transducer

    +

    LM rescoring is a commonly used approach to incorporate external LM information. Unlike shallow-fusion-based +methods (see shallow-fusion, LODR for RNN Transducer), rescoring is usually performed to re-rank the n-best hypotheses after beam search. +Rescoring is usually more efficient than shallow fusion since less computation is performed on the external LM. +In this tutorial, we will show you how to use external LM to rescore the n-best hypotheses decoded from neural transducer models in +icefall.

    +
    +

    Note

    +

    This tutorial is based on the recipe +pruned_transducer_stateless7_streaming, +which is a streaming transducer model trained on LibriSpeech. +However, you can easily apply shallow fusion to other recipes. +If you encounter any problems, please open an issue here.

    +
    +
    +

    Note

    +

    For simplicity, the training and testing corpus in this tutorial is the same (LibriSpeech). However, you can change the testing set +to any other domains (e.g GigaSpeech) and use an external LM trained on that domain.

    +
    +
    +

    Hint

    +

    We recommend you to use a GPU for decoding.

    +
    +

    For illustration purpose, we will use a pre-trained ASR model from this link. +If you want to train your model from scratch, please have a look at Pruned transducer statelessX.

    +

    As the initial step, let’s download the pre-trained model.

    +
    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
    +$ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ git lfs pull --include "pretrained.pt"
    +$ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
    +
    +
    +

    As usual, we first test the model’s performance without external LM. This can be done via the following command:

    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --exp-dir $exp_dir \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search
    +
    +
    +

    The following WERs are achieved on test-clean and test-other:

    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       3.11    best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       7.93    best for test-other
    +
    +
    +

    Now, we will try to improve the above WER numbers via external LM rescoring. We will download +a pre-trained LM from this link.

    +
    +

    Note

    +

    This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus. +You may also train a RNN LM from scratch. Please refer to this script +for training a RNN LM and this script to train a transformer LM.

    +
    +
    $ # download the external LM
    +$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
    +$ # create a symbolic link so that the checkpoint can be loaded
    +$ pushd icefall-librispeech-rnn-lm/exp
    +$ git lfs pull --include "pretrained.pt"
    +$ ln -s pretrained.pt epoch-99.pt
    +$ popd
    +
    +
    +

    With the RNNLM available, we can rescore the n-best hypotheses generated from modified_beam_search. Here, +n should be the number of beams, i.e --beam-size. The command for LM rescoring is +as follows. Note that the --decoding-method is set to modified_beam_search_lm_rescore and --use-shallow-fusion +is set to False.

    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ lm_dir=./icefall-librispeech-rnn-lm/exp
    +$ lm_scale=0.43
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --beam-size 4 \
    +    --exp-dir $exp_dir \
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search_lm_rescore \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --use-shallow-fusion 0 \
    +    --lm-type rnn \
    +    --lm-exp-dir $lm_dir \
    +    --lm-epoch 99 \
    +    --lm-scale $lm_scale \
    +    --lm-avg 1 \
    +    --rnn-lm-embedding-dim 2048 \
    +    --rnn-lm-hidden-dim 2048 \
    +    --rnn-lm-num-layers 3 \
    +    --lm-vocab-size 500
    +
    +
    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       2.93    best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       7.6     best for test-other
    +
    +
    +

    Great! We made some improvements! Increasing the size of the n-best hypotheses will further boost the performance, +see the following table:

    + + +++++ + + + + + + + + + + + + + + + + + + + + +
    Table 3 WERs of LM rescoring with different beam sizes

    Beam size

    test-clean

    test-other

    4

    2.93

    7.6

    8

    2.67

    7.11

    12

    2.59

    6.86

    +

    In fact, we can also apply LODR (see LODR for RNN Transducer) when doing LM rescoring. To do so, we need to +download the bi-gram required by LODR:

    +
    $ # download the bi-gram
    +$ git lfs install
    +$ git clone https://huggingface.co/marcoyang/librispeech_bigram
    +$ pushd data/lang_bpe_500
    +$ ln -s ../../librispeech_bigram/2gram.arpa .
    +$ popd
    +
    +
    +

    Then we can performn LM rescoring + LODR by changing the decoding method to modified_beam_search_lm_rescore_LODR.

    +
    +

    Note

    +

    This decoding method requires the dependency of kenlm. You can install it +via this command: pip install https://github.com/kpu/kenlm/archive/master.zip.

    +
    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ lm_dir=./icefall-librispeech-rnn-lm/exp
    +$ lm_scale=0.43
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --beam-size 4 \
    +    --exp-dir $exp_dir \
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search_lm_rescore_LODR \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --use-shallow-fusion 0 \
    +    --lm-type rnn \
    +    --lm-exp-dir $lm_dir \
    +    --lm-epoch 99 \
    +    --lm-scale $lm_scale \
    +    --lm-avg 1 \
    +    --rnn-lm-embedding-dim 2048 \
    +    --rnn-lm-hidden-dim 2048 \
    +    --rnn-lm-num-layers 3 \
    +    --lm-vocab-size 500
    +
    +
    +

    You should see the following WERs after executing the commands above:

    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       2.9     best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       7.57    best for test-other
    +
    +
    +

    It’s slightly better than LM rescoring. If we further increase the beam size, we will see +further improvements from LM rescoring + LODR:

    + + +++++ + + + + + + + + + + + + + + + + + + + + +
    Table 4 WERs of LM rescoring + LODR with different beam sizes

    Beam size

    test-clean

    test-other

    4

    2.9

    7.57

    8

    2.63

    7.04

    12

    2.52

    6.73

    +

    As mentioned earlier, LM rescoring is usually faster than shallow-fusion based methods. +Here, we benchmark the WERs and decoding speed of them:

    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 5 LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)

    Decoding method

    beam=4

    beam=8

    beam=12

    modified_beam_search

    3.11/7.93; 132s

    3.1/7.95; 177s

    3.1/7.96; 210s

    modified_beam_search_lm_shallow_fusion

    2.77/7.08; 262s

    2.62/6.65; 352s

    2.58/6.65; 488s

    LODR

    2.61/6.74; 400s

    2.45/6.38; 610s

    2.4/6.23; 870s

    modified_beam_search_lm_rescore

    2.93/7.6; 156s

    2.67/7.11; 203s

    2.59/6.86; 255s

    modified_beam_search_lm_rescore_LODR

    2.9/7.57; 160s

    2.63/7.04; 203s

    2.52/6.73; 263s

    +
    +

    Note

    +

    Decoding is performed with a single 32G V100, we set --max-duration to 600. +Decoding time here is only for reference and it may vary.

    +
    +
    + + +
    +
    + +
    +
    +
    +
    + + + + \ No newline at end of file diff --git a/decoding-with-langugage-models/shallow-fusion.html b/decoding-with-langugage-models/shallow-fusion.html new file mode 100644 index 000000000..50ef537ce --- /dev/null +++ b/decoding-with-langugage-models/shallow-fusion.html @@ -0,0 +1,296 @@ + + + + + + + Shallow fusion for Transducer — icefall 0.1 documentation + + + + + + + + + + + + + + + + + +
    + + +
    + +
    +
    +
    + +
    +
    +
    +
    + +
    +

    Shallow fusion for Transducer

    +

    External language models (LM) are commonly used to improve WERs for E2E ASR models. +This tutorial shows you how to perform shallow fusion with an external LM +to improve the word-error-rate of a transducer model.

    +
    +

    Note

    +

    This tutorial is based on the recipe +pruned_transducer_stateless7_streaming, +which is a streaming transducer model trained on LibriSpeech. +However, you can easily apply shallow fusion to other recipes. +If you encounter any problems, please open an issue here icefall.

    +
    +
    +

    Note

    +

    For simplicity, the training and testing corpus in this tutorial is the same (LibriSpeech). However, you can change the testing set +to any other domains (e.g GigaSpeech) and use an external LM trained on that domain.

    +
    +
    +

    Hint

    +

    We recommend you to use a GPU for decoding.

    +
    +

    For illustration purpose, we will use a pre-trained ASR model from this link. +If you want to train your model from scratch, please have a look at Pruned transducer statelessX.

    +

    As the initial step, let’s download the pre-trained model.

    +
    $ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
    +$ pushd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ git lfs pull --include "pretrained.pt"
    +$ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
    +
    +
    +

    To test the model, let’s have a look at the decoding results without using LM. This can be done via the following command:

    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --exp-dir $exp_dir \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search
    +
    +
    +

    The following WERs are achieved on test-clean and test-other:

    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       3.11    best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       7.93    best for test-other
    +
    +
    +

    These are already good numbers! But we can further improve it by using shallow fusion with external LM. +Training a language model usually takes a long time, we can download a pre-trained LM from this link.

    +
    $ # download the external LM
    +$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
    +$ # create a symbolic link so that the checkpoint can be loaded
    +$ pushd icefall-librispeech-rnn-lm/exp
    +$ git lfs pull --include "pretrained.pt"
    +$ ln -s pretrained.pt epoch-99.pt
    +$ popd
    +
    +
    +
    +

    Note

    +

    This is an RNN LM trained on the LibriSpeech text corpus. So it might not be ideal for other corpus. +You may also train a RNN LM from scratch. Please refer to this script +for training a RNN LM and this script to train a transformer LM.

    +
    +

    To use shallow fusion for decoding, we can execute the following command:

    +
    $ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
    +$ lm_dir=./icefall-librispeech-rnn-lm/exp
    +$ lm_scale=0.29
    +$ ./pruned_transducer_stateless7_streaming/decode.py \
    +    --epoch 99 \
    +    --avg 1 \
    +    --use-averaged-model False \
    +    --beam-size 4 \
    +    --exp-dir $exp_dir \
    +    --max-duration 600 \
    +    --decode-chunk-len 32 \
    +    --decoding-method modified_beam_search_lm_shallow_fusion \
    +    --bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
    +    --use-shallow-fusion 1 \
    +    --lm-type rnn \
    +    --lm-exp-dir $lm_dir \
    +    --lm-epoch 99 \
    +    --lm-scale $lm_scale \
    +    --lm-avg 1 \
    +    --rnn-lm-embedding-dim 2048 \
    +    --rnn-lm-hidden-dim 2048 \
    +    --rnn-lm-num-layers 3 \
    +    --lm-vocab-size 500
    +
    +
    +

    Note that we set --decoding-method modified_beam_search_lm_shallow_fusion and --use-shallow-fusion True +to use shallow fusion. --lm-type specifies the type of neural LM we are going to use, you can either choose +between rnn or transformer. The following three arguments are associated with the rnn:

    +
      +
    • +
      --rnn-lm-embedding-dim

      The embedding dimension of the RNN LM

      +
      +
      +
    • +
    • +
      --rnn-lm-hidden-dim

      The hidden dimension of the RNN LM

      +
      +
      +
    • +
    • +
      --rnn-lm-num-layers

      The number of RNN layers in the RNN LM.

      +
      +
      +
    • +
    +

    The decoding result obtained with the above command are shown below.

    +
    $ For test-clean, WER of different settings are:
    +$ beam_size_4       2.77    best for test-clean
    +$ For test-other, WER of different settings are:
    +$ beam_size_4       7.08    best for test-other
    +
    +
    +

    The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%. +A few parameters can be tuned to further boost the performance of shallow fusion:

    +
      +
    • --lm-scale

      +
      +

      Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large, +the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.

      +
      +
    • +
    • --beam-size

      +
      +

      The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy.

      +
      +
    • +
    +

    Here, we also show how –beam-size effect the WER and decoding time:

    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1 WERs and decoding time (on test-clean) of shallow fusion with different beam sizes

    Beam size

    test-clean

    test-other

    Decoding time on test-clean (s)

    4

    2.77

    7.08

    262

    8

    2.62

    6.65

    352

    12

    2.58

    6.65

    488

    +

    As we see, a larger beam size during shallow fusion improves the WER, but is also slower.

    +
    + + +
    +
    + +
    +
    +
    +
    + + + + \ No newline at end of file diff --git a/faqs.html b/faqs.html index b2cd5952b..ffe125679 100644 --- a/faqs.html +++ b/faqs.html @@ -59,6 +59,9 @@ + diff --git a/genindex.html b/genindex.html index 3ea6ec482..9ae9d458b 100644 --- a/genindex.html +++ b/genindex.html @@ -51,6 +51,9 @@ + diff --git a/huggingface/index.html b/huggingface/index.html index 0cdd8fa13..570462c18 100644 --- a/huggingface/index.html +++ b/huggingface/index.html @@ -58,6 +58,9 @@
  • Huggingface spaces
  • + + diff --git a/huggingface/pretrained-models.html b/huggingface/pretrained-models.html index 18a696caf..1f1a0ffd5 100644 --- a/huggingface/pretrained-models.html +++ b/huggingface/pretrained-models.html @@ -58,6 +58,9 @@
  • Huggingface spaces
  • + + diff --git a/huggingface/spaces.html b/huggingface/spaces.html index b702ead36..c82af354c 100644 --- a/huggingface/spaces.html +++ b/huggingface/spaces.html @@ -19,6 +19,7 @@ + @@ -60,6 +61,9 @@ + + @@ -144,6 +148,7 @@ the following YouTube channel by +
    diff --git a/index.html b/index.html index 60ed00b9d..93c9b2ca5 100644 --- a/index.html +++ b/index.html @@ -53,6 +53,9 @@ + @@ -148,6 +151,16 @@ speech recognition recipes using + + diff --git a/installation/index.html b/installation/index.html index 01e817409..a7e1bd057 100644 --- a/installation/index.html +++ b/installation/index.html @@ -76,6 +76,9 @@ + diff --git a/model-export/export-model-state-dict.html b/model-export/export-model-state-dict.html index 8e378efdc..0ee2fa000 100644 --- a/model-export/export-model-state-dict.html +++ b/model-export/export-model-state-dict.html @@ -67,6 +67,9 @@ + diff --git a/model-export/export-ncnn-conv-emformer.html b/model-export/export-ncnn-conv-emformer.html index 7784202a7..f0521d694 100644 --- a/model-export/export-ncnn-conv-emformer.html +++ b/model-export/export-ncnn-conv-emformer.html @@ -75,6 +75,9 @@ + diff --git a/model-export/export-ncnn-lstm.html b/model-export/export-ncnn-lstm.html index 5ee447022..4b3392faa 100644 --- a/model-export/export-ncnn-lstm.html +++ b/model-export/export-ncnn-lstm.html @@ -75,6 +75,9 @@ + diff --git a/model-export/export-ncnn-zipformer.html b/model-export/export-ncnn-zipformer.html index 85909488d..45fa51fac 100644 --- a/model-export/export-ncnn-zipformer.html +++ b/model-export/export-ncnn-zipformer.html @@ -74,6 +74,9 @@ + diff --git a/model-export/export-ncnn.html b/model-export/export-ncnn.html index 5990017be..583f82e95 100644 --- a/model-export/export-ncnn.html +++ b/model-export/export-ncnn.html @@ -66,6 +66,9 @@ + diff --git a/model-export/export-onnx.html b/model-export/export-onnx.html index 1bd94d732..b52c539a2 100644 --- a/model-export/export-onnx.html +++ b/model-export/export-onnx.html @@ -68,6 +68,9 @@ + diff --git a/model-export/export-with-torch-jit-script.html b/model-export/export-with-torch-jit-script.html index 6fd69f3e6..5f46f9ee1 100644 --- a/model-export/export-with-torch-jit-script.html +++ b/model-export/export-with-torch-jit-script.html @@ -66,6 +66,9 @@ + diff --git a/model-export/export-with-torch-jit-trace.html b/model-export/export-with-torch-jit-trace.html index d29ab9121..b85004cfd 100644 --- a/model-export/export-with-torch-jit-trace.html +++ b/model-export/export-with-torch-jit-trace.html @@ -66,6 +66,9 @@ + diff --git a/model-export/index.html b/model-export/index.html index d7fc088e7..6a4894bdb 100644 --- a/model-export/index.html +++ b/model-export/index.html @@ -61,6 +61,9 @@ + diff --git a/objects.inv b/objects.inv index c5bbb969b..eacb8b263 100644 Binary files a/objects.inv and b/objects.inv differ diff --git a/recipes/Non-streaming-ASR/aishell/conformer_ctc.html b/recipes/Non-streaming-ASR/aishell/conformer_ctc.html index cb49f3445..a966f5c64 100644 --- a/recipes/Non-streaming-ASR/aishell/conformer_ctc.html +++ b/recipes/Non-streaming-ASR/aishell/conformer_ctc.html @@ -69,6 +69,9 @@ + diff --git a/recipes/Non-streaming-ASR/aishell/index.html b/recipes/Non-streaming-ASR/aishell/index.html index a631035a7..8301b81d9 100644 --- a/recipes/Non-streaming-ASR/aishell/index.html +++ b/recipes/Non-streaming-ASR/aishell/index.html @@ -69,6 +69,9 @@ + diff --git a/recipes/Non-streaming-ASR/aishell/stateless_transducer.html b/recipes/Non-streaming-ASR/aishell/stateless_transducer.html index 5927c866c..717a5ab0e 100644 --- a/recipes/Non-streaming-ASR/aishell/stateless_transducer.html +++ b/recipes/Non-streaming-ASR/aishell/stateless_transducer.html @@ -69,6 +69,9 @@ + diff --git a/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html b/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html index 57d7d377c..710fce058 100644 --- a/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html +++ b/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.html @@ -69,6 +69,9 @@ + diff --git a/recipes/Non-streaming-ASR/index.html b/recipes/Non-streaming-ASR/index.html index 5095cc140..73a07bc7e 100644 --- a/recipes/Non-streaming-ASR/index.html +++ b/recipes/Non-streaming-ASR/index.html @@ -64,6 +64,9 @@ + diff --git a/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html b/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html index 6d9cac4af..ba23bf7df 100644 --- a/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html +++ b/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html @@ -72,6 +72,9 @@ + diff --git a/recipes/Non-streaming-ASR/librispeech/distillation.html b/recipes/Non-streaming-ASR/librispeech/distillation.html index e806c3c84..70dcea8ca 100644 --- a/recipes/Non-streaming-ASR/librispeech/distillation.html +++ b/recipes/Non-streaming-ASR/librispeech/distillation.html @@ -72,6 +72,9 @@ + @@ -103,7 +106,7 @@

    Distillation with HuBERT

    -

    This tutorial shows you how to perform knowledge distillation in `icefall`_ +

    This tutorial shows you how to perform knowledge distillation in icefall with the LibriSpeech dataset. The distillation method used here is called “Multi Vector Quantization Knowledge Distillation” (MVQ-KD). Please have a look at our paper Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation @@ -119,7 +122,7 @@ encounter any problems, please open an issue here

    Note

    We assume you have read the page Installation and have setup -the environment for `icefall`_.

    +the environment for icefall.

    Hint

    @@ -190,7 +193,7 @@ run

    Note

    There are 5 stages in total, the first and second stage will be automatically skipped -when choosing to downloaded codebook indexes prepared by `icefall`_. +when choosing to downloaded codebook indexes prepared by icefall. Of course, you can extract and compute the codebook indexes by yourself. This will require you downloading a HuBERT-XL model and it can take a while for the extraction of codebook indexes.

    @@ -226,10 +229,10 @@ and prepares MVQ-augmented training manifests.

    Please see the following screenshot for the output of an example execution.

    -
    +
    Downloading codebook indexes and preparing training manifest.
    -

    Fig. 6 Downloading codebook indexes and preparing training manifest.

    +

    Fig. 6 Downloading codebook indexes and preparing training manifest.

    @@ -241,10 +244,10 @@ set use_extracted_c num_codebooks by yourself.

    Now, you should see the following files under the directory ./data/vq_fbank_layer36_cb8.

    -
    +
    MVQ-augmented training manifests
    -

    Fig. 7 MVQ-augmented training manifests.

    +

    Fig. 7 MVQ-augmented training manifests.

    Whola! You are ready to perform knowledge distillation training now!

    diff --git a/recipes/Non-streaming-ASR/librispeech/index.html b/recipes/Non-streaming-ASR/librispeech/index.html index de995da42..5809a5f19 100644 --- a/recipes/Non-streaming-ASR/librispeech/index.html +++ b/recipes/Non-streaming-ASR/librispeech/index.html @@ -72,6 +72,9 @@ + diff --git a/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html b/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html index f0dfe4475..96ba89680 100644 --- a/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html +++ b/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.html @@ -72,6 +72,9 @@ + @@ -381,10 +384,10 @@ $ tensorboard dev

    Note there is a URL in the above output. Click it and you will see the following screenshot:

    -
    +
    TensorBoard screenshot
    -

    Fig. 5 TensorBoard screenshot.

    +

    Fig. 5 TensorBoard screenshot.

    diff --git a/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html b/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html index 3c3420c12..11a988c05 100644 --- a/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html +++ b/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.html @@ -72,6 +72,9 @@ + diff --git a/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html b/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html index b1bfe6c77..66ad3db1d 100644 --- a/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html +++ b/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.html @@ -72,6 +72,9 @@ + diff --git a/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html b/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html index 6dd89e5b8..e46e1a5dc 100644 --- a/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html +++ b/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.html @@ -72,6 +72,9 @@ + diff --git a/recipes/Non-streaming-ASR/timit/index.html b/recipes/Non-streaming-ASR/timit/index.html index 8418740c5..08733f79e 100644 --- a/recipes/Non-streaming-ASR/timit/index.html +++ b/recipes/Non-streaming-ASR/timit/index.html @@ -68,6 +68,9 @@ + diff --git a/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html b/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html index 0970c41af..9581f9367 100644 --- a/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html +++ b/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.html @@ -68,6 +68,9 @@ + diff --git a/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html b/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html index 0afe1777f..b39536ff4 100644 --- a/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html +++ b/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.html @@ -68,6 +68,9 @@ + diff --git a/recipes/Non-streaming-ASR/yesno/index.html b/recipes/Non-streaming-ASR/yesno/index.html index 7704c24b9..742e738ae 100644 --- a/recipes/Non-streaming-ASR/yesno/index.html +++ b/recipes/Non-streaming-ASR/yesno/index.html @@ -67,6 +67,9 @@ + diff --git a/recipes/Non-streaming-ASR/yesno/tdnn.html b/recipes/Non-streaming-ASR/yesno/tdnn.html index 2a4849baa..c977d762c 100644 --- a/recipes/Non-streaming-ASR/yesno/tdnn.html +++ b/recipes/Non-streaming-ASR/yesno/tdnn.html @@ -67,6 +67,9 @@ + diff --git a/recipes/Streaming-ASR/index.html b/recipes/Streaming-ASR/index.html index 279eb8f32..b1cdcdbf4 100644 --- a/recipes/Streaming-ASR/index.html +++ b/recipes/Streaming-ASR/index.html @@ -62,6 +62,9 @@ + diff --git a/recipes/Streaming-ASR/introduction.html b/recipes/Streaming-ASR/introduction.html index edde29646..9e43a2d69 100644 --- a/recipes/Streaming-ASR/introduction.html +++ b/recipes/Streaming-ASR/introduction.html @@ -66,6 +66,9 @@ + diff --git a/recipes/Streaming-ASR/librispeech/index.html b/recipes/Streaming-ASR/librispeech/index.html index ba1d9fda4..bda684f82 100644 --- a/recipes/Streaming-ASR/librispeech/index.html +++ b/recipes/Streaming-ASR/librispeech/index.html @@ -67,6 +67,9 @@ + diff --git a/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html b/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html index afd94794d..82e1e392e 100644 --- a/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html +++ b/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html @@ -67,6 +67,9 @@ + @@ -380,10 +383,10 @@ $ tensorboard dev

    Note there is a URL in the above output. Click it and you will see the following screenshot:

    -
    +
    TensorBoard screenshot
    -

    Fig. 10 TensorBoard screenshot.

    +

    Fig. 10 TensorBoard screenshot.

    diff --git a/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html b/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html index 67195d4ab..654e02874 100644 --- a/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html +++ b/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.html @@ -67,6 +67,9 @@ + @@ -400,10 +403,10 @@ $ tensorboard dev

    Note there is a URL in the above output. Click it and you will see the following screenshot:

    -
    +
    TensorBoard screenshot
    -

    Fig. 9 TensorBoard screenshot.

    +

    Fig. 9 TensorBoard screenshot.

    diff --git a/recipes/Streaming-ASR/librispeech/zipformer_transducer.html b/recipes/Streaming-ASR/librispeech/zipformer_transducer.html index 4d266b959..f9e370b8d 100644 --- a/recipes/Streaming-ASR/librispeech/zipformer_transducer.html +++ b/recipes/Streaming-ASR/librispeech/zipformer_transducer.html @@ -67,6 +67,9 @@ + diff --git a/recipes/index.html b/recipes/index.html index 646f5378f..c53fc3c54 100644 --- a/recipes/index.html +++ b/recipes/index.html @@ -58,6 +58,9 @@ + diff --git a/search.html b/search.html index cbe8c2152..bc5fbe20f 100644 --- a/search.html +++ b/search.html @@ -54,6 +54,9 @@ + diff --git a/searchindex.js b/searchindex.js index 090fd4fbf..62a6f9d6d 100644 --- a/searchindex.js +++ b/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["contributing/code-style", "contributing/doc", "contributing/how-to-create-a-recipe", "contributing/index", "faqs", "huggingface/index", "huggingface/pretrained-models", "huggingface/spaces", "index", "installation/index", "model-export/export-model-state-dict", "model-export/export-ncnn", "model-export/export-ncnn-conv-emformer", "model-export/export-ncnn-lstm", "model-export/export-ncnn-zipformer", "model-export/export-onnx", "model-export/export-with-torch-jit-script", "model-export/export-with-torch-jit-trace", "model-export/index", "recipes/Non-streaming-ASR/aishell/conformer_ctc", "recipes/Non-streaming-ASR/aishell/index", "recipes/Non-streaming-ASR/aishell/stateless_transducer", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/index", "recipes/Non-streaming-ASR/librispeech/conformer_ctc", "recipes/Non-streaming-ASR/librispeech/distillation", "recipes/Non-streaming-ASR/librispeech/index", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi", "recipes/Non-streaming-ASR/timit/index", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/yesno/index", "recipes/Non-streaming-ASR/yesno/tdnn", "recipes/Streaming-ASR/index", "recipes/Streaming-ASR/introduction", "recipes/Streaming-ASR/librispeech/index", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Streaming-ASR/librispeech/zipformer_transducer", "recipes/index"], "filenames": ["contributing/code-style.rst", "contributing/doc.rst", "contributing/how-to-create-a-recipe.rst", "contributing/index.rst", "faqs.rst", "huggingface/index.rst", "huggingface/pretrained-models.rst", "huggingface/spaces.rst", "index.rst", "installation/index.rst", "model-export/export-model-state-dict.rst", "model-export/export-ncnn.rst", "model-export/export-ncnn-conv-emformer.rst", "model-export/export-ncnn-lstm.rst", "model-export/export-ncnn-zipformer.rst", "model-export/export-onnx.rst", "model-export/export-with-torch-jit-script.rst", "model-export/export-with-torch-jit-trace.rst", "model-export/index.rst", "recipes/Non-streaming-ASR/aishell/conformer_ctc.rst", "recipes/Non-streaming-ASR/aishell/index.rst", "recipes/Non-streaming-ASR/aishell/stateless_transducer.rst", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/index.rst", "recipes/Non-streaming-ASR/librispeech/conformer_ctc.rst", "recipes/Non-streaming-ASR/librispeech/distillation.rst", "recipes/Non-streaming-ASR/librispeech/index.rst", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi.rst", "recipes/Non-streaming-ASR/timit/index.rst", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.rst", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/yesno/index.rst", "recipes/Non-streaming-ASR/yesno/tdnn.rst", "recipes/Streaming-ASR/index.rst", "recipes/Streaming-ASR/introduction.rst", "recipes/Streaming-ASR/librispeech/index.rst", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Streaming-ASR/librispeech/zipformer_transducer.rst", "recipes/index.rst"], "titles": ["Follow the code style", "Contributing to Documentation", "How to create a recipe", "Contributing", "Frequently Asked Questions (FAQs)", "Huggingface", "Pre-trained models", "Huggingface spaces", "Icefall", "Installation", "Export model.state_dict()", "Export to ncnn", "Export ConvEmformer transducer models to ncnn", "Export LSTM transducer models to ncnn", "Export streaming Zipformer transducer models to ncnn", "Export to ONNX", "Export model with torch.jit.script()", "Export model with torch.jit.trace()", "Model export", "Conformer CTC", "aishell", "Stateless Transducer", "TDNN-LSTM CTC", "Non Streaming ASR", "Conformer CTC", "Distillation with HuBERT", "LibriSpeech", "Pruned transducer statelessX", "TDNN-LSTM-CTC", "Zipformer CTC Blank Skip", "Zipformer MMI", "TIMIT", "TDNN-LiGRU-CTC", "TDNN-LSTM-CTC", "YesNo", "TDNN-CTC", "Streaming ASR", "Introduction", "LibriSpeech", "LSTM Transducer", "Pruned transducer statelessX", "Zipformer Transducer", "Recipes"], "terms": {"we": [0, 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "us": [0, 1, 2, 4, 5, 7, 8, 9, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 24, 25, 28, 32, 33, 35, 37], "tool": [0, 4, 12], "make": [0, 1, 3, 12, 13, 14, 19, 21, 24, 37], "consist": [0, 21, 27, 39, 40, 41], "possibl": [0, 2, 3, 9, 19, 24], "black": 0, "format": [0, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "flake8": 0, "check": [0, 24], "qualiti": [0, 20], "isort": 0, "sort": [0, 9], "import": [0, 4, 12, 40, 41], "The": [0, 1, 2, 4, 7, 9, 10, 12, 13, 14, 19, 20, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "version": [0, 8, 9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 32, 33, 40], "abov": [0, 4, 9, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40, 41], "ar": [0, 1, 3, 4, 9, 10, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41, 42], "22": [0, 9, 12, 13, 24, 32, 33, 35], "3": [0, 4, 8, 10, 11, 15, 18, 22, 25, 27, 28, 29, 30, 35, 39, 40, 41], "0": [0, 1, 8, 10, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "5": [0, 11, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "4": [0, 4, 8, 10, 11, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "10": [0, 8, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "1": [0, 8, 10, 11, 15, 16, 17, 18, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "after": [0, 1, 7, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "run": [0, 2, 4, 7, 9, 12, 13, 14, 15, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "command": [0, 1, 4, 9, 10, 12, 13, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "git": [0, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "clone": [0, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "http": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "github": [0, 2, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "com": [0, 2, 6, 7, 9, 10, 12, 13, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "k2": [0, 2, 4, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "fsa": [0, 2, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 24, 27, 29, 30, 39, 40, 41], "icefal": [0, 2, 3, 4, 6, 7, 10, 11, 15, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "cd": [0, 1, 2, 4, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "pip": [0, 1, 4, 9, 12, 15, 21], "instal": [0, 1, 4, 5, 7, 8, 10, 11, 15, 18, 25, 27, 29, 30, 35, 39, 40, 41], "pre": [0, 3, 5, 7, 8, 9, 11, 18, 25], "commit": 0, "whenev": 0, "you": [0, 1, 2, 4, 6, 7, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "automat": [0, 7, 25], "hook": 0, "invok": 0, "fail": [0, 9], "If": [0, 2, 4, 7, 12, 13, 14, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "ani": [0, 9, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40], "your": [0, 1, 2, 5, 7, 8, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "wa": [0, 9, 10, 24, 28], "success": [0, 9, 12, 13], "pleas": [0, 1, 2, 4, 7, 9, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "fix": [0, 4, 9, 12, 13, 14, 24], "issu": [0, 4, 9, 12, 13, 24, 25, 40, 41], "report": [0, 4, 9, 25], "some": [0, 1, 10, 12, 13, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "i": [0, 1, 2, 4, 7, 9, 10, 11, 12, 13, 14, 15, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "e": [0, 2, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "modifi": [0, 11, 18, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "file": [0, 2, 7, 8, 10, 12, 13, 14, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "place": [0, 9, 10, 21, 24, 28], "so": [0, 7, 8, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "statu": 0, "failur": 0, "see": [0, 1, 7, 9, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "which": [0, 2, 7, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 40, 41], "ha": [0, 2, 8, 11, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 37, 39, 40, 41], "been": [0, 11, 12, 13, 14, 21], "befor": [0, 1, 10, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "further": 0, "chang": [0, 4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "all": [0, 6, 7, 10, 12, 13, 14, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "again": [0, 12, 13, 35], "should": [0, 2, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "succe": 0, "thi": [0, 2, 3, 4, 5, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "time": [0, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "succeed": 0, "want": [0, 9, 10, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "can": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "do": [0, 2, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "Or": 0, "without": [0, 5, 7, 19, 24], "your_changed_fil": 0, "py": [0, 2, 4, 9, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "sphinx": 1, "write": [1, 2, 3], "have": [1, 2, 6, 7, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "prepar": [1, 3, 10], "environ": [1, 4, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 32, 33, 35, 40, 41], "doc": [1, 10, 37], "r": [1, 9, 12, 13, 14, 32, 33], "requir": [1, 9, 14, 25, 40, 41], "txt": [1, 9, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "set": [1, 4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "up": [1, 9, 10, 12, 13, 14, 19, 22, 24, 25, 27, 28, 29, 30, 40, 41], "readi": [1, 19, 24, 25], "refer": [1, 2, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 22, 24, 27, 28, 29, 32, 33, 35, 37, 40, 41], "restructuredtext": 1, "primer": 1, "familiar": 1, "build": [1, 9, 10, 12, 13, 14, 19, 21, 24], "local": [1, 9, 27, 29, 30, 39, 40, 41], "preview": 1, "what": [1, 2, 9, 12, 13, 14, 21, 37], "look": [1, 2, 6, 9, 12, 13, 14, 19, 21, 22, 24, 25], "like": [1, 2, 7, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40], "publish": [1, 10, 20], "html": [1, 2, 4, 9, 11, 12, 13, 14, 15, 16, 17, 27, 39, 40, 41], "gener": [1, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "view": [1, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40, 41], "follow": [1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "python3": [1, 4, 9, 13, 14], "m": [1, 9, 12, 13, 14, 21, 27, 29, 30, 32, 33, 39, 40, 41], "server": [1, 7, 9, 39], "It": [1, 2, 5, 9, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "print": [1, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "serv": [1, 27, 29, 30, 39, 40, 41], "port": [1, 25, 27, 29, 30, 39, 40, 41], "8000": [1, 35], "open": [1, 8, 10, 12, 13, 14, 20, 21, 24, 25], "browser": [1, 5, 7, 27, 29, 30, 39, 40, 41], "go": [1, 9, 19, 21, 24, 27, 29, 30, 39, 40, 41], "read": [2, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "code": [2, 3, 4, 8, 12, 13, 14, 19, 24, 25, 27, 28, 32, 33, 35, 37, 40, 41], "style": [2, 3, 8], "adjust": 2, "sytl": 2, "design": 2, "python": [2, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 24, 27, 29, 30, 39, 40, 41], "recommend": [2, 9, 19, 21, 22, 24, 25, 27, 40, 41], "test": [2, 8, 10, 11, 18, 19, 21, 22, 24, 25, 28, 29, 32, 33], "valid": [2, 9, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "dataset": [2, 4, 9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "lhots": [2, 8, 10, 12, 13, 14, 19, 21, 24], "readthedoc": [2, 9], "io": [2, 9, 11, 12, 13, 14, 15, 16, 17, 27, 39, 40, 41], "en": [2, 9, 12], "latest": [2, 7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "index": [2, 9, 11, 12, 13, 14, 15, 16, 17, 39, 40, 41], "yesno": [2, 4, 8, 9, 23, 35, 42], "veri": [2, 3, 12, 13, 14, 21, 32, 33, 35, 40, 41], "good": 2, "exampl": [2, 7, 8, 10, 12, 13, 14, 16, 17, 18, 25, 28, 32, 33, 35], "speech": [2, 7, 8, 9, 11, 20, 21, 35, 42], "pull": [2, 12, 13, 14, 15, 19, 21, 24, 37], "380": [2, 12, 33], "show": [2, 7, 9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "add": [2, 12, 13, 14, 19, 21, 22, 40, 42], "new": [2, 3, 7, 9, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 35, 39, 40, 41], "suppos": [2, 40, 41], "would": [2, 9, 10, 12, 13, 14, 24, 28, 40, 41], "name": [2, 4, 10, 12, 13, 14, 15, 19, 21, 27, 29, 30, 40, 41], "foo": [2, 17, 19, 24, 27, 29, 30, 39, 40, 41], "eg": [2, 4, 6, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "mkdir": [2, 12, 13, 19, 21, 22, 24, 28, 32, 33, 35], "p": [2, 9, 12, 13, 21, 32, 33], "asr": [2, 4, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "touch": 2, "sh": [2, 9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "chmod": 2, "x": [2, 14, 37], "simpl": [2, 21], "own": [2, 25, 27, 40, 41], "otherwis": [2, 12, 13, 14, 19, 21, 24, 25, 27, 29, 30, 39, 40, 41], "librispeech": [2, 4, 6, 8, 10, 12, 13, 14, 15, 16, 17, 23, 24, 25, 27, 28, 29, 30, 36, 37, 39, 40, 41, 42], "assum": [2, 9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 39, 40, 41], "fanci": 2, "call": [2, 4, 15, 25], "bar": [2, 17, 19, 24, 27, 29, 30, 39, 40, 41], "organ": 2, "wai": [2, 3, 18, 27, 29, 30, 37, 39, 40, 41], "readm": [2, 19, 21, 22, 24, 28, 32, 33, 35], "md": [2, 6, 10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "asr_datamodul": [2, 4, 9], "pretrain": [2, 10, 12, 13, 14, 15, 17, 19, 21, 22, 24, 28, 32, 33, 35], "For": [2, 4, 6, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "instanc": [2, 4, 6, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "tdnn": [2, 4, 9, 20, 23, 26, 31, 34], "its": [2, 10, 11, 12, 13, 14, 17, 21, 29], "directori": [2, 8, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "structur": [2, 14], "descript": [2, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "contain": [2, 8, 10, 11, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41, 42], "inform": [2, 10, 19, 21, 22, 24, 27, 28, 29, 32, 33, 35, 37, 39, 40, 41], "g": [2, 9, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "wer": [2, 9, 10, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "etc": [2, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "provid": [2, 7, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41, 42], "pytorch": [2, 4, 8, 12, 13, 14, 21], "dataload": [2, 9], "take": [2, 10, 25, 27, 35, 40, 41], "input": [2, 10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35, 37], "checkpoint": [2, 9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "save": [2, 9, 10, 13, 14, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "dure": [2, 4, 7, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "stage": [2, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "": [2, 9, 10, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "definit": [2, 12, 13], "neural": [2, 19, 24], "network": [2, 19, 21, 24, 27, 29, 30, 39, 40, 41], "script": [2, 8, 9, 17, 18, 19, 21, 22, 24, 25, 28, 32, 33, 35, 39], "infer": [2, 10, 12, 13], "tdnn_lstm_ctc": [2, 22, 28, 33], "conformer_ctc": [2, 19, 24], "get": [2, 7, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 35, 37, 39, 40, 41], "feel": [2, 25, 39], "result": [2, 6, 7, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "everi": [2, 10, 27, 29, 30, 39, 40, 41], "kept": [2, 27, 40, 41], "self": [2, 11, 14, 37], "toler": 2, "duplic": 2, "among": [2, 9], "differ": [2, 9, 12, 13, 14, 15, 19, 20, 24, 25, 27, 37, 39, 40, 41], "invoc": [2, 12, 13], "help": [2, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "blob": [2, 6, 10, 17, 27, 29, 30, 39, 40, 41], "master": [2, 6, 9, 10, 13, 14, 16, 17, 21, 25, 27, 29, 30, 39, 40, 41], "transform": [2, 19, 24, 39], "conform": [2, 16, 20, 21, 23, 26, 27, 29, 39, 40, 41], "base": [2, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "lstm": [2, 11, 17, 18, 20, 23, 26, 31, 36, 38], "attent": [2, 14, 21, 22, 25, 37, 40, 41], "lm": [2, 9, 21, 27, 28, 32, 33, 35, 40, 41], "rescor": [2, 22, 28, 30, 32, 33, 35], "demonstr": [2, 5, 7, 10, 15], "consid": [2, 14], "colab": 2, "notebook": 2, "welcom": 3, "There": [3, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "mani": [3, 40, 41], "two": [3, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "them": [3, 5, 6, 7, 9, 12, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "To": [3, 7, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "document": [3, 8, 10, 11, 12, 13, 14, 15, 30], "repositori": [3, 12, 13, 14, 15], "recip": [3, 6, 8, 9, 10, 15, 19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 37, 39, 40, 41], "In": [3, 4, 7, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 25, 28, 32, 33, 35, 37], "page": [3, 7, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "describ": [3, 5, 10, 12, 13, 15, 16, 17, 18, 19, 21, 22, 24, 27, 28, 32, 33, 40, 41], "how": [3, 5, 7, 8, 9, 12, 13, 14, 15, 18, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "creat": [3, 8, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40], "data": [3, 10, 12, 13, 14, 15, 16, 17, 20], "train": [3, 4, 5, 7, 8, 10, 11, 16, 17, 18, 37], "decod": [3, 4, 7, 12, 13, 14, 17, 18], "model": [3, 5, 7, 8, 9, 11, 25, 37], "section": [4, 5, 9, 10, 15, 16, 17, 18, 19, 24], "collect": [4, 9], "user": [4, 9], "post": 4, "correspond": [4, 6, 7], "solut": 4, "One": 4, "torch": [4, 8, 9, 10, 11, 18, 19, 21, 24], "torchaudio": [4, 8, 37], "cu111": 4, "torchvis": 4, "11": [4, 9, 12, 13, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "f": [4, 9, 32, 33], "download": [4, 7, 8, 11, 18, 20, 25], "org": [4, 9, 20, 21, 27, 39, 40, 41], "whl": [4, 9], "torch_stabl": [4, 9], "throw": [4, 12, 13, 14], "error": [4, 9, 12, 13, 14, 24], "when": [4, 7, 12, 13, 14, 18, 21, 24, 25, 27, 29, 30, 40, 41], "specifi": [4, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "cuda": [4, 8, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "while": [4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "That": [4, 12, 13, 25, 27, 39, 40, 41], "cu11": 4, "therefor": 4, "correct": 4, "log": [4, 9, 12, 13, 14, 28, 32, 33, 35], "traceback": 4, "most": [4, 40, 41], "recent": [4, 12, 13, 14], "last": 4, "line": [4, 9, 12, 13, 14, 27, 40, 41], "14": [4, 9, 10, 12, 13, 16, 19, 24, 27, 28, 29, 32, 39, 40, 41], "from": [4, 5, 7, 9, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "yesnoasrdatamodul": 4, "home": [4, 12, 13, 19, 24], "xxx": [4, 10, 12, 13, 14], "next": [4, 7, 9, 12, 13, 14, 24, 25, 27, 28, 29, 30, 39, 40, 41], "gen": [4, 7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "kaldi": [4, 7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "34": [4, 9, 12, 13], "datamodul": 4, "__init__": [4, 9, 10, 12, 13, 14, 19, 21, 24], "23": [4, 9, 12, 13, 14, 19, 21, 22, 24, 32, 33, 35], "util": [4, 9, 24], "add_eo": 4, "add_so": 4, "get_text": 4, "39": [4, 9, 12, 14, 21, 24, 28, 32], "tensorboard": [4, 9, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "summarywrit": 4, "miniconda3": 4, "env": 4, "yyi": 4, "lib": [4, 9, 14], "8": [4, 9, 10, 12, 13, 14, 19, 21, 24, 25, 27, 28, 29, 30, 35, 39, 40, 41], "site": [4, 9, 14], "packag": [4, 9, 14], "loosevers": 4, "uninstal": 4, "setuptool": [4, 9], "58": [4, 9, 24], "conda": [4, 9], "encount": [4, 9, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "dev": [4, 9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "yangyifan": 4, "anaconda3": 4, "dev20230112": 4, "cuda11": [4, 9], "6": [4, 9, 11, 18, 19, 21, 24, 27, 28, 32, 33, 39], "torch1": [4, 9], "13": [4, 9, 10, 12, 13, 14, 21, 22, 24, 28, 29, 32], "py3": [4, 9], "linux": [4, 7, 9, 11, 12, 13, 14, 15], "x86_64": [4, 9, 12], "egg": [4, 9], "24": [4, 9, 12, 13, 22, 28, 32, 33, 35], "_k2": [4, 9], "determinizeweightpushingtyp": 4, "handl": [4, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "except": [4, 10], "anoth": 4, "occur": 4, "pruned_transducer_stateless7_ctc_b": [4, 29], "104": 4, "30": [4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "rais": 4, "note": [4, 10, 12, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "re": [4, 19, 22, 24, 25, 27, 29, 30, 37, 39, 40, 41], "anaconda": 4, "maco": [4, 7, 11, 12, 13, 14, 15], "probabl": [4, 9, 21, 27, 29, 39, 40, 41], "variabl": [4, 9, 12, 13, 14, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "export": [4, 8, 9, 19, 21, 22, 24, 25, 28, 32, 33, 35], "dyld_library_path": 4, "conda_prefix": 4, "first": [4, 9, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "try": [4, 5, 7, 25, 27, 29, 30, 39, 40, 41], "find": [4, 5, 6, 7, 9, 10, 12, 13, 14, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "where": [4, 40], "locat": [4, 12], "libpython": 4, "abl": 4, "insid": [4, 17], "codna_prefix": 4, "ld_library_path": 4, "also": [5, 6, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40, 41], "within": [5, 7, 12, 13], "anyth": [5, 7], "space": [5, 8], "youtub": [5, 8, 24, 25, 27, 28, 29, 30, 39, 40, 41], "video": [5, 8, 24, 25, 27, 28, 29, 30, 39, 40, 41], "upload": [6, 7, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "huggingfac": [6, 8, 10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 29, 30, 32, 33, 35, 39], "co": [6, 7, 10, 12, 13, 14, 15, 19, 20, 21, 22, 24, 28, 29, 30, 32, 33, 35, 39], "visit": [6, 7, 27, 29, 30, 39, 40, 41], "link": [6, 9, 10, 11, 27, 29, 30, 39, 40, 41], "search": [6, 7], "specif": [6, 15, 21], "aishel": [6, 8, 19, 21, 22, 23, 42], "gigaspeech": [6, 16, 39], "wenetspeech": [6, 16], "integr": 7, "framework": [7, 27, 40], "sherpa": [7, 11, 16, 17, 18, 39], "need": [7, 9, 10, 11, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "window": [7, 11, 12, 13, 14, 15], "even": [7, 9, 13], "ipad": 7, "phone": 7, "start": [7, 9, 10, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "address": [7, 9, 10, 12, 13, 14, 21, 27, 30, 39, 40, 41], "recognit": [7, 8, 11, 12, 13, 20, 21, 35, 42], "screenshot": [7, 19, 21, 22, 24, 25, 27, 35, 39, 40], "select": [7, 12, 13, 14, 27, 28, 32, 33, 35, 39, 40, 41], "languag": [7, 19, 21, 22], "current": [7, 12, 13, 21, 25, 37, 39, 40, 41, 42], "chines": [7, 20, 21], "english": [7, 35, 39], "target": 7, "method": [7, 9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 39, 40, 41], "greedi": 7, "modified_beam_search": [7, 21, 25, 27, 29, 39, 40, 41], "choos": [7, 9, 25, 27, 29, 30, 39, 40, 41], "number": [7, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "activ": 7, "path": [7, 9, 10, 12, 13, 14, 17, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "either": [7, 19, 21, 22, 24, 40, 41], "record": [7, 13, 14, 19, 20, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "click": [7, 9, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "button": 7, "submit": 7, "wait": 7, "moment": 7, "an": [7, 9, 10, 12, 13, 14, 15, 16, 17, 19, 20, 21, 24, 25, 27, 30, 35, 39, 40, 41], "bottom": [7, 27, 29, 30, 39, 40, 41], "part": [7, 9, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "tabl": [7, 12, 13, 14], "one": [7, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "subscrib": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "channel": [7, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "nadira": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "povei": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "www": [7, 9, 20, 24, 25, 27, 28, 29, 30, 39, 40, 41], "uc_vaumpkminz1pnkfxan9mw": [7, 9, 24, 25, 27, 28, 29, 30, 39, 40, 41], "toolkit": 8, "cudnn": 8, "2": [8, 10, 11, 18, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "frequent": 8, "ask": 8, "question": 8, "faq": 8, "oserror": 8, "libtorch_hip": 8, "cannot": [8, 12, 13, 14], "share": [8, 9], "object": [8, 9, 19, 21, 22, 27, 35, 39, 40], "attributeerror": 8, "modul": [8, 9, 12, 14, 29, 40], "distutil": 8, "attribut": [8, 14, 24], "importerror": 8, "libpython3": 8, "No": [8, 12, 13, 14, 35], "state_dict": [8, 18, 19, 21, 22, 24, 28, 32, 33, 35], "jit": [8, 11, 18, 24], "trace": [8, 11, 16, 18], "onnx": [8, 10, 18], "ncnn": [8, 18], "non": [8, 24, 37, 40, 42], "stream": [8, 11, 12, 13, 15, 18, 19, 24, 32, 33, 39, 42], "timit": [8, 23, 32, 33, 42], "introduct": [8, 36, 42], "contribut": 8, "depend": [9, 19, 24], "step": [9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "99": [9, 12, 13, 14, 15], "who": 9, "about": [9, 12, 13, 14, 21, 25, 27, 30, 39, 40, 41], "suggest": [9, 27, 29, 30, 39, 40, 41], "virut": 9, "venv": 9, "my_env": 9, "sourc": [9, 10, 12, 13, 14, 19, 20, 21, 24], "bin": [9, 12, 13, 14, 19, 24], "order": [9, 12, 13, 14, 19, 22, 24, 28, 32, 33], "matter": [9, 12], "compil": [9, 12, 13, 19, 21, 24], "wheel": [9, 12], "same": [9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "don": [9, 12, 13, 14, 16, 19, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "t": [9, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "from_sourc": 9, "for_develop": 9, "alwai": [9, 10], "strongli": 9, "pythonpath": [9, 12, 13, 14], "point": [9, 10, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "folder": [9, 10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "tmp": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "setup": [9, 12, 19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 40, 41], "put": [9, 12, 13, 29, 40], "sever": [9, 10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "switch": [9, 19, 24, 30], "just": [9, 12, 13, 14, 37], "virtualenv": 9, "cpython3": 9, "final": [9, 10, 12, 13, 24, 28], "64": [9, 10, 12, 21, 40], "1540m": 9, "creator": 9, "cpython3posix": 9, "dest": 9, "ceph": [9, 10, 19, 21, 24], "fj": [9, 10, 12, 13, 14, 21, 24], "fangjun": [9, 10, 12, 13, 14, 21, 24], "clear": 9, "fals": [9, 10, 12, 13, 14, 19, 21, 24, 25], "no_vcs_ignor": 9, "global": 9, "seeder": 9, "fromappdata": 9, "bundl": 9, "via": [9, 11, 16, 17, 18], "copi": [9, 37], "app_data_dir": 9, "root": [9, 12, 13, 14], "v": [9, 12, 13, 14, 24, 32, 33], "irtualenv": 9, "ad": [9, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 37, 39, 40, 41], "seed": 9, "21": [9, 10, 12, 19, 21, 24, 32, 33], "57": [9, 13, 24, 28], "36": [9, 12, 21, 24, 25], "bashactiv": 9, "cshellactiv": 9, "fishactiv": 9, "powershellactiv": 9, "pythonactiv": 9, "xonshactiv": 9, "dev20210822": 9, "cpu": [9, 10, 12, 13, 14, 16, 19, 27, 29, 30, 35, 40, 41], "9": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 35, 39, 40, 41], "nightli": 9, "2bcpu": 9, "cp38": 9, "linux_x86_64": 9, "mb": [9, 12, 13, 14], "________________________________": 9, "185": [9, 19, 24, 35], "kb": [9, 12, 13, 14, 32, 33], "graphviz": 9, "17": [9, 10, 12, 13, 14, 19, 24, 32, 33, 39], "none": [9, 19, 24], "18": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 32, 33, 39, 40, 41], "cach": [9, 14], "manylinux1_x86_64": 9, "831": [9, 21, 33], "type": [9, 10, 12, 13, 14, 19, 21, 24, 27, 29, 30, 35, 37, 39, 40, 41], "extens": 9, "typing_extens": 9, "26": [9, 12, 13, 14, 21, 24, 33], "successfulli": [9, 12, 13, 14], "req": 9, "7b1b76ge": 9, "q": 9, "audioread": 9, "soundfil": 9, "post1": 9, "py2": 9, "7": [9, 10, 11, 14, 18, 19, 22, 24, 27, 28, 32, 33, 39, 40], "97": [9, 12, 19], "cytoolz": 9, "manylinux_2_17_x86_64": 9, "manylinux2014_x86_64": 9, "dataclass": 9, "h5py": 9, "manylinux_2_12_x86_64": 9, "manylinux2010_x86_64": 9, "684": [9, 19, 35], "intervaltre": 9, "lilcom": 9, "numpi": 9, "15": [9, 10, 12, 13, 14, 21, 22, 24, 32, 35], "40": [9, 12, 13, 14, 22, 24, 28, 32, 33], "pyyaml": 9, "662": 9, "tqdm": 9, "62": [9, 24, 28], "76": [9, 35], "73": 9, "alreadi": [9, 10], "satisfi": 9, "2a1410b": 9, "clean": [9, 14, 19, 21, 24, 25, 27, 28, 29, 30, 39, 40, 41], "toolz": 9, "55": [9, 12, 22, 24, 32], "sortedcontain": 9, "29": [9, 14, 15, 19, 21, 22, 24, 28, 29, 32, 33], "cffi": 9, "411": [9, 14, 24], "pycpars": 9, "20": [9, 10, 12, 14, 19, 21, 22, 24, 27, 28, 32, 33, 35, 40], "112": [9, 12, 13, 14], "pypars": 9, "67": 9, "done": [9, 10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "filenam": [9, 12, 13, 14, 15, 16, 17, 29, 30, 39, 41], "dev_2a1410b_clean": 9, "size": [9, 10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "342242": 9, "sha256": 9, "f683444afa4dc0881133206b4646a": 9, "9d0f774224cc84000f55d0a67f6e4a37997": 9, "store": [9, 24], "ephem": 9, "ftu0qysz": 9, "7f": 9, "7a": 9, "8e": 9, "a0bf241336e2e3cb573e1e21e5600952d49f5162454f2e612f": 9, "warn": 9, "built": 9, "invalid": [9, 24], "metadata": [9, 32, 33], "mandat": 9, "pep": 9, "440": 9, "packa": 9, "ging": 9, "deprec": [9, 21], "legaci": 9, "becaus": 9, "could": [9, 12, 13, 14, 19, 22], "A": [9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 39, 40, 41], "replac": [9, 12, 13], "discuss": 9, "regard": 9, "pypa": 9, "sue": 9, "8368": 9, "inter": 9, "valtre": 9, "sor": 9, "tedcontain": 9, "remot": 9, "enumer": 9, "500": [9, 10, 12, 13, 14, 21, 24, 30, 39], "count": 9, "100": [9, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "compress": 9, "308": [9, 19, 21, 22], "total": [9, 13, 14, 19, 21, 22, 24, 25, 27, 28, 35, 39, 40], "delta": 9, "263": [9, 13], "reus": 9, "307": 9, "102": [9, 14, 19], "pack": [9, 40, 41], "receiv": 9, "172": 9, "49": [9, 12, 13, 24, 33, 35], "kib": 9, "385": 9, "00": [9, 12, 19, 21, 22, 24, 28, 32, 33, 35], "resolv": 9, "kaldilm": 9, "tar": 9, "gz": 9, "48": [9, 12, 13, 19, 21], "574": 9, "kaldialign": 9, "sentencepiec": [9, 24], "96": 9, "41": [9, 12, 14, 19, 21, 32, 35], "absl": 9, "absl_pi": 9, "132": 9, "googl": [9, 27, 29, 30, 39, 40, 41], "auth": 9, "oauthlib": 9, "google_auth_oauthlib": 9, "grpcio": 9, "ment": 9, "12": [9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 27, 29, 30, 32, 35, 39, 40, 41], "requi": 9, "rement": 9, "protobuf": 9, "manylinux_2_5_x86_64": 9, "werkzeug": 9, "288": 9, "tensorboard_data_serv": 9, "google_auth": 9, "35": [9, 10, 12, 13, 14, 21, 24, 39], "152": 9, "request": [9, 37], "plugin": 9, "wit": 9, "tensorboard_plugin_wit": 9, "781": 9, "markdown": 9, "six": 9, "16": [9, 10, 12, 13, 14, 17, 19, 21, 22, 24, 27, 28, 32, 33, 35, 39, 40, 41], "cachetool": 9, "rsa": 9, "pyasn1": 9, "pyasn1_modul": 9, "155": 9, "requests_oauthlib": 9, "77": [9, 24], "urllib3": 9, "27": [9, 12, 13, 14, 19, 21, 28, 33], "138": [9, 19, 21], "certifi": 9, "2017": 9, "2021": [9, 19, 22, 24, 28, 32, 33, 35], "145": 9, "charset": 9, "normal": [9, 28, 32, 33, 35, 40], "charset_norm": 9, "idna": 9, "59": [9, 12, 22, 24], "146": 9, "897233": 9, "eccb906cafcd45bf9a7e1a1718e4534254bfb": 9, "f4c0d0cbc66eee6c88d68a63862": 9, "85": 9, "7d": 9, "63": [9, 21], "f2dd586369b8797cb36d213bf3a84a789eeb92db93d2e723c9": 9, "etool": 9, "oaut": 9, "hlib": 9, "let": [9, 12, 13, 14, 19, 24], "u": [9, 12, 13, 14, 19, 21, 22, 24, 25, 35], "2023": [9, 12, 13, 14, 29], "05": [9, 10, 12, 13, 19, 21, 22, 24, 33], "main": [9, 19, 24, 37], "dl_dir": [9, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "waves_yesno": 9, "_______________________________________________________________": 9, "70m": 9, "06": [9, 10, 12, 22, 24, 28, 35], "54": [9, 13, 14, 24, 28, 32, 33], "4kb": 9, "02": [9, 10, 12, 13, 14, 21, 24, 27, 33, 39, 40], "19": [9, 10, 12, 13, 14, 19, 24, 28, 32, 33], "manifest": [9, 25], "45": [9, 12, 14, 19, 21, 24], "comput": [9, 10, 12, 13, 14, 19, 21, 22, 25, 27, 28, 30, 32, 33, 35, 39, 40, 41], "fbank": [9, 10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35], "199": [9, 24, 28], "info": [9, 10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35], "compute_fbank_yesno": 9, "65": [9, 12], "process": [9, 10, 12, 13, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "extract": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "featur": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "90": [9, 12], "212": 9, "60it": 9, "640": [9, 14], "304": [9, 13], "53it": 9, "51": [9, 12, 19, 24, 35], "lang": [9, 10, 21, 24, 30], "66": [9, 13], "project": 9, "csrc": [9, 24], "arpa_file_pars": 9, "cc": 9, "void": 9, "arpafilepars": 9, "std": 9, "istream": 9, "79": 9, "140": [9, 22], "gram": [9, 19, 21, 22, 27, 28, 30, 32, 33, 40, 41], "92": [9, 24], "hlg": [9, 28, 32, 33, 35], "28": [9, 12, 13, 21, 24, 28], "581": [9, 12, 28], "compile_hlg": 9, "124": [9, 19, 24], "lang_phon": [9, 22, 28, 32, 33, 35], "582": 9, "lexicon": [9, 19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "171": [9, 22, 24, 32, 33], "convert": [9, 12, 13, 14, 24], "l": [9, 12, 13, 14, 21, 32, 33, 35], "pt": [9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "linv": [9, 21, 24, 35], "609": 9, "ctc_topo": 9, "max_token_id": 9, "610": 9, "52": [9, 19, 24], "load": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "fst": [9, 21, 35], "611": 9, "intersect": [9, 27, 40, 41], "613": 9, "lg": [9, 27, 30, 40, 41], "shape": [9, 14], "connect": [9, 10, 24, 27, 28, 39, 40, 41], "614": 9, "68": [9, 24], "70": 9, "class": [9, 24], "tensor": [9, 13, 14, 19, 21, 22, 24, 27, 35, 39, 40], "71": [9, 24, 28], "determin": 9, "615": 9, "74": [9, 10], "rag": 9, "raggedtensor": 9, "remov": [9, 19, 21, 22, 24, 28, 32, 33], "disambigu": 9, "symbol": [9, 21, 27, 40, 41], "616": 9, "91": 9, "remove_epsilon": 9, "617": 9, "arc": 9, "compos": 9, "h": 9, "619": 9, "106": [9, 13, 24], "109": [9, 19, 24], "111": [9, 24], "127": [9, 12, 13, 35], "now": [9, 12, 13, 14, 19, 24, 25, 27, 28, 29, 30, 32, 33, 39, 40, 41], "cuda_visible_devic": [9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "gpu": [9, 12, 13, 19, 21, 22, 24, 25, 27, 29, 30, 32, 33, 35, 39, 40, 41], "avail": [9, 10, 12, 13, 14, 19, 21, 24, 28, 32, 33, 35, 39], "case": [9, 10, 12, 13, 14, 27, 29, 30, 39, 40, 41], "segment": 9, "fault": 9, "core": 9, "dump": 9, "protocol_buffers_python_implement": 9, "more": [9, 12, 13, 14, 19, 24, 25, 35, 37, 39, 40], "674": 9, "interest": [9, 25, 27, 29, 30, 39, 40, 41], "given": [9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 40, 41], "below": [9, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40], "04": [9, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33], "759": [9, 21], "481": 9, "482": 9, "exp_dir": [9, 12, 13, 14, 21, 24, 25, 27, 29, 30, 40, 41], "posixpath": [9, 12, 13, 14, 21, 24], "exp": [9, 10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "lang_dir": [9, 21, 24], "lr": [9, 21, 39], "01": [9, 12, 21, 22, 24, 25, 29], "feature_dim": [9, 10, 12, 13, 14, 19, 21, 24, 35], "weight_decai": 9, "1e": 9, "start_epoch": 9, "best_train_loss": [9, 10, 12, 13, 14], "inf": [9, 10, 12, 13, 14], "best_valid_loss": [9, 10, 12, 13, 14], "best_train_epoch": [9, 10, 12, 13, 14], "best_valid_epoch": [9, 10, 13, 14], "batch_idx_train": [9, 10, 12, 13, 14], "log_interv": [9, 10, 12, 13, 14], "reset_interv": [9, 10, 12, 13, 14], "valid_interv": [9, 10, 12, 13, 14], "beam_siz": [9, 10, 21], "reduct": [9, 12, 13, 29], "sum": 9, "use_double_scor": [9, 19, 24, 35], "true": [9, 10, 12, 13, 14, 19, 21, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "world_siz": [9, 25], "master_port": 9, "12354": 9, "num_epoch": 9, "42": [9, 13, 19, 24, 35], "feature_dir": [9, 24], "max_dur": [9, 24], "bucketing_sampl": [9, 24], "num_bucket": [9, 24], "concatenate_cut": [9, 24], "duration_factor": [9, 24], "gap": [9, 24], "on_the_fly_feat": [9, 24], "shuffl": [9, 24], "return_cut": [9, 24], "num_work": [9, 24], "env_info": [9, 10, 12, 13, 14, 19, 21, 24], "releas": [9, 10, 12, 13, 14, 19, 21, 24], "sha1": [9, 10, 12, 13, 14, 19, 21, 24], "3b7f09fa35e72589914f67089c0da9f196a92ca4": 9, "date": [9, 10, 12, 13, 14, 19, 21, 24], "mon": [9, 13, 14], "mai": [9, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41, 42], "6fcfced": 9, "cu118": 9, "branch": [9, 10, 12, 13, 14, 19, 21, 24, 29], "30bde4b": 9, "thu": [9, 10, 12, 13, 14, 21, 24, 28], "37": [9, 13, 19, 21, 24, 32], "47": [9, 12, 13, 14, 19, 24], "dev20230512": 9, "torch2": 9, "hostnam": [9, 10, 12, 13, 14, 21], "host": [9, 10], "ip": [9, 10, 12, 13, 14, 21], "761": 9, "168": [9, 28], "764": 9, "495": 9, "devic": [9, 10, 12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 40, 41], "791": [9, 28], "cut": [9, 24], "244": 9, "852": 9, "149": [9, 12, 24], "singlecutsampl": 9, "205": [9, 24], "853": 9, "218": [9, 13], "252": 9, "986": 9, "422": 9, "epoch": [9, 10, 12, 13, 14, 15, 16, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "batch": [9, 12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "loss": [9, 12, 13, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "065": 9, "over": [9, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "2436": 9, "frame": [9, 21, 27, 29, 40, 41], "tot_loss": 9, "352": [9, 24], "4561": 9, "2828": 9, "7076": 9, "22192": 9, "691": 9, "444": 9, "9002": 9, "18067": 9, "996": 9, "2555": 9, "2695": 9, "484": 9, "34971": 9, "217": [9, 19, 24], "4688": 9, "251": [9, 32, 33], "75": [9, 12], "389": [9, 22, 24], "2532": 9, "637": 9, "1139": 9, "1592": 9, "859": 9, "1629": 9, "094": 9, "0767": 9, "118": [9, 24], "350": 9, "06778": 9, "395": 9, "789": 9, "01056": 9, "016": 9, "009022": 9, "009985": 9, "271": [9, 10, 13], "01088": 9, "497": 9, "01174": 9, "01077": 9, "747": 9, "01087": 9, "783": 9, "921": 9, "01045": 9, "008957": 9, "009903": 9, "374": 9, "01092": 9, "598": [9, 24], "01169": 9, "01065": 9, "824": 9, "862": [9, 13], "865": [9, 13], "555": 9, "08": [9, 14, 24, 28, 30, 32, 33, 35, 39], "483": 9, "264": [9, 14], "lm_dir": [9, 24], "search_beam": [9, 19, 24, 35], "output_beam": [9, 19, 24, 35], "min_active_st": [9, 19, 24, 35], "max_active_st": [9, 19, 24, 35], "10000": [9, 19, 24, 35], "avg": [9, 10, 12, 13, 14, 15, 16, 17, 21, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "487": 9, "273": [9, 10, 21], "513": 9, "291": 9, "averag": [9, 10, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "521": 9, "675": 9, "204": [9, 14, 24], "until": [9, 24, 29], "923": 9, "241": [9, 19], "transcript": [9, 19, 20, 21, 22, 24, 27, 28, 32, 33, 39, 40, 41], "recog": [9, 21, 24], "test_set": [9, 35], "924": 9, "558": 9, "240": [9, 19, 35], "ins": [9, 24, 35], "del": [9, 24, 35], "sub": [9, 24, 35], "925": 9, "249": [9, 13], "wrote": [9, 24], "detail": [9, 11, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "stat": [9, 24], "err": [9, 21, 24], "316": [9, 24], "congratul": [9, 12, 13, 14, 19, 22, 24, 28, 32, 33, 35], "fun": [9, 12, 13], "debug": 9, "variou": [9, 15, 18, 42], "problem": [9, 25], "period": [10, 12], "disk": 10, "optim": [10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "other": [10, 13, 14, 15, 21, 24, 25, 27, 28, 32, 33, 35, 37, 40, 41, 42], "relat": [10, 19, 21, 24, 28, 32, 33, 35], "resum": [10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "howev": [10, 13, 25], "onli": [10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41, 42], "strip": 10, "reduc": [10, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "each": [10, 12, 13, 15, 19, 21, 22, 24, 27, 29, 30, 37, 39, 40, 41], "well": [10, 35, 42], "usag": [10, 12, 13, 14, 16, 17, 28, 32, 33, 35], "pruned_transducer_stateless3": [10, 16, 37], "almost": [10, 27, 37, 40, 41], "dir": [10, 12, 13, 14, 15, 16, 17, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "bpe": [10, 12, 13, 14, 15, 16, 17, 24, 27, 29, 30, 39, 40, 41], "lang_bpe_500": [10, 12, 13, 14, 15, 16, 17, 24, 27, 29, 30, 39, 40, 41], "dict": [10, 14], "csukuangfj": [10, 12, 13, 15, 19, 21, 22, 24, 28, 32, 33, 35, 39], "prune": [10, 14, 15, 21, 23, 25, 26, 36, 37, 38, 39, 41], "transduc": [10, 11, 15, 18, 20, 23, 25, 26, 36, 37, 38], "stateless3": [10, 12], "2022": [10, 12, 13, 14, 15, 21, 27, 29, 30, 39, 40], "lf": [10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 30, 32, 33, 35], "repo": [10, 15], "prefix": 10, "those": 10, "wave": [10, 12, 13, 14, 19, 24], "iter": [10, 12, 13, 14, 17, 27, 29, 30, 39, 40, 41], "1224000": 10, "greedy_search": [10, 21, 27, 29, 39, 40, 41], "test_wav": [10, 12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "1089": [10, 12, 13, 14, 15, 24, 28], "134686": [10, 12, 13, 14, 15, 24, 28], "0001": [10, 12, 13, 14, 15, 24, 28], "wav": [10, 12, 13, 14, 15, 17, 19, 21, 22, 24, 27, 29, 30, 32, 33, 35, 39, 40, 41], "1221": [10, 12, 13, 24, 28], "135766": [10, 12, 13, 24, 28], "0002": [10, 12, 13, 24, 28], "multipl": [10, 19, 21, 22, 24, 28, 32, 33, 35], "sound": [10, 12, 13, 14, 17, 18, 19, 21, 22, 24, 28, 32, 33, 35], "Its": [10, 12, 13, 14, 24], "output": [10, 12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "09": [10, 13, 19, 21, 22, 24, 39], "233": [10, 12, 13], "265": 10, "50": [10, 12, 13, 14, 24, 27, 32, 39, 40, 41], "200": [10, 12, 13, 14, 19, 24, 25, 32, 33, 35], "3000": [10, 12, 13, 14], "80": [10, 12, 13, 14, 19, 21, 24], "subsampling_factor": [10, 13, 14, 19, 21, 24], "encoder_dim": [10, 12, 13, 14], "512": [10, 12, 13, 14, 19, 21, 24], "nhead": [10, 12, 14, 19, 21, 24, 27, 40], "dim_feedforward": [10, 12, 13, 21], "2048": [10, 12, 13, 14, 21], "num_encoder_lay": [10, 12, 13, 14, 21], "decoder_dim": [10, 12, 13, 14], "joiner_dim": [10, 12, 13, 14], "model_warm_step": [10, 12, 13], "4810e00d8738f1a21278b0156a42ff396a2d40ac": 10, "fri": 10, "oct": [10, 24], "03": [10, 13, 21, 24, 32, 33, 39], "miss": [10, 12, 13, 14, 21, 24], "cu102": [10, 12, 13, 14], "1013": 10, "c39cba5": 10, "dirti": [10, 12, 13, 19, 24], "jsonl": 10, "de": [10, 12, 13, 14, 21], "74279": [10, 12, 13, 14, 21], "0324160024": 10, "65bfd8b584": 10, "jjlbn": 10, "177": [10, 13, 14, 21, 22, 24], "203": [10, 24], "bpe_model": [10, 12, 13, 14, 24], "sound_fil": [10, 19, 21, 24, 35], "sample_r": [10, 19, 21, 24, 35], "16000": [10, 19, 21, 22, 24, 28, 29, 32, 33], "beam": [10, 39], "max_context": 10, "max_stat": 10, "context_s": [10, 12, 13, 14, 21], "max_sym_per_fram": [10, 21], "simulate_stream": 10, "decode_chunk_s": 10, "left_context": 10, "dynamic_chunk_train": 10, "causal_convolut": 10, "short_chunk_s": [10, 14, 40, 41], "25": [10, 12, 13, 19, 24, 27, 32, 33, 35, 40], "num_left_chunk": [10, 14], "blank_id": [10, 12, 13, 14, 21], "unk_id": 10, "vocab_s": [10, 12, 13, 14, 21], "612": 10, "458": 10, "disabl": [10, 12, 13], "giga": [10, 13, 39], "623": 10, "277": 10, "paramet": [10, 12, 13, 14, 16, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "78648040": 10, "951": [10, 24], "285": [10, 21, 24], "construct": [10, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33, 35], "952": 10, "295": [10, 19, 21, 22, 24], "957": 10, "301": [10, 24], "700": 10, "329": [10, 13, 24], "912": 10, "388": 10, "earli": [10, 12, 13, 14, 24, 28], "nightfal": [10, 12, 13, 14, 24, 28], "THE": [10, 12, 13, 14, 24, 28], "yellow": [10, 12, 13, 14, 24, 28], "lamp": [10, 12, 13, 14, 24, 28], "light": [10, 12, 13, 14, 24, 28], "here": [10, 12, 13, 14, 19, 21, 22, 24, 25, 28, 37, 40], "AND": [10, 12, 13, 14, 24, 28], "THERE": [10, 12, 13, 14, 24, 28], "squalid": [10, 12, 13, 14, 24, 28], "quarter": [10, 12, 13, 14, 24, 28], "OF": [10, 12, 13, 14, 24, 28], "brothel": [10, 12, 13, 14, 24, 28], "god": [10, 24, 28], "AS": [10, 24, 28], "direct": [10, 24, 28], "consequ": [10, 24, 28], "sin": [10, 24, 28], "man": [10, 24, 28], "punish": [10, 24, 28], "had": [10, 24, 28], "her": [10, 24, 28], "love": [10, 24, 28], "child": [10, 24, 28], "whose": [10, 21, 24, 28], "ON": [10, 12, 24, 28], "THAT": [10, 24, 28], "dishonor": [10, 24, 28], "bosom": [10, 24, 28], "TO": [10, 24, 28], "parent": [10, 24, 28], "forev": [10, 24, 28], "WITH": [10, 24, 28], "race": [10, 24, 28], "descent": [10, 24, 28], "mortal": [10, 24, 28], "BE": [10, 24, 28], "bless": [10, 24, 28], "soul": [10, 24, 28], "IN": [10, 24, 28], "heaven": [10, 24, 28], "yet": [10, 12, 13, 24, 28], "THESE": [10, 24, 28], "thought": [10, 24, 28], "affect": [10, 24, 28], "hester": [10, 24, 28], "prynn": [10, 24, 28], "less": [10, 24, 28, 35, 40, 41], "hope": [10, 20, 24, 28], "than": [10, 13, 19, 21, 22, 24, 27, 28, 29, 30, 35, 39, 40, 41], "apprehens": [10, 24, 28], "390": 10, "down": [10, 19, 24, 27, 29, 30, 39, 40, 41], "reproduc": [10, 24], "ln": [10, 12, 13, 14, 15, 19, 24, 27, 29, 30, 39, 40, 41], "9999": [10, 29, 30, 39], "symlink": 10, "pass": [10, 14, 19, 21, 22, 24, 27, 29, 30, 37, 39, 40, 41], "max": [10, 12, 13, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "durat": [10, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "600": [10, 24, 27, 29, 39, 40, 41], "reason": [10, 12, 13, 14, 40], "support": [11, 12, 13, 14, 19, 21, 24, 27, 29, 30, 37, 39, 40, 41], "zipform": [11, 15, 18, 23, 26, 36, 38], "convemform": [11, 18, 37], "perform": [11, 21, 25, 40], "platform": [11, 15], "android": [11, 12, 13, 14, 15], "raspberri": [11, 15], "pi": [11, 15], "\u7231\u82af\u6d3e": 11, "maix": 11, "iii": 11, "axera": 11, "rv1126": 11, "static": 11, "produc": [11, 27, 29, 30, 39, 40, 41], "binari": [11, 12, 13, 14, 19, 21, 22, 24, 27, 35, 39, 40], "everyth": 11, "pnnx": [11, 18], "torchscript": [11, 16, 17, 18], "encod": [11, 15, 17, 18, 19, 21, 22, 24, 27, 28, 29, 35, 37, 39, 40, 41], "option": [11, 15, 18, 21, 25, 28, 32, 33, 35], "int8": [11, 18], "quantiz": [11, 18, 25], "zengwei": [12, 14, 15, 30, 39], "conv": [12, 13], "emform": [12, 13, 16], "stateless2": [12, 13, 39], "07": [12, 13, 14, 19, 21, 22, 24], "ubuntu": [12, 13, 14], "work": [12, 13, 14, 24], "cpp": [12, 16], "pretrained_model": [12, 13, 14], "online_transduc": 12, "continu": [12, 13, 14, 15, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "git_lfs_skip_smudg": [12, 13, 14, 15], "includ": [12, 13, 14, 15, 27, 29, 30, 39, 40, 41], "jit_xxx": [12, 13, 14], "anywher": [12, 13], "submodul": 12, "updat": [12, 13, 14], "recurs": 12, "init": 12, "cmake": [12, 13, 19, 24], "dcmake_build_typ": [12, 19, 24], "dncnn_python": 12, "dncnn_build_benchmark": 12, "off": 12, "dncnn_build_exampl": 12, "dncnn_build_tool": 12, "j4": 12, "pwd": 12, "src": [12, 14], "compon": [12, 37], "execut": [12, 19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "ncnn2int8": [12, 13], "our": [12, 13, 14, 16, 17, 24, 25, 27, 37, 40, 41], "cpython": 12, "38": [12, 19, 21, 24, 32], "gnu": 12, "am": 12, "sai": [12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "But": [12, 27, 29, 30, 39, 40, 41], "doe": [12, 13, 14, 19, 21, 24, 35], "As": [12, 21, 24, 25], "long": 12, "later": [12, 13, 14, 19, 22, 24, 27, 28, 29, 30, 32, 33, 39, 40, 41], "termin": 12, "tencent": [12, 13], "made": 12, "modif": [12, 21], "offic": 12, "synchron": 12, "offici": 12, "renam": [12, 13, 14], "conv_emformer_transducer_stateless2": [12, 37], "num": [12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "layer": [12, 13, 14, 21, 25, 27, 37, 39, 40, 41], "chunk": [12, 14, 15, 40, 41], "length": [12, 14, 21, 40, 41], "32": [12, 13, 14, 15, 19, 21, 22, 41], "cnn": [12, 14], "kernel": [12, 14, 21], "31": [12, 13, 14, 24], "left": [12, 14, 21, 40, 41], "context": [12, 21, 27, 37, 39, 40, 41], "right": [12, 21, 37, 40], "memori": [12, 19, 21, 24, 37], "dim": [12, 13, 14, 21, 27, 40], "configur": [12, 14, 21, 25, 28, 32, 33, 35], "accordingli": [12, 13, 14], "yourself": [12, 13, 14, 25, 40, 41], "tune": [12, 13, 14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "best": [12, 13, 14, 19, 22, 24], "combin": [12, 13, 14], "677": 12, "220": [12, 21, 22, 24], "681": 12, "229": [12, 19], "best_v": 12, "alid_epoch": 12, "subsampl": [12, 40, 41], "ing_factor": 12, "a34171ed85605b0926eebbd0463d059431f4f74a": 12, "wed": [12, 19, 21, 24], "dec": 12, "ver": 12, "ion": 12, "530e8a1": 12, "tue": [12, 24], "star": [12, 13, 14], "op": 12, "1220120619": [12, 13, 14], "7695ff496b": [12, 13, 14], "s9n4w": [12, 13, 14], "icefa": 12, "ll": 12, "transdu": 12, "cer": 12, "use_averaged_model": [12, 13, 14], "cnn_module_kernel": [12, 14], "left_context_length": 12, "chunk_length": 12, "right_context_length": 12, "memory_s": 12, "231": [12, 13, 14], "053": 12, "022": 12, "708": [12, 19, 21, 24, 35], "315": [12, 19, 21, 22, 24, 28], "75490012": 12, "318": [12, 13], "320": [12, 21], "682": 12, "lh": [12, 13, 14], "rw": [12, 13, 14], "kuangfangjun": [12, 13, 14], "289m": 12, "jan": [12, 13, 14], "289": 12, "roughli": [12, 13, 14], "equal": [12, 13, 14, 40, 41], "1024": [12, 13, 14, 39], "287": [12, 35], "1010k": [12, 13], "decoder_jit_trac": [12, 13, 14, 17, 39, 41], "283m": 12, "encoder_jit_trac": [12, 13, 14, 17, 39, 41], "0m": [12, 13], "joiner_jit_trac": [12, 13, 14, 17, 39, 41], "sure": [12, 13, 14], "found": [12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "param": [12, 13, 14], "503k": [12, 13], "437": [12, 13, 14], "142m": 12, "79k": 12, "5m": [12, 13], "488": [12, 13, 14], "text": [12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "architectur": [12, 13, 14, 39], "editor": [12, 13, 14], "content": [12, 13, 14], "compar": [12, 13, 14, 40], "283": [12, 14], "1010": [12, 13], "142": [12, 19, 22, 24], "503": [12, 13], "convers": [12, 13, 14], "half": [12, 13, 14, 27, 40, 41], "joiner": [12, 13, 14, 15, 17, 21, 27, 39, 40, 41], "default": [12, 13, 14, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "float32": [12, 13, 14], "float16": [12, 13, 14], "occupi": [12, 13, 14], "byte": [12, 13, 14], "twice": [12, 13, 14], "smaller": [12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "fp16": [12, 13, 14, 27, 29, 30, 39, 40, 41], "won": [12, 13, 14, 15, 19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "token": [12, 13, 14, 15, 19, 21, 22, 24, 28, 32, 33, 35], "accept": [12, 13, 14], "216": [12, 19, 24, 32, 33], "encoder_param_filenam": [12, 13, 14], "encoder_bin_filenam": [12, 13, 14], "decoder_param_filenam": [12, 13, 14], "decoder_bin_filenam": [12, 13, 14], "joiner_param_filenam": [12, 13, 14], "joiner_bin_filenam": [12, 13, 14], "sound_filenam": [12, 13, 14], "141": 12, "328": 12, "151": 12, "331": [12, 13, 24, 28], "176": [12, 21, 24], "336": 12, "106000": [12, 13, 14, 24, 28], "381": 12, "few": [12, 13, 14, 25], "7767517": [12, 13, 14], "1060": 12, "1342": 12, "in0": [12, 13, 14], "explan": [12, 13, 14], "three": [12, 13, 14, 17, 19, 21, 37], "magic": [12, 13, 14], "intermedi": [12, 13, 14], "mean": [12, 13, 14, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 37, 39, 40, 41], "extra": [12, 13, 14, 21, 37, 40], "increment": [12, 13, 14], "1061": 12, "sherpametadata": [12, 13, 14], "sherpa_meta_data1": [12, 13, 14], "still": [12, 13, 14], "sinc": [12, 13, 14, 25, 35, 39], "newli": [12, 13, 14], "must": [12, 13, 14, 40], "kei": [12, 13, 14, 24], "valu": [12, 13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "eas": [12, 13, 14], "list": [12, 13, 14, 19, 21, 22, 24, 28, 32, 33], "pair": [12, 13, 14], "sad": [12, 13, 14], "rememb": [12, 13, 14], "anymor": [12, 13, 14], "flexibl": [12, 13, 14], "edit": [12, 13, 14], "arm": [12, 13, 14], "aarch64": [12, 13, 14], "onc": [12, 13], "mayb": [12, 13], "year": [12, 13], "_jit_trac": [12, 13], "56": [12, 13, 24, 32], "fp32": [12, 13], "doubl": [12, 13], "j": [12, 13, 19, 24], "scale": [12, 13, 19, 24, 25, 28, 30, 32, 33], "py38": [12, 13, 14], "arg": [12, 13], "wave_filenam": [12, 13], "16k": [12, 13], "hz": [12, 13, 32, 33], "mono": [12, 13], "calibr": [12, 13], "purpos": [12, 13], "cat": [12, 13], "eof": [12, 13], "calcul": [12, 13, 29, 40, 41], "has_gpu": [12, 13], "config": [12, 13], "use_vulkan_comput": [12, 13], "88": [12, 21], "conv_87": 12, "942385": [12, 13], "threshold": [12, 13, 29], "938493": 12, "968131": 12, "conv_88": 12, "442448": 12, "549335": 12, "167552": 12, "conv_89": 12, "228289": 12, "001738": 12, "871552": 12, "linear_90": 12, "976146": 12, "101789": 12, "115": [12, 13, 19, 24], "267128": 12, "linear_91": 12, "962030": 12, "162033": 12, "602713": 12, "linear_92": 12, "323041": 12, "853959": 12, "953129": 12, "linear_94": 12, "905416": 12, "648006": 12, "323545": 12, "linear_93": 12, "474093": 12, "200188": 12, "linear_95": 12, "888012": 12, "403563": 12, "483986": 12, "linear_96": 12, "856741": 12, "398679": 12, "524273": 12, "linear_97": 12, "635942": 12, "613655": 12, "590950": 12, "linear_98": 12, "460340": 12, "670146": 12, "398010": 12, "linear_99": 12, "532276": 12, "585537": 12, "119396": 12, "linear_101": 12, "585871": 12, "719224": 12, "205809": 12, "linear_100": 12, "751382": 12, "081648": 12, "linear_102": 12, "593344": 12, "450581": 12, "87": 12, "551147": 12, "linear_103": 12, "592681": 12, "705824": 12, "257959": 12, "linear_104": 12, "752957": 12, "980955": 12, "110489": 12, "linear_105": 12, "696240": 12, "877193": 12, "608953": 12, "linear_106": 12, "059659": 12, "643138": 12, "048950": 12, "linear_108": 12, "975461": 12, "589567": 12, "671457": 12, "linear_107": 12, "190381": 12, "515701": 12, "linear_109": 12, "710759": 12, "305635": 12, "082436": 12, "linear_110": 12, "531228": 12, "731162": 12, "159557": 12, "linear_111": 12, "528083": 12, "259322": 12, "211544": 12, "linear_112": 12, "148807": 12, "500842": 12, "087374": 12, "linear_113": 12, "592566": 12, "948851": 12, "166611": 12, "linear_115": 12, "437109": 12, "608947": 12, "642395": 12, "linear_114": 12, "193942": 12, "503904": 12, "linear_116": 12, "966980": 12, "200896": 12, "676392": 12, "linear_117": 12, "451303": 12, "061664": 12, "951344": 12, "linear_118": 12, "077262": 12, "965800": 12, "023804": 12, "linear_119": 12, "671615": 12, "847613": 12, "198460": 12, "linear_120": 12, "625638": 12, "131427": 12, "556595": 12, "linear_122": 12, "274080": 12, "888716": 12, "978189": 12, "linear_121": 12, "420480": 12, "429659": 12, "linear_123": 12, "826197": 12, "599617": 12, "281532": 12, "linear_124": 12, "396383": 12, "325849": 12, "335875": 12, "linear_125": 12, "337198": 12, "941410": 12, "221970": 12, "linear_126": 12, "699965": 12, "842878": 12, "224073": 12, "linear_127": 12, "775370": 12, "884215": 12, "696438": 12, "linear_129": 12, "872276": 12, "837319": 12, "254213": 12, "linear_128": 12, "180057": 12, "687883": 12, "linear_130": 12, "150427": 12, "454298": 12, "765789": 12, "linear_131": 12, "112692": 12, "924847": 12, "025545": 12, "linear_132": 12, "852893": 12, "116593": 12, "749626": 12, "linear_133": 12, "517084": 12, "024665": 12, "275314": 12, "linear_134": 12, "683807": 12, "878618": 12, "743618": 12, "linear_136": 12, "421055": 12, "322729": 12, "086264": 12, "linear_135": 12, "309880": 12, "917679": 12, "linear_137": 12, "827781": 12, "744595": 12, "33": [12, 13, 19, 20, 21, 24, 32], "915554": 12, "linear_138": 12, "422395": 12, "742882": 12, "402161": 12, "linear_139": 12, "527538": 12, "866123": 12, "849449": 12, "linear_140": 12, "128619": 12, "657793": 12, "266134": 12, "linear_141": 12, "839593": 12, "845993": 12, "021378": 12, "linear_143": 12, "442304": 12, "099039": 12, "889746": 12, "linear_142": 12, "325038": 12, "849592": 12, "linear_144": 12, "929444": 12, "618206": 12, "605080": 12, "linear_145": 12, "382126": 12, "321095": 12, "625010": 12, "linear_146": 12, "894987": 12, "867645": 12, "836517": 12, "linear_147": 12, "915313": 12, "906028": 12, "886522": 12, "linear_148": 12, "614287": 12, "908151": 12, "496181": 12, "linear_150": 12, "724932": 12, "485588": 12, "312899": 12, "linear_149": 12, "161146": 12, "606939": 12, "linear_151": 12, "164453": 12, "847355": 12, "719223": 12, "linear_152": 12, "086471": 12, "984121": 12, "222834": 12, "linear_153": 12, "099524": 12, "991601": 12, "816805": 12, "linear_154": 12, "054585": 12, "489706": 12, "286930": 12, "linear_155": 12, "389185": 12, "100321": 12, "963501": 12, "linear_157": 12, "982999": 12, "154796": 12, "637253": 12, "linear_156": 12, "537706": 12, "875190": 12, "linear_158": 12, "420287": 12, "502287": 12, "531588": 12, "linear_159": 12, "014746": 12, "423280": 12, "477261": 12, "linear_160": 12, "633553": 12, "715335": 12, "220921": 12, "linear_161": 12, "371849": 12, "117830": 12, "815203": 12, "linear_162": 12, "492933": 12, "126283": 12, "623318": 12, "linear_164": 12, "697504": 12, "825712": 12, "317358": 12, "linear_163": 12, "078367": 12, "008038": 12, "linear_165": 12, "023975": 12, "836278": 12, "577358": 12, "linear_166": 12, "860619": 12, "259792": 12, "493614": 12, "linear_167": 12, "380934": 12, "496160": 12, "107042": 12, "linear_168": 12, "691216": 12, "733317": 12, "831076": 12, "linear_169": 12, "723948": 12, "952728": 12, "129707": 12, "linear_171": 12, "034811": 12, "366547": 12, "665123": 12, "linear_170": 12, "356277": 12, "710501": 12, "linear_172": 12, "556884": 12, "729481": 12, "166058": 12, "linear_173": 12, "033039": 12, "207264": 12, "442120": 12, "linear_174": 12, "597379": 12, "658676": 12, "768131": 12, "linear_2": [12, 13], "293503": 12, "305265": 12, "877850": 12, "linear_1": [12, 13], "812222": 12, "766452": 12, "487047": 12, "linear_3": [12, 13], "999999": 12, "999755": 12, "031174": 12, "wish": [12, 13], "low": [12, 13], "accuraci": [12, 13, 20], "955k": 12, "18k": 12, "inparam": [12, 13], "inbin": [12, 13], "outparam": [12, 13], "outbin": [12, 13], "99m": 12, "78k": 12, "774k": [12, 13], "496": [12, 13, 24, 28], "774": [12, 13], "much": [12, 13], "linear": [12, 13, 21], "convolut": [12, 13, 29, 37, 40], "exact": [12, 13], "4x": [12, 13], "speed": [12, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "comparison": 12, "44": [12, 13, 24, 32, 33], "468000": [13, 17, 39], "lstm_transducer_stateless2": [13, 17, 39], "rnn": [13, 21, 27, 29, 39, 40, 41], "hidden": [13, 39], "222": [13, 22, 24], "is_pnnx": 13, "62e404dd3f3a811d73e424199b3408e309c06e1a": [13, 14], "6d7a559": [13, 14], "feb": [13, 14, 21], "147": [13, 14], "rnn_hidden_s": 13, "aux_layer_period": 13, "235": 13, "43": [13, 14, 24], "239": [13, 21], "472": 13, "595": 13, "324": 13, "83137520": 13, "596": 13, "325": 13, "257024": 13, "326": 13, "781812": 13, "327": 13, "84176356": 13, "182": [13, 14, 19, 28], "158": 13, "183": [13, 32, 33], "335": 13, "101": 13, "tracerwarn": [13, 14], "boolean": [13, 14], "might": [13, 14, 40, 41], "caus": [13, 14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "incorrect": [13, 14, 21], "flow": [13, 14], "treat": [13, 14], "constant": [13, 14], "futur": [13, 14, 21, 42], "need_pad": 13, "bool": 13, "259": [13, 19], "180": [13, 19, 24], "339": 13, "207": [13, 22, 24], "84": [13, 19], "324m": 13, "321": [13, 19], "107": [13, 28], "318m": 13, "159m": 13, "21k": 13, "159": [13, 24, 35], "861": 13, "255": [13, 14], "425": [13, 24], "427": [13, 24], "266": [13, 14, 24, 28], "431": 13, "342": 13, "343": 13, "267": [13, 21, 32, 33], "379": 13, "268": [13, 24, 28], "317m": 13, "317": 13, "conv_15": 13, "930708": 13, "972025": 13, "conv_16": 13, "978855": 13, "031788": 13, "456645": 13, "conv_17": 13, "868437": 13, "830528": 13, "218575": 13, "linear_18": 13, "107259": 13, "194808": 13, "293236": 13, "linear_19": 13, "193777": 13, "634748": 13, "401705": 13, "linear_20": 13, "259933": 13, "606617": 13, "722160": 13, "linear_21": 13, "186600": 13, "790260": 13, "512129": 13, "linear_22": 13, "759041": 13, "265832": 13, "050053": 13, "linear_23": 13, "931209": 13, "099090": 13, "979767": 13, "linear_24": 13, "324160": 13, "215561": 13, "321835": 13, "linear_25": 13, "800708": 13, "599352": 13, "284134": 13, "linear_26": 13, "492444": 13, "153369": 13, "274391": 13, "linear_27": 13, "660161": 13, "720994": 13, "46": [13, 19, 24], "674126": 13, "linear_28": 13, "415265": 13, "174434": 13, "007133": 13, "linear_29": 13, "038418": 13, "118534": 13, "724262": 13, "linear_30": 13, "072084": 13, "936867": 13, "259155": 13, "linear_31": 13, "342712": 13, "599489": 13, "282787": 13, "linear_32": 13, "340535": 13, "120308": 13, "701103": 13, "linear_33": 13, "846987": 13, "630030": 13, "985939": 13, "linear_34": 13, "686298": 13, "204571": 13, "607586": 13, "linear_35": 13, "904821": 13, "575518": 13, "756420": 13, "linear_36": 13, "806659": 13, "585589": 13, "118401": 13, "linear_37": 13, "402340": 13, "047157": 13, "162680": 13, "linear_38": 13, "174589": 13, "923361": 13, "030258": 13, "linear_39": 13, "178576": 13, "556058": 13, "807705": 13, "linear_40": 13, "901954": 13, "301267": 13, "956539": 13, "linear_41": 13, "839805": 13, "597429": 13, "716181": 13, "linear_42": 13, "178945": 13, "651595": 13, "895699": 13, "829245": 13, "627592": 13, "637907": 13, "746186": 13, "255032": 13, "167313": 13, "000000": 13, "999756": 13, "031013": 13, "345k": 13, "17k": 13, "218m": 13, "larger": [13, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "counterpart": 13, "bit": [13, 19, 21, 22, 24, 28, 35], "4532": 13, "stateless7": [14, 15], "pruned_transducer_stateless7_stream": [14, 15, 41], "len": [14, 15, 41], "feedforward": [14, 21, 27, 40], "384": [14, 24], "192": [14, 24], "unmask": 14, "256": [14, 32, 33], "downsampl": [14, 20], "factor": [14, 19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "473": [14, 24], "246": [14, 21, 24, 32, 33], "477": 14, "warm_step": 14, "2000": [14, 22], "feedforward_dim": 14, "attention_dim": [14, 19, 21, 24], "encoder_unmasked_dim": 14, "zipformer_downsampling_factor": 14, "decode_chunk_len": 14, "257": [14, 21, 32, 33], "023": 14, "zipformer2": 14, "419": 14, "At": [14, 19, 24], "stack": 14, "downsampling_factor": 14, "037": 14, "655": 14, "346": 14, "68944004": 14, "347": 14, "260096": 14, "348": [14, 32], "716276": 14, "656": [14, 24], "349": 14, "69920376": 14, "351": 14, "353": 14, "174": [14, 24], "175": 14, "1344": 14, "assert": 14, "cached_len": 14, "num_lay": 14, "1348": 14, "cached_avg": 14, "1352": 14, "cached_kei": 14, "1356": 14, "cached_v": 14, "1360": 14, "cached_val2": 14, "1364": 14, "cached_conv1": 14, "1368": 14, "cached_conv2": 14, "1373": 14, "left_context_len": 14, "1884": 14, "x_size": 14, "2442": 14, "2449": 14, "2469": 14, "2473": 14, "2483": 14, "kv_len": 14, "k": [14, 27, 32, 33, 39, 40, 41], "2570": 14, "attn_output": 14, "bsz": 14, "num_head": 14, "seq_len": 14, "head_dim": 14, "2926": 14, "lorder": 14, "2652": 14, "2653": 14, "embed_dim": 14, "2666": 14, "1543": 14, "in_x_siz": 14, "1637": 14, "1643": 14, "in_channel": 14, "1571": 14, "1763": 14, "src1": 14, "src2": 14, "1779": 14, "dim1": 14, "1780": 14, "dim2": 14, "_trace": 14, "958": 14, "tracer": 14, "instead": [14, 21, 40], "tupl": 14, "namedtupl": 14, "absolut": 14, "know": [14, 25], "side": 14, "effect": 14, "strict": [14, 20], "allow": [14, 27, 40], "behavior": [14, 21], "_c": 14, "_create_method_from_trac": 14, "646": 14, "357": 14, "embedding_out": 14, "686": 14, "361": [14, 24, 28], "735": 14, "69": 14, "269m": 14, "53": [14, 19, 27, 28, 33, 39, 40], "269": [14, 19, 32, 33], "725": [14, 28], "1022k": 14, "266m": 14, "8m": 14, "509k": 14, "133m": 14, "152k": 14, "4m": 14, "1022": 14, "133": 14, "509": 14, "260": [14, 24], "360": 14, "365": 14, "280": [14, 24], "372": [14, 19], "state": [14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "026": 14, "410": 14, "2028": 14, "2547": 14, "2029": 14, "23316": 14, "23317": 14, "23318": 14, "23319": 14, "23320": 14, "amount": [14, 20], "pad": [14, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "conv2dsubsampl": 14, "v2": [14, 19, 24], "arrai": 14, "23300": 14, "element": 14, "onnx_pretrain": 15, "onnxruntim": 15, "separ": 15, "deploi": [15, 19, 24], "repo_url": 15, "basenam": 15, "pushd": 15, "popd": 15, "tree": [16, 17, 19, 21, 22, 24, 28, 32, 33, 35, 39], "cpu_jit": [16, 19, 24, 27, 29, 30, 40, 41], "confus": 16, "move": [16, 27, 29, 30, 40, 41], "why": 16, "streaming_asr": [16, 17, 39, 40, 41], "conv_emform": 16, "offline_asr": [16, 27], "jit_pretrain": [17, 29, 30, 39], "baz": 17, "tutori": [19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 39, 40, 41], "learn": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "singl": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "1best": [19, 22, 24, 28, 29, 30, 32, 33], "automag": [19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "stop": [19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "control": [19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "By": [19, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "musan": [19, 22, 24, 25, 27, 29, 30, 39, 40, 41], "thei": [19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "intal": [19, 22], "initi": [19, 22], "sudo": [19, 22], "apt": [19, 22], "permiss": [19, 22], "commandlin": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "quit": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "often": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "experi": [19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "world": [19, 21, 22, 24, 25, 27, 28, 29, 30, 39, 40, 41], "multi": [19, 21, 22, 24, 25, 27, 29, 30, 37, 39, 40, 41], "machin": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "ddp": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "implement": [19, 21, 22, 24, 25, 27, 29, 30, 37, 39, 40, 41], "present": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "second": [19, 21, 22, 24, 25, 27, 29, 30, 35, 39, 40, 41], "utter": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "oom": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "v100": [19, 21, 22, 24], "nvidia": [19, 21, 22, 24], "due": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "usual": [19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "increas": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "weight": [19, 22, 24, 29, 30, 39], "decai": [19, 22, 24, 29, 30, 39], "warmup": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "function": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "get_param": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "realli": [19, 22, 24, 27, 29, 30, 39, 40, 41], "directli": [19, 21, 22, 24, 25, 27, 29, 30, 39, 40, 41], "perturb": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "actual": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "3x150": [19, 21, 22], "450": [19, 21, 22], "hour": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "These": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "rate": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "visual": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "logdir": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "labelsmooth": 19, "someth": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "tensorflow": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "press": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40, 41], "ctrl": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40, 41], "engw8ksktzqs24zbv5dgcg": 19, "22t11": 19, "scan": [19, 21, 22, 24, 27, 35, 39, 40], "116068": 19, "scalar": [19, 21, 22, 24, 27, 35, 39, 40], "listen": [19, 21, 22, 27, 35, 39, 40], "url": [19, 21, 22, 24, 27, 29, 30, 35, 39, 40], "xxxx": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "saw": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "consol": [19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "typic": [19, 21, 22, 24], "avoid": [19, 21, 24], "commonli": [19, 21, 22, 24, 28, 32, 33, 35], "nbest": [19, 24, 30], "lattic": [19, 22, 24, 27, 28, 32, 33, 40, 41], "score": [19, 24, 27, 40, 41], "uniqu": [19, 24, 27, 40, 41], "pkufool": [19, 22, 28], "icefall_asr_aishell_conformer_ctc": 19, "transcrib": [19, 21, 22, 24], "v1": [19, 22, 24, 28, 32, 33], "lang_char": [19, 21], "word": [19, 21, 22, 24, 28, 32, 33, 35], "bac009s0764w0121": [19, 21, 22], "bac009s0764w0122": [19, 21, 22], "bac009s0764w0123": [19, 21, 22], "tran": [19, 22, 24, 28, 32, 33], "graph": [19, 22, 24, 27, 28, 32, 33, 40, 41], "id": [19, 22, 24, 28, 32, 33], "conveni": [19, 22, 24, 25], "eo": [19, 22, 24], "easili": [19, 22, 24], "obtain": [19, 21, 22, 24, 28, 32, 33], "soxi": [19, 21, 22, 24, 28, 35], "sampl": [19, 21, 22, 24, 28, 29, 35, 40, 41], "precis": [19, 21, 22, 24, 27, 28, 35, 40, 41], "67263": [19, 21, 22], "cdda": [19, 21, 22, 24, 28, 35], "sector": [19, 21, 22, 24, 28, 35], "135k": [19, 21, 22], "256k": [19, 21, 22, 24], "sign": [19, 21, 22, 24, 35], "integ": [19, 21, 22, 24, 35], "pcm": [19, 21, 22, 24, 35], "65840": [19, 21, 22], "625": [19, 21, 22], "132k": [19, 21, 22], "64000": [19, 21, 22], "300": [19, 21, 22, 24, 25, 27, 40], "128k": [19, 21, 22, 35], "displai": [19, 21, 22, 24], "topologi": [19, 24], "707": [19, 24], "num_decoder_lay": [19, 24], "vgg_frontend": [19, 21, 24], "use_feat_batchnorm": [19, 24], "f2fd997f752ed11bbef4c306652c433e83f9cf12": 19, "sun": 19, "sep": 19, "33cfe45": 19, "d57a873": 19, "nov": [19, 24], "hw": 19, "kangwei": 19, "icefall_aishell3": 19, "k2_releas": 19, "tokens_fil": 19, "words_fil": [19, 24, 35], "num_path": [19, 24, 27, 40, 41], "ngram_lm_scal": [19, 24], "attention_decoder_scal": [19, 24], "nbest_scal": [19, 24], "sos_id": [19, 24], "eos_id": [19, 24], "num_class": [19, 24, 35], "4336": [19, 21], "242": [19, 24], "131": [19, 24], "134": 19, "275": 19, "293": [19, 24], "704": [19, 32], "369": [19, 24], "\u751a": [19, 21], "\u81f3": [19, 21], "\u51fa": [19, 21], "\u73b0": [19, 21], "\u4ea4": [19, 21], "\u6613": [19, 21], "\u51e0": [19, 21], "\u4e4e": [19, 21], "\u505c": [19, 21], "\u6b62": 19, "\u7684": [19, 21, 22], "\u60c5": [19, 21], "\u51b5": [19, 21], "\u4e00": [19, 21], "\u4e8c": [19, 21], "\u7ebf": [19, 21, 22], "\u57ce": [19, 21], "\u5e02": [19, 21], "\u867d": [19, 21], "\u7136": [19, 21], "\u4e5f": [19, 21, 22], "\u5904": [19, 21], "\u4e8e": [19, 21], "\u8c03": [19, 21], "\u6574": [19, 21], "\u4e2d": [19, 21, 22], "\u4f46": [19, 21, 22], "\u56e0": [19, 21], "\u4e3a": [19, 21], "\u805a": [19, 21], "\u96c6": [19, 21], "\u4e86": [19, 21, 22], "\u8fc7": [19, 21], "\u591a": [19, 21], "\u516c": [19, 21], "\u5171": [19, 21], "\u8d44": [19, 21], "\u6e90": [19, 21], "371": 19, "683": 19, "651": [19, 35], "654": 19, "659": 19, "752": 19, "887": 19, "340": 19, "370": 19, "\u751a\u81f3": [19, 22], "\u51fa\u73b0": [19, 22], "\u4ea4\u6613": [19, 22], "\u51e0\u4e4e": [19, 22], "\u505c\u6b62": 19, "\u60c5\u51b5": [19, 22], "\u4e00\u4e8c": [19, 22], "\u57ce\u5e02": [19, 22], "\u867d\u7136": [19, 22], "\u5904\u4e8e": [19, 22], "\u8c03\u6574": [19, 22], "\u56e0\u4e3a": [19, 22], "\u805a\u96c6": [19, 22], "\u8fc7\u591a": [19, 22], "\u516c\u5171": [19, 22], "\u8d44\u6e90": [19, 22], "n": [19, 25, 27, 29, 30, 32, 33, 39, 40, 41], "recor": [19, 24], "highest": [19, 24], "965": 19, "966": 19, "821": 19, "822": 19, "826": 19, "916": 19, "345": 19, "888": 19, "889": 19, "limit": [19, 21, 24, 37, 40], "upgrad": [19, 24], "pro": [19, 24], "finish": [19, 21, 22, 24, 25, 27, 28, 32, 33, 35, 40, 41], "NOT": [19, 21, 24, 35], "checkout": [19, 24], "hlg_decod": [19, 24], "four": [19, 24], "messag": [19, 24, 27, 29, 30, 39, 40, 41], "nn_model": [19, 24], "use_gpu": [19, 24], "word_tabl": [19, 24], "caution": [19, 24], "forward": [19, 24, 29], "89": 19, "cu": [19, 24], "int": [19, 24], "char": [19, 24], "98": 19, "150": [19, 24], "693": [19, 32], "165": [19, 24], "nnet_output": [19, 24], "489": 19, "mandarin": 20, "corpu": 20, "beij": 20, "shell": 20, "technologi": 20, "ltd": 20, "400": 20, "peopl": 20, "accent": 20, "area": 20, "china": 20, "invit": 20, "particip": 20, "conduct": 20, "quiet": 20, "indoor": 20, "high": 20, "fidel": 20, "microphon": 20, "16khz": 20, "manual": 20, "95": 20, "through": 20, "profession": 20, "annot": 20, "inspect": 20, "free": [20, 25, 39], "academ": 20, "moder": 20, "research": 20, "field": 20, "openslr": 20, "ctc": [20, 23, 26, 30, 31, 34], "stateless": [20, 23, 27, 39, 40, 41], "head": [21, 37], "embed": [21, 27, 39, 40, 41], "conv1d": [21, 27, 39, 40, 41], "nn": [21, 27, 29, 30, 39, 40, 41], "tanh": 21, "borrow": 21, "ieeexplor": 21, "ieee": 21, "stamp": 21, "jsp": 21, "arnumb": 21, "9054419": 21, "predict": [21, 25, 27, 39, 40, 41], "charact": 21, "unit": 21, "vocabulari": 21, "87939824": 21, "optimized_transduc": 21, "technqiu": 21, "propos": [21, 37, 41], "improv": 21, "end": [21, 27, 29, 30, 35, 39, 40, 41], "furthermor": 21, "maximum": 21, "emit": 21, "per": [21, 27, 40, 41], "simplifi": [21, 37], "significantli": 21, "degrad": 21, "exactli": 21, "benchmark": 21, "unprun": 21, "advantag": 21, "minim": 21, "pruned_transducer_stateless": [21, 27, 37, 40], "altern": 21, "though": 21, "transducer_stateless_modifi": 21, "pr": 21, "gb": 21, "ram": 21, "small": [21, 32, 33, 35], "tri": 21, "prob": [21, 39], "appli": [21, 37], "219": [21, 24], "c": [21, 22, 27, 29, 30, 35, 39, 40, 41], "lagz6hrcqxoigbfd5e0y3q": 21, "03t14": 21, "8477": 21, "250": [21, 28], "sym": [21, 27, 40, 41], "beam_search": [21, 27, 40, 41], "decoding_method": 21, "beam_4": 21, "ensur": 21, "give": 21, "poor": 21, "531": [21, 22], "994": [21, 24], "027": 21, "encoder_out_dim": 21, "f4fefe4882bc0ae59af951da3f47335d5495ef71": 21, "50d2281": 21, "mar": 21, "0815224919": 21, "75d558775b": 21, "mmnv8": 21, "72": [21, 24], "248": 21, "878": [21, 33], "880": 21, "891": 21, "113": [21, 24], "userwarn": 21, "__floordiv__": 21, "round": 21, "toward": 21, "trunc": 21, "floor": 21, "neg": 21, "keep": [21, 27, 40, 41], "div": 21, "b": [21, 24, 32, 33], "rounding_mod": 21, "divis": 21, "x_len": 21, "163": [21, 24], "\u6ede": 21, "322": 21, "760": 21, "919": 21, "922": 21, "929": 21, "046": 21, "047": 21, "319": [21, 24], "798": 21, "214": [21, 24], "215": [21, 24, 28], "402": 21, "topk_hyp_index": 21, "topk_index": 21, "logit": 21, "583": [21, 33], "lji9mwuorlow3jkdhxwk8a": 22, "13t11": 22, "4454": 22, "icefall_asr_aishell_tdnn_lstm_ctc": 22, "858": [22, 24], "154": 22, "161": [22, 24], "536": 22, "539": 22, "917": 22, "129": 22, "\u505c\u6ede": 22, "statelessx": [23, 25, 26, 36, 37, 38], "mmi": [23, 26], "blank": [23, 26], "skip": [23, 25, 26, 27, 39, 40, 41], "distil": [23, 26], "hubert": [23, 26], "ligru": [23, 31], "full": [24, 25, 27, 29, 30, 39, 40, 41], "libri": [24, 25, 27, 29, 30, 39, 40, 41], "960": [24, 27, 29, 30, 39, 40, 41], "subset": [24, 27, 29, 30, 39, 40, 41], "3x960": [24, 27, 29, 30, 39, 40, 41], "2880": [24, 27, 29, 30, 39, 40, 41], "lzgnetjwrxc3yghnmd4kpw": 24, "24t16": 24, "4540": 24, "sentenc": 24, "piec": 24, "And": [24, 27, 29, 30, 39, 40, 41], "neither": 24, "nor": 24, "vocab": 24, "5000": 24, "033": 24, "537": 24, "538": 24, "full_libri": [24, 25], "406": 24, "464": 24, "548": 24, "776": 24, "652": [24, 35], "109226120": 24, "714": [24, 32], "206": 24, "944": 24, "1328": 24, "443": [24, 28], "2563": 24, "494": 24, "592": 24, "1715": 24, "52576": 24, "128": 24, "1424": 24, "807": 24, "506": 24, "808": [24, 32], "522": 24, "362": 24, "565": 24, "1477": 24, "2922": 24, "208": 24, "4295": 24, "52343": 24, "396": 24, "3584": 24, "432": 24, "433": 24, "680": [24, 32], "_pickl": 24, "unpicklingerror": 24, "hlg_modifi": 24, "g_4_gram": [24, 28, 32, 33], "875": [24, 28], "212k": 24, "267440": [24, 28], "1253": [24, 28], "535k": 24, "83": [24, 28], "77200": [24, 28], "154k": 24, "554": 24, "7178d67e594bc7fa89c2b331ad7bd1c62a6a9eb4": 24, "8d93169": 24, "601": 24, "758": 24, "025": 24, "broffel": 24, "osom": 24, "723": 24, "775": 24, "881": 24, "234": 24, "571": 24, "whole": [24, 28, 32, 33, 40, 41], "ngram": [24, 28, 32, 33], "857": 24, "979": 24, "980": 24, "055": 24, "117": 24, "051": 24, "363": 24, "959": [24, 33], "546": 24, "599": [24, 28], "833": 24, "834": 24, "915": 24, "076": 24, "110": 24, "397": 24, "999": [24, 27, 40, 41], "concaten": 24, "bucket": 24, "sampler": 24, "1000": 24, "ctc_decod": 24, "ngram_lm_rescor": 24, "attention_rescor": 24, "kind": [24, 27, 29, 30, 39, 40, 41], "105": 24, "221": 24, "125": [24, 35], "136": 24, "228": 24, "144": 24, "543": 24, "topo": 24, "547": 24, "729": 24, "702": 24, "703": 24, "545": 24, "279": 24, "122": 24, "126": 24, "135": [24, 35], "153": [24, 35], "945": 24, "475": 24, "191": [24, 32, 33], "398": 24, "515": 24, "w": [24, 32, 33], "deseri": 24, "441": 24, "fsaclass": 24, "loadfsa": 24, "const": 24, "string": 24, "c10": 24, "ignor": 24, "dummi": 24, "589": 24, "attention_scal": 24, "162": 24, "169": [24, 32, 33], "188": 24, "984": 24, "624": 24, "519": [24, 33], "632": 24, "645": [24, 35], "243": 24, "970": 24, "303": 24, "179": 24, "knowledg": 25, "_": 25, "vector": 25, "mvq": 25, "kd": 25, "paper": [25, 27, 39, 40, 41], "pruned_transducer_stateless4": [25, 27, 37, 40], "theoret": 25, "applic": 25, "minor": 25, "out": 25, "necessari": 25, "thing": 25, "distillation_with_hubert": 25, "Of": 25, "cours": 25, "xl": 25, "proce": 25, "960h": [25, 29], "use_extracted_codebook": 25, "augment": 25, "th": [25, 32, 33], "fine": 25, "embedding_lay": 25, "num_codebook": 25, "under": 25, "vq_fbank_layer36_cb8": 25, "whola": 25, "snippet": 25, "echo": 25, "awk": 25, "split": 25, "pruned_transducer_stateless6": 25, "12359": 25, "spec": 25, "aug": 25, "warp": 25, "enabl": 25, "argument": [25, 37], "paid": 25, "similar": [25, 29, 40, 41], "suitabl": [27, 39, 40, 41], "pruned_transducer_stateless2": [27, 37, 40], "pruned_transducer_stateless5": [27, 37, 40], "scroll": [27, 29, 30, 39, 40, 41], "scratch": [27, 29, 30, 39, 40, 41], "arxiv": [27, 39, 40, 41], "ab": [27, 39, 40, 41], "2206": [27, 39, 40, 41], "13236": [27, 39, 40, 41], "rework": [27, 37, 40], "daniel": [27, 40, 41], "joint": [27, 39, 40, 41], "contrari": [27, 39, 40, 41], "convent": [27, 39, 40, 41], "recurr": [27, 39, 40, 41], "2x": [27, 40, 41], "dimens": [27, 40, 41], "littl": [27, 40], "436000": [27, 29, 30, 39, 40, 41], "438000": [27, 29, 30, 39, 40, 41], "qogspbgsr8kzcrmmie9jgw": 27, "20t15": [27, 39, 40], "4468": [27, 39, 40], "210171": [27, 39, 40], "access": [27, 29, 30, 39, 40, 41], "6008": [27, 29, 30, 39, 40, 41], "localhost": [27, 29, 30, 39, 40, 41], "expos": [27, 29, 30, 39, 40, 41], "proxi": [27, 29, 30, 39, 40, 41], "bind_al": [27, 29, 30, 39, 40, 41], "both": [27, 29, 30, 37, 39, 40, 41], "lowest": [27, 29, 30, 39, 40, 41], "fast_beam_search": [27, 29, 39, 40, 41], "474000": [27, 39, 40, 41], "largest": [27, 40, 41], "posterior": [27, 29, 40, 41], "algorithm": [27, 40, 41], "pdf": [27, 30, 40, 41], "1211": [27, 40, 41], "3711": [27, 40, 41], "espnet": [27, 40, 41], "net": [27, 40, 41], "beam_search_transduc": [27, 40, 41], "basicli": [27, 40, 41], "topk": [27, 40, 41], "expand": [27, 40, 41], "mode": [27, 40, 41], "being": [27, 40, 41], "hardcod": [27, 40, 41], "composit": [27, 40, 41], "between": [27, 40, 41], "log_prob": [27, 40, 41], "hard": [27, 37, 40, 41], "2211": [27, 40, 41], "00484": [27, 40, 41], "rnnt": [27, 40, 41], "effici": [27, 40, 41], "fast_beam_search_lg": [27, 40, 41], "trivial": [27, 40, 41], "fast_beam_search_nbest": [27, 40, 41], "random_path": [27, 40, 41], "shortest": [27, 40, 41], "fast_beam_search_nbest_lg": [27, 40, 41], "logic": [27, 40, 41], "smallest": [27, 39, 40, 41], "icefall_asr_librispeech_tdnn": 28, "lstm_ctc": 28, "flac": 28, "116k": 28, "140k": 28, "343k": 28, "164k": 28, "105k": 28, "174k": 28, "pretraind": 28, "170": 28, "584": [28, 33], "209": 28, "245": 28, "098": 28, "099": 28, "methond": [28, 32, 33], "403": 28, "631": 28, "190": 28, "121": 28, "010": 28, "guidanc": 29, "bigger": 29, "simpli": 29, "discard": 29, "prevent": 29, "lconv": 29, "encourag": [29, 30, 39], "stabil": [29, 30], "doesn": 29, "warm": [29, 30], "xyozukpeqm62hbilud4upa": [29, 30], "ctc_guide_decode_b": 29, "pretrained_ctc": 29, "jit_pretrained_ctc": 29, "100h": 29, "yfyeung": 29, "wechat": 30, "zipformer_mmi": 30, "worker": [30, 39], "hp": 30, "tdnn_ligru_ctc": 32, "enough": [32, 33, 35], "luomingshuang": [32, 33], "icefall_asr_timit_tdnn_ligru_ctc": 32, "pretrained_average_9_25": 32, "fdhc0_si1559": [32, 33], "felc0_si756": [32, 33], "fmgd0_si1564": [32, 33], "ffprobe": [32, 33], "show_format": [32, 33], "nistspher": [32, 33], "database_id": [32, 33], "database_vers": [32, 33], "utterance_id": [32, 33], "dhc0_si1559": [32, 33], "sample_min": [32, 33], "4176": [32, 33], "sample_max": [32, 33], "5984": [32, 33], "bitrat": [32, 33], "258": [32, 33], "audio": [32, 33], "pcm_s16le": [32, 33], "s16": [32, 33], "elc0_si756": [32, 33], "1546": [32, 33], "1989": [32, 33], "mgd0_si1564": [32, 33], "7626": [32, 33], "10573": [32, 33], "660": 32, "695": 32, "697": 32, "210": [32, 33], "819": 32, "829": 32, "sil": [32, 33], "dh": [32, 33], "ih": [32, 33], "uw": [32, 33], "ah": [32, 33], "ii": [32, 33], "z": [32, 33], "aa": [32, 33], "ei": [32, 33], "dx": [32, 33], "d": [32, 33], "uh": [32, 33], "ng": [32, 33], "eh": [32, 33], "jh": [32, 33], "er": [32, 33], "ai": [32, 33], "hh": [32, 33], "aw": 32, "ae": [32, 33], "705": 32, "715": 32, "720": 32, "ch": 32, "icefall_asr_timit_tdnn_lstm_ctc": 33, "pretrained_average_16_25": 33, "816": 33, "827": 33, "387": 33, "unk": 33, "739": 33, "971": 33, "977": 33, "978": 33, "981": 33, "ow": 33, "ykubhb5wrmosxykid1z9eg": 35, "23t23": 35, "icefall_asr_yesno_tdnn": 35, "l_disambig": 35, "lexicon_disambig": 35, "arpa": 35, "0_0_0_1_0_0_0_1": 35, "0_0_1_0_0_0_1_0": 35, "0_0_1_0_0_1_1_1": 35, "0_0_1_0_1_0_0_1": 35, "0_0_1_1_0_0_0_1": 35, "0_0_1_1_0_1_1_0": 35, "0_0_1_1_1_0_0_0": 35, "0_0_1_1_1_1_0_0": 35, "0_1_0_0_0_1_0_0": 35, "0_1_0_0_1_0_1_0": 35, "0_1_0_1_0_0_0_0": 35, "0_1_0_1_1_1_0_0": 35, "0_1_1_0_0_1_1_1": 35, "0_1_1_1_0_0_1_0": 35, "0_1_1_1_1_0_1_0": 35, "1_0_0_0_0_0_0_0": 35, "1_0_0_0_0_0_1_1": 35, "1_0_0_1_0_1_1_1": 35, "1_0_1_1_0_1_1_1": 35, "1_0_1_1_1_1_0_1": 35, "1_1_0_0_0_1_1_1": 35, "1_1_0_0_1_0_1_1": 35, "1_1_0_1_0_1_0_0": 35, "1_1_0_1_1_0_0_1": 35, "1_1_0_1_1_1_1_0": 35, "1_1_1_0_0_1_0_1": 35, "1_1_1_0_1_0_1_0": 35, "1_1_1_1_0_0_1_0": 35, "1_1_1_1_1_0_0_0": 35, "1_1_1_1_1_1_1_1": 35, "54080": 35, "507": 35, "108k": 35, "ye": 35, "hebrew": 35, "NO": 35, "621": 35, "119": 35, "650": 35, "139": 35, "143": 35, "198": 35, "181": 35, "186": 35, "187": 35, "213": 35, "correctli": 35, "simplest": 35, "former": 37, "idea": 37, "achiev": 37, "mask": [37, 40, 41], "wenet": 37, "did": 37, "metion": 37, "complic": 37, "techniqu": 37, "bank": 37, "memor": 37, "histori": 37, "introduc": 37, "variant": 37, "pruned_stateless_emformer_rnnt2": 37, "conv_emformer_transducer_stateless": 37, "ourself": 37, "mechan": 37, "onlin": 39, "lstm_transducer_stateless": 39, "lower": 39, "prepare_giga_speech": 39, "cj2vtpiwqhkn9q1tx6ptpg": 39, "dynam": [40, 41], "causal": 40, "short": [40, 41], "2012": 40, "05481": 40, "flag": 40, "indic": [40, 41], "whether": 40, "sequenc": [40, 41], "uniformli": [40, 41], "seen": [40, 41], "97vkxf80ru61cnp2alwzzg": 40, "streaming_decod": [40, 41], "acoust": [40, 41], "wise": [40, 41], "parallel": [40, 41], "bath": [40, 41], "parallelli": [40, 41], "seem": 40, "benefit": 40, "mismatch": 40, "mdoel": 40, "320m": 41, "550": 41, "scriptmodul": 41, "jit_trace_export": 41, "jit_trace_pretrain": 41, "task": 42}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"follow": 0, "code": 0, "style": 0, "contribut": [1, 3], "document": 1, "how": [2, 10, 16, 17], "creat": [2, 9], "recip": [2, 42], "data": [2, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "prepar": [2, 9, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "train": [2, 6, 9, 12, 13, 14, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "decod": [2, 9, 10, 15, 19, 21, 22, 24, 25, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "pre": [2, 6, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "model": [2, 6, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "frequent": 4, "ask": 4, "question": 4, "faq": 4, "oserror": 4, "libtorch_hip": 4, "so": 4, "cannot": 4, "open": 4, "share": 4, "object": 4, "file": [4, 15], "directori": 4, "attributeerror": 4, "modul": 4, "distutil": 4, "ha": 4, "attribut": 4, "version": 4, "importerror": 4, "libpython3": 4, "10": 4, "1": [4, 9, 12, 13, 14, 19, 21, 22, 24], "0": [4, 9], "No": 4, "huggingfac": [5, 7], "space": 7, "youtub": [7, 9], "video": [7, 9], "icefal": [8, 9, 12, 13, 14], "content": [8, 42], "instal": [9, 12, 13, 14, 19, 21, 22, 24, 28, 32, 33], "cuda": 9, "toolkit": 9, "cudnn": 9, "pytorch": 9, "torchaudio": 9, "2": [9, 12, 13, 14, 19, 21, 22, 24], "k2": 9, "3": [9, 12, 13, 14, 19, 21, 24], "lhots": 9, "4": [9, 12, 13, 14], "download": [9, 12, 13, 14, 15, 19, 21, 22, 24, 27, 28, 29, 30, 32, 33, 35, 39, 40, 41], "exampl": [9, 15, 19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "virtual": 9, "environ": 9, "activ": 9, "your": 9, "5": [9, 12, 13, 14], "test": [9, 12, 13, 14], "export": [10, 11, 12, 13, 14, 15, 16, 17, 18, 27, 29, 30, 39, 40, 41], "state_dict": [10, 27, 29, 30, 39, 40, 41], "when": [10, 16, 17], "us": [10, 16, 17, 27, 29, 30, 39, 40, 41], "run": 10, "py": 10, "ncnn": [11, 12, 13, 14], "convemform": 12, "transduc": [12, 13, 14, 21, 27, 39, 40, 41], "pnnx": [12, 13, 14], "via": [12, 13, 14], "torch": [12, 13, 14, 16, 17, 27, 29, 30, 39, 40, 41], "jit": [12, 13, 14, 16, 17, 27, 29, 30, 39, 40, 41], "trace": [12, 13, 14, 17, 39, 41], "torchscript": [12, 13, 14], "6": [12, 13, 14], "modifi": [12, 13, 14, 21], "encod": [12, 13, 14], "sherpa": [12, 13, 14, 15, 27, 40, 41], "7": [12, 13], "option": [12, 13, 19, 22, 24, 27, 29, 30, 39, 40, 41], "int8": [12, 13], "quantiz": [12, 13], "lstm": [13, 22, 28, 33, 39], "stream": [14, 23, 36, 37, 40, 41], "zipform": [14, 29, 30, 41], "onnx": 15, "sound": 15, "script": [16, 27, 29, 30, 40, 41], "conform": [19, 24, 37], "ctc": [19, 22, 24, 28, 29, 32, 33, 35], "configur": [19, 22, 24, 27, 29, 30, 39, 40, 41], "log": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "usag": [19, 21, 22, 24, 27, 29, 30, 39, 40, 41], "case": [19, 21, 22, 24], "kaldifeat": [19, 21, 22, 24, 28, 32, 33, 35], "hlg": [19, 22, 24], "attent": [19, 24], "rescor": [19, 24], "colab": [19, 21, 22, 24, 28, 32, 33, 35], "notebook": [19, 21, 22, 24, 28, 32, 33, 35], "deploy": [19, 24], "c": [19, 24], "aishel": 20, "stateless": 21, "The": 21, "loss": 21, "todo": 21, "greedi": 21, "search": 21, "beam": 21, "tdnn": [22, 28, 32, 33, 35], "non": 23, "asr": [23, 36], "lm": 24, "comput": 24, "wer": 24, "n": 24, "gram": 24, "distil": 25, "hubert": 25, "codebook": 25, "index": 25, "librispeech": [26, 38], "prune": [27, 40], "statelessx": [27, 40], "pretrain": [27, 29, 30, 39, 40, 41], "deploi": [27, 40, 41], "infer": [28, 32, 33, 35], "blank": 29, "skip": 29, "mmi": 30, "timit": 31, "ligru": 32, "yesno": 34, "introduct": 37, "emform": 37, "which": 39, "simul": [40, 41], "real": [40, 41], "tabl": 42}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.todo": 2, "sphinx": 57}, "alltitles": {"Follow the code style": [[0, "follow-the-code-style"]], "Contributing to Documentation": [[1, "contributing-to-documentation"]], "How to create a recipe": [[2, "how-to-create-a-recipe"]], "Data Preparation": [[2, "data-preparation"], [21, "data-preparation"]], "Training": [[2, "training"], [9, "training"], [19, "training"], [21, "training"], [22, "training"], [24, "training"], [25, "training"], [27, "training"], [28, "training"], [29, "training"], [30, "training"], [32, "training"], [33, "training"], [35, "training"], [39, "training"], [40, "training"], [41, "training"]], "Decoding": [[2, "decoding"], [9, "decoding"], [19, "decoding"], [21, "decoding"], [22, "decoding"], [24, "decoding"], [25, "decoding"], [27, "decoding"], [28, "decoding"], [29, "decoding"], [30, "decoding"], [32, "decoding"], [33, "decoding"], [35, "decoding"], [39, "decoding"], [40, "decoding"], [41, "decoding"]], "Pre-trained model": [[2, "pre-trained-model"]], "Contributing": [[3, "contributing"]], "Frequently Asked Questions (FAQs)": [[4, "frequently-asked-questions-faqs"]], "OSError: libtorch_hip.so: cannot open shared object file: no such file or directory": [[4, "oserror-libtorch-hip-so-cannot-open-shared-object-file-no-such-file-or-directory"]], "AttributeError: module \u2018distutils\u2019 has no attribute \u2018version\u2019": [[4, "attributeerror-module-distutils-has-no-attribute-version"]], "ImportError: libpython3.10.so.1.0: cannot open shared object file: No such file or directory": [[4, "importerror-libpython3-10-so-1-0-cannot-open-shared-object-file-no-such-file-or-directory"]], "Huggingface": [[5, "huggingface"]], "Pre-trained models": [[6, "pre-trained-models"]], "Huggingface spaces": [[7, "huggingface-spaces"]], "YouTube Video": [[7, "youtube-video"], [9, "youtube-video"]], "Icefall": [[8, "icefall"]], "Contents:": [[8, null]], "Installation": [[9, "installation"]], "(0) Install CUDA toolkit and cuDNN": [[9, "install-cuda-toolkit-and-cudnn"]], "(1) Install PyTorch and torchaudio": [[9, "install-pytorch-and-torchaudio"]], "(2) Install k2": [[9, "install-k2"]], "(3) Install lhotse": [[9, "install-lhotse"]], "(4) Download icefall": [[9, "download-icefall"]], "Installation example": [[9, "installation-example"]], "(1) Create a virtual environment": [[9, "create-a-virtual-environment"]], "(2) Activate your virtual environment": [[9, "activate-your-virtual-environment"]], "(3) Install k2": [[9, "id1"]], "(4) Install lhotse": [[9, "id2"]], "(5) Download icefall": [[9, "id3"]], "Test Your Installation": [[9, "test-your-installation"]], "Data preparation": [[9, "data-preparation"], [19, "data-preparation"], [22, "data-preparation"], [24, "data-preparation"], [25, "data-preparation"], [27, "data-preparation"], [28, "data-preparation"], [29, "data-preparation"], [30, "data-preparation"], [32, "data-preparation"], [33, "data-preparation"], [35, "data-preparation"], [39, "data-preparation"], [40, "data-preparation"], [41, "data-preparation"]], "Export model.state_dict()": [[10, "export-model-state-dict"], [27, "export-model-state-dict"], [29, "export-model-state-dict"], [30, "export-model-state-dict"], [39, "export-model-state-dict"], [40, "export-model-state-dict"], [41, "export-model-state-dict"]], "When to use it": [[10, "when-to-use-it"], [16, "when-to-use-it"], [17, "when-to-use-it"]], "How to export": [[10, "how-to-export"], [16, "how-to-export"], [17, "how-to-export"]], "How to use the exported model": [[10, "how-to-use-the-exported-model"], [16, "how-to-use-the-exported-model"]], "Use the exported model to run decode.py": [[10, "use-the-exported-model-to-run-decode-py"]], "Export to ncnn": [[11, "export-to-ncnn"]], "Export ConvEmformer transducer models to ncnn": [[12, "export-convemformer-transducer-models-to-ncnn"]], "1. Download the pre-trained model": [[12, "download-the-pre-trained-model"], [13, "download-the-pre-trained-model"], [14, "download-the-pre-trained-model"]], "2. Install ncnn and pnnx": [[12, "install-ncnn-and-pnnx"], [13, "install-ncnn-and-pnnx"], [14, "install-ncnn-and-pnnx"]], "3. Export the model via torch.jit.trace()": [[12, "export-the-model-via-torch-jit-trace"], [13, "export-the-model-via-torch-jit-trace"], [14, "export-the-model-via-torch-jit-trace"]], "4. Export torchscript model via pnnx": [[12, "export-torchscript-model-via-pnnx"], [13, "export-torchscript-model-via-pnnx"], [14, "export-torchscript-model-via-pnnx"]], "5. Test the exported models in icefall": [[12, "test-the-exported-models-in-icefall"], [13, "test-the-exported-models-in-icefall"], [14, "test-the-exported-models-in-icefall"]], "6. Modify the exported encoder for sherpa-ncnn": [[12, "modify-the-exported-encoder-for-sherpa-ncnn"], [13, "modify-the-exported-encoder-for-sherpa-ncnn"], [14, "modify-the-exported-encoder-for-sherpa-ncnn"]], "7. (Optional) int8 quantization with sherpa-ncnn": [[12, "optional-int8-quantization-with-sherpa-ncnn"], [13, "optional-int8-quantization-with-sherpa-ncnn"]], "Export LSTM transducer models to ncnn": [[13, "export-lstm-transducer-models-to-ncnn"]], "Export streaming Zipformer transducer models to ncnn": [[14, "export-streaming-zipformer-transducer-models-to-ncnn"]], "Export to ONNX": [[15, "export-to-onnx"]], "sherpa-onnx": [[15, "sherpa-onnx"]], "Example": [[15, "example"]], "Download the pre-trained model": [[15, "download-the-pre-trained-model"], [19, "download-the-pre-trained-model"], [21, "download-the-pre-trained-model"], [22, "download-the-pre-trained-model"], [24, "download-the-pre-trained-model"], [28, "download-the-pre-trained-model"], [32, "download-the-pre-trained-model"], [33, "download-the-pre-trained-model"], [35, "download-the-pre-trained-model"]], "Export the model to ONNX": [[15, "export-the-model-to-onnx"]], "Decode sound files with exported ONNX models": [[15, "decode-sound-files-with-exported-onnx-models"]], "Export model with torch.jit.script()": [[16, "export-model-with-torch-jit-script"]], "Export model with torch.jit.trace()": [[17, "export-model-with-torch-jit-trace"]], "How to use the exported models": [[17, "how-to-use-the-exported-models"]], "Model export": [[18, "model-export"]], "Conformer CTC": [[19, "conformer-ctc"], [24, "conformer-ctc"]], "Configurable options": [[19, "configurable-options"], [22, "configurable-options"], [24, "configurable-options"], [27, "configurable-options"], [29, "configurable-options"], [30, "configurable-options"], [39, "configurable-options"], [40, "configurable-options"], [41, "configurable-options"]], "Pre-configured options": [[19, "pre-configured-options"], [22, "pre-configured-options"], [24, "pre-configured-options"], [27, "pre-configured-options"], [29, "pre-configured-options"], [30, "pre-configured-options"], [39, "pre-configured-options"], [40, "pre-configured-options"], [41, "pre-configured-options"]], "Training logs": [[19, "training-logs"], [21, "training-logs"], [22, "training-logs"], [24, "training-logs"], [27, "training-logs"], [29, "training-logs"], [30, "training-logs"], [39, "training-logs"], [40, "training-logs"], [41, "training-logs"]], "Usage examples": [[19, "usage-examples"], [21, "usage-examples"], [22, "usage-examples"], [24, "usage-examples"]], "Case 1": [[19, "case-1"], [21, "case-1"], [22, "case-1"], [24, "case-1"]], "Case 2": [[19, "case-2"], [21, "case-2"], [22, "case-2"], [24, "case-2"]], "Case 3": [[19, "case-3"], [21, "case-3"], [24, "case-3"]], "Pre-trained Model": [[19, "pre-trained-model"], [21, "pre-trained-model"], [22, "pre-trained-model"], [24, "pre-trained-model"], [28, "pre-trained-model"], [32, "pre-trained-model"], [33, "pre-trained-model"], [35, "pre-trained-model"]], "Install kaldifeat": [[19, "install-kaldifeat"], [21, "install-kaldifeat"], [22, "install-kaldifeat"], [24, "install-kaldifeat"], [28, "install-kaldifeat"], [32, "install-kaldifeat"], [33, "install-kaldifeat"]], "Usage": [[19, "usage"], [21, "usage"], [22, "usage"], [24, "usage"]], "CTC decoding": [[19, "ctc-decoding"], [24, "ctc-decoding"], [24, "id2"]], "HLG decoding": [[19, "hlg-decoding"], [19, "id2"], [22, "hlg-decoding"], [24, "hlg-decoding"], [24, "id3"]], "HLG decoding + attention decoder rescoring": [[19, "hlg-decoding-attention-decoder-rescoring"]], "Colab notebook": [[19, "colab-notebook"], [21, "colab-notebook"], [22, "colab-notebook"], [24, "colab-notebook"], [28, "colab-notebook"], [32, "colab-notebook"], [33, "colab-notebook"], [35, "colab-notebook"]], "Deployment with C++": [[19, "deployment-with-c"], [24, "deployment-with-c"]], "aishell": [[20, "aishell"]], "Stateless Transducer": [[21, "stateless-transducer"]], "The Model": [[21, "the-model"]], "The Loss": [[21, "the-loss"]], "Todo": [[21, "id1"]], "Greedy search": [[21, "greedy-search"]], "Beam search": [[21, "beam-search"]], "Modified Beam search": [[21, "modified-beam-search"]], "TDNN-LSTM CTC": [[22, "tdnn-lstm-ctc"]], "Non Streaming ASR": [[23, "non-streaming-asr"]], "HLG decoding + LM rescoring": [[24, "hlg-decoding-lm-rescoring"]], "HLG decoding + LM rescoring + attention decoder rescoring": [[24, "hlg-decoding-lm-rescoring-attention-decoder-rescoring"]], "Compute WER with the pre-trained model": [[24, "compute-wer-with-the-pre-trained-model"]], "HLG decoding + n-gram LM rescoring": [[24, "hlg-decoding-n-gram-lm-rescoring"]], "HLG decoding + n-gram LM rescoring + attention decoder rescoring": [[24, "hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring"]], "Distillation with HuBERT": [[25, "distillation-with-hubert"]], "Codebook index preparation": [[25, "codebook-index-preparation"]], "LibriSpeech": [[26, "librispeech"], [38, "librispeech"]], "Pruned transducer statelessX": [[27, "pruned-transducer-statelessx"], [40, "pruned-transducer-statelessx"]], "Usage example": [[27, "usage-example"], [29, "usage-example"], [30, "usage-example"], [39, "usage-example"], [40, "usage-example"], [41, "usage-example"]], "Export Model": [[27, "export-model"], [40, "export-model"], [41, "export-model"]], "Export model using torch.jit.script()": [[27, "export-model-using-torch-jit-script"], [29, "export-model-using-torch-jit-script"], [30, "export-model-using-torch-jit-script"], [40, "export-model-using-torch-jit-script"], [41, "export-model-using-torch-jit-script"]], "Download pretrained models": [[27, "download-pretrained-models"], [29, "download-pretrained-models"], [30, "download-pretrained-models"], [39, "download-pretrained-models"], [40, "download-pretrained-models"], [41, "download-pretrained-models"]], "Deploy with Sherpa": [[27, "deploy-with-sherpa"], [40, "deploy-with-sherpa"], [41, "deploy-with-sherpa"]], "TDNN-LSTM-CTC": [[28, "tdnn-lstm-ctc"], [33, "tdnn-lstm-ctc"]], "Inference with a pre-trained model": [[28, "inference-with-a-pre-trained-model"], [32, "inference-with-a-pre-trained-model"], [33, "inference-with-a-pre-trained-model"], [35, "inference-with-a-pre-trained-model"]], "Zipformer CTC Blank Skip": [[29, "zipformer-ctc-blank-skip"]], "Export models": [[29, "export-models"], [30, "export-models"], [39, "export-models"]], "Zipformer MMI": [[30, "zipformer-mmi"]], "TIMIT": [[31, "timit"]], "TDNN-LiGRU-CTC": [[32, "tdnn-ligru-ctc"]], "YesNo": [[34, "yesno"]], "TDNN-CTC": [[35, "tdnn-ctc"]], "Download kaldifeat": [[35, "download-kaldifeat"]], "Streaming ASR": [[36, "streaming-asr"]], "Introduction": [[37, "introduction"]], "Streaming Conformer": [[37, "streaming-conformer"]], "Streaming Emformer": [[37, "streaming-emformer"]], "LSTM Transducer": [[39, "lstm-transducer"]], "Which model to use": [[39, "which-model-to-use"]], "Export model using torch.jit.trace()": [[39, "export-model-using-torch-jit-trace"], [41, "export-model-using-torch-jit-trace"]], "Simulate streaming decoding": [[40, "simulate-streaming-decoding"], [41, "simulate-streaming-decoding"]], "Real streaming decoding": [[40, "real-streaming-decoding"], [41, "real-streaming-decoding"]], "Zipformer Transducer": [[41, "zipformer-transducer"]], "Recipes": [[42, "recipes"]], "Table of Contents": [[42, null]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["contributing/code-style", "contributing/doc", "contributing/how-to-create-a-recipe", "contributing/index", "decoding-with-langugage-models/LODR", "decoding-with-langugage-models/index", "decoding-with-langugage-models/rescoring", "decoding-with-langugage-models/shallow-fusion", "faqs", "huggingface/index", "huggingface/pretrained-models", "huggingface/spaces", "index", "installation/index", "model-export/export-model-state-dict", "model-export/export-ncnn", "model-export/export-ncnn-conv-emformer", "model-export/export-ncnn-lstm", "model-export/export-ncnn-zipformer", "model-export/export-onnx", "model-export/export-with-torch-jit-script", "model-export/export-with-torch-jit-trace", "model-export/index", "recipes/Non-streaming-ASR/aishell/conformer_ctc", "recipes/Non-streaming-ASR/aishell/index", "recipes/Non-streaming-ASR/aishell/stateless_transducer", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/index", "recipes/Non-streaming-ASR/librispeech/conformer_ctc", "recipes/Non-streaming-ASR/librispeech/distillation", "recipes/Non-streaming-ASR/librispeech/index", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi", "recipes/Non-streaming-ASR/timit/index", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc", "recipes/Non-streaming-ASR/yesno/index", "recipes/Non-streaming-ASR/yesno/tdnn", "recipes/Streaming-ASR/index", "recipes/Streaming-ASR/introduction", "recipes/Streaming-ASR/librispeech/index", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless", "recipes/Streaming-ASR/librispeech/zipformer_transducer", "recipes/index"], "filenames": ["contributing/code-style.rst", "contributing/doc.rst", "contributing/how-to-create-a-recipe.rst", "contributing/index.rst", "decoding-with-langugage-models/LODR.rst", "decoding-with-langugage-models/index.rst", "decoding-with-langugage-models/rescoring.rst", "decoding-with-langugage-models/shallow-fusion.rst", "faqs.rst", "huggingface/index.rst", "huggingface/pretrained-models.rst", "huggingface/spaces.rst", "index.rst", "installation/index.rst", "model-export/export-model-state-dict.rst", "model-export/export-ncnn.rst", "model-export/export-ncnn-conv-emformer.rst", "model-export/export-ncnn-lstm.rst", "model-export/export-ncnn-zipformer.rst", "model-export/export-onnx.rst", "model-export/export-with-torch-jit-script.rst", "model-export/export-with-torch-jit-trace.rst", "model-export/index.rst", "recipes/Non-streaming-ASR/aishell/conformer_ctc.rst", "recipes/Non-streaming-ASR/aishell/index.rst", "recipes/Non-streaming-ASR/aishell/stateless_transducer.rst", "recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/index.rst", "recipes/Non-streaming-ASR/librispeech/conformer_ctc.rst", "recipes/Non-streaming-ASR/librispeech/distillation.rst", "recipes/Non-streaming-ASR/librispeech/index.rst", "recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.rst", "recipes/Non-streaming-ASR/librispeech/zipformer_mmi.rst", "recipes/Non-streaming-ASR/timit/index.rst", "recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.rst", "recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.rst", "recipes/Non-streaming-ASR/yesno/index.rst", "recipes/Non-streaming-ASR/yesno/tdnn.rst", "recipes/Streaming-ASR/index.rst", "recipes/Streaming-ASR/introduction.rst", "recipes/Streaming-ASR/librispeech/index.rst", "recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst", "recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst", "recipes/Streaming-ASR/librispeech/zipformer_transducer.rst", "recipes/index.rst"], "titles": ["Follow the code style", "Contributing to Documentation", "How to create a recipe", "Contributing", "LODR for RNN Transducer", "Decoding with language models", "LM rescoring for Transducer", "Shallow fusion for Transducer", "Frequently Asked Questions (FAQs)", "Huggingface", "Pre-trained models", "Huggingface spaces", "Icefall", "Installation", "Export model.state_dict()", "Export to ncnn", "Export ConvEmformer transducer models to ncnn", "Export LSTM transducer models to ncnn", "Export streaming Zipformer transducer models to ncnn", "Export to ONNX", "Export model with torch.jit.script()", "Export model with torch.jit.trace()", "Model export", "Conformer CTC", "aishell", "Stateless Transducer", "TDNN-LSTM CTC", "Non Streaming ASR", "Conformer CTC", "Distillation with HuBERT", "LibriSpeech", "Pruned transducer statelessX", "TDNN-LSTM-CTC", "Zipformer CTC Blank Skip", "Zipformer MMI", "TIMIT", "TDNN-LiGRU-CTC", "TDNN-LSTM-CTC", "YesNo", "TDNN-CTC", "Streaming ASR", "Introduction", "LibriSpeech", "LSTM Transducer", "Pruned transducer statelessX", "Zipformer Transducer", "Recipes"], "terms": {"we": [0, 1, 2, 3, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "us": [0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 17, 18, 19, 22, 23, 24, 25, 26, 28, 29, 32, 36, 37, 39, 41], "tool": [0, 8, 16], "make": [0, 1, 3, 16, 17, 18, 23, 25, 28, 41], "consist": [0, 25, 31, 43, 44, 45], "possibl": [0, 2, 3, 13, 23, 28], "black": 0, "format": [0, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "flake8": 0, "check": [0, 28], "qualiti": [0, 24], "isort": 0, "sort": [0, 13], "import": [0, 8, 16, 44, 45], "The": [0, 1, 2, 4, 7, 8, 11, 13, 14, 16, 17, 18, 23, 24, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "version": [0, 12, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 36, 37, 44], "abov": [0, 4, 6, 7, 8, 13, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44, 45], "ar": [0, 1, 3, 4, 6, 7, 8, 13, 14, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45, 46], "22": [0, 13, 16, 17, 28, 36, 37, 39], "3": [0, 4, 6, 7, 8, 12, 14, 15, 19, 22, 26, 29, 31, 32, 33, 34, 39, 43, 44, 45], "0": [0, 1, 4, 6, 7, 12, 14, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "5": [0, 7, 15, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "4": [0, 4, 6, 7, 8, 12, 14, 15, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "10": [0, 7, 12, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "1": [0, 4, 6, 7, 12, 14, 15, 19, 20, 21, 22, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "after": [0, 1, 6, 11, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "run": [0, 2, 8, 11, 13, 16, 17, 18, 19, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "command": [0, 1, 4, 6, 7, 8, 13, 14, 16, 17, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "git": [0, 4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "clone": [0, 4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "http": [0, 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "github": [0, 2, 6, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "com": [0, 2, 6, 10, 11, 13, 14, 16, 17, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "k2": [0, 2, 8, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "fsa": [0, 2, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 28, 31, 33, 34, 43, 44, 45], "icefal": [0, 2, 3, 4, 6, 7, 8, 10, 11, 14, 15, 19, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "cd": [0, 1, 2, 8, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "pip": [0, 1, 6, 8, 13, 16, 19, 25], "instal": [0, 1, 4, 6, 8, 9, 11, 12, 14, 15, 19, 22, 29, 31, 33, 34, 39, 43, 44, 45], "pre": [0, 3, 4, 6, 7, 9, 11, 12, 13, 15, 22, 29], "commit": 0, "whenev": 0, "you": [0, 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "automat": [0, 11, 29], "hook": 0, "invok": 0, "fail": [0, 13], "If": [0, 2, 4, 6, 7, 8, 11, 16, 17, 18, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "ani": [0, 4, 6, 7, 13, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44], "your": [0, 1, 2, 4, 6, 7, 9, 11, 12, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "wa": [0, 13, 14, 28, 32], "success": [0, 13, 16, 17], "pleas": [0, 1, 2, 4, 6, 7, 8, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "fix": [0, 8, 13, 16, 17, 18, 28], "issu": [0, 4, 6, 7, 8, 13, 16, 17, 28, 29, 44, 45], "report": [0, 8, 13, 29], "some": [0, 1, 4, 6, 14, 16, 17, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "i": [0, 1, 2, 4, 7, 8, 11, 13, 14, 15, 16, 17, 18, 19, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "e": [0, 2, 4, 6, 7, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "modifi": [0, 15, 22, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "file": [0, 2, 11, 12, 14, 16, 17, 18, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "place": [0, 13, 14, 25, 28, 32], "so": [0, 4, 6, 7, 11, 12, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "statu": 0, "failur": 0, "see": [0, 1, 6, 7, 11, 13, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "which": [0, 2, 4, 6, 7, 11, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 44, 45], "ha": [0, 2, 12, 15, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 41, 43, 44, 45], "been": [0, 15, 16, 17, 18, 25], "befor": [0, 1, 14, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "further": [0, 4, 6, 7], "chang": [0, 4, 6, 7, 8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "all": [0, 10, 11, 14, 16, 17, 18, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "again": [0, 16, 17, 39], "should": [0, 2, 4, 6, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "succe": 0, "thi": [0, 2, 3, 4, 5, 6, 7, 8, 9, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "time": [0, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "succeed": 0, "want": [0, 4, 6, 7, 13, 14, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "can": [0, 1, 2, 4, 6, 7, 8, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "do": [0, 2, 4, 6, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "Or": 0, "without": [0, 4, 6, 7, 9, 11, 23, 28], "your_changed_fil": 0, "py": [0, 2, 4, 6, 7, 8, 13, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "sphinx": 1, "write": [1, 2, 3], "have": [1, 2, 4, 6, 7, 10, 11, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "prepar": [1, 3, 4, 14], "environ": [1, 8, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 36, 37, 39, 44, 45], "doc": [1, 14, 41], "r": [1, 13, 16, 17, 18, 36, 37], "requir": [1, 4, 6, 13, 18, 29, 44, 45], "txt": [1, 4, 13, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "set": [1, 4, 6, 7, 8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "up": [1, 13, 14, 16, 17, 18, 23, 26, 28, 29, 31, 32, 33, 34, 44, 45], "readi": [1, 23, 28, 29], "refer": [1, 2, 6, 7, 13, 14, 15, 16, 17, 18, 20, 21, 23, 25, 26, 28, 31, 32, 33, 36, 37, 39, 41, 44, 45], "restructuredtext": 1, "primer": 1, "familiar": 1, "build": [1, 13, 14, 16, 17, 18, 23, 25, 28], "local": [1, 13, 31, 33, 34, 43, 44, 45], "preview": 1, "what": [1, 2, 13, 16, 17, 18, 25, 41], "look": [1, 2, 4, 6, 7, 10, 13, 16, 17, 18, 23, 25, 26, 28, 29], "like": [1, 2, 11, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44], "publish": [1, 14, 24], "html": [1, 2, 8, 13, 15, 16, 17, 18, 19, 20, 21, 31, 43, 44, 45], "gener": [1, 6, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "view": [1, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44, 45], "follow": [1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "python3": [1, 8, 13, 17, 18], "m": [1, 13, 16, 17, 18, 25, 31, 33, 34, 36, 37, 43, 44, 45], "server": [1, 11, 13, 43], "It": [1, 2, 6, 7, 9, 13, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "print": [1, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "serv": [1, 31, 33, 34, 43, 44, 45], "port": [1, 29, 31, 33, 34, 43, 44, 45], "8000": [1, 39], "open": [1, 4, 6, 7, 12, 14, 16, 17, 18, 24, 25, 28, 29], "browser": [1, 9, 11, 31, 33, 34, 43, 44, 45], "go": [1, 7, 13, 23, 25, 28, 31, 33, 34, 43, 44, 45], "read": [2, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "code": [2, 3, 8, 12, 16, 17, 18, 23, 28, 29, 31, 32, 36, 37, 39, 41, 44, 45], "style": [2, 3, 12], "adjust": 2, "sytl": 2, "design": 2, "python": [2, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 28, 31, 33, 34, 43, 44, 45], "recommend": [2, 6, 7, 13, 23, 25, 26, 28, 29, 31, 44, 45], "test": [2, 4, 12, 14, 15, 22, 23, 25, 26, 28, 29, 32, 33, 36, 37], "valid": [2, 13, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "dataset": [2, 8, 13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "lhots": [2, 12, 14, 16, 17, 18, 23, 25, 28], "readthedoc": [2, 13], "io": [2, 13, 15, 16, 17, 18, 19, 20, 21, 31, 43, 44, 45], "en": [2, 13, 16], "latest": [2, 11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "index": [2, 13, 15, 16, 17, 18, 19, 20, 21, 43, 44, 45], "yesno": [2, 8, 12, 13, 27, 39, 46], "veri": [2, 3, 7, 16, 17, 18, 25, 36, 37, 39, 44, 45], "good": [2, 7], "exampl": [2, 11, 12, 14, 16, 17, 18, 20, 21, 22, 29, 32, 36, 37, 39], "speech": [2, 11, 12, 13, 15, 24, 25, 39, 46], "pull": [2, 4, 6, 7, 16, 17, 18, 19, 23, 25, 28, 41], "380": [2, 16, 37], "show": [2, 4, 6, 7, 11, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "add": [2, 16, 17, 18, 23, 25, 26, 44, 46], "new": [2, 3, 11, 13, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 39, 43, 44, 45], "suppos": [2, 44, 45], "would": [2, 13, 14, 16, 17, 18, 28, 32, 44, 45], "name": [2, 8, 14, 16, 17, 18, 19, 23, 25, 31, 33, 34, 44, 45], "foo": [2, 21, 23, 28, 31, 33, 34, 43, 44, 45], "eg": [2, 8, 10, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "mkdir": [2, 16, 17, 23, 25, 26, 28, 32, 36, 37, 39], "p": [2, 4, 13, 16, 17, 25, 36, 37], "asr": [2, 4, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "touch": 2, "sh": [2, 13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "chmod": 2, "x": [2, 4, 18, 41], "simpl": [2, 25], "own": [2, 29, 31, 44, 45], "otherwis": [2, 16, 17, 18, 23, 25, 28, 29, 31, 33, 34, 43, 44, 45], "librispeech": [2, 4, 6, 7, 8, 10, 12, 14, 16, 17, 18, 19, 20, 21, 27, 28, 29, 31, 32, 33, 34, 40, 41, 43, 44, 45, 46], "assum": [2, 4, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 43, 44, 45], "fanci": 2, "call": [2, 8, 19, 29], "bar": [2, 21, 23, 28, 31, 33, 34, 43, 44, 45], "organ": 2, "wai": [2, 3, 22, 31, 33, 34, 41, 43, 44, 45], "readm": [2, 23, 25, 26, 28, 32, 36, 37, 39], "md": [2, 10, 14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "asr_datamodul": [2, 8, 13], "pretrain": [2, 4, 6, 7, 14, 16, 17, 18, 19, 21, 23, 25, 26, 28, 32, 36, 37, 39], "For": [2, 4, 6, 7, 8, 10, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "instanc": [2, 8, 10, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "tdnn": [2, 8, 13, 24, 27, 30, 35, 38], "its": [2, 4, 14, 15, 16, 17, 18, 21, 25, 33], "directori": [2, 12, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "structur": [2, 18], "descript": [2, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "contain": [2, 12, 14, 15, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45, 46], "inform": [2, 4, 6, 14, 23, 25, 26, 28, 31, 32, 33, 36, 37, 39, 41, 43, 44, 45], "g": [2, 4, 6, 7, 13, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "wer": [2, 5, 13, 14, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "etc": [2, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "provid": [2, 11, 13, 14, 15, 16, 17, 18, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45, 46], "pytorch": [2, 8, 12, 16, 17, 18, 25], "dataload": [2, 13], "take": [2, 7, 14, 29, 31, 39, 44, 45], "input": [2, 14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39, 41], "checkpoint": [2, 4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "save": [2, 13, 14, 17, 18, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "dure": [2, 4, 5, 7, 8, 11, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "stage": [2, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "": [2, 4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "definit": [2, 16, 17], "neural": [2, 4, 6, 7, 23, 28], "network": [2, 23, 25, 28, 31, 33, 34, 43, 44, 45], "script": [2, 6, 7, 12, 13, 21, 22, 23, 25, 26, 28, 29, 32, 36, 37, 39, 43], "infer": [2, 14, 16, 17], "tdnn_lstm_ctc": [2, 26, 32, 37], "conformer_ctc": [2, 23, 28], "get": [2, 11, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 39, 41, 43, 44, 45], "feel": [2, 29, 43], "result": [2, 4, 7, 10, 11, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "everi": [2, 14, 31, 33, 34, 43, 44, 45], "kept": [2, 31, 44, 45], "self": [2, 15, 18, 41], "toler": 2, "duplic": 2, "among": [2, 13], "differ": [2, 13, 16, 17, 18, 19, 23, 24, 28, 29, 31, 41, 43, 44, 45], "invoc": [2, 16, 17], "help": [2, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "blob": [2, 10, 14, 21, 31, 33, 34, 43, 44, 45], "master": [2, 6, 10, 13, 14, 17, 18, 20, 21, 25, 29, 31, 33, 34, 43, 44, 45], "transform": [2, 6, 7, 23, 28, 43], "conform": [2, 20, 24, 25, 27, 30, 31, 33, 43, 44, 45], "base": [2, 4, 7, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "lstm": [2, 15, 21, 22, 24, 27, 30, 35, 40, 42], "attent": [2, 18, 25, 26, 29, 41, 44, 45], "lm": [2, 4, 5, 7, 12, 13, 25, 31, 32, 36, 37, 39, 44, 45], "rescor": [2, 5, 12, 26, 32, 34, 36, 37, 39], "demonstr": [2, 9, 11, 14, 19], "consid": [2, 4, 18], "colab": 2, "notebook": 2, "welcom": 3, "There": [3, 4, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "mani": [3, 44, 45], "two": [3, 4, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "them": [3, 6, 9, 10, 11, 13, 16, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "To": [3, 4, 6, 7, 11, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "document": [3, 12, 14, 15, 16, 17, 18, 19, 34], "repositori": [3, 16, 17, 18, 19], "recip": [3, 4, 6, 7, 10, 12, 13, 14, 19, 23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 41, 43, 44, 45], "In": [3, 4, 6, 8, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 29, 32, 36, 37, 39, 41], "page": [3, 11, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "describ": [3, 5, 9, 14, 16, 17, 19, 20, 21, 22, 23, 25, 26, 28, 31, 32, 36, 37, 44, 45], "how": [3, 4, 5, 6, 7, 9, 11, 12, 13, 16, 17, 18, 19, 22, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "creat": [3, 4, 6, 7, 12, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44], "data": [3, 4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 24], "train": [3, 4, 6, 7, 8, 9, 11, 12, 14, 15, 20, 21, 22, 41], "decod": [3, 4, 8, 11, 12, 16, 17, 18, 21, 22], "model": [3, 4, 6, 7, 9, 11, 12, 13, 15, 29, 41], "As": [4, 6, 7, 16, 25, 28, 29], "type": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 28, 31, 33, 34, 39, 41, 43, 44, 45], "e2": [4, 7], "usual": [4, 6, 7, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "an": [4, 6, 7, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 28, 29, 31, 34, 39, 43, 44, 45], "intern": 4, "languag": [4, 7, 11, 12, 23, 25, 26], "learn": [4, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "level": 4, "corpu": [4, 6, 7, 24], "real": 4, "life": 4, "scenario": 4, "often": [4, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "mismatch": [4, 44], "between": [4, 7, 31, 44, 45], "target": [4, 11], "space": [4, 9, 12], "problem": [4, 6, 7, 13, 29], "when": [4, 6, 8, 11, 16, 17, 18, 22, 25, 28, 29, 31, 33, 34, 44, 45], "act": 4, "against": 4, "extern": [4, 5, 6, 7], "tutori": [4, 6, 7, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 43, 44, 45], "low": [4, 16, 17], "order": [4, 13, 16, 17, 18, 23, 26, 28, 32, 36, 37], "densiti": 4, "ratio": 4, "allevi": 4, "effect": [4, 7, 18], "improv": [4, 5, 6, 7, 25], "perform": [4, 6, 7, 15, 25, 29, 44], "languga": 4, "integr": [4, 11], "pruned_transducer_stateless7_stream": [4, 6, 7, 18, 19, 45], "stream": [4, 6, 7, 12, 15, 16, 17, 19, 22, 23, 28, 36, 37, 43, 46], "howev": [4, 6, 7, 14, 17, 29], "easili": [4, 6, 7, 23, 26, 28], "appli": [4, 6, 7, 25, 41], "other": [4, 7, 14, 17, 18, 19, 25, 28, 29, 31, 32, 36, 37, 39, 41, 44, 45, 46], "encount": [4, 6, 7, 8, 13, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "here": [4, 6, 7, 14, 16, 17, 18, 23, 25, 26, 28, 29, 32, 41, 44], "simplic": [4, 6, 7], "same": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "domain": [4, 6, 7], "gigaspeech": [4, 6, 7, 10, 20, 43], "first": [4, 6, 8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "let": [4, 6, 7, 13, 16, 17, 18, 23, 28], "background": 4, "predecessor": 4, "dr": 4, "propos": [4, 25, 41, 45], "address": [4, 11, 13, 14, 16, 17, 18, 25, 31, 34, 43, 44, 45], "sourc": [4, 13, 14, 16, 17, 18, 23, 24, 25, 28], "acoust": [4, 44, 45], "similar": [4, 29, 33, 44, 45], "deriv": 4, "formular": 4, "bay": 4, "theorem": 4, "text": [4, 6, 7, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "score": [4, 7, 23, 28, 31, 44, 45], "left": [4, 16, 18, 25, 44, 45], "y_u": 4, "mathit": 4, "y": 4, "right": [4, 16, 25, 41, 44], "log": [4, 8, 13, 16, 17, 18, 32, 36, 37, 39], "y_": 4, "u": [4, 13, 16, 17, 18, 23, 25, 26, 28, 29, 39], "lambda_1": 4, "p_": 4, "lambda_2": 4, "where": [4, 8, 44], "weight": [4, 23, 26, 28, 33, 34, 43], "respect": 4, "onli": [4, 6, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45, 46], "compar": [4, 16, 17, 18, 44], "shallow": [4, 5, 12], "fusion": [4, 5, 12], "subtract": 4, "work": [4, 16, 17, 18, 28], "treat": [4, 17, 18], "predictor": 4, "joiner": [4, 16, 17, 18, 19, 21, 25, 31, 43, 44, 45], "weak": 4, "captur": 4, "therefor": [4, 8], "n": [4, 6, 23, 29, 31, 33, 34, 36, 37, 43, 44, 45], "gram": [4, 6, 13, 23, 25, 26, 31, 32, 34, 36, 37, 44, 45], "approxim": 4, "ilm": 4, "lead": [4, 7], "formula": 4, "rnnt": [4, 31, 44, 45], "bi": [4, 6], "addit": 4, "estim": 4, "comar": 4, "li": 4, "choic": 4, "accord": 4, "origin": 4, "paper": [4, 29, 31, 43, 44, 45], "achiev": [4, 6, 7, 41], "both": [4, 31, 33, 34, 41, 43, 44, 45], "intra": 4, "cross": 4, "much": [4, 16, 17], "faster": [4, 6], "evalu": 4, "now": [4, 6, 13, 16, 17, 18, 23, 28, 29, 31, 32, 33, 34, 36, 37, 43, 44, 45], "illustr": [4, 6, 7], "purpos": [4, 6, 7, 16, 17], "from": [4, 6, 7, 8, 9, 11, 13, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "link": [4, 6, 7, 10, 13, 14, 15, 31, 33, 34, 43, 44, 45], "scratch": [4, 6, 7, 31, 33, 34, 43, 44, 45], "prune": [4, 6, 7, 14, 18, 19, 25, 27, 29, 30, 40, 41, 42, 43, 45], "statelessx": [4, 6, 7, 27, 29, 30, 40, 41, 42], "initi": [4, 6, 7, 23, 26], "step": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "download": [4, 6, 7, 8, 11, 12, 15, 22, 24, 29], "git_lfs_skip_smudg": [4, 6, 7, 16, 17, 18, 19], "huggingfac": [4, 6, 7, 10, 12, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 33, 34, 36, 37, 39, 43], "co": [4, 6, 7, 10, 11, 14, 16, 17, 18, 19, 23, 24, 25, 26, 28, 32, 33, 34, 36, 37, 39, 43], "zengwei": [4, 6, 7, 16, 18, 19, 34, 43], "stateless7": [4, 6, 7, 18, 19], "2022": [4, 6, 7, 14, 16, 17, 18, 19, 25, 31, 33, 34, 43, 44], "12": [4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 31, 33, 34, 36, 39, 43, 44, 45], "29": [4, 6, 7, 13, 18, 19, 23, 25, 26, 28, 32, 33, 36, 37], "pushd": [4, 6, 7, 19], "exp": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "lf": [4, 6, 7, 14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 34, 36, 37, 39], "includ": [4, 6, 7, 16, 17, 18, 19, 31, 33, 34, 43, 44, 45], "pt": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "ln": [4, 6, 7, 14, 16, 17, 18, 19, 23, 28, 31, 33, 34, 43, 44, 45], "epoch": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "99": [4, 6, 7, 13, 16, 17, 18, 19], "symbol": [4, 6, 7, 13, 25, 31, 44, 45], "load": [4, 6, 7, 13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "done": [4, 6, 7, 13, 14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "via": [4, 6, 7, 13, 15, 20, 21, 22], "exp_dir": [4, 6, 7, 13, 16, 17, 18, 25, 28, 29, 31, 33, 34, 44, 45], "avg": [4, 6, 7, 13, 14, 16, 17, 18, 19, 20, 21, 25, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "averag": [4, 6, 7, 13, 14, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "fals": [4, 6, 7, 13, 14, 16, 17, 18, 23, 25, 28, 29], "dir": [4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "bpe": [4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 28, 31, 33, 34, 43, 44, 45], "lang_bpe_500": [4, 6, 7, 14, 16, 17, 18, 19, 20, 21, 28, 31, 33, 34, 43, 44, 45], "max": [4, 6, 7, 14, 16, 17, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "durat": [4, 6, 7, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "600": [4, 6, 7, 14, 28, 31, 33, 43, 44, 45], "chunk": [4, 6, 7, 16, 18, 19, 44, 45], "len": [4, 6, 7, 18, 19, 45], "32": [4, 6, 7, 16, 17, 18, 19, 23, 25, 26, 45], "method": [4, 7, 11, 13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 43, 44, 45], "modified_beam_search": [4, 6, 7, 11, 25, 29, 31, 33, 43, 44, 45], "clean": [4, 13, 18, 23, 25, 28, 29, 31, 32, 33, 34, 43, 44, 45], "beam_size_4": [4, 6, 7], "11": [4, 6, 7, 8, 13, 16, 17, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "best": [4, 6, 7, 16, 17, 18, 23, 26, 28], "7": [4, 6, 7, 13, 14, 15, 18, 22, 23, 26, 28, 31, 32, 36, 37, 43, 44], "93": [4, 6, 7], "Then": [4, 6], "necessari": [4, 29], "note": [4, 6, 7, 8, 14, 16, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "960": [4, 28, 31, 33, 34, 43, 44, 45], "hour": [4, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "ezerhouni": [4, 6, 7], "popd": [4, 6, 7, 19], "marcoyang": [4, 6], "librispeech_bigram": [4, 6], "2gram": [4, 6], "fst": [4, 13, 25, 39], "modified_beam_search_lm_lodr": 4, "lm_dir": [4, 6, 7, 13, 28], "lm_scale": [4, 6, 7], "42": [4, 13, 17, 23, 28, 39], "lodr_scal": 4, "24": [4, 8, 13, 16, 17, 26, 32, 36, 37, 39], "scale": [4, 6, 7, 16, 17, 23, 28, 29, 32, 34, 36, 37], "embed": [4, 6, 7, 25, 31, 43, 44, 45], "dim": [4, 6, 7, 16, 17, 18, 25, 31, 44], "2048": [4, 6, 7, 14, 16, 17, 18, 25], "hidden": [4, 6, 7, 17, 43], "num": [4, 6, 7, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "layer": [4, 6, 7, 16, 17, 18, 25, 29, 31, 41, 43, 44, 45], "vocab": [4, 6, 7, 28], "500": [4, 6, 7, 13, 14, 16, 17, 18, 25, 28, 34, 43], "token": [4, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "ngram": [4, 28, 32, 36, 37], "2": [4, 6, 7, 12, 14, 15, 22, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "extra": [4, 16, 17, 18, 25, 41, 44], "argument": [4, 7, 29, 41], "need": [4, 6, 11, 13, 14, 15, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "given": [4, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 44, 45], "specifi": [4, 7, 8, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "neg": [4, 25], "number": [4, 7, 11, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "obtain": [4, 7, 23, 25, 26, 28, 32, 36, 37], "shown": [4, 7], "below": [4, 7, 13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44], "61": [4, 6], "6": [4, 6, 7, 8, 13, 15, 22, 23, 25, 28, 31, 32, 36, 37, 43], "74": [4, 6, 13, 14], "recal": 4, "lowest": [4, 31, 33, 34, 43, 44, 45], "77": [4, 6, 7, 13, 28], "08": [4, 6, 7, 13, 18, 28, 32, 34, 36, 37, 39, 43], "inde": 4, "even": [4, 11, 13, 17], "better": [4, 6], "increas": [4, 6, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "8": [4, 6, 7, 8, 13, 14, 16, 17, 18, 23, 25, 28, 29, 31, 32, 33, 34, 39, 43, 44, 45], "45": [4, 6, 13, 16, 18, 23, 25, 28], "38": [4, 6, 16, 23, 25, 28, 36], "23": [4, 6, 8, 13, 16, 17, 18, 23, 25, 26, 28, 36, 37, 39], "section": [5, 8, 9, 13, 14, 19, 20, 21, 22, 23, 28], "langugag": 5, "transduc": [5, 12, 14, 15, 19, 22, 24, 27, 29, 30, 40, 41, 42], "lodr": [5, 12], "rnn": [5, 6, 7, 12, 17, 25, 31, 33, 43, 44, 45], "commonli": [6, 7, 23, 25, 26, 28, 32, 36, 37, 39], "approach": 6, "incorpor": 6, "unlik": 6, "re": [6, 8, 23, 26, 28, 29, 31, 33, 34, 41, 43, 44, 45], "rank": 6, "hypothes": 6, "search": [6, 7, 10, 11], "more": [6, 13, 16, 17, 18, 23, 28, 29, 39, 41, 43, 44], "effici": [6, 7, 31, 44, 45], "than": [6, 14, 17, 23, 25, 26, 28, 31, 32, 33, 34, 39, 43, 44, 45], "sinc": [6, 16, 17, 18, 29, 39, 43], "less": [6, 14, 28, 32, 39, 44, 45], "comput": [6, 13, 14, 16, 17, 18, 23, 25, 26, 29, 31, 32, 34, 36, 37, 39, 43, 44, 45], "gpu": [6, 7, 13, 16, 17, 23, 25, 26, 28, 29, 31, 33, 34, 36, 37, 39, 43, 44, 45], "try": [6, 8, 9, 11, 29, 31, 33, 34, 43, 44, 45], "might": [6, 7, 17, 18, 44, 45], "ideal": [6, 7], "mai": [6, 7, 13, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45, 46], "also": [6, 7, 9, 10, 13, 14, 15, 16, 17, 18, 19, 21, 23, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44, 45], "With": 6, "rnnlm": 6, "avail": [6, 13, 14, 16, 17, 18, 23, 25, 28, 32, 36, 37, 39, 43], "modified_beam_search_lm_rescor": 6, "43": [6, 17, 18, 28], "great": 6, "made": [6, 16], "boost": [6, 7], "tabl": [6, 11, 16, 17, 18], "67": [6, 13], "59": [6, 13, 16, 26, 28], "86": 6, "fact": 6, "arpa": [6, 39], "performn": 6, "modified_beam_search_lm_rescore_lodr": 6, "depend": [6, 13, 23, 28], "kenlm": 6, "kpu": 6, "archiv": 6, "zip": 6, "execut": [6, 7, 16, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "9": [6, 13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 39, 43, 44, 45], "57": [6, 13, 17, 28, 32], "slightli": 6, "63": [6, 13, 25], "04": [6, 13, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37], "52": [6, 13, 23, 28], "73": [6, 13], "mention": 6, "earlier": 6, "benchmark": [6, 25], "speed": [6, 16, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "132": [6, 13], "95": [6, 24], "177": [6, 14, 17, 18, 25, 26, 28], "96": [6, 13], "210": [6, 36, 37], "modified_beam_search_lm_shallow_fus": [6, 7], "262": [6, 7], "62": [6, 7, 13, 28, 32], "65": [6, 7, 13, 16], "352": [6, 7, 13, 28], "58": [6, 7, 8, 13, 28], "488": [6, 7, 16, 17, 18], "400": [6, 24], "610": [6, 13], "870": 6, "156": 6, "203": [6, 14, 28], "255": [6, 17, 18], "160": 6, "263": [6, 13, 17], "singl": [6, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "32g": 6, "v100": [6, 23, 25, 26, 28], "vari": 6, "word": [7, 23, 25, 26, 28, 32, 36, 37, 39], "error": [7, 8, 13, 16, 17, 18, 28], "rate": [7, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "These": [7, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "alreadi": [7, 13, 14], "But": [7, 16, 31, 33, 34, 43, 44, 45], "long": [7, 16], "true": [7, 13, 14, 16, 17, 18, 23, 25, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "either": [7, 11, 23, 25, 26, 28, 44, 45], "choos": [7, 11, 13, 29, 31, 33, 34, 43, 44, 45], "three": [7, 16, 17, 18, 21, 23, 25, 41], "associ": 7, "dimens": [7, 31, 44, 45], "obviou": 7, "rel": 7, "reduct": [7, 13, 16, 17, 33], "around": 7, "A": [7, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 43, 44, 45], "few": [7, 16, 17, 18, 29], "paramet": [7, 14, 16, 17, 18, 20, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "tune": [7, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "control": [7, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "too": 7, "small": [7, 25, 36, 37, 39], "fulli": 7, "util": [7, 8, 13, 28], "larg": 7, "domin": 7, "bad": 7, "typic": [7, 23, 25, 26, 28], "valu": [7, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "activ": [7, 11], "path": [7, 11, 13, 14, 16, 17, 18, 21, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "trade": 7, "off": [7, 16], "accuraci": [7, 16, 17, 24], "larger": [7, 17, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "slower": 7, "collect": [8, 13], "user": [8, 13], "post": 8, "correspond": [8, 10, 11], "solut": 8, "One": 8, "torch": [8, 12, 13, 14, 15, 22, 23, 25, 28], "torchaudio": [8, 12, 41], "cu111": 8, "torchvis": 8, "f": [8, 13, 36, 37], "org": [8, 13, 24, 25, 31, 43, 44, 45], "whl": [8, 13], "torch_stabl": [8, 13], "throw": [8, 16, 17, 18], "cuda": [8, 12, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "while": [8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "That": [8, 16, 17, 29, 31, 43, 44, 45], "cu11": 8, "correct": 8, "traceback": 8, "most": [8, 44, 45], "recent": [8, 16, 17, 18], "last": 8, "line": [8, 13, 16, 17, 18, 31, 44, 45], "14": [8, 13, 14, 16, 17, 20, 23, 28, 31, 32, 33, 36, 43, 44, 45], "yesnoasrdatamodul": 8, "home": [8, 16, 17, 23, 28], "xxx": [8, 14, 16, 17, 18], "next": [8, 11, 13, 16, 17, 18, 28, 29, 31, 32, 33, 34, 43, 44, 45], "gen": [8, 11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "kaldi": [8, 11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "34": [8, 13, 16, 17], "datamodul": 8, "__init__": [8, 13, 14, 16, 17, 18, 23, 25, 28], "add_eo": 8, "add_so": 8, "get_text": 8, "39": [8, 13, 16, 18, 25, 28, 32, 36], "tensorboard": [8, 13, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "summarywrit": 8, "miniconda3": 8, "env": 8, "yyi": 8, "lib": [8, 13, 18], "site": [8, 13, 18], "packag": [8, 13, 18], "loosevers": 8, "uninstal": 8, "setuptool": [8, 13], "conda": [8, 13], "dev": [8, 13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "yangyifan": 8, "anaconda3": 8, "dev20230112": 8, "cuda11": [8, 13], "torch1": [8, 13], "13": [8, 13, 14, 16, 17, 18, 25, 26, 28, 32, 33, 36], "py3": [8, 13], "linux": [8, 11, 13, 15, 16, 17, 18, 19], "x86_64": [8, 13, 16], "egg": [8, 13], "_k2": [8, 13], "determinizeweightpushingtyp": 8, "handl": [8, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "except": [8, 14], "anoth": 8, "occur": 8, "pruned_transducer_stateless7_ctc_b": [8, 33], "104": 8, "30": [8, 13, 16, 17, 18, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "rais": 8, "anaconda": 8, "maco": [8, 11, 15, 16, 17, 18, 19], "probabl": [8, 13, 25, 31, 33, 43, 44, 45], "variabl": [8, 13, 16, 17, 18, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "export": [8, 12, 13, 23, 25, 26, 28, 29, 32, 36, 37, 39], "dyld_library_path": 8, "conda_prefix": 8, "find": [8, 9, 10, 11, 13, 14, 16, 17, 18, 21, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "locat": [8, 16], "libpython": 8, "abl": 8, "insid": [8, 21], "codna_prefix": 8, "ld_library_path": 8, "within": [9, 11, 16, 17], "anyth": [9, 11], "youtub": [9, 12, 28, 29, 31, 32, 33, 34, 43, 44, 45], "video": [9, 12, 28, 29, 31, 32, 33, 34, 43, 44, 45], "upload": [10, 11, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "visit": [10, 11, 31, 33, 34, 43, 44, 45], "specif": [10, 19, 25], "aishel": [10, 12, 23, 25, 26, 27, 46], "wenetspeech": [10, 20], "framework": [11, 31, 44], "sherpa": [11, 15, 20, 21, 22, 43], "window": [11, 15, 16, 17, 18, 19], "ipad": 11, "phone": 11, "start": [11, 13, 14, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "recognit": [11, 12, 15, 16, 17, 24, 25, 39, 46], "screenshot": [11, 23, 25, 26, 28, 29, 31, 39, 43, 44], "select": [11, 16, 17, 18, 31, 32, 36, 37, 39, 43, 44, 45], "current": [11, 16, 17, 25, 29, 41, 43, 44, 45, 46], "chines": [11, 24, 25], "english": [11, 39, 43], "greedi": 11, "record": [11, 17, 18, 23, 24, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "click": [11, 13, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "button": 11, "submit": 11, "wait": 11, "moment": 11, "bottom": [11, 31, 33, 34, 43, 44, 45], "part": [11, 13, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "one": [11, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "subscrib": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "channel": [11, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "nadira": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "povei": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "www": [11, 13, 24, 28, 29, 31, 32, 33, 34, 43, 44, 45], "uc_vaumpkminz1pnkfxan9mw": [11, 13, 28, 29, 31, 32, 33, 34, 43, 44, 45], "toolkit": 12, "cudnn": 12, "frequent": 12, "ask": 12, "question": 12, "faq": 12, "oserror": 12, "libtorch_hip": 12, "cannot": [12, 16, 17, 18], "share": [12, 13], "object": [12, 13, 23, 25, 26, 31, 39, 43, 44], "attributeerror": 12, "modul": [12, 13, 16, 18, 33, 44], "distutil": 12, "attribut": [12, 18, 28], "importerror": 12, "libpython3": 12, "No": [12, 16, 17, 18, 39], "state_dict": [12, 22, 23, 25, 26, 28, 32, 36, 37, 39], "jit": [12, 15, 22, 28], "trace": [12, 15, 20, 22], "onnx": [12, 14, 22], "ncnn": [12, 22], "non": [12, 28, 41, 44, 46], "timit": [12, 27, 36, 37, 46], "introduct": [12, 40, 46], "contribut": 12, "who": 13, "about": [13, 16, 17, 18, 25, 29, 31, 34, 43, 44, 45], "suggest": [13, 31, 33, 34, 43, 44, 45], "virut": 13, "venv": 13, "my_env": 13, "bin": [13, 16, 17, 18, 23, 28], "matter": [13, 16], "compil": [13, 16, 17, 23, 25, 28], "wheel": [13, 16], "don": [13, 16, 17, 18, 20, 23, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "t": [13, 16, 17, 18, 19, 20, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "from_sourc": 13, "for_develop": 13, "alwai": [13, 14], "strongli": 13, "pythonpath": [13, 16, 17, 18], "point": [13, 14, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "folder": [13, 14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "tmp": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "setup": [13, 16, 23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 44, 45], "put": [13, 16, 17, 33, 44], "sever": [13, 14, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "switch": [13, 23, 28, 34], "just": [13, 16, 17, 18, 41], "virtualenv": 13, "cpython3": 13, "final": [13, 14, 16, 17, 28, 32], "64": [13, 14, 16, 25, 44], "1540m": 13, "creator": 13, "cpython3posix": 13, "dest": 13, "ceph": [13, 14, 23, 25, 28], "fj": [13, 14, 16, 17, 18, 25, 28], "fangjun": [13, 14, 16, 17, 18, 25, 28], "clear": 13, "no_vcs_ignor": 13, "global": 13, "seeder": 13, "fromappdata": 13, "bundl": 13, "copi": [13, 41], "app_data_dir": 13, "root": [13, 16, 17, 18], "v": [13, 16, 17, 18, 28, 36, 37], "irtualenv": 13, "ad": [13, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 41, 43, 44, 45], "seed": 13, "21": [13, 14, 16, 23, 25, 28, 36, 37], "36": [13, 16, 25, 28, 29], "bashactiv": 13, "cshellactiv": 13, "fishactiv": 13, "powershellactiv": 13, "pythonactiv": 13, "xonshactiv": 13, "dev20210822": 13, "cpu": [13, 14, 16, 17, 18, 20, 23, 31, 33, 34, 39, 44, 45], "nightli": 13, "2bcpu": 13, "cp38": 13, "linux_x86_64": 13, "mb": [13, 16, 17, 18], "________________________________": 13, "185": [13, 23, 28, 39], "kb": [13, 16, 17, 18, 36, 37], "graphviz": 13, "17": [13, 14, 16, 17, 18, 23, 28, 36, 37, 43], "none": [13, 23, 28], "18": [13, 16, 17, 18, 23, 25, 26, 28, 31, 32, 36, 37, 43, 44, 45], "cach": [13, 18], "manylinux1_x86_64": 13, "831": [13, 25, 37], "extens": 13, "typing_extens": 13, "26": [13, 16, 17, 18, 25, 28, 37], "successfulli": [13, 16, 17, 18], "req": 13, "7b1b76ge": 13, "q": 13, "audioread": 13, "soundfil": 13, "post1": 13, "py2": 13, "97": [13, 16, 23], "cytoolz": 13, "manylinux_2_17_x86_64": 13, "manylinux2014_x86_64": 13, "dataclass": 13, "h5py": 13, "manylinux_2_12_x86_64": 13, "manylinux2010_x86_64": 13, "684": [13, 23, 39], "intervaltre": 13, "lilcom": 13, "numpi": 13, "15": [13, 14, 16, 17, 18, 25, 26, 28, 36, 39], "40": [13, 16, 17, 18, 26, 28, 32, 36, 37], "pyyaml": 13, "662": 13, "tqdm": 13, "76": [13, 39], "satisfi": 13, "2a1410b": 13, "toolz": 13, "55": [13, 16, 26, 28, 36], "sortedcontain": 13, "cffi": 13, "411": [13, 18, 28], "pycpars": 13, "20": [13, 14, 16, 18, 23, 25, 26, 28, 31, 32, 36, 37, 39, 44], "112": [13, 16, 17, 18], "pypars": 13, "filenam": [13, 16, 17, 18, 19, 20, 21, 33, 34, 43, 45], "dev_2a1410b_clean": 13, "size": [13, 14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "342242": 13, "sha256": 13, "f683444afa4dc0881133206b4646a": 13, "9d0f774224cc84000f55d0a67f6e4a37997": 13, "store": [13, 28], "ephem": 13, "ftu0qysz": 13, "7f": 13, "7a": 13, "8e": 13, "a0bf241336e2e3cb573e1e21e5600952d49f5162454f2e612f": 13, "warn": 13, "built": 13, "invalid": [13, 28], "metadata": [13, 36, 37], "mandat": 13, "pep": 13, "440": 13, "packa": 13, "ging": 13, "deprec": [13, 25], "legaci": 13, "becaus": 13, "could": [13, 16, 17, 18, 23, 26], "replac": [13, 16, 17], "discuss": 13, "regard": 13, "pypa": 13, "sue": 13, "8368": 13, "inter": 13, "valtre": 13, "sor": 13, "tedcontain": 13, "remot": 13, "enumer": 13, "count": 13, "100": [13, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "compress": 13, "308": [13, 23, 25, 26], "total": [13, 17, 18, 23, 25, 26, 28, 29, 31, 32, 39, 43, 44], "delta": 13, "reus": 13, "307": 13, "102": [13, 18, 23], "pack": [13, 44, 45], "receiv": 13, "172": 13, "49": [13, 16, 17, 28, 37, 39], "kib": 13, "385": 13, "00": [13, 16, 23, 25, 26, 28, 32, 36, 37, 39], "resolv": 13, "kaldilm": 13, "tar": 13, "gz": 13, "48": [13, 16, 17, 23, 25], "574": 13, "kaldialign": 13, "sentencepiec": [13, 28], "41": [13, 16, 18, 23, 25, 36, 39], "absl": 13, "absl_pi": 13, "googl": [13, 31, 33, 34, 43, 44, 45], "auth": 13, "oauthlib": 13, "google_auth_oauthlib": 13, "grpcio": 13, "ment": 13, "requi": 13, "rement": 13, "protobuf": 13, "manylinux_2_5_x86_64": 13, "werkzeug": 13, "288": 13, "tensorboard_data_serv": 13, "google_auth": 13, "35": [13, 14, 16, 17, 18, 25, 28, 43], "152": 13, "request": [13, 41], "plugin": 13, "wit": 13, "tensorboard_plugin_wit": 13, "781": 13, "markdown": 13, "six": 13, "16": [13, 14, 16, 17, 18, 21, 23, 25, 26, 28, 31, 32, 36, 37, 39, 43, 44, 45], "cachetool": 13, "rsa": 13, "pyasn1": 13, "pyasn1_modul": 13, "155": 13, "requests_oauthlib": 13, "urllib3": 13, "27": [13, 16, 17, 18, 23, 25, 32, 37], "138": [13, 23, 25], "certifi": 13, "2017": 13, "2021": [13, 23, 26, 28, 32, 36, 37, 39], "145": 13, "charset": 13, "normal": [13, 32, 36, 37, 39, 44], "charset_norm": 13, "idna": 13, "146": 13, "897233": 13, "eccb906cafcd45bf9a7e1a1718e4534254bfb": 13, "f4c0d0cbc66eee6c88d68a63862": 13, "85": 13, "7d": 13, "f2dd586369b8797cb36d213bf3a84a789eeb92db93d2e723c9": 13, "etool": 13, "oaut": 13, "hlib": 13, "2023": [13, 16, 17, 18, 33], "05": [13, 14, 16, 17, 23, 25, 26, 28, 37], "main": [13, 23, 28, 41], "dl_dir": [13, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "waves_yesno": 13, "_______________________________________________________________": 13, "70m": 13, "06": [13, 14, 16, 26, 28, 32, 39], "54": [13, 17, 18, 28, 32, 36, 37], "4kb": 13, "02": [13, 14, 16, 17, 18, 25, 28, 31, 37, 43, 44], "19": [13, 14, 16, 17, 18, 23, 28, 32, 36, 37], "manifest": [13, 29], "fbank": [13, 14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39], "199": [13, 28, 32], "info": [13, 14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39], "compute_fbank_yesno": 13, "process": [13, 14, 16, 17, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "extract": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "featur": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "90": [13, 16], "212": 13, "60it": 13, "640": [13, 18], "304": [13, 17], "53it": 13, "51": [13, 16, 23, 28, 39], "lang": [13, 14, 25, 28, 34], "66": [13, 17], "project": 13, "csrc": [13, 28], "arpa_file_pars": 13, "cc": 13, "void": 13, "arpafilepars": 13, "std": 13, "istream": 13, "79": 13, "140": [13, 26], "92": [13, 28], "hlg": [13, 32, 36, 37, 39], "28": [13, 16, 17, 25, 28, 32], "581": [13, 16, 32], "compile_hlg": 13, "124": [13, 23, 28], "lang_phon": [13, 26, 32, 36, 37, 39], "582": 13, "lexicon": [13, 23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "171": [13, 26, 28, 36, 37], "convert": [13, 16, 17, 18, 28], "l": [13, 16, 17, 18, 25, 36, 37, 39], "linv": [13, 25, 28, 39], "609": 13, "ctc_topo": 13, "max_token_id": 13, "611": 13, "intersect": [13, 31, 44, 45], "613": 13, "lg": [13, 31, 34, 44, 45], "shape": [13, 18], "connect": [13, 14, 28, 31, 32, 43, 44, 45], "614": 13, "68": [13, 28], "70": 13, "class": [13, 28], "tensor": [13, 17, 18, 23, 25, 26, 28, 31, 39, 43, 44], "71": [13, 28, 32], "determin": 13, "615": 13, "rag": 13, "raggedtensor": 13, "remov": [13, 23, 25, 26, 28, 32, 36, 37], "disambigu": 13, "616": 13, "91": 13, "remove_epsilon": 13, "617": 13, "arc": 13, "compos": 13, "h": 13, "619": 13, "106": [13, 17, 28], "109": [13, 23, 28], "111": [13, 28], "127": [13, 16, 17, 39], "cuda_visible_devic": [13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "case": [13, 14, 16, 17, 18, 31, 33, 34, 43, 44, 45], "segment": 13, "fault": 13, "core": 13, "dump": 13, "protocol_buffers_python_implement": 13, "674": 13, "interest": [13, 29, 31, 33, 34, 43, 44, 45], "759": [13, 25], "481": 13, "482": 13, "posixpath": [13, 16, 17, 18, 25, 28], "lang_dir": [13, 25, 28], "lr": [13, 25, 43], "01": [13, 16, 25, 26, 28, 29, 33], "feature_dim": [13, 14, 16, 17, 18, 23, 25, 28, 39], "weight_decai": 13, "1e": 13, "start_epoch": 13, "best_train_loss": [13, 14, 16, 17, 18], "inf": [13, 14, 16, 17, 18], "best_valid_loss": [13, 14, 16, 17, 18], "best_train_epoch": [13, 14, 16, 17, 18], "best_valid_epoch": [13, 14, 17, 18], "batch_idx_train": [13, 14, 16, 17, 18], "log_interv": [13, 14, 16, 17, 18], "reset_interv": [13, 14, 16, 17, 18], "valid_interv": [13, 14, 16, 17, 18], "beam_siz": [13, 14, 25], "sum": 13, "use_double_scor": [13, 23, 28, 39], "world_siz": [13, 29], "master_port": 13, "12354": 13, "num_epoch": 13, "feature_dir": [13, 28], "max_dur": [13, 28], "bucketing_sampl": [13, 28], "num_bucket": [13, 28], "concatenate_cut": [13, 28], "duration_factor": [13, 28], "gap": [13, 28], "on_the_fly_feat": [13, 28], "shuffl": [13, 28], "return_cut": [13, 28], "num_work": [13, 28], "env_info": [13, 14, 16, 17, 18, 23, 25, 28], "releas": [13, 14, 16, 17, 18, 23, 25, 28], "sha1": [13, 14, 16, 17, 18, 23, 25, 28], "3b7f09fa35e72589914f67089c0da9f196a92ca4": 13, "date": [13, 14, 16, 17, 18, 23, 25, 28], "mon": [13, 17, 18], "6fcfced": 13, "cu118": 13, "branch": [13, 14, 16, 17, 18, 23, 25, 28, 33], "30bde4b": 13, "thu": [13, 14, 16, 17, 18, 25, 28, 32], "37": [13, 17, 23, 25, 28, 36], "47": [13, 16, 17, 18, 23, 28], "dev20230512": 13, "torch2": 13, "hostnam": [13, 14, 16, 17, 18, 25], "host": [13, 14], "ip": [13, 14, 16, 17, 18, 25], "761": 13, "168": [13, 32], "764": 13, "495": 13, "devic": [13, 14, 16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 44, 45], "791": [13, 32], "cut": [13, 28], "244": 13, "852": 13, "149": [13, 16, 28], "singlecutsampl": 13, "205": [13, 28], "853": 13, "218": [13, 17], "252": 13, "986": 13, "422": 13, "batch": [13, 16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "loss": [13, 16, 17, 23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "065": 13, "over": [13, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "2436": 13, "frame": [13, 25, 31, 33, 44, 45], "tot_loss": 13, "4561": 13, "2828": 13, "7076": 13, "22192": 13, "691": 13, "444": 13, "9002": 13, "18067": 13, "996": 13, "2555": 13, "2695": 13, "484": 13, "34971": 13, "217": [13, 23, 28], "4688": 13, "251": [13, 36, 37], "75": [13, 16], "389": [13, 26, 28], "2532": 13, "637": 13, "1139": 13, "1592": 13, "859": 13, "1629": 13, "094": 13, "0767": 13, "118": [13, 28], "350": 13, "06778": 13, "395": 13, "789": 13, "01056": 13, "016": 13, "009022": 13, "009985": 13, "271": [13, 14, 17], "01088": 13, "497": 13, "01174": 13, "01077": 13, "747": 13, "01087": 13, "783": 13, "921": 13, "01045": 13, "008957": 13, "009903": 13, "374": 13, "01092": 13, "598": [13, 28], "01169": 13, "01065": 13, "824": 13, "862": [13, 17], "865": [13, 17], "555": 13, "483": 13, "264": [13, 18], "search_beam": [13, 23, 28, 39], "output_beam": [13, 23, 28, 39], "min_active_st": [13, 23, 28, 39], "max_active_st": [13, 23, 28, 39], "10000": [13, 23, 28, 39], "487": 13, "273": [13, 14, 25], "513": 13, "291": 13, "521": 13, "675": 13, "204": [13, 18, 28], "until": [13, 28, 33], "923": 13, "241": [13, 23], "transcript": [13, 23, 24, 25, 26, 28, 31, 32, 36, 37, 43, 44, 45], "recog": [13, 25, 28], "test_set": [13, 39], "924": 13, "558": 13, "240": [13, 23, 39], "ins": [13, 28, 39], "del": [13, 28, 39], "sub": [13, 28, 39], "925": 13, "249": [13, 17], "wrote": [13, 28], "detail": [13, 15, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "stat": [13, 28], "err": [13, 25, 28], "316": [13, 28], "congratul": [13, 16, 17, 18, 23, 26, 28, 32, 36, 37, 39], "fun": [13, 16, 17], "debug": 13, "variou": [13, 19, 22, 46], "period": [14, 16], "disk": 14, "optim": [14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "relat": [14, 23, 25, 28, 32, 36, 37, 39], "resum": [14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "strip": 14, "reduc": [14, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "each": [14, 16, 17, 19, 23, 25, 26, 28, 31, 33, 34, 41, 43, 44, 45], "well": [14, 39, 46], "usag": [14, 16, 17, 18, 20, 21, 32, 36, 37, 39], "pruned_transducer_stateless3": [14, 20, 41], "almost": [14, 31, 41, 44, 45], "dict": [14, 18], "csukuangfj": [14, 16, 17, 19, 23, 25, 26, 28, 32, 36, 37, 39, 43], "stateless3": [14, 16], "repo": [14, 19], "prefix": 14, "those": 14, "wave": [14, 16, 17, 18, 23, 28], "iter": [14, 16, 17, 18, 21, 31, 33, 34, 43, 44, 45], "1224000": 14, "greedy_search": [14, 25, 31, 33, 43, 44, 45], "test_wav": [14, 16, 17, 18, 19, 23, 25, 26, 28, 32, 36, 37, 39], "1089": [14, 16, 17, 18, 19, 28, 32], "134686": [14, 16, 17, 18, 19, 28, 32], "0001": [14, 16, 17, 18, 19, 28, 32], "wav": [14, 16, 17, 18, 19, 21, 23, 25, 26, 28, 31, 33, 34, 36, 37, 39, 43, 44, 45], "1221": [14, 16, 17, 28, 32], "135766": [14, 16, 17, 28, 32], "0002": [14, 16, 17, 28, 32], "multipl": [14, 23, 25, 26, 28, 32, 36, 37, 39], "sound": [14, 16, 17, 18, 21, 22, 23, 25, 26, 28, 32, 36, 37, 39], "Its": [14, 16, 17, 18, 28], "output": [14, 16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "09": [14, 17, 23, 25, 26, 28, 43], "233": [14, 16, 17], "265": 14, "50": [14, 16, 17, 18, 28, 31, 36, 43, 44, 45], "200": [14, 16, 17, 18, 23, 28, 29, 36, 37, 39], "3000": [14, 16, 17, 18], "80": [14, 16, 17, 18, 23, 25, 28], "subsampling_factor": [14, 17, 18, 23, 25, 28], "encoder_dim": [14, 16, 17, 18], "512": [14, 16, 17, 18, 23, 25, 28], "nhead": [14, 16, 18, 23, 25, 28, 31, 44], "dim_feedforward": [14, 16, 17, 25], "num_encoder_lay": [14, 16, 17, 18, 25], "decoder_dim": [14, 16, 17, 18], "joiner_dim": [14, 16, 17, 18], "model_warm_step": [14, 16, 17], "4810e00d8738f1a21278b0156a42ff396a2d40ac": 14, "fri": 14, "oct": [14, 28], "03": [14, 17, 25, 28, 36, 37, 43], "miss": [14, 16, 17, 18, 25, 28], "cu102": [14, 16, 17, 18], "1013": 14, "c39cba5": 14, "dirti": [14, 16, 17, 23, 28], "jsonl": 14, "de": [14, 16, 17, 18, 25], "74279": [14, 16, 17, 18, 25], "0324160024": 14, "65bfd8b584": 14, "jjlbn": 14, "bpe_model": [14, 16, 17, 18, 28], "sound_fil": [14, 23, 25, 28, 39], "sample_r": [14, 23, 25, 28, 39], "16000": [14, 23, 25, 26, 28, 32, 33, 36, 37], "beam": [14, 43], "max_context": 14, "max_stat": 14, "context_s": [14, 16, 17, 18, 25], "max_sym_per_fram": [14, 25], "simulate_stream": 14, "decode_chunk_s": 14, "left_context": 14, "dynamic_chunk_train": 14, "causal_convolut": 14, "short_chunk_s": [14, 18, 44, 45], "25": [14, 16, 17, 23, 28, 31, 36, 37, 39, 44], "num_left_chunk": [14, 18], "blank_id": [14, 16, 17, 18, 25], "unk_id": 14, "vocab_s": [14, 16, 17, 18, 25], "612": 14, "458": 14, "disabl": [14, 16, 17], "giga": [14, 17, 43], "623": 14, "277": 14, "78648040": 14, "951": [14, 28], "285": [14, 25, 28], "construct": [14, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37, 39], "952": 14, "295": [14, 23, 25, 26, 28], "957": 14, "301": [14, 28], "700": 14, "329": [14, 17, 28], "912": 14, "388": 14, "earli": [14, 16, 17, 18, 28, 32], "nightfal": [14, 16, 17, 18, 28, 32], "THE": [14, 16, 17, 18, 28, 32], "yellow": [14, 16, 17, 18, 28, 32], "lamp": [14, 16, 17, 18, 28, 32], "light": [14, 16, 17, 18, 28, 32], "AND": [14, 16, 17, 18, 28, 32], "THERE": [14, 16, 17, 18, 28, 32], "squalid": [14, 16, 17, 18, 28, 32], "quarter": [14, 16, 17, 18, 28, 32], "OF": [14, 16, 17, 18, 28, 32], "brothel": [14, 16, 17, 18, 28, 32], "god": [14, 28, 32], "AS": [14, 28, 32], "direct": [14, 28, 32], "consequ": [14, 28, 32], "sin": [14, 28, 32], "man": [14, 28, 32], "punish": [14, 28, 32], "had": [14, 28, 32], "her": [14, 28, 32], "love": [14, 28, 32], "child": [14, 28, 32], "whose": [14, 25, 28, 32], "ON": [14, 16, 28, 32], "THAT": [14, 28, 32], "dishonor": [14, 28, 32], "bosom": [14, 28, 32], "TO": [14, 28, 32], "parent": [14, 28, 32], "forev": [14, 28, 32], "WITH": [14, 28, 32], "race": [14, 28, 32], "descent": [14, 28, 32], "mortal": [14, 28, 32], "BE": [14, 28, 32], "bless": [14, 28, 32], "soul": [14, 28, 32], "IN": [14, 28, 32], "heaven": [14, 28, 32], "yet": [14, 16, 17, 28, 32], "THESE": [14, 28, 32], "thought": [14, 28, 32], "affect": [14, 28, 32], "hester": [14, 28, 32], "prynn": [14, 28, 32], "hope": [14, 24, 28, 32], "apprehens": [14, 28, 32], "390": 14, "down": [14, 23, 28, 31, 33, 34, 43, 44, 45], "reproduc": [14, 28], "9999": [14, 33, 34, 43], "symlink": 14, "pass": [14, 18, 23, 25, 26, 28, 31, 33, 34, 41, 43, 44, 45], "reason": [14, 16, 17, 18, 44], "support": [15, 16, 17, 18, 23, 25, 28, 31, 33, 34, 41, 43, 44, 45], "zipform": [15, 19, 22, 27, 30, 40, 42], "convemform": [15, 22, 41], "platform": [15, 19], "android": [15, 16, 17, 18, 19], "raspberri": [15, 19], "pi": [15, 19], "\u7231\u82af\u6d3e": 15, "maix": 15, "iii": 15, "axera": 15, "rv1126": 15, "static": 15, "produc": [15, 31, 33, 34, 43, 44, 45], "binari": [15, 16, 17, 18, 23, 25, 26, 28, 31, 39, 43, 44], "everyth": 15, "pnnx": [15, 22], "torchscript": [15, 20, 21, 22], "encod": [15, 19, 21, 22, 23, 25, 26, 28, 31, 32, 33, 39, 41, 43, 44, 45], "option": [15, 19, 22, 25, 29, 32, 36, 37, 39], "int8": [15, 22], "quantiz": [15, 22, 29], "conv": [16, 17], "emform": [16, 17, 20], "stateless2": [16, 17, 43], "07": [16, 17, 18, 23, 25, 26, 28], "ubuntu": [16, 17, 18], "cpp": [16, 20], "pretrained_model": [16, 17, 18], "online_transduc": 16, "continu": [16, 17, 18, 19, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "jit_xxx": [16, 17, 18], "anywher": [16, 17], "submodul": 16, "updat": [16, 17, 18], "recurs": 16, "init": 16, "cmake": [16, 17, 23, 28], "dcmake_build_typ": [16, 23, 28], "dncnn_python": 16, "dncnn_build_benchmark": 16, "dncnn_build_exampl": 16, "dncnn_build_tool": 16, "j4": 16, "pwd": 16, "src": [16, 18], "compon": [16, 41], "ncnn2int8": [16, 17], "our": [16, 17, 18, 20, 21, 28, 29, 31, 41, 44, 45], "cpython": 16, "gnu": 16, "am": 16, "sai": [16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "doe": [16, 17, 18, 23, 25, 28, 39], "later": [16, 17, 18, 23, 26, 28, 31, 32, 33, 34, 36, 37, 43, 44, 45], "termin": 16, "tencent": [16, 17], "modif": [16, 25], "offic": 16, "synchron": 16, "offici": 16, "renam": [16, 17, 18], "conv_emformer_transducer_stateless2": [16, 41], "length": [16, 18, 25, 44, 45], "cnn": [16, 18], "kernel": [16, 18, 25], "31": [16, 17, 18, 28], "context": [16, 25, 31, 41, 43, 44, 45], "memori": [16, 23, 25, 28, 41], "configur": [16, 18, 25, 29, 32, 36, 37, 39], "accordingli": [16, 17, 18], "yourself": [16, 17, 18, 29, 44, 45], "combin": [16, 17, 18], "677": 16, "220": [16, 25, 26, 28], "681": 16, "229": [16, 23], "best_v": 16, "alid_epoch": 16, "subsampl": [16, 44, 45], "ing_factor": 16, "a34171ed85605b0926eebbd0463d059431f4f74a": 16, "wed": [16, 23, 25, 28], "dec": 16, "ver": 16, "ion": 16, "530e8a1": 16, "tue": [16, 28], "star": [16, 17, 18], "op": 16, "1220120619": [16, 17, 18], "7695ff496b": [16, 17, 18], "s9n4w": [16, 17, 18], "icefa": 16, "ll": 16, "transdu": 16, "cer": 16, "use_averaged_model": [16, 17, 18], "cnn_module_kernel": [16, 18], "left_context_length": 16, "chunk_length": 16, "right_context_length": 16, "memory_s": 16, "231": [16, 17, 18], "053": 16, "022": 16, "708": [16, 23, 25, 28, 39], "315": [16, 23, 25, 26, 28, 32], "75490012": 16, "318": [16, 17], "320": [16, 25], "682": 16, "lh": [16, 17, 18], "rw": [16, 17, 18], "kuangfangjun": [16, 17, 18], "289m": 16, "jan": [16, 17, 18], "289": 16, "roughli": [16, 17, 18], "equal": [16, 17, 18, 44, 45], "1024": [16, 17, 18, 43], "287": [16, 39], "1010k": [16, 17], "decoder_jit_trac": [16, 17, 18, 21, 43, 45], "283m": 16, "encoder_jit_trac": [16, 17, 18, 21, 43, 45], "0m": [16, 17], "joiner_jit_trac": [16, 17, 18, 21, 43, 45], "sure": [16, 17, 18], "found": [16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "param": [16, 17, 18], "503k": [16, 17], "437": [16, 17, 18], "142m": 16, "79k": 16, "5m": [16, 17], "architectur": [16, 17, 18, 43], "editor": [16, 17, 18], "content": [16, 17, 18], "283": [16, 18], "1010": [16, 17], "142": [16, 23, 26, 28], "503": [16, 17], "convers": [16, 17, 18], "half": [16, 17, 18, 31, 44, 45], "default": [16, 17, 18, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "float32": [16, 17, 18], "float16": [16, 17, 18], "occupi": [16, 17, 18], "byte": [16, 17, 18], "twice": [16, 17, 18], "smaller": [16, 17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "fp16": [16, 17, 18, 31, 33, 34, 43, 44, 45], "won": [16, 17, 18, 19, 23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "accept": [16, 17, 18], "216": [16, 23, 28, 36, 37], "encoder_param_filenam": [16, 17, 18], "encoder_bin_filenam": [16, 17, 18], "decoder_param_filenam": [16, 17, 18], "decoder_bin_filenam": [16, 17, 18], "joiner_param_filenam": [16, 17, 18], "joiner_bin_filenam": [16, 17, 18], "sound_filenam": [16, 17, 18], "141": 16, "328": 16, "151": 16, "331": [16, 17, 28, 32], "176": [16, 25, 28], "336": 16, "106000": [16, 17, 18, 28, 32], "381": 16, "7767517": [16, 17, 18], "1060": 16, "1342": 16, "in0": [16, 17, 18], "explan": [16, 17, 18], "magic": [16, 17, 18], "intermedi": [16, 17, 18], "mean": [16, 17, 18, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 41, 43, 44, 45], "increment": [16, 17, 18], "1061": 16, "sherpametadata": [16, 17, 18], "sherpa_meta_data1": [16, 17, 18], "still": [16, 17, 18], "newli": [16, 17, 18], "must": [16, 17, 18, 44], "kei": [16, 17, 18, 28], "eas": [16, 17, 18], "list": [16, 17, 18, 23, 25, 26, 28, 32, 36, 37], "pair": [16, 17, 18], "sad": [16, 17, 18], "rememb": [16, 17, 18], "anymor": [16, 17, 18], "flexibl": [16, 17, 18], "edit": [16, 17, 18], "arm": [16, 17, 18], "aarch64": [16, 17, 18], "onc": [16, 17], "mayb": [16, 17], "year": [16, 17], "_jit_trac": [16, 17], "56": [16, 17, 28, 36], "fp32": [16, 17], "doubl": [16, 17], "j": [16, 17, 23, 28], "py38": [16, 17, 18], "arg": [16, 17], "wave_filenam": [16, 17], "16k": [16, 17], "hz": [16, 17, 36, 37], "mono": [16, 17], "calibr": [16, 17], "cat": [16, 17], "eof": [16, 17], "calcul": [16, 17, 33, 44, 45], "has_gpu": [16, 17], "config": [16, 17], "use_vulkan_comput": [16, 17], "88": [16, 25], "conv_87": 16, "942385": [16, 17], "threshold": [16, 17, 33], "938493": 16, "968131": 16, "conv_88": 16, "442448": 16, "549335": 16, "167552": 16, "conv_89": 16, "228289": 16, "001738": 16, "871552": 16, "linear_90": 16, "976146": 16, "101789": 16, "115": [16, 17, 23, 28], "267128": 16, "linear_91": 16, "962030": 16, "162033": 16, "602713": 16, "linear_92": 16, "323041": 16, "853959": 16, "953129": 16, "linear_94": 16, "905416": 16, "648006": 16, "323545": 16, "linear_93": 16, "474093": 16, "200188": 16, "linear_95": 16, "888012": 16, "403563": 16, "483986": 16, "linear_96": 16, "856741": 16, "398679": 16, "524273": 16, "linear_97": 16, "635942": 16, "613655": 16, "590950": 16, "linear_98": 16, "460340": 16, "670146": 16, "398010": 16, "linear_99": 16, "532276": 16, "585537": 16, "119396": 16, "linear_101": 16, "585871": 16, "719224": 16, "205809": 16, "linear_100": 16, "751382": 16, "081648": 16, "linear_102": 16, "593344": 16, "450581": 16, "87": 16, "551147": 16, "linear_103": 16, "592681": 16, "705824": 16, "257959": 16, "linear_104": 16, "752957": 16, "980955": 16, "110489": 16, "linear_105": 16, "696240": 16, "877193": 16, "608953": 16, "linear_106": 16, "059659": 16, "643138": 16, "048950": 16, "linear_108": 16, "975461": 16, "589567": 16, "671457": 16, "linear_107": 16, "190381": 16, "515701": 16, "linear_109": 16, "710759": 16, "305635": 16, "082436": 16, "linear_110": 16, "531228": 16, "731162": 16, "159557": 16, "linear_111": 16, "528083": 16, "259322": 16, "211544": 16, "linear_112": 16, "148807": 16, "500842": 16, "087374": 16, "linear_113": 16, "592566": 16, "948851": 16, "166611": 16, "linear_115": 16, "437109": 16, "608947": 16, "642395": 16, "linear_114": 16, "193942": 16, "503904": 16, "linear_116": 16, "966980": 16, "200896": 16, "676392": 16, "linear_117": 16, "451303": 16, "061664": 16, "951344": 16, "linear_118": 16, "077262": 16, "965800": 16, "023804": 16, "linear_119": 16, "671615": 16, "847613": 16, "198460": 16, "linear_120": 16, "625638": 16, "131427": 16, "556595": 16, "linear_122": 16, "274080": 16, "888716": 16, "978189": 16, "linear_121": 16, "420480": 16, "429659": 16, "linear_123": 16, "826197": 16, "599617": 16, "281532": 16, "linear_124": 16, "396383": 16, "325849": 16, "335875": 16, "linear_125": 16, "337198": 16, "941410": 16, "221970": 16, "linear_126": 16, "699965": 16, "842878": 16, "224073": 16, "linear_127": 16, "775370": 16, "884215": 16, "696438": 16, "linear_129": 16, "872276": 16, "837319": 16, "254213": 16, "linear_128": 16, "180057": 16, "687883": 16, "linear_130": 16, "150427": 16, "454298": 16, "765789": 16, "linear_131": 16, "112692": 16, "924847": 16, "025545": 16, "linear_132": 16, "852893": 16, "116593": 16, "749626": 16, "linear_133": 16, "517084": 16, "024665": 16, "275314": 16, "linear_134": 16, "683807": 16, "878618": 16, "743618": 16, "linear_136": 16, "421055": 16, "322729": 16, "086264": 16, "linear_135": 16, "309880": 16, "917679": 16, "linear_137": 16, "827781": 16, "744595": 16, "33": [16, 17, 23, 24, 25, 28, 36], "915554": 16, "linear_138": 16, "422395": 16, "742882": 16, "402161": 16, "linear_139": 16, "527538": 16, "866123": 16, "849449": 16, "linear_140": 16, "128619": 16, "657793": 16, "266134": 16, "linear_141": 16, "839593": 16, "845993": 16, "021378": 16, "linear_143": 16, "442304": 16, "099039": 16, "889746": 16, "linear_142": 16, "325038": 16, "849592": 16, "linear_144": 16, "929444": 16, "618206": 16, "605080": 16, "linear_145": 16, "382126": 16, "321095": 16, "625010": 16, "linear_146": 16, "894987": 16, "867645": 16, "836517": 16, "linear_147": 16, "915313": 16, "906028": 16, "886522": 16, "linear_148": 16, "614287": 16, "908151": 16, "496181": 16, "linear_150": 16, "724932": 16, "485588": 16, "312899": 16, "linear_149": 16, "161146": 16, "606939": 16, "linear_151": 16, "164453": 16, "847355": 16, "719223": 16, "linear_152": 16, "086471": 16, "984121": 16, "222834": 16, "linear_153": 16, "099524": 16, "991601": 16, "816805": 16, "linear_154": 16, "054585": 16, "489706": 16, "286930": 16, "linear_155": 16, "389185": 16, "100321": 16, "963501": 16, "linear_157": 16, "982999": 16, "154796": 16, "637253": 16, "linear_156": 16, "537706": 16, "875190": 16, "linear_158": 16, "420287": 16, "502287": 16, "531588": 16, "linear_159": 16, "014746": 16, "423280": 16, "477261": 16, "linear_160": 16, "633553": 16, "715335": 16, "220921": 16, "linear_161": 16, "371849": 16, "117830": 16, "815203": 16, "linear_162": 16, "492933": 16, "126283": 16, "623318": 16, "linear_164": 16, "697504": 16, "825712": 16, "317358": 16, "linear_163": 16, "078367": 16, "008038": 16, "linear_165": 16, "023975": 16, "836278": 16, "577358": 16, "linear_166": 16, "860619": 16, "259792": 16, "493614": 16, "linear_167": 16, "380934": 16, "496160": 16, "107042": 16, "linear_168": 16, "691216": 16, "733317": 16, "831076": 16, "linear_169": 16, "723948": 16, "952728": 16, "129707": 16, "linear_171": 16, "034811": 16, "366547": 16, "665123": 16, "linear_170": 16, "356277": 16, "710501": 16, "linear_172": 16, "556884": 16, "729481": 16, "166058": 16, "linear_173": 16, "033039": 16, "207264": 16, "442120": 16, "linear_174": 16, "597379": 16, "658676": 16, "768131": 16, "linear_2": [16, 17], "293503": 16, "305265": 16, "877850": 16, "linear_1": [16, 17], "812222": 16, "766452": 16, "487047": 16, "linear_3": [16, 17], "999999": 16, "999755": 16, "031174": 16, "wish": [16, 17], "955k": 16, "18k": 16, "inparam": [16, 17], "inbin": [16, 17], "outparam": [16, 17], "outbin": [16, 17], "99m": 16, "78k": 16, "774k": [16, 17], "496": [16, 17, 28, 32], "774": [16, 17], "linear": [16, 17, 25], "convolut": [16, 17, 33, 41, 44], "exact": [16, 17], "4x": [16, 17], "comparison": 16, "44": [16, 17, 28, 36, 37], "468000": [17, 21, 43], "lstm_transducer_stateless2": [17, 21, 43], "222": [17, 26, 28], "is_pnnx": 17, "62e404dd3f3a811d73e424199b3408e309c06e1a": [17, 18], "6d7a559": [17, 18], "feb": [17, 18, 25], "147": [17, 18], "rnn_hidden_s": 17, "aux_layer_period": 17, "235": 17, "239": [17, 25], "472": 17, "595": 17, "324": 17, "83137520": 17, "596": 17, "325": 17, "257024": 17, "326": 17, "781812": 17, "327": 17, "84176356": 17, "182": [17, 18, 23, 32], "158": 17, "183": [17, 36, 37], "335": 17, "101": 17, "tracerwarn": [17, 18], "boolean": [17, 18], "caus": [17, 18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "incorrect": [17, 18, 25], "flow": [17, 18], "constant": [17, 18], "futur": [17, 18, 25, 46], "need_pad": 17, "bool": 17, "259": [17, 23], "180": [17, 23, 28], "339": 17, "207": [17, 26, 28], "84": [17, 23], "324m": 17, "321": [17, 23], "107": [17, 32], "318m": 17, "159m": 17, "21k": 17, "159": [17, 28, 39], "861": 17, "425": [17, 28], "427": [17, 28], "266": [17, 18, 28, 32], "431": 17, "342": 17, "343": 17, "267": [17, 25, 36, 37], "379": 17, "268": [17, 28, 32], "317m": 17, "317": 17, "conv_15": 17, "930708": 17, "972025": 17, "conv_16": 17, "978855": 17, "031788": 17, "456645": 17, "conv_17": 17, "868437": 17, "830528": 17, "218575": 17, "linear_18": 17, "107259": 17, "194808": 17, "293236": 17, "linear_19": 17, "193777": 17, "634748": 17, "401705": 17, "linear_20": 17, "259933": 17, "606617": 17, "722160": 17, "linear_21": 17, "186600": 17, "790260": 17, "512129": 17, "linear_22": 17, "759041": 17, "265832": 17, "050053": 17, "linear_23": 17, "931209": 17, "099090": 17, "979767": 17, "linear_24": 17, "324160": 17, "215561": 17, "321835": 17, "linear_25": 17, "800708": 17, "599352": 17, "284134": 17, "linear_26": 17, "492444": 17, "153369": 17, "274391": 17, "linear_27": 17, "660161": 17, "720994": 17, "46": [17, 23, 28], "674126": 17, "linear_28": 17, "415265": 17, "174434": 17, "007133": 17, "linear_29": 17, "038418": 17, "118534": 17, "724262": 17, "linear_30": 17, "072084": 17, "936867": 17, "259155": 17, "linear_31": 17, "342712": 17, "599489": 17, "282787": 17, "linear_32": 17, "340535": 17, "120308": 17, "701103": 17, "linear_33": 17, "846987": 17, "630030": 17, "985939": 17, "linear_34": 17, "686298": 17, "204571": 17, "607586": 17, "linear_35": 17, "904821": 17, "575518": 17, "756420": 17, "linear_36": 17, "806659": 17, "585589": 17, "118401": 17, "linear_37": 17, "402340": 17, "047157": 17, "162680": 17, "linear_38": 17, "174589": 17, "923361": 17, "030258": 17, "linear_39": 17, "178576": 17, "556058": 17, "807705": 17, "linear_40": 17, "901954": 17, "301267": 17, "956539": 17, "linear_41": 17, "839805": 17, "597429": 17, "716181": 17, "linear_42": 17, "178945": 17, "651595": 17, "895699": 17, "829245": 17, "627592": 17, "637907": 17, "746186": 17, "255032": 17, "167313": 17, "000000": 17, "999756": 17, "031013": 17, "345k": 17, "17k": 17, "218m": 17, "counterpart": 17, "bit": [17, 23, 25, 26, 28, 32, 39], "4532": 17, "feedforward": [18, 25, 31, 44], "384": [18, 28], "192": [18, 28], "unmask": 18, "256": [18, 36, 37], "downsampl": [18, 24], "factor": [18, 23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "473": [18, 28], "246": [18, 25, 28, 36, 37], "477": 18, "warm_step": 18, "2000": [18, 26], "feedforward_dim": 18, "attention_dim": [18, 23, 25, 28], "encoder_unmasked_dim": 18, "zipformer_downsampling_factor": 18, "decode_chunk_len": 18, "257": [18, 25, 36, 37], "023": 18, "zipformer2": 18, "419": 18, "At": [18, 23, 28], "stack": 18, "downsampling_factor": 18, "037": 18, "655": 18, "346": 18, "68944004": 18, "347": 18, "260096": 18, "348": [18, 36], "716276": 18, "656": [18, 28], "349": 18, "69920376": 18, "351": 18, "353": 18, "174": [18, 28], "175": 18, "1344": 18, "assert": 18, "cached_len": 18, "num_lay": 18, "1348": 18, "cached_avg": 18, "1352": 18, "cached_kei": 18, "1356": 18, "cached_v": 18, "1360": 18, "cached_val2": 18, "1364": 18, "cached_conv1": 18, "1368": 18, "cached_conv2": 18, "1373": 18, "left_context_len": 18, "1884": 18, "x_size": 18, "2442": 18, "2449": 18, "2469": 18, "2473": 18, "2483": 18, "kv_len": 18, "k": [18, 31, 36, 37, 43, 44, 45], "2570": 18, "attn_output": 18, "bsz": 18, "num_head": 18, "seq_len": 18, "head_dim": 18, "2926": 18, "lorder": 18, "2652": 18, "2653": 18, "embed_dim": 18, "2666": 18, "1543": 18, "in_x_siz": 18, "1637": 18, "1643": 18, "in_channel": 18, "1571": 18, "1763": 18, "src1": 18, "src2": 18, "1779": 18, "dim1": 18, "1780": 18, "dim2": 18, "_trace": 18, "958": 18, "tracer": 18, "instead": [18, 25, 44], "tupl": 18, "namedtupl": 18, "absolut": 18, "know": [18, 29], "side": 18, "strict": [18, 24], "allow": [18, 31, 44], "behavior": [18, 25], "_c": 18, "_create_method_from_trac": 18, "646": 18, "357": 18, "embedding_out": 18, "686": 18, "361": [18, 28, 32], "735": 18, "69": 18, "269m": 18, "53": [18, 23, 31, 32, 37, 43, 44], "269": [18, 23, 36, 37], "725": [18, 32], "1022k": 18, "266m": 18, "8m": 18, "509k": 18, "133m": 18, "152k": 18, "4m": 18, "1022": 18, "133": 18, "509": 18, "260": [18, 28], "360": 18, "365": 18, "280": [18, 28], "372": [18, 23], "state": [18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "026": 18, "410": 18, "2028": 18, "2547": 18, "2029": 18, "23316": 18, "23317": 18, "23318": 18, "23319": 18, "23320": 18, "amount": [18, 24], "pad": [18, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "conv2dsubsampl": 18, "v2": [18, 23, 28], "arrai": 18, "23300": 18, "element": 18, "onnx_pretrain": 19, "onnxruntim": 19, "separ": 19, "deploi": [19, 23, 28], "repo_url": 19, "basenam": 19, "tree": [20, 21, 23, 25, 26, 28, 32, 36, 37, 39, 43], "cpu_jit": [20, 23, 28, 31, 33, 34, 44, 45], "confus": 20, "move": [20, 31, 33, 34, 44, 45], "why": 20, "streaming_asr": [20, 21, 43, 44, 45], "conv_emform": 20, "offline_asr": [20, 31], "jit_pretrain": [21, 33, 34, 43], "baz": 21, "1best": [23, 26, 28, 32, 33, 34, 36, 37], "automag": [23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "stop": [23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "By": [23, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "musan": [23, 26, 28, 29, 31, 33, 34, 43, 44, 45], "thei": [23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "intal": [23, 26], "sudo": [23, 26], "apt": [23, 26], "permiss": [23, 26], "commandlin": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "quit": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "experi": [23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "world": [23, 25, 26, 28, 29, 31, 32, 33, 34, 43, 44, 45], "multi": [23, 25, 26, 28, 29, 31, 33, 34, 41, 43, 44, 45], "machin": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "ddp": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "implement": [23, 25, 26, 28, 29, 31, 33, 34, 41, 43, 44, 45], "present": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "second": [23, 25, 26, 28, 29, 31, 33, 34, 39, 43, 44, 45], "utter": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "oom": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "nvidia": [23, 25, 26, 28], "due": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "decai": [23, 26, 28, 33, 34, 43], "warmup": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "function": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "get_param": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "realli": [23, 26, 28, 31, 33, 34, 43, 44, 45], "directli": [23, 25, 26, 28, 29, 31, 33, 34, 43, 44, 45], "perturb": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "actual": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "3x150": [23, 25, 26], "450": [23, 25, 26], "visual": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "logdir": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "labelsmooth": 23, "someth": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "tensorflow": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "press": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44, 45], "ctrl": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44, 45], "engw8ksktzqs24zbv5dgcg": 23, "22t11": 23, "scan": [23, 25, 26, 28, 31, 39, 43, 44], "116068": 23, "scalar": [23, 25, 26, 28, 31, 39, 43, 44], "listen": [23, 25, 26, 31, 39, 43, 44], "url": [23, 25, 26, 28, 31, 33, 34, 39, 43, 44], "xxxx": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "saw": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "consol": [23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "avoid": [23, 25, 28], "nbest": [23, 28, 34], "lattic": [23, 26, 28, 31, 32, 36, 37, 44, 45], "uniqu": [23, 28, 31, 44, 45], "pkufool": [23, 26, 32], "icefall_asr_aishell_conformer_ctc": 23, "transcrib": [23, 25, 26, 28], "v1": [23, 26, 28, 32, 36, 37], "lang_char": [23, 25], "bac009s0764w0121": [23, 25, 26], "bac009s0764w0122": [23, 25, 26], "bac009s0764w0123": [23, 25, 26], "tran": [23, 26, 28, 32, 36, 37], "graph": [23, 26, 28, 31, 32, 36, 37, 44, 45], "id": [23, 26, 28, 32, 36, 37], "conveni": [23, 26, 28, 29], "eo": [23, 26, 28], "soxi": [23, 25, 26, 28, 32, 39], "sampl": [23, 25, 26, 28, 32, 33, 39, 44, 45], "precis": [23, 25, 26, 28, 31, 32, 39, 44, 45], "67263": [23, 25, 26], "cdda": [23, 25, 26, 28, 32, 39], "sector": [23, 25, 26, 28, 32, 39], "135k": [23, 25, 26], "256k": [23, 25, 26, 28], "sign": [23, 25, 26, 28, 39], "integ": [23, 25, 26, 28, 39], "pcm": [23, 25, 26, 28, 39], "65840": [23, 25, 26], "625": [23, 25, 26], "132k": [23, 25, 26], "64000": [23, 25, 26], "300": [23, 25, 26, 28, 29, 31, 44], "128k": [23, 25, 26, 39], "displai": [23, 25, 26, 28], "topologi": [23, 28], "707": [23, 28], "num_decoder_lay": [23, 28], "vgg_frontend": [23, 25, 28], "use_feat_batchnorm": [23, 28], "f2fd997f752ed11bbef4c306652c433e83f9cf12": 23, "sun": 23, "sep": 23, "33cfe45": 23, "d57a873": 23, "nov": [23, 28], "hw": 23, "kangwei": 23, "icefall_aishell3": 23, "k2_releas": 23, "tokens_fil": 23, "words_fil": [23, 28, 39], "num_path": [23, 28, 31, 44, 45], "ngram_lm_scal": [23, 28], "attention_decoder_scal": [23, 28], "nbest_scal": [23, 28], "sos_id": [23, 28], "eos_id": [23, 28], "num_class": [23, 28, 39], "4336": [23, 25], "242": [23, 28], "131": [23, 28], "134": 23, "275": 23, "293": [23, 28], "704": [23, 36], "369": [23, 28], "\u751a": [23, 25], "\u81f3": [23, 25], "\u51fa": [23, 25], "\u73b0": [23, 25], "\u4ea4": [23, 25], "\u6613": [23, 25], "\u51e0": [23, 25], "\u4e4e": [23, 25], "\u505c": [23, 25], "\u6b62": 23, "\u7684": [23, 25, 26], "\u60c5": [23, 25], "\u51b5": [23, 25], "\u4e00": [23, 25], "\u4e8c": [23, 25], "\u7ebf": [23, 25, 26], "\u57ce": [23, 25], "\u5e02": [23, 25], "\u867d": [23, 25], "\u7136": [23, 25], "\u4e5f": [23, 25, 26], "\u5904": [23, 25], "\u4e8e": [23, 25], "\u8c03": [23, 25], "\u6574": [23, 25], "\u4e2d": [23, 25, 26], "\u4f46": [23, 25, 26], "\u56e0": [23, 25], "\u4e3a": [23, 25], "\u805a": [23, 25], "\u96c6": [23, 25], "\u4e86": [23, 25, 26], "\u8fc7": [23, 25], "\u591a": [23, 25], "\u516c": [23, 25], "\u5171": [23, 25], "\u8d44": [23, 25], "\u6e90": [23, 25], "371": 23, "683": 23, "651": [23, 39], "654": 23, "659": 23, "752": 23, "887": 23, "340": 23, "370": 23, "\u751a\u81f3": [23, 26], "\u51fa\u73b0": [23, 26], "\u4ea4\u6613": [23, 26], "\u51e0\u4e4e": [23, 26], "\u505c\u6b62": 23, "\u60c5\u51b5": [23, 26], "\u4e00\u4e8c": [23, 26], "\u57ce\u5e02": [23, 26], "\u867d\u7136": [23, 26], "\u5904\u4e8e": [23, 26], "\u8c03\u6574": [23, 26], "\u56e0\u4e3a": [23, 26], "\u805a\u96c6": [23, 26], "\u8fc7\u591a": [23, 26], "\u516c\u5171": [23, 26], "\u8d44\u6e90": [23, 26], "recor": [23, 28], "highest": [23, 28], "965": 23, "966": 23, "821": 23, "822": 23, "826": 23, "916": 23, "345": 23, "888": 23, "889": 23, "limit": [23, 25, 28, 41, 44], "upgrad": [23, 28], "pro": [23, 28], "finish": [23, 25, 26, 28, 29, 31, 32, 36, 37, 39, 44, 45], "NOT": [23, 25, 28, 39], "checkout": [23, 28], "hlg_decod": [23, 28], "four": [23, 28], "messag": [23, 28, 31, 33, 34, 43, 44, 45], "nn_model": [23, 28], "use_gpu": [23, 28], "word_tabl": [23, 28], "caution": [23, 28], "forward": [23, 28, 33], "89": 23, "cu": [23, 28], "int": [23, 28], "char": [23, 28], "98": 23, "150": [23, 28], "693": [23, 36], "165": [23, 28], "nnet_output": [23, 28], "489": 23, "mandarin": 24, "beij": 24, "shell": 24, "technologi": 24, "ltd": 24, "peopl": 24, "accent": 24, "area": 24, "china": 24, "invit": 24, "particip": 24, "conduct": 24, "quiet": 24, "indoor": 24, "high": 24, "fidel": 24, "microphon": 24, "16khz": 24, "manual": 24, "through": 24, "profession": 24, "annot": 24, "inspect": 24, "free": [24, 29, 43], "academ": 24, "moder": 24, "research": 24, "field": 24, "openslr": 24, "ctc": [24, 27, 30, 34, 35, 38], "stateless": [24, 27, 31, 43, 44, 45], "head": [25, 41], "conv1d": [25, 31, 43, 44, 45], "nn": [25, 31, 33, 34, 43, 44, 45], "tanh": 25, "borrow": 25, "ieeexplor": 25, "ieee": 25, "stamp": 25, "jsp": 25, "arnumb": 25, "9054419": 25, "predict": [25, 29, 31, 43, 44, 45], "charact": 25, "unit": 25, "vocabulari": 25, "87939824": 25, "optimized_transduc": 25, "technqiu": 25, "end": [25, 31, 33, 34, 39, 43, 44, 45], "furthermor": 25, "maximum": 25, "emit": 25, "per": [25, 31, 44, 45], "simplifi": [25, 41], "significantli": 25, "degrad": 25, "exactli": 25, "unprun": 25, "advantag": 25, "minim": 25, "pruned_transducer_stateless": [25, 31, 41, 44], "altern": 25, "though": 25, "transducer_stateless_modifi": 25, "pr": 25, "gb": 25, "ram": 25, "tri": 25, "prob": [25, 43], "219": [25, 28], "c": [25, 26, 31, 33, 34, 39, 43, 44, 45], "lagz6hrcqxoigbfd5e0y3q": 25, "03t14": 25, "8477": 25, "250": [25, 32], "sym": [25, 31, 44, 45], "beam_search": [25, 31, 44, 45], "decoding_method": 25, "beam_4": 25, "ensur": 25, "give": 25, "poor": 25, "531": [25, 26], "994": [25, 28], "027": 25, "encoder_out_dim": 25, "f4fefe4882bc0ae59af951da3f47335d5495ef71": 25, "50d2281": 25, "mar": 25, "0815224919": 25, "75d558775b": 25, "mmnv8": 25, "72": [25, 28], "248": 25, "878": [25, 37], "880": 25, "891": 25, "113": [25, 28], "userwarn": 25, "__floordiv__": 25, "round": 25, "toward": 25, "trunc": 25, "floor": 25, "keep": [25, 31, 44, 45], "div": 25, "b": [25, 28, 36, 37], "rounding_mod": 25, "divis": 25, "x_len": 25, "163": [25, 28], "\u6ede": 25, "322": 25, "760": 25, "919": 25, "922": 25, "929": 25, "046": 25, "047": 25, "319": [25, 28], "798": 25, "214": [25, 28], "215": [25, 28, 32], "402": 25, "topk_hyp_index": 25, "topk_index": 25, "logit": 25, "583": [25, 37], "lji9mwuorlow3jkdhxwk8a": 26, "13t11": 26, "4454": 26, "icefall_asr_aishell_tdnn_lstm_ctc": 26, "858": [26, 28], "154": 26, "161": [26, 28], "536": 26, "539": 26, "917": 26, "129": 26, "\u505c\u6ede": 26, "mmi": [27, 30], "blank": [27, 30], "skip": [27, 29, 30, 31, 43, 44, 45], "distil": [27, 30], "hubert": [27, 30], "ligru": [27, 35], "full": [28, 29, 31, 33, 34, 43, 44, 45], "libri": [28, 29, 31, 33, 34, 43, 44, 45], "subset": [28, 31, 33, 34, 43, 44, 45], "3x960": [28, 31, 33, 34, 43, 44, 45], "2880": [28, 31, 33, 34, 43, 44, 45], "lzgnetjwrxc3yghnmd4kpw": 28, "24t16": 28, "4540": 28, "sentenc": 28, "piec": 28, "And": [28, 31, 33, 34, 43, 44, 45], "neither": 28, "nor": 28, "5000": 28, "033": 28, "537": 28, "538": 28, "full_libri": [28, 29], "406": 28, "464": 28, "548": 28, "776": 28, "652": [28, 39], "109226120": 28, "714": [28, 36], "206": 28, "944": 28, "1328": 28, "443": [28, 32], "2563": 28, "494": 28, "592": 28, "1715": 28, "52576": 28, "128": 28, "1424": 28, "807": 28, "506": 28, "808": [28, 36], "522": 28, "362": 28, "565": 28, "1477": 28, "2922": 28, "208": 28, "4295": 28, "52343": 28, "396": 28, "3584": 28, "432": 28, "433": 28, "680": [28, 36], "_pickl": 28, "unpicklingerror": 28, "hlg_modifi": 28, "g_4_gram": [28, 32, 36, 37], "875": [28, 32], "212k": 28, "267440": [28, 32], "1253": [28, 32], "535k": 28, "83": [28, 32], "77200": [28, 32], "154k": 28, "554": 28, "7178d67e594bc7fa89c2b331ad7bd1c62a6a9eb4": 28, "8d93169": 28, "601": 28, "758": 28, "025": 28, "broffel": 28, "osom": 28, "723": 28, "775": 28, "881": 28, "234": 28, "571": 28, "whole": [28, 32, 36, 37, 44, 45], "857": 28, "979": 28, "980": 28, "055": 28, "117": 28, "051": 28, "363": 28, "959": [28, 37], "546": 28, "599": [28, 32], "833": 28, "834": 28, "915": 28, "076": 28, "110": 28, "397": 28, "999": [28, 31, 44, 45], "concaten": 28, "bucket": 28, "sampler": 28, "1000": 28, "ctc_decod": 28, "ngram_lm_rescor": 28, "attention_rescor": 28, "kind": [28, 31, 33, 34, 43, 44, 45], "105": 28, "221": 28, "125": [28, 39], "136": 28, "228": 28, "144": 28, "543": 28, "topo": 28, "547": 28, "729": 28, "702": 28, "703": 28, "545": 28, "279": 28, "122": 28, "126": 28, "135": [28, 39], "153": [28, 39], "945": 28, "475": 28, "191": [28, 36, 37], "398": 28, "515": 28, "w": [28, 36, 37], "deseri": 28, "441": 28, "fsaclass": 28, "loadfsa": 28, "const": 28, "string": 28, "c10": 28, "ignor": 28, "dummi": 28, "589": 28, "attention_scal": 28, "162": 28, "169": [28, 36, 37], "188": 28, "984": 28, "624": 28, "519": [28, 37], "632": 28, "645": [28, 39], "243": 28, "970": 28, "303": 28, "179": 28, "knowledg": 29, "vector": 29, "mvq": 29, "kd": 29, "pruned_transducer_stateless4": [29, 31, 41, 44], "theoret": 29, "applic": 29, "minor": 29, "out": 29, "thing": 29, "distillation_with_hubert": 29, "Of": 29, "cours": 29, "xl": 29, "proce": 29, "960h": [29, 33], "use_extracted_codebook": 29, "augment": 29, "th": [29, 36, 37], "fine": 29, "embedding_lay": 29, "num_codebook": 29, "under": 29, "vq_fbank_layer36_cb8": 29, "whola": 29, "snippet": 29, "echo": 29, "awk": 29, "split": 29, "_": 29, "pruned_transducer_stateless6": 29, "12359": 29, "spec": 29, "aug": 29, "warp": 29, "enabl": 29, "paid": 29, "suitabl": [31, 43, 44, 45], "pruned_transducer_stateless2": [31, 41, 44], "pruned_transducer_stateless5": [31, 41, 44], "scroll": [31, 33, 34, 43, 44, 45], "arxiv": [31, 43, 44, 45], "ab": [31, 43, 44, 45], "2206": [31, 43, 44, 45], "13236": [31, 43, 44, 45], "rework": [31, 41, 44], "daniel": [31, 44, 45], "joint": [31, 43, 44, 45], "contrari": [31, 43, 44, 45], "convent": [31, 43, 44, 45], "recurr": [31, 43, 44, 45], "2x": [31, 44, 45], "littl": [31, 44], "436000": [31, 33, 34, 43, 44, 45], "438000": [31, 33, 34, 43, 44, 45], "qogspbgsr8kzcrmmie9jgw": 31, "20t15": [31, 43, 44], "4468": [31, 43, 44], "210171": [31, 43, 44], "access": [31, 33, 34, 43, 44, 45], "6008": [31, 33, 34, 43, 44, 45], "localhost": [31, 33, 34, 43, 44, 45], "expos": [31, 33, 34, 43, 44, 45], "proxi": [31, 33, 34, 43, 44, 45], "bind_al": [31, 33, 34, 43, 44, 45], "fast_beam_search": [31, 33, 43, 44, 45], "474000": [31, 43, 44, 45], "largest": [31, 44, 45], "posterior": [31, 33, 44, 45], "algorithm": [31, 44, 45], "pdf": [31, 34, 44, 45], "1211": [31, 44, 45], "3711": [31, 44, 45], "espnet": [31, 44, 45], "net": [31, 44, 45], "beam_search_transduc": [31, 44, 45], "basicli": [31, 44, 45], "topk": [31, 44, 45], "expand": [31, 44, 45], "mode": [31, 44, 45], "being": [31, 44, 45], "hardcod": [31, 44, 45], "composit": [31, 44, 45], "log_prob": [31, 44, 45], "hard": [31, 41, 44, 45], "2211": [31, 44, 45], "00484": [31, 44, 45], "fast_beam_search_lg": [31, 44, 45], "trivial": [31, 44, 45], "fast_beam_search_nbest": [31, 44, 45], "random_path": [31, 44, 45], "shortest": [31, 44, 45], "fast_beam_search_nbest_lg": [31, 44, 45], "logic": [31, 44, 45], "smallest": [31, 43, 44, 45], "icefall_asr_librispeech_tdnn": 32, "lstm_ctc": 32, "flac": 32, "116k": 32, "140k": 32, "343k": 32, "164k": 32, "105k": 32, "174k": 32, "pretraind": 32, "170": 32, "584": [32, 37], "209": 32, "245": 32, "098": 32, "099": 32, "methond": [32, 36, 37], "403": 32, "631": 32, "190": 32, "121": 32, "010": 32, "guidanc": 33, "bigger": 33, "simpli": 33, "discard": 33, "prevent": 33, "lconv": 33, "encourag": [33, 34, 43], "stabil": [33, 34], "doesn": 33, "warm": [33, 34], "xyozukpeqm62hbilud4upa": [33, 34], "ctc_guide_decode_b": 33, "pretrained_ctc": 33, "jit_pretrained_ctc": 33, "100h": 33, "yfyeung": 33, "wechat": 34, "zipformer_mmi": 34, "worker": [34, 43], "hp": 34, "tdnn_ligru_ctc": 36, "enough": [36, 37, 39], "luomingshuang": [36, 37], "icefall_asr_timit_tdnn_ligru_ctc": 36, "pretrained_average_9_25": 36, "fdhc0_si1559": [36, 37], "felc0_si756": [36, 37], "fmgd0_si1564": [36, 37], "ffprobe": [36, 37], "show_format": [36, 37], "nistspher": [36, 37], "database_id": [36, 37], "database_vers": [36, 37], "utterance_id": [36, 37], "dhc0_si1559": [36, 37], "sample_min": [36, 37], "4176": [36, 37], "sample_max": [36, 37], "5984": [36, 37], "bitrat": [36, 37], "258": [36, 37], "audio": [36, 37], "pcm_s16le": [36, 37], "s16": [36, 37], "elc0_si756": [36, 37], "1546": [36, 37], "1989": [36, 37], "mgd0_si1564": [36, 37], "7626": [36, 37], "10573": [36, 37], "660": 36, "695": 36, "697": 36, "819": 36, "829": 36, "sil": [36, 37], "dh": [36, 37], "ih": [36, 37], "uw": [36, 37], "ah": [36, 37], "ii": [36, 37], "z": [36, 37], "aa": [36, 37], "ei": [36, 37], "dx": [36, 37], "d": [36, 37], "uh": [36, 37], "ng": [36, 37], "eh": [36, 37], "jh": [36, 37], "er": [36, 37], "ai": [36, 37], "hh": [36, 37], "aw": 36, "ae": [36, 37], "705": 36, "715": 36, "720": 36, "ch": 36, "icefall_asr_timit_tdnn_lstm_ctc": 37, "pretrained_average_16_25": 37, "816": 37, "827": 37, "387": 37, "unk": 37, "739": 37, "971": 37, "977": 37, "978": 37, "981": 37, "ow": 37, "ykubhb5wrmosxykid1z9eg": 39, "23t23": 39, "icefall_asr_yesno_tdnn": 39, "l_disambig": 39, "lexicon_disambig": 39, "0_0_0_1_0_0_0_1": 39, "0_0_1_0_0_0_1_0": 39, "0_0_1_0_0_1_1_1": 39, "0_0_1_0_1_0_0_1": 39, "0_0_1_1_0_0_0_1": 39, "0_0_1_1_0_1_1_0": 39, "0_0_1_1_1_0_0_0": 39, "0_0_1_1_1_1_0_0": 39, "0_1_0_0_0_1_0_0": 39, "0_1_0_0_1_0_1_0": 39, "0_1_0_1_0_0_0_0": 39, "0_1_0_1_1_1_0_0": 39, "0_1_1_0_0_1_1_1": 39, "0_1_1_1_0_0_1_0": 39, "0_1_1_1_1_0_1_0": 39, "1_0_0_0_0_0_0_0": 39, "1_0_0_0_0_0_1_1": 39, "1_0_0_1_0_1_1_1": 39, "1_0_1_1_0_1_1_1": 39, "1_0_1_1_1_1_0_1": 39, "1_1_0_0_0_1_1_1": 39, "1_1_0_0_1_0_1_1": 39, "1_1_0_1_0_1_0_0": 39, "1_1_0_1_1_0_0_1": 39, "1_1_0_1_1_1_1_0": 39, "1_1_1_0_0_1_0_1": 39, "1_1_1_0_1_0_1_0": 39, "1_1_1_1_0_0_1_0": 39, "1_1_1_1_1_0_0_0": 39, "1_1_1_1_1_1_1_1": 39, "54080": 39, "507": 39, "108k": 39, "ye": 39, "hebrew": 39, "NO": 39, "621": 39, "119": 39, "650": 39, "139": 39, "143": 39, "198": 39, "181": 39, "186": 39, "187": 39, "213": 39, "correctli": 39, "simplest": 39, "former": 41, "idea": 41, "mask": [41, 44, 45], "wenet": 41, "did": 41, "metion": 41, "complic": 41, "techniqu": 41, "bank": 41, "memor": 41, "histori": 41, "introduc": 41, "variant": 41, "pruned_stateless_emformer_rnnt2": 41, "conv_emformer_transducer_stateless": 41, "ourself": 41, "mechan": 41, "onlin": 43, "lstm_transducer_stateless": 43, "lower": 43, "prepare_giga_speech": 43, "cj2vtpiwqhkn9q1tx6ptpg": 43, "dynam": [44, 45], "causal": 44, "short": [44, 45], "2012": 44, "05481": 44, "flag": 44, "indic": [44, 45], "whether": 44, "sequenc": [44, 45], "uniformli": [44, 45], "seen": [44, 45], "97vkxf80ru61cnp2alwzzg": 44, "streaming_decod": [44, 45], "wise": [44, 45], "parallel": [44, 45], "bath": [44, 45], "parallelli": [44, 45], "seem": 44, "benefit": 44, "mdoel": 44, "320m": 45, "550": 45, "scriptmodul": 45, "jit_trace_export": 45, "jit_trace_pretrain": 45, "task": 46}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"follow": 0, "code": 0, "style": 0, "contribut": [1, 3], "document": 1, "how": [2, 14, 20, 21], "creat": [2, 13], "recip": [2, 46], "data": [2, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "prepar": [2, 13, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "train": [2, 10, 13, 16, 17, 18, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "decod": [2, 5, 6, 7, 13, 14, 19, 23, 25, 26, 28, 29, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "pre": [2, 10, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "model": [2, 5, 10, 14, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "lodr": [4, 6], "rnn": 4, "transduc": [4, 6, 7, 16, 17, 18, 25, 31, 43, 44, 45], "wer": [4, 6, 7, 28], "differ": [4, 6, 7], "beam": [4, 6, 7, 25], "size": [4, 6, 7], "languag": 5, "lm": [6, 28], "rescor": [6, 23, 28], "base": 6, "method": 6, "v": 6, "shallow": [6, 7], "fusion": [6, 7], "The": [6, 25], "number": 6, "each": 6, "field": 6, "i": 6, "test": [6, 7, 13, 16, 17, 18], "clean": [6, 7], "other": 6, "time": [6, 7], "frequent": 8, "ask": 8, "question": 8, "faq": 8, "oserror": 8, "libtorch_hip": 8, "so": 8, "cannot": 8, "open": 8, "share": 8, "object": 8, "file": [8, 19], "directori": 8, "attributeerror": 8, "modul": 8, "distutil": 8, "ha": 8, "attribut": 8, "version": 8, "importerror": 8, "libpython3": 8, "10": 8, "1": [8, 13, 16, 17, 18, 23, 25, 26, 28], "0": [8, 13], "No": 8, "huggingfac": [9, 11], "space": 11, "youtub": [11, 13], "video": [11, 13], "icefal": [12, 13, 16, 17, 18], "content": [12, 46], "instal": [13, 16, 17, 18, 23, 25, 26, 28, 32, 36, 37], "cuda": 13, "toolkit": 13, "cudnn": 13, "pytorch": 13, "torchaudio": 13, "2": [13, 16, 17, 18, 23, 25, 26, 28], "k2": 13, "3": [13, 16, 17, 18, 23, 25, 28], "lhots": 13, "4": [13, 16, 17, 18], "download": [13, 16, 17, 18, 19, 23, 25, 26, 28, 31, 32, 33, 34, 36, 37, 39, 43, 44, 45], "exampl": [13, 19, 23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "virtual": 13, "environ": 13, "activ": 13, "your": 13, "5": [13, 16, 17, 18], "export": [14, 15, 16, 17, 18, 19, 20, 21, 22, 31, 33, 34, 43, 44, 45], "state_dict": [14, 31, 33, 34, 43, 44, 45], "when": [14, 20, 21], "us": [14, 20, 21, 31, 33, 34, 43, 44, 45], "run": 14, "py": 14, "ncnn": [15, 16, 17, 18], "convemform": 16, "pnnx": [16, 17, 18], "via": [16, 17, 18], "torch": [16, 17, 18, 20, 21, 31, 33, 34, 43, 44, 45], "jit": [16, 17, 18, 20, 21, 31, 33, 34, 43, 44, 45], "trace": [16, 17, 18, 21, 43, 45], "torchscript": [16, 17, 18], "6": [16, 17, 18], "modifi": [16, 17, 18, 25], "encod": [16, 17, 18], "sherpa": [16, 17, 18, 19, 31, 44, 45], "7": [16, 17], "option": [16, 17, 23, 26, 28, 31, 33, 34, 43, 44, 45], "int8": [16, 17], "quantiz": [16, 17], "lstm": [17, 26, 32, 37, 43], "stream": [18, 27, 40, 41, 44, 45], "zipform": [18, 33, 34, 45], "onnx": 19, "sound": 19, "script": [20, 31, 33, 34, 44, 45], "conform": [23, 28, 41], "ctc": [23, 26, 28, 32, 33, 36, 37, 39], "configur": [23, 26, 28, 31, 33, 34, 43, 44, 45], "log": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "usag": [23, 25, 26, 28, 31, 33, 34, 43, 44, 45], "case": [23, 25, 26, 28], "kaldifeat": [23, 25, 26, 28, 32, 36, 37, 39], "hlg": [23, 26, 28], "attent": [23, 28], "colab": [23, 25, 26, 28, 32, 36, 37, 39], "notebook": [23, 25, 26, 28, 32, 36, 37, 39], "deploy": [23, 28], "c": [23, 28], "aishel": 24, "stateless": 25, "loss": 25, "todo": 25, "greedi": 25, "search": 25, "tdnn": [26, 32, 36, 37, 39], "non": 27, "asr": [27, 40], "comput": 28, "n": 28, "gram": 28, "distil": 29, "hubert": 29, "codebook": 29, "index": 29, "librispeech": [30, 42], "prune": [31, 44], "statelessx": [31, 44], "pretrain": [31, 33, 34, 43, 44, 45], "deploi": [31, 44, 45], "infer": [32, 36, 37, 39], "blank": 33, "skip": 33, "mmi": 34, "timit": 35, "ligru": 36, "yesno": 38, "introduct": 41, "emform": 41, "which": 43, "simul": [44, 45], "real": [44, 45], "tabl": 46}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.todo": 2, "sphinx": 57}, "alltitles": {"Follow the code style": [[0, "follow-the-code-style"]], "Contributing to Documentation": [[1, "contributing-to-documentation"]], "How to create a recipe": [[2, "how-to-create-a-recipe"]], "Data Preparation": [[2, "data-preparation"], [25, "data-preparation"]], "Training": [[2, "training"], [13, "training"], [23, "training"], [25, "training"], [26, "training"], [28, "training"], [29, "training"], [31, "training"], [32, "training"], [33, "training"], [34, "training"], [36, "training"], [37, "training"], [39, "training"], [43, "training"], [44, "training"], [45, "training"]], "Decoding": [[2, "decoding"], [13, "decoding"], [23, "decoding"], [25, "decoding"], [26, "decoding"], [28, "decoding"], [29, "decoding"], [31, "decoding"], [32, "decoding"], [33, "decoding"], [34, "decoding"], [36, "decoding"], [37, "decoding"], [39, "decoding"], [43, "decoding"], [44, "decoding"], [45, "decoding"]], "Pre-trained model": [[2, "pre-trained-model"]], "Contributing": [[3, "contributing"]], "LODR for RNN Transducer": [[4, "lodr-for-rnn-transducer"]], "WER of LODR with different beam sizes": [[4, "id1"]], "Decoding with language models": [[5, "decoding-with-language-models"]], "LM rescoring for Transducer": [[6, "lm-rescoring-for-transducer"]], "WERs of LM rescoring with different beam sizes": [[6, "id1"]], "WERs of LM rescoring + LODR with different beam sizes": [[6, "id2"]], "LM-rescoring-based methods vs shallow-fusion-based methods (The numbers in each field is WER on test-clean, WER on test-other and decoding time on test-clean)": [[6, "id3"]], "Shallow fusion for Transducer": [[7, "shallow-fusion-for-transducer"]], "WERs and decoding time (on test-clean) of shallow fusion with different beam sizes": [[7, "id2"]], "Frequently Asked Questions (FAQs)": [[8, "frequently-asked-questions-faqs"]], "OSError: libtorch_hip.so: cannot open shared object file: no such file or directory": [[8, "oserror-libtorch-hip-so-cannot-open-shared-object-file-no-such-file-or-directory"]], "AttributeError: module \u2018distutils\u2019 has no attribute \u2018version\u2019": [[8, "attributeerror-module-distutils-has-no-attribute-version"]], "ImportError: libpython3.10.so.1.0: cannot open shared object file: No such file or directory": [[8, "importerror-libpython3-10-so-1-0-cannot-open-shared-object-file-no-such-file-or-directory"]], "Huggingface": [[9, "huggingface"]], "Pre-trained models": [[10, "pre-trained-models"]], "Huggingface spaces": [[11, "huggingface-spaces"]], "YouTube Video": [[11, "youtube-video"], [13, "youtube-video"]], "Icefall": [[12, "icefall"]], "Contents:": [[12, null]], "Installation": [[13, "installation"]], "(0) Install CUDA toolkit and cuDNN": [[13, "install-cuda-toolkit-and-cudnn"]], "(1) Install PyTorch and torchaudio": [[13, "install-pytorch-and-torchaudio"]], "(2) Install k2": [[13, "install-k2"]], "(3) Install lhotse": [[13, "install-lhotse"]], "(4) Download icefall": [[13, "download-icefall"]], "Installation example": [[13, "installation-example"]], "(1) Create a virtual environment": [[13, "create-a-virtual-environment"]], "(2) Activate your virtual environment": [[13, "activate-your-virtual-environment"]], "(3) Install k2": [[13, "id1"]], "(4) Install lhotse": [[13, "id2"]], "(5) Download icefall": [[13, "id3"]], "Test Your Installation": [[13, "test-your-installation"]], "Data preparation": [[13, "data-preparation"], [23, "data-preparation"], [26, "data-preparation"], [28, "data-preparation"], [29, "data-preparation"], [31, "data-preparation"], [32, "data-preparation"], [33, "data-preparation"], [34, "data-preparation"], [36, "data-preparation"], [37, "data-preparation"], [39, "data-preparation"], [43, "data-preparation"], [44, "data-preparation"], [45, "data-preparation"]], "Export model.state_dict()": [[14, "export-model-state-dict"], [31, "export-model-state-dict"], [33, "export-model-state-dict"], [34, "export-model-state-dict"], [43, "export-model-state-dict"], [44, "export-model-state-dict"], [45, "export-model-state-dict"]], "When to use it": [[14, "when-to-use-it"], [20, "when-to-use-it"], [21, "when-to-use-it"]], "How to export": [[14, "how-to-export"], [20, "how-to-export"], [21, "how-to-export"]], "How to use the exported model": [[14, "how-to-use-the-exported-model"], [20, "how-to-use-the-exported-model"]], "Use the exported model to run decode.py": [[14, "use-the-exported-model-to-run-decode-py"]], "Export to ncnn": [[15, "export-to-ncnn"]], "Export ConvEmformer transducer models to ncnn": [[16, "export-convemformer-transducer-models-to-ncnn"]], "1. Download the pre-trained model": [[16, "download-the-pre-trained-model"], [17, "download-the-pre-trained-model"], [18, "download-the-pre-trained-model"]], "2. Install ncnn and pnnx": [[16, "install-ncnn-and-pnnx"], [17, "install-ncnn-and-pnnx"], [18, "install-ncnn-and-pnnx"]], "3. Export the model via torch.jit.trace()": [[16, "export-the-model-via-torch-jit-trace"], [17, "export-the-model-via-torch-jit-trace"], [18, "export-the-model-via-torch-jit-trace"]], "4. Export torchscript model via pnnx": [[16, "export-torchscript-model-via-pnnx"], [17, "export-torchscript-model-via-pnnx"], [18, "export-torchscript-model-via-pnnx"]], "5. Test the exported models in icefall": [[16, "test-the-exported-models-in-icefall"], [17, "test-the-exported-models-in-icefall"], [18, "test-the-exported-models-in-icefall"]], "6. Modify the exported encoder for sherpa-ncnn": [[16, "modify-the-exported-encoder-for-sherpa-ncnn"], [17, "modify-the-exported-encoder-for-sherpa-ncnn"], [18, "modify-the-exported-encoder-for-sherpa-ncnn"]], "7. (Optional) int8 quantization with sherpa-ncnn": [[16, "optional-int8-quantization-with-sherpa-ncnn"], [17, "optional-int8-quantization-with-sherpa-ncnn"]], "Export LSTM transducer models to ncnn": [[17, "export-lstm-transducer-models-to-ncnn"]], "Export streaming Zipformer transducer models to ncnn": [[18, "export-streaming-zipformer-transducer-models-to-ncnn"]], "Export to ONNX": [[19, "export-to-onnx"]], "sherpa-onnx": [[19, "sherpa-onnx"]], "Example": [[19, "example"]], "Download the pre-trained model": [[19, "download-the-pre-trained-model"], [23, "download-the-pre-trained-model"], [25, "download-the-pre-trained-model"], [26, "download-the-pre-trained-model"], [28, "download-the-pre-trained-model"], [32, "download-the-pre-trained-model"], [36, "download-the-pre-trained-model"], [37, "download-the-pre-trained-model"], [39, "download-the-pre-trained-model"]], "Export the model to ONNX": [[19, "export-the-model-to-onnx"]], "Decode sound files with exported ONNX models": [[19, "decode-sound-files-with-exported-onnx-models"]], "Export model with torch.jit.script()": [[20, "export-model-with-torch-jit-script"]], "Export model with torch.jit.trace()": [[21, "export-model-with-torch-jit-trace"]], "How to use the exported models": [[21, "how-to-use-the-exported-models"]], "Model export": [[22, "model-export"]], "Conformer CTC": [[23, "conformer-ctc"], [28, "conformer-ctc"]], "Configurable options": [[23, "configurable-options"], [26, "configurable-options"], [28, "configurable-options"], [31, "configurable-options"], [33, "configurable-options"], [34, "configurable-options"], [43, "configurable-options"], [44, "configurable-options"], [45, "configurable-options"]], "Pre-configured options": [[23, "pre-configured-options"], [26, "pre-configured-options"], [28, "pre-configured-options"], [31, "pre-configured-options"], [33, "pre-configured-options"], [34, "pre-configured-options"], [43, "pre-configured-options"], [44, "pre-configured-options"], [45, "pre-configured-options"]], "Training logs": [[23, "training-logs"], [25, "training-logs"], [26, "training-logs"], [28, "training-logs"], [31, "training-logs"], [33, "training-logs"], [34, "training-logs"], [43, "training-logs"], [44, "training-logs"], [45, "training-logs"]], "Usage examples": [[23, "usage-examples"], [25, "usage-examples"], [26, "usage-examples"], [28, "usage-examples"]], "Case 1": [[23, "case-1"], [25, "case-1"], [26, "case-1"], [28, "case-1"]], "Case 2": [[23, "case-2"], [25, "case-2"], [26, "case-2"], [28, "case-2"]], "Case 3": [[23, "case-3"], [25, "case-3"], [28, "case-3"]], "Pre-trained Model": [[23, "pre-trained-model"], [25, "pre-trained-model"], [26, "pre-trained-model"], [28, "pre-trained-model"], [32, "pre-trained-model"], [36, "pre-trained-model"], [37, "pre-trained-model"], [39, "pre-trained-model"]], "Install kaldifeat": [[23, "install-kaldifeat"], [25, "install-kaldifeat"], [26, "install-kaldifeat"], [28, "install-kaldifeat"], [32, "install-kaldifeat"], [36, "install-kaldifeat"], [37, "install-kaldifeat"]], "Usage": [[23, "usage"], [25, "usage"], [26, "usage"], [28, "usage"]], "CTC decoding": [[23, "ctc-decoding"], [28, "ctc-decoding"], [28, "id2"]], "HLG decoding": [[23, "hlg-decoding"], [23, "id2"], [26, "hlg-decoding"], [28, "hlg-decoding"], [28, "id3"]], "HLG decoding + attention decoder rescoring": [[23, "hlg-decoding-attention-decoder-rescoring"]], "Colab notebook": [[23, "colab-notebook"], [25, "colab-notebook"], [26, "colab-notebook"], [28, "colab-notebook"], [32, "colab-notebook"], [36, "colab-notebook"], [37, "colab-notebook"], [39, "colab-notebook"]], "Deployment with C++": [[23, "deployment-with-c"], [28, "deployment-with-c"]], "aishell": [[24, "aishell"]], "Stateless Transducer": [[25, "stateless-transducer"]], "The Model": [[25, "the-model"]], "The Loss": [[25, "the-loss"]], "Todo": [[25, "id1"]], "Greedy search": [[25, "greedy-search"]], "Beam search": [[25, "beam-search"]], "Modified Beam search": [[25, "modified-beam-search"]], "TDNN-LSTM CTC": [[26, "tdnn-lstm-ctc"]], "Non Streaming ASR": [[27, "non-streaming-asr"]], "HLG decoding + LM rescoring": [[28, "hlg-decoding-lm-rescoring"]], "HLG decoding + LM rescoring + attention decoder rescoring": [[28, "hlg-decoding-lm-rescoring-attention-decoder-rescoring"]], "Compute WER with the pre-trained model": [[28, "compute-wer-with-the-pre-trained-model"]], "HLG decoding + n-gram LM rescoring": [[28, "hlg-decoding-n-gram-lm-rescoring"]], "HLG decoding + n-gram LM rescoring + attention decoder rescoring": [[28, "hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring"]], "Distillation with HuBERT": [[29, "distillation-with-hubert"]], "Codebook index preparation": [[29, "codebook-index-preparation"]], "LibriSpeech": [[30, "librispeech"], [42, "librispeech"]], "Pruned transducer statelessX": [[31, "pruned-transducer-statelessx"], [44, "pruned-transducer-statelessx"]], "Usage example": [[31, "usage-example"], [33, "usage-example"], [34, "usage-example"], [43, "usage-example"], [44, "usage-example"], [45, "usage-example"]], "Export Model": [[31, "export-model"], [44, "export-model"], [45, "export-model"]], "Export model using torch.jit.script()": [[31, "export-model-using-torch-jit-script"], [33, "export-model-using-torch-jit-script"], [34, "export-model-using-torch-jit-script"], [44, "export-model-using-torch-jit-script"], [45, "export-model-using-torch-jit-script"]], "Download pretrained models": [[31, "download-pretrained-models"], [33, "download-pretrained-models"], [34, "download-pretrained-models"], [43, "download-pretrained-models"], [44, "download-pretrained-models"], [45, "download-pretrained-models"]], "Deploy with Sherpa": [[31, "deploy-with-sherpa"], [44, "deploy-with-sherpa"], [45, "deploy-with-sherpa"]], "TDNN-LSTM-CTC": [[32, "tdnn-lstm-ctc"], [37, "tdnn-lstm-ctc"]], "Inference with a pre-trained model": [[32, "inference-with-a-pre-trained-model"], [36, "inference-with-a-pre-trained-model"], [37, "inference-with-a-pre-trained-model"], [39, "inference-with-a-pre-trained-model"]], "Zipformer CTC Blank Skip": [[33, "zipformer-ctc-blank-skip"]], "Export models": [[33, "export-models"], [34, "export-models"], [43, "export-models"]], "Zipformer MMI": [[34, "zipformer-mmi"]], "TIMIT": [[35, "timit"]], "TDNN-LiGRU-CTC": [[36, "tdnn-ligru-ctc"]], "YesNo": [[38, "yesno"]], "TDNN-CTC": [[39, "tdnn-ctc"]], "Download kaldifeat": [[39, "download-kaldifeat"]], "Streaming ASR": [[40, "streaming-asr"]], "Introduction": [[41, "introduction"]], "Streaming Conformer": [[41, "streaming-conformer"]], "Streaming Emformer": [[41, "streaming-emformer"]], "LSTM Transducer": [[43, "lstm-transducer"]], "Which model to use": [[43, "which-model-to-use"]], "Export model using torch.jit.trace()": [[43, "export-model-using-torch-jit-trace"], [45, "export-model-using-torch-jit-trace"]], "Simulate streaming decoding": [[44, "simulate-streaming-decoding"], [45, "simulate-streaming-decoding"]], "Real streaming decoding": [[44, "real-streaming-decoding"], [45, "real-streaming-decoding"]], "Zipformer Transducer": [[45, "zipformer-transducer"]], "Recipes": [[46, "recipes"]], "Table of Contents": [[46, null]]}, "indexentries": {}}) \ No newline at end of file