deploy: 099cd3a215a4f840bf6312b62aaa4693af31fc51

2025-12-11 06:55:27 +00:00 · 2022-09-20 15:02:05 +00:00 · 2022-09-20 15:02:05 +00:00 · 45a5750eda
commit 45a5750eda
parent 67dbb25620
15 changed files with 1352 additions and 10 deletions
--- a/.buildinfo
+++ b/.buildinfo
@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: cfc3e6ecc44ed7573f700065af8738a7
+config: 3ca2e66d59e42ffdb5e0a5ba2153f99e
 tags: 645f666f9bcd5a90fca523b33c5a78b7
--- a/_images/librispeech-lstm-transducer-tensorboard-log.png
+++ b/_images/librispeech-lstm-transducer-tensorboard-log.png
--- a/_sources/recipes/librispeech/index.rst.txt
+++ b/_sources/recipes/librispeech/index.rst.txt
@ -6,3 +6,4 @@ LibriSpeech

   tdnn_lstm_ctc
   conformer_ctc
+   lstm_pruned_stateless_transducer
--- a/_sources/recipes/librispeech/lstm_pruned_stateless_transducer.rst.txt
+++ b/_sources/recipes/librispeech/lstm_pruned_stateless_transducer.rst.txt
@ -0,0 +1,625 @@
+Transducer
+==========
+
+.. hint::
+
+   Please scroll down to the bottom of this page to find download links
+   for pretrained models if you don't want to train a model from scratch.
+
+
+This tutorial shows you how to train a transducer model
+with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
+
+We use pruned RNN-T to compute the loss.
+
+.. note::
+
+   You can find the paper about pruned RNN-T at the following address:
+
+   `<https://arxiv.org/abs/2206.13236>`_
+
+The transducer model consists of 3 parts:
+
+  - Encoder, a.k.a, transcriber. We use an LSTM model
+  - Decoder, a.k.a, predictor. We use a model consisting of ``nn.Embedding``
+    and ``nn.Conv1d``
+  - Joiner, a.k.a, the joint network.
+
+.. caution::
+
+   Contrary to the conventional RNN-T models, we use a stateless decoder.
+   That is, it has no recurrent connections.
+
+.. hint::
+
+   Since the encoder model is an LSTM, not Transformer/Conformer, the
+   resulting model is suitable for streaming/online ASR.
+
+
+Which model to use
+------------------
+
+Currently, there are two folders about LSTM stateless transducer training:
+
+  - ``(1)`` `<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless>`_
+
+    This recipe uses only LibriSpeech during training.
+
+  - ``(2)`` `<https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2>`_
+
+    This recipe uses GigaSpeech + LibriSpeech during training.
+
+``(1)`` and ``(2)`` use the same model architecture. The only difference is that ``(2)`` supports
+multi-dataset. Since ``(2)`` uses more data, it has a lower WER than ``(1)`` but it needs
+more training time.
+
+We use ``lstm_transducer_stateless2`` as an example below.
+
+.. note::
+
+   You need to download the `GigaSpeech <https://github.com/SpeechColab/GigaSpeech>`_ dataset
+   to run ``(2)``. If you have only ``LibriSpeech`` dataset available, feel free to use ``(1)``.
+
+Data preparation
+----------------
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./prepare.sh
+
+  # If you use (1), you can **skip** the following command
+  $ ./prepare_giga_speech.sh
+
+The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
+All you need to do is to run it.
+
+The data preparation contains several stages, you can use the following two
+options:
+
+  - ``--stage``
+  - ``--stop-stage``
+
+to control which stage(s) should be run. By default, all stages are executed.
+
+
+For example,
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./prepare.sh --stage 0 --stop-stage 0
+
+means to run only stage 0.
+
+To run stage 2 to stage 5, use:
+
+.. code-block:: bash
+
+  $ ./prepare.sh --stage 2 --stop-stage 5
+
+.. hint::
+
+  If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
+  dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
+  they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
+  the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
+  ``./prepare.sh`` won't re-download them.
+
+.. note::
+
+  All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
+  are saved in ``./data`` directory.
+
+We provide the following YouTube video showing how to run ``./prepare.sh``.
+
+.. note::
+
+   To get the latest news of `next-gen Kaldi <https://github.com/k2-fsa>`_, please subscribe
+   the following YouTube channel by `Nadira Povey <https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_:
+
+      `<https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_
+
+..  youtube:: ofEIoJL-mGM
+
+Training
+--------
+
+Configurable options
+~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./lstm_transducer_stateless2/train.py --help
+
+shows you the training options that can be passed from the commandline.
+The following options are used quite often:
+
+  - ``--full-libri``
+
+    If it's True, the training part uses all the training data, i.e.,
+    960 hours. Otherwise, the training part uses only the subset
+    ``train-clean-100``, which has 100 hours of training data.
+
+    .. CAUTION::
+
+      The training set is perturbed by speed with two factors: 0.9 and 1.1.
+      If ``--full-libri`` is True, each epoch actually processes
+      ``3x960 == 2880`` hours of data.
+
+  - ``--num-epochs``
+
+    It is the number of epochs to train. For instance,
+    ``./lstm_transducer_stateless2/train.py --num-epochs 30`` trains for 30 epochs
+    and generates ``epoch-1.pt``, ``epoch-2.pt``, ..., ``epoch-30.pt``
+    in the folder ``./lstm_transducer_stateless2/exp``.
+
+  - ``--start-epoch``
+
+    It's used to resume training.
+    ``./lstm_transducer_stateless2/train.py --start-epoch 10`` loads the
+    checkpoint ``./lstm_transducer_stateless2/exp/epoch-9.pt`` and starts
+    training from epoch 10, based on the state from epoch 9.
+
+  - ``--world-size``
+
+    It is used for multi-GPU single-machine DDP training.
+
+      - (a) If it is 1, then no DDP training is used.
+
+      - (b) If it is 2, then GPU 0 and GPU 1 are used for DDP training.
+
+    The following shows some use cases with it.
+
+      **Use case 1**: You have 4 GPUs, but you only want to use GPU 0 and
+      GPU 2 for training. You can do the following:
+
+        .. code-block:: bash
+
+          $ cd egs/librispeech/ASR
+          $ export CUDA_VISIBLE_DEVICES="0,2"
+          $ ./lstm_transducer_stateless2/train.py --world-size 2
+
+      **Use case 2**: You have 4 GPUs and you want to use all of them
+      for training. You can do the following:
+
+        .. code-block:: bash
+
+          $ cd egs/librispeech/ASR
+          $ ./lstm_transducer_stateless2/train.py --world-size 4
+
+      **Use case 3**: You have 4 GPUs but you only want to use GPU 3
+      for training. You can do the following:
+
+        .. code-block:: bash
+
+          $ cd egs/librispeech/ASR
+          $ export CUDA_VISIBLE_DEVICES="3"
+          $ ./lstm_transducer_stateless2/train.py --world-size 1
+
+    .. caution::
+
+      Only multi-GPU single-machine DDP training is implemented at present.
+      Multi-GPU multi-machine DDP training will be added later.
+
+  - ``--max-duration``
+
+    It specifies the number of seconds over all utterances in a
+    batch, before **padding**.
+    If you encounter CUDA OOM, please reduce it.
+
+    .. HINT::
+
+      Due to padding, the number of seconds of all utterances in a
+      batch will usually be larger than ``--max-duration``.
+
+      A larger value for ``--max-duration`` may cause OOM during training,
+      while a smaller value may increase the training time. You have to
+      tune it.
+
+  - ``--giga-prob``
+
+    The probability to select a batch from the ``GigaSpeech`` dataset.
+    Note: It is available only for ``(2)``.
+
+Pre-configured options
+~~~~~~~~~~~~~~~~~~~~~~
+
+There are some training options, e.g., weight decay,
+number of warmup steps, results dir, etc,
+that are not passed from the commandline.
+They are pre-configured by the function ``get_params()`` in
+`lstm_transducer_stateless2/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/train.py>`_
+
+You don't need to change these pre-configured parameters. If you really need to change
+them, please modify ``./lstm_transducer_stateless2/train.py`` directly.
+
+Training logs
+~~~~~~~~~~~~~
+
+Training logs and checkpoints are saved in ``lstm_transducer_stateless2/exp``.
+You will find the following files in that directory:
+
+  - ``epoch-1.pt``, ``epoch-2.pt``, ...
+
+    These are checkpoint files saved at the end of each epoch, containing model
+    ``state_dict`` and optimizer ``state_dict``.
+    To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
+
+      .. code-block:: bash
+
+        $ ./lstm_transducer_stateless2/train.py --start-epoch 11
+
+  - ``checkpoint-436000.pt``, ``checkpoint-438000.pt``, ...
+
+    These are checkpoint files saved every ``--save-every-n`` batches,
+    containing model ``state_dict`` and optimizer ``state_dict``.
+    To resume training from some checkpoint, say ``checkpoint-436000``, you can use:
+
+      .. code-block:: bash
+
+        $ ./lstm_transducer_stateless2/train.py --start-batch 436000
+
+  - ``tensorboard/``
+
+    This folder contains TensorBoard logs. Training loss, validation loss, learning
+    rate, etc, are recorded in these logs. You can visualize them by:
+
+      .. code-block:: bash
+
+        $ cd lstm_transducer_stateless2/exp/tensorboard
+        $ tensorboard dev upload --logdir . --description "LSTM transducer training for LibriSpeech with icefall"
+
+    It will print something like below:
+
+      .. code-block::
+
+        TensorFlow installation not found - running with reduced feature set.
+        Upload started and will continue reading any new data as it's added to the logdir.
+
+        To stop uploading, press Ctrl-C.
+
+        New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/cj2vtPiwQHKN9Q1tx6PTpg/
+
+        [2022-09-20T15:50:50] Started scanning logdir.
+        Uploading 4468 scalars...
+        [2022-09-20T15:53:02] Total uploaded: 210171 scalars, 0 tensors, 0 binary objects
+        Listening for new data in logdir...
+
+    Note there is a URL in the above output, click it and you will see
+    the following screenshot:
+
+      .. figure:: images/librispeech-lstm-transducer-tensorboard-log.png
+         :width: 600
+         :alt: TensorBoard screenshot
+         :align: center
+         :target: https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/
+
+         TensorBoard screenshot.
+
+  .. hint::
+
+    If you don't have access to google, you can use the following command
+    to view the tensorboard log locally:
+
+      .. code-block:: bash
+
+        cd lstm_transducer_stateless2/exp/tensorboard
+        tensorboard --logdir . --port 6008
+
+    It will print the following message:
+
+      .. code-block::
+
+        Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
+        TensorBoard 2.8.0 at http://localhost:6008/ (Press CTRL+C to quit)
+
+    Now start your browser and go to `<http://localhost:6008>`_ to view the tensorboard
+    logs.
+
+
+  - ``log/log-train-xxxx``
+
+    It is the detailed training log in text format, same as the one
+    you saw printed to the console during training.
+
+Usage example
+~~~~~~~~~~~~~
+
+You can use the following command to start the training using 8 GPUs:
+
+.. code-block:: bash
+
+  export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+  ./lstm_transducer_stateless2/train.py \
+    --world-size 8 \
+    --num-epochs 35 \
+    --start-epoch 1 \
+    --full-libri 1 \
+    --exp-dir lstm_transducer_stateless2/exp \
+    --max-duration 500 \
+    --use-fp16 0 \
+    --lr-epochs 10 \
+    --num-workers 2 \
+    --giga-prob 0.9
+
+Decoding
+--------
+
+The decoding part uses checkpoints saved by the training part, so you have
+to run the training part first.
+
+.. hint::
+
+   There are two kinds of checkpoints:
+
+    - (1) ``epoch-1.pt``, ``epoch-2.pt``, ..., which are saved at the end
+      of each epoch. You can pass ``--epoch`` to
+      ``lstm_transducer_stateless2/decode.py`` to use them.
+
+    - (2) ``checkpoints-436000.pt``, ``epoch-438000.pt``, ..., which are saved
+      every ``--save-every-n`` batches. You can pass ``--iter`` to
+      ``lstm_transducer_stateless2/decode.py`` to use them.
+
+    We suggest that you try both types of checkpoints and choose the one
+    that produces the lowest WERs.
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./lstm_transducer_stateless2/decode.py --help
+
+shows the options for decoding.
+
+The following shows two examples:
+
+.. code-block:: bash
+
+  for m in greedy_search fast_beam_search modified_beam_search; do
+    for epoch in 17; do
+      for avg in 1 2; do
+        ./lstm_transducer_stateless2/decode.py \
+          --epoch $epoch \
+          --avg $avg \
+          --exp-dir lstm_transducer_stateless2/exp \
+          --max-duration 600 \
+          --num-encoder-layers 12 \
+          --rnn-hidden-size 1024 \
+          --decoding-method $m \
+          --use-averaged-model True \
+          --beam 4 \
+          --max-contexts 4 \
+          --max-states 8 \
+          --beam-size 4
+      done
+    done
+  done
+
+
+.. code-block:: bash
+
+  for m in greedy_search fast_beam_search modified_beam_search; do
+    for iter in 474000; do
+      for avg in 8 10 12 14 16 18; do
+        ./lstm_transducer_stateless2/decode.py \
+          --iter $iter \
+          --avg $avg \
+          --exp-dir lstm_transducer_stateless2/exp \
+          --max-duration 600 \
+          --num-encoder-layers 12 \
+          --rnn-hidden-size 1024 \
+          --decoding-method $m \
+          --use-averaged-model True \
+          --beam 4 \
+          --max-contexts 4 \
+          --max-states 8 \
+          --beam-size 4
+      done
+    done
+  done
+
+Export models
+-------------
+
+`lstm_transducer_stateless2/export.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/export.py>`_ supports to export checkpoints from ``lstm_transducer_stateless2/exp`` in the following ways.
+
+Export ``model.state_dict()``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Checkpoints saved by ``lstm_transducer_stateless2/train.py`` also include
+``optimizer.state_dict()``. It is useful for resuming training. But after training,
+we are interested only in ``model.state_dict()``. You can use the following
+command to extract ``model.state_dict()``.
+
+.. code-block:: bash
+
+  # Assume that --iter 468000 --avg 16 produces the smallest WER
+  # (You can get such information after running ./lstm_transducer_stateless2/decode.py)
+
+  iter=468000
+  avg=16
+
+  ./lstm_transducer_stateless2/export.py \
+    --exp-dir ./lstm_transducer_stateless2/exp \
+    --bpe-model data/lang_bpe_500/bpe.model \
+    --iter $iter \
+    --avg  $avg
+
+It will generate a file ``./lstm_transducer_stateless2/exp/pretrained.pt``.
+
+.. hint::
+
+   To use the generated ``pretrained.pt`` for ``lstm_transducer_stateless2/decode.py``,
+   you can run:
+
+   .. code-block:: bash
+
+      cd lstm_transducer_stateless2/exp
+      ln -s pretrained epoch-9999.pt
+
+   And then pass `--epoch 9999 --avg 1 --use-averaged-model 0` to
+   ``./lstm_transducer_stateless2/decode.py``.
+
+To use the exported model with ``./lstm_transducer_stateless2/pretrained.py``, you
+can run:
+
+.. code-block:: bash
+
+  ./lstm_transducer_stateless2/pretrained.py \
+    --checkpoint ./lstm_transducer_stateless2/exp/pretrained.pt \
+    --bpe-model ./data/lang_bpe_500/bpe.model \
+    --method greedy_search \
+    /path/to/foo.wav \
+    /path/to/bar.wav
+
+Export model using ``torch.jit.trace()``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+  iter=468000
+  avg=16
+
+  ./lstm_transducer_stateless2/export.py \
+    --exp-dir ./lstm_transducer_stateless2/exp \
+    --bpe-model data/lang_bpe_500/bpe.model \
+    --iter $iter \
+    --avg  $avg \
+    --jit-trace 1
+
+It will generate 3 files:
+
+  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace.pt``
+  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace.pt``
+  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace.pt``
+
+To use the generated files with ``./lstm_transducer_stateless2/jit_pretrained``:
+
+.. code-block:: bash
+
+  ./lstm_transducer_stateless2/jit_pretrained.py \
+    --bpe-model ./data/lang_bpe_500/bpe.model \
+    --encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt \
+    --decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt \
+    --joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt \
+    /path/to/foo.wav \
+    /path/to/bar.wav
+
+Export model for ncnn
+~~~~~~~~~~~~~~~~~~~~~
+
+We support exporting pretrained LSTM transducer models to
+`ncnn <https://github.com/tencent/ncnn>`_ using
+`pnnx <https://github.com/Tencent/ncnn/tree/master/tools/pnnx>`_.
+
+First, let us install a modified version of ``ncnn``:
+
+.. code-block:: bash
+
+  git clone https://github.com/csukuangfj/ncnn
+  cd ncnn
+  git submodule update --recursive --init
+  python3 setup.py bdist_wheel
+  ls -lh dist/
+  pip install ./dist/*.whl
+
+  # now build pnnx
+  cd tools/pnnx
+  mkdir build
+  cd build
+  make -j4
+  export PATH=$PWD/src:$PATH
+
+  ./src/pnnx
+
+.. note::
+
+   We assume that you have added the path to the binary ``pnnx`` to the
+   environment variable ``PATH``.
+
+Second, let us export the model using ``torch.jit.trace()`` that is suitable
+for ``pnnx``:
+
+.. code-block:: bash
+
+  iter=468000
+  avg=16
+
+  ./lstm_transducer_stateless2/export.py \
+    --exp-dir ./lstm_transducer_stateless2/exp \
+    --bpe-model data/lang_bpe_500/bpe.model \
+    --iter $iter \
+    --avg  $avg \
+    --pnnx 1
+
+It will generate 3 files:
+
+  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.pt``
+  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.pt``
+  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.pt``
+
+Third, convert torchscript model to ``ncnn`` format:
+
+.. code-block::
+
+   pnnx ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.pt
+   pnnx ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.pt
+   pnnx ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.pt
+
+It will generate the following files:
+
+  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param``
+  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin``
+  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param``
+  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin``
+  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param``
+  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin``
+
+To use the above generate files, run:
+
+.. code-block:: bash
+
+./lstm_transducer_stateless2/ncnn-decode.py \
+ --bpe-model-filename ./data/lang_bpe_500/bpe.model \
+ --encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
+ --encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
+ --decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
+ --decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
+ --joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
+ --joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
+ /path/to/foo.wav
+
+.. code-block:: bash
+
+./lstm_transducer_stateless2/streaming-ncnn-decode.py \
+ --bpe-model-filename ./data/lang_bpe_500/bpe.model \
+ --encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
+ --encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
+ --decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
+ --decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
+ --joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
+ --joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
+ /path/to/foo.wav
+
+To use the above generated files in C++, please see
+`<https://github.com/k2-fsa/sherpa-ncnn>`_
+
+It is able to generate a static linked library that can be run on Linux, Windows,
+macOS, Raspberry Pi, etc.
+
+Download pretrained models
+--------------------------
+
+If you don't want to train from scratch, you can download the pretrained models
+by visiting the following links:
+
+  - `<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>`_
+
+  - `<https://huggingface.co/Zengwei/icefall-asr-librispeech-lstm-transducer-stateless-2022-08-18>`_
+
+  See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
+  for the details of the above pretrained models
+
+You can find more usages of the pretrained models in
+`<https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html>`_
--- a/contributing/code-style.html
+++ b/contributing/code-style.html
@ -108,7 +108,7 @@ $ pre-commit install
 <div><figure class="align-center" id="id2">
 <a class="reference internal image-reference" href="../_images/pre-commit-check.png"><img alt="../_images/pre-commit-check.png" src="../_images/pre-commit-check.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 7 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Failed).</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 8 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Failed).</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
@ -127,7 +127,7 @@ it should succeed this time:</p>
 <div><figure class="align-center" id="id3">
 <a class="reference internal image-reference" href="../_images/pre-commit-check-success.png"><img alt="../_images/pre-commit-check-success.png" src="../_images/pre-commit-check-success.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 8 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Succeeded).</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 9 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Succeeded).</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/contributing/doc.html
+++ b/contributing/doc.html
@ -116,7 +116,7 @@ the following:</p>
 <div><figure class="align-center" id="id1">
 <a class="reference internal image-reference" href="../_images/doc-contrib.png"><img alt="../_images/doc-contrib.png" src="../_images/doc-contrib.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 6 </span><span class="caption-text">View generated documentation locally with <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-m</span> <span class="pre">http.server</span></code>.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 7 </span><span class="caption-text">View generated documentation locally with <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-m</span> <span class="pre">http.server</span></code>.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/objects.inv
+++ b/objects.inv
--- a/recipes/index.html
+++ b/recipes/index.html
@ -93,6 +93,7 @@ Currently, only speech recognition recipes are provided.</p>
 <li class="toctree-l1"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a><ul>
 <li class="toctree-l2"><a class="reference internal" href="librispeech/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
 <li class="toctree-l2"><a class="reference internal" href="librispeech/conformer_ctc.html">Conformer CTC</a></li>
+<li class="toctree-l2"><a class="reference internal" href="librispeech/lstm_pruned_stateless_transducer.html">Transducer</a></li>
 </ul>
 </li>
 <li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>
--- a/recipes/librispeech/conformer_ctc.html
+++ b/recipes/librispeech/conformer_ctc.html
@ -19,7 +19,7 @@
    <script src="../../_static/js/theme.js"></script>
    <link rel="index" title="Index" href="../../genindex.html" />
    <link rel="search" title="Search" href="../../search.html" />
-    <link rel="next" title="TIMIT" href="../timit/index.html" />
+    <link rel="next" title="Transducer" href="lstm_pruned_stateless_transducer.html" />
    <link rel="prev" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" /> 
 </head>

@ -54,6 +54,7 @@
 <li class="toctree-l4"><a class="reference internal" href="#deployment-with-c">Deployment with C++</a></li>
 </ul>
 </li>
+<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
 </ul>
 </li>
 <li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -1086,7 +1087,7 @@ Please see <a class="reference external" href="https://colab.research.google.com
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-left" title="TDNN-LSTM-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
-        <a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+        <a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-right" title="Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>
--- a/recipes/librispeech/index.html
+++ b/recipes/librispeech/index.html
@ -46,6 +46,7 @@
 <li class="toctree-l2 current"><a class="current reference internal" href="#">LibriSpeech</a><ul>
 <li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
 <li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
+<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
 </ul>
 </li>
 <li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -87,6 +88,7 @@
 <ul>
 <li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
 <li class="toctree-l1"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
+<li class="toctree-l1"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
 </ul>
 </div>
 </section>
--- a/recipes/librispeech/lstm_pruned_stateless_transducer.html
+++ b/recipes/librispeech/lstm_pruned_stateless_transducer.html
@ -0,0 +1,711 @@
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
+
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>Transducer &mdash; icefall 0.1 documentation</title>
+      <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
+      <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
+  <!--[if lt IE 9]>
+    <script src="../../_static/js/html5shiv.min.js"></script>
+  <![endif]-->
+  
+        <script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
+        <script src="../../_static/jquery.js"></script>
+        <script src="../../_static/underscore.js"></script>
+        <script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
+        <script src="../../_static/doctools.js"></script>
+    <script src="../../_static/js/theme.js"></script>
+    <link rel="index" title="Index" href="../../genindex.html" />
+    <link rel="search" title="Search" href="../../search.html" />
+    <link rel="next" title="TIMIT" href="../timit/index.html" />
+    <link rel="prev" title="Conformer CTC" href="conformer_ctc.html" /> 
+</head>
+
+<body class="wy-body-for-nav"> 
+  <div class="wy-grid-for-nav">
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+            <a href="../../index.html" class="icon icon-home"> icefall
+          </a>
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
+              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li>
+<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
+<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
+<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
+<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
+<li class="toctree-l3 current"><a class="current reference internal" href="#">Transducer</a><ul>
+<li class="toctree-l4"><a class="reference internal" href="#which-model-to-use">Which model to use</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#export-models">Export models</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#download-pretrained-models">Download pretrained models</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
+</ul>
+
+        </div>
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
+          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+          <a href="../../index.html">icefall</a>
+      </nav>
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="Page navigation">
+  <ul class="wy-breadcrumbs">
+      <li><a href="../../index.html" class="icon icon-home"></a> &raquo;</li>
+          <li><a href="../index.html">Recipes</a> &raquo;</li>
+          <li><a href="index.html">LibriSpeech</a> &raquo;</li>
+      <li>Transducer</li>
+      <li class="wy-breadcrumbs-aside">
+              <a href="https://github.com/k2-fsa/icefall/blob/master/icefall/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a>
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+             
+  <section id="transducer">
+<h1>Transducer<a class="headerlink" href="#transducer" title="Permalink to this heading"></a></h1>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>Please scroll down to the bottom of this page to find download links
+for pretrained models if you don’t want to train a model from scratch.</p>
+</div>
+<p>This tutorial shows you how to train a transducer model
+with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
+<p>We use pruned RNN-T to compute the loss.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>You can find the paper about pruned RNN-T at the following address:</p>
+<p><a class="reference external" href="https://arxiv.org/abs/2206.13236">https://arxiv.org/abs/2206.13236</a></p>
+</div>
+<p>The transducer model consists of 3 parts:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p>Encoder, a.k.a, transcriber. We use an LSTM model</p></li>
+<li><p>Decoder, a.k.a, predictor. We use a model consisting of <code class="docutils literal notranslate"><span class="pre">nn.Embedding</span></code>
+and <code class="docutils literal notranslate"><span class="pre">nn.Conv1d</span></code></p></li>
+<li><p>Joiner, a.k.a, the joint network.</p></li>
+</ul>
+</div></blockquote>
+<div class="admonition caution">
+<p class="admonition-title">Caution</p>
+<p>Contrary to the conventional RNN-T models, we use a stateless decoder.
+That is, it has no recurrent connections.</p>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>Since the encoder model is an LSTM, not Transformer/Conformer, the
+resulting model is suitable for streaming/online ASR.</p>
+</div>
+<section id="which-model-to-use">
+<h2>Which model to use<a class="headerlink" href="#which-model-to-use" title="Permalink to this heading"></a></h2>
+<p>Currently, there are two folders about LSTM stateless transducer training:</p>
+<blockquote>
+<div><ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">(1)</span></code> <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless">https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless</a></p>
+<p>This recipe uses only LibriSpeech during training.</p>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">(2)</span></code> <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2">https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2</a></p>
+<p>This recipe uses GigaSpeech + LibriSpeech during training.</p>
+</li>
+</ul>
+</div></blockquote>
+<p><code class="docutils literal notranslate"><span class="pre">(1)</span></code> and <code class="docutils literal notranslate"><span class="pre">(2)</span></code> use the same model architecture. The only difference is that <code class="docutils literal notranslate"><span class="pre">(2)</span></code> supports
+multi-dataset. Since <code class="docutils literal notranslate"><span class="pre">(2)</span></code> uses more data, it has a lower WER than <code class="docutils literal notranslate"><span class="pre">(1)</span></code> but it needs
+more training time.</p>
+<p>We use <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2</span></code> as an example below.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>You need to download the <a class="reference external" href="https://github.com/SpeechColab/GigaSpeech">GigaSpeech</a> dataset
+to run <code class="docutils literal notranslate"><span class="pre">(2)</span></code>. If you have only <code class="docutils literal notranslate"><span class="pre">LibriSpeech</span></code> dataset available, feel free to use <code class="docutils literal notranslate"><span class="pre">(1)</span></code>.</p>
+</div>
+</section>
+<section id="data-preparation">
+<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ ./prepare.sh
+
+<span class="c1"># If you use (1), you can **skip** the following command</span>
+$ ./prepare_giga_speech.sh
+</pre></div>
+</div>
+<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
+All you need to do is to run it.</p>
+<p>The data preparation contains several stages, you can use the following two
+options:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">--stage</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--stop-stage</span></code></p></li>
+</ul>
+</div></blockquote>
+<p>to control which stage(s) should be run. By default, all stages are executed.</p>
+<p>For example,</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
+</pre></div>
+</div>
+<p>means to run only stage 0.</p>
+<p>To run stage 2 to stage 5, use:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
+</pre></div>
+</div>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>If you have pre-downloaded the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>
+dataset and the <a class="reference external" href="http://www.openslr.org/17/">musan</a> dataset, say,
+they are saved in <code class="docutils literal notranslate"><span class="pre">/tmp/LibriSpeech</span></code> and <code class="docutils literal notranslate"><span class="pre">/tmp/musan</span></code>, you can modify
+the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></code> variable in <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> to point to <code class="docutils literal notranslate"><span class="pre">/tmp</span></code> so that
+<code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> won’t re-download them.</p>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>All generated files by <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>, e.g., features, lexicon, etc,
+are saved in <code class="docutils literal notranslate"><span class="pre">./data</span></code> directory.</p>
+</div>
+<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>To get the latest news of <a class="reference external" href="https://github.com/k2-fsa">next-gen Kaldi</a>, please subscribe
+the following YouTube channel by <a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">Nadira Povey</a>:</p>
+<blockquote>
+<div><p><a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw</a></p>
+</div></blockquote>
+</div>
+<div class="video_wrapper" style="">
+<iframe allowfullscreen="true" src="https://www.youtube.com/embed/ofEIoJL-mGM" style="border: 0; height: 345px; width: 560px">
+</iframe></div></section>
+<section id="training">
+<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
+<section id="configurable-options">
+<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ ./lstm_transducer_stateless2/train.py --help
+</pre></div>
+</div>
+<p>shows you the training options that can be passed from the commandline.
+The following options are used quite often:</p>
+<blockquote>
+<div><ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">--full-libri</span></code></p>
+<p>If it’s True, the training part uses all the training data, i.e.,
+960 hours. Otherwise, the training part uses only the subset
+<code class="docutils literal notranslate"><span class="pre">train-clean-100</span></code>, which has 100 hours of training data.</p>
+<div class="admonition caution">
+<p class="admonition-title">Caution</p>
+<p>The training set is perturbed by speed with two factors: 0.9 and 1.1.
+If <code class="docutils literal notranslate"><span class="pre">--full-libri</span></code> is True, each epoch actually processes
+<code class="docutils literal notranslate"><span class="pre">3x960</span> <span class="pre">==</span> <span class="pre">2880</span></code> hours of data.</p>
+</div>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--num-epochs</span></code></p>
+<p>It is the number of epochs to train. For instance,
+<code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/train.py</span> <span class="pre">--num-epochs</span> <span class="pre">30</span></code> trains for 30 epochs
+and generates <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, <code class="docutils literal notranslate"><span class="pre">epoch-30.pt</span></code>
+in the folder <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp</span></code>.</p>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--start-epoch</span></code></p>
+<p>It’s used to resume training.
+<code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/train.py</span> <span class="pre">--start-epoch</span> <span class="pre">10</span></code> loads the
+checkpoint <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/epoch-9.pt</span></code> and starts
+training from epoch 10, based on the state from epoch 9.</p>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--world-size</span></code></p>
+<p>It is used for multi-GPU single-machine DDP training.</p>
+<blockquote>
+<div><ul class="simple">
+<li><ol class="loweralpha simple">
+<li><p>If it is 1, then no DDP training is used.</p></li>
+</ol>
+</li>
+<li><ol class="loweralpha simple" start="2">
+<li><p>If it is 2, then GPU 0 and GPU 1 are used for DDP training.</p></li>
+</ol>
+</li>
+</ul>
+</div></blockquote>
+<p>The following shows some use cases with it.</p>
+<blockquote>
+<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
+GPU 2 for training. You can do the following:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
+$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">2</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
+for training. You can do the following:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">4</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
+for training. You can do the following:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
+$ ./lstm_transducer_stateless2/train.py --world-size <span class="m">1</span>
+</pre></div>
+</div>
+</div></blockquote>
+</div></blockquote>
+<div class="admonition caution">
+<p class="admonition-title">Caution</p>
+<p>Only multi-GPU single-machine DDP training is implemented at present.
+Multi-GPU multi-machine DDP training will be added later.</p>
+</div>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--max-duration</span></code></p>
+<p>It specifies the number of seconds over all utterances in a
+batch, before <strong>padding</strong>.
+If you encounter CUDA OOM, please reduce it.</p>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>Due to padding, the number of seconds of all utterances in a
+batch will usually be larger than <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code>.</p>
+<p>A larger value for <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> may cause OOM during training,
+while a smaller value may increase the training time. You have to
+tune it.</p>
+</div>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">--giga-prob</span></code></p>
+<p>The probability to select a batch from the <code class="docutils literal notranslate"><span class="pre">GigaSpeech</span></code> dataset.
+Note: It is available only for <code class="docutils literal notranslate"><span class="pre">(2)</span></code>.</p>
+</li>
+</ul>
+</div></blockquote>
+</section>
+<section id="pre-configured-options">
+<h3>Pre-configured options<a class="headerlink" href="#pre-configured-options" title="Permalink to this heading"></a></h3>
+<p>There are some training options, e.g., weight decay,
+number of warmup steps, results dir, etc,
+that are not passed from the commandline.
+They are pre-configured by the function <code class="docutils literal notranslate"><span class="pre">get_params()</span></code> in
+<a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/train.py">lstm_transducer_stateless2/train.py</a></p>
+<p>You don’t need to change these pre-configured parameters. If you really need to change
+them, please modify <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/train.py</span></code> directly.</p>
+</section>
+<section id="training-logs">
+<h3>Training logs<a class="headerlink" href="#training-logs" title="Permalink to this heading"></a></h3>
+<p>Training logs and checkpoints are saved in <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/exp</span></code>.
+You will find the following files in that directory:</p>
+<blockquote>
+<div><ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …</p>
+<p>These are checkpoint files saved at the end of each epoch, containing model
+<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
+To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./lstm_transducer_stateless2/train.py --start-epoch <span class="m">11</span>
+</pre></div>
+</div>
+</div></blockquote>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">checkpoint-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">checkpoint-438000.pt</span></code>, …</p>
+<p>These are checkpoint files saved every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches,
+containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
+To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./lstm_transducer_stateless2/train.py --start-batch <span class="m">436000</span>
+</pre></div>
+</div>
+</div></blockquote>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">tensorboard/</span></code></p>
+<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
+rate, etc, are recorded in these logs. You can visualize them by:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
+$ tensorboard dev upload --logdir . --description <span class="s2">&quot;LSTM transducer training for LibriSpeech with icefall&quot;</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p>It will print something like below:</p>
+<blockquote>
+<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">TensorFlow</span> <span class="n">installation</span> <span class="ow">not</span> <span class="n">found</span> <span class="o">-</span> <span class="n">running</span> <span class="k">with</span> <span class="n">reduced</span> <span class="n">feature</span> <span class="nb">set</span><span class="o">.</span>
+<span class="n">Upload</span> <span class="n">started</span> <span class="ow">and</span> <span class="n">will</span> <span class="k">continue</span> <span class="n">reading</span> <span class="nb">any</span> <span class="n">new</span> <span class="n">data</span> <span class="k">as</span> <span class="n">it</span><span class="s1">&#39;s added to the logdir.</span>
+
+<span class="n">To</span> <span class="n">stop</span> <span class="n">uploading</span><span class="p">,</span> <span class="n">press</span> <span class="n">Ctrl</span><span class="o">-</span><span class="n">C</span><span class="o">.</span>
+
+<span class="n">New</span> <span class="n">experiment</span> <span class="n">created</span><span class="o">.</span> <span class="n">View</span> <span class="n">your</span> <span class="n">TensorBoard</span> <span class="n">at</span><span class="p">:</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">tensorboard</span><span class="o">.</span><span class="n">dev</span><span class="o">/</span><span class="n">experiment</span><span class="o">/</span><span class="n">cj2vtPiwQHKN9Q1tx6PTpg</span><span class="o">/</span>
+
+<span class="p">[</span><span class="mi">2022</span><span class="o">-</span><span class="mi">09</span><span class="o">-</span><span class="mi">20</span><span class="n">T15</span><span class="p">:</span><span class="mi">50</span><span class="p">:</span><span class="mi">50</span><span class="p">]</span> <span class="n">Started</span> <span class="n">scanning</span> <span class="n">logdir</span><span class="o">.</span>
+<span class="n">Uploading</span> <span class="mi">4468</span> <span class="n">scalars</span><span class="o">...</span>
+<span class="p">[</span><span class="mi">2022</span><span class="o">-</span><span class="mi">09</span><span class="o">-</span><span class="mi">20</span><span class="n">T15</span><span class="p">:</span><span class="mi">53</span><span class="p">:</span><span class="mi">02</span><span class="p">]</span> <span class="n">Total</span> <span class="n">uploaded</span><span class="p">:</span> <span class="mi">210171</span> <span class="n">scalars</span><span class="p">,</span> <span class="mi">0</span> <span class="n">tensors</span><span class="p">,</span> <span class="mi">0</span> <span class="n">binary</span> <span class="n">objects</span>
+<span class="n">Listening</span> <span class="k">for</span> <span class="n">new</span> <span class="n">data</span> <span class="ow">in</span> <span class="n">logdir</span><span class="o">...</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p>Note there is a URL in the above output, click it and you will see
+the following screenshot:</p>
+<blockquote>
+<div><figure class="align-center" id="id2">
+<a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../_images/librispeech-lstm-transducer-tensorboard-log.png" style="width: 600px;" /></a>
+<figcaption>
+<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
+</figcaption>
+</figure>
+</div></blockquote>
+</li>
+</ul>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>If you don’t have access to google, you can use the following command
+to view the tensorboard log locally:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
+tensorboard --logdir . --port <span class="m">6008</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p>It will print the following message:</p>
+<blockquote>
+<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Serving</span> <span class="n">TensorBoard</span> <span class="n">on</span> <span class="n">localhost</span><span class="p">;</span> <span class="n">to</span> <span class="n">expose</span> <span class="n">to</span> <span class="n">the</span> <span class="n">network</span><span class="p">,</span> <span class="n">use</span> <span class="n">a</span> <span class="n">proxy</span> <span class="ow">or</span> <span class="k">pass</span> <span class="o">--</span><span class="n">bind_all</span>
+<span class="n">TensorBoard</span> <span class="mf">2.8.0</span> <span class="n">at</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">localhost</span><span class="p">:</span><span class="mi">6008</span><span class="o">/</span> <span class="p">(</span><span class="n">Press</span> <span class="n">CTRL</span><span class="o">+</span><span class="n">C</span> <span class="n">to</span> <span class="n">quit</span><span class="p">)</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p>Now start your browser and go to <a class="reference external" href="http://localhost:6008">http://localhost:6008</a> to view the tensorboard
+logs.</p>
+</div>
+<ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">log/log-train-xxxx</span></code></p>
+<p>It is the detailed training log in text format, same as the one
+you saw printed to the console during training.</p>
+</li>
+</ul>
+</div></blockquote>
+</section>
+<section id="usage-example">
+<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
+<p>You can use the following command to start the training using 8 GPUs:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3,4,5,6,7&quot;</span>
+./lstm_transducer_stateless2/train.py <span class="se">\</span>
+  --world-size <span class="m">8</span> <span class="se">\</span>
+  --num-epochs <span class="m">35</span> <span class="se">\</span>
+  --start-epoch <span class="m">1</span> <span class="se">\</span>
+  --full-libri <span class="m">1</span> <span class="se">\</span>
+  --exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
+  --max-duration <span class="m">500</span> <span class="se">\</span>
+  --use-fp16 <span class="m">0</span> <span class="se">\</span>
+  --lr-epochs <span class="m">10</span> <span class="se">\</span>
+  --num-workers <span class="m">2</span> <span class="se">\</span>
+  --giga-prob <span class="m">0</span>.9
+</pre></div>
+</div>
+</section>
+</section>
+<section id="decoding">
+<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
+<p>The decoding part uses checkpoints saved by the training part, so you have
+to run the training part first.</p>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>There are two kinds of checkpoints:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p>(1) <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, which are saved at the end
+of each epoch. You can pass <code class="docutils literal notranslate"><span class="pre">--epoch</span></code> to
+<code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/decode.py</span></code> to use them.</p></li>
+<li><p>(2) <code class="docutils literal notranslate"><span class="pre">checkpoints-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-438000.pt</span></code>, …, which are saved
+every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches. You can pass <code class="docutils literal notranslate"><span class="pre">--iter</span></code> to
+<code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/decode.py</span></code> to use them.</p></li>
+</ul>
+<p>We suggest that you try both types of checkpoints and choose the one
+that produces the lowest WERs.</p>
+</div></blockquote>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
+$ ./lstm_transducer_stateless2/decode.py --help
+</pre></div>
+</div>
+<p>shows the options for decoding.</p>
+<p>The following shows two examples:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
+  <span class="k">for</span> epoch <span class="k">in</span> <span class="m">17</span><span class="p">;</span> <span class="k">do</span>
+    <span class="k">for</span> avg <span class="k">in</span> <span class="m">1</span> <span class="m">2</span><span class="p">;</span> <span class="k">do</span>
+      ./lstm_transducer_stateless2/decode.py <span class="se">\</span>
+        --epoch <span class="nv">$epoch</span> <span class="se">\</span>
+        --avg <span class="nv">$avg</span> <span class="se">\</span>
+        --exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
+        --max-duration <span class="m">600</span> <span class="se">\</span>
+        --num-encoder-layers <span class="m">12</span> <span class="se">\</span>
+        --rnn-hidden-size <span class="m">1024</span> <span class="se">\</span>
+        --decoding-method <span class="nv">$m</span> <span class="se">\</span>
+        --use-averaged-model True <span class="se">\</span>
+        --beam <span class="m">4</span> <span class="se">\</span>
+        --max-contexts <span class="m">4</span> <span class="se">\</span>
+        --max-states <span class="m">8</span> <span class="se">\</span>
+        --beam-size <span class="m">4</span>
+    <span class="k">done</span>
+  <span class="k">done</span>
+<span class="k">done</span>
+</pre></div>
+</div>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
+  <span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
+    <span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
+      ./lstm_transducer_stateless2/decode.py <span class="se">\</span>
+        --iter <span class="nv">$iter</span> <span class="se">\</span>
+        --avg <span class="nv">$avg</span> <span class="se">\</span>
+        --exp-dir lstm_transducer_stateless2/exp <span class="se">\</span>
+        --max-duration <span class="m">600</span> <span class="se">\</span>
+        --num-encoder-layers <span class="m">12</span> <span class="se">\</span>
+        --rnn-hidden-size <span class="m">1024</span> <span class="se">\</span>
+        --decoding-method <span class="nv">$m</span> <span class="se">\</span>
+        --use-averaged-model True <span class="se">\</span>
+        --beam <span class="m">4</span> <span class="se">\</span>
+        --max-contexts <span class="m">4</span> <span class="se">\</span>
+        --max-states <span class="m">8</span> <span class="se">\</span>
+        --beam-size <span class="m">4</span>
+    <span class="k">done</span>
+  <span class="k">done</span>
+<span class="k">done</span>
+</pre></div>
+</div>
+</section>
+<section id="export-models">
+<h2>Export models<a class="headerlink" href="#export-models" title="Permalink to this heading"></a></h2>
+<p><a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/export.py">lstm_transducer_stateless2/export.py</a> supports to export checkpoints from <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/exp</span></code> in the following ways.</p>
+<section id="export-model-state-dict">
+<h3>Export <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code><a class="headerlink" href="#export-model-state-dict" title="Permalink to this heading"></a></h3>
+<p>Checkpoints saved by <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/train.py</span></code> also include
+<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
+we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
+command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Assume that --iter 468000 --avg 16 produces the smallest WER</span>
+<span class="c1"># (You can get such information after running ./lstm_transducer_stateless2/decode.py)</span>
+
+<span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
+<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
+
+./lstm_transducer_stateless2/export.py <span class="se">\</span>
+  --exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
+  --bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
+  --iter <span class="nv">$iter</span> <span class="se">\</span>
+  --avg  <span class="nv">$avg</span>
+</pre></div>
+</div>
+<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/pretrained.pt</span></code>.</p>
+<div class="admonition hint">
+<p class="admonition-title">Hint</p>
+<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/decode.py</span></code>,
+you can run:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> lstm_transducer_stateless2/exp
+ln -s pretrained epoch-9999.pt
+</pre></div>
+</div>
+<p>And then pass <cite>–epoch 9999 –avg 1 –use-averaged-model 0</cite> to
+<code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/decode.py</span></code>.</p>
+</div>
+<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/pretrained.py</span></code>, you
+can run:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/pretrained.py <span class="se">\</span>
+  --checkpoint ./lstm_transducer_stateless2/exp/pretrained.pt <span class="se">\</span>
+  --bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
+  --method greedy_search <span class="se">\</span>
+  /path/to/foo.wav <span class="se">\</span>
+  /path/to/bar.wav
+</pre></div>
+</div>
+</section>
+<section id="export-model-using-torch-jit-trace">
+<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.trace()</span></code><a class="headerlink" href="#export-model-using-torch-jit-trace" title="Permalink to this heading"></a></h3>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
+<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
+
+./lstm_transducer_stateless2/export.py <span class="se">\</span>
+  --exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
+  --bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
+  --iter <span class="nv">$iter</span> <span class="se">\</span>
+  --avg  <span class="nv">$avg</span> <span class="se">\</span>
+  --jit-trace <span class="m">1</span>
+</pre></div>
+</div>
+<p>It will generate 3 files:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/encoder_jit_trace.pt</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/decoder_jit_trace.pt</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/joiner_jit_trace.pt</span></code></p></li>
+</ul>
+</div></blockquote>
+<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/jit_pretrained</span></code>:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/jit_pretrained.py <span class="se">\</span>
+  --bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
+  --encoder-model-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace.pt <span class="se">\</span>
+  --decoder-model-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace.pt <span class="se">\</span>
+  --joiner-model-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace.pt <span class="se">\</span>
+  /path/to/foo.wav <span class="se">\</span>
+  /path/to/bar.wav
+</pre></div>
+</div>
+</section>
+<section id="export-model-for-ncnn">
+<h3>Export model for ncnn<a class="headerlink" href="#export-model-for-ncnn" title="Permalink to this heading"></a></h3>
+<p>We support exporting pretrained LSTM transducer models to
+<a class="reference external" href="https://github.com/tencent/ncnn">ncnn</a> using
+<a class="reference external" href="https://github.com/Tencent/ncnn/tree/master/tools/pnnx">pnnx</a>.</p>
+<p>First, let us install a modified version of <code class="docutils literal notranslate"><span class="pre">ncnn</span></code>:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git clone https://github.com/csukuangfj/ncnn
+<span class="nb">cd</span> ncnn
+git submodule update --recursive --init
+python3 setup.py bdist_wheel
+ls -lh dist/
+pip install ./dist/*.whl
+
+<span class="c1"># now build pnnx</span>
+<span class="nb">cd</span> tools/pnnx
+mkdir build
+<span class="nb">cd</span> build
+make -j4
+<span class="nb">export</span> <span class="nv">PATH</span><span class="o">=</span><span class="nv">$PWD</span>/src:<span class="nv">$PATH</span>
+
+./src/pnnx
+</pre></div>
+</div>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>We assume that you have added the path to the binary <code class="docutils literal notranslate"><span class="pre">pnnx</span></code> to the
+environment variable <code class="docutils literal notranslate"><span class="pre">PATH</span></code>.</p>
+</div>
+<p>Second, let us export the model using <code class="docutils literal notranslate"><span class="pre">torch.jit.trace()</span></code> that is suitable
+for <code class="docutils literal notranslate"><span class="pre">pnnx</span></code>:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">iter</span><span class="o">=</span><span class="m">468000</span>
+<span class="nv">avg</span><span class="o">=</span><span class="m">16</span>
+
+./lstm_transducer_stateless2/export.py <span class="se">\</span>
+  --exp-dir ./lstm_transducer_stateless2/exp <span class="se">\</span>
+  --bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
+  --iter <span class="nv">$iter</span> <span class="se">\</span>
+  --avg  <span class="nv">$avg</span> <span class="se">\</span>
+  --pnnx <span class="m">1</span>
+</pre></div>
+</div>
+<p>It will generate 3 files:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.pt</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.pt</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.pt</span></code></p></li>
+</ul>
+</div></blockquote>
+<p>Third, convert torchscript model to <code class="docutils literal notranslate"><span class="pre">ncnn</span></code> format:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pnnx</span> <span class="o">./</span><span class="n">lstm_transducer_stateless2</span><span class="o">/</span><span class="n">exp</span><span class="o">/</span><span class="n">encoder_jit_trace</span><span class="o">-</span><span class="n">pnnx</span><span class="o">.</span><span class="n">pt</span>
+<span class="n">pnnx</span> <span class="o">./</span><span class="n">lstm_transducer_stateless2</span><span class="o">/</span><span class="n">exp</span><span class="o">/</span><span class="n">decoder_jit_trace</span><span class="o">-</span><span class="n">pnnx</span><span class="o">.</span><span class="n">pt</span>
+<span class="n">pnnx</span> <span class="o">./</span><span class="n">lstm_transducer_stateless2</span><span class="o">/</span><span class="n">exp</span><span class="o">/</span><span class="n">joiner_jit_trace</span><span class="o">-</span><span class="n">pnnx</span><span class="o">.</span><span class="n">pt</span>
+</pre></div>
+</div>
+<p>It will generate the following files:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin</span></code></p></li>
+</ul>
+</div></blockquote>
+<p>To use the above generate files, run:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>
+</pre></div>
+</div>
+<dl class="simple">
+<dt>./lstm_transducer_stateless2/ncnn-decode.py </dt><dd><p>–bpe-model-filename ./data/lang_bpe_500/bpe.model –encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param –encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin –decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param –decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin –joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param –joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin /path/to/foo.wav</p>
+</dd>
+</dl>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>
+</pre></div>
+</div>
+<dl class="simple">
+<dt>./lstm_transducer_stateless2/streaming-ncnn-decode.py </dt><dd><p>–bpe-model-filename ./data/lang_bpe_500/bpe.model –encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param –encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin –decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param –decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin –joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param –joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin /path/to/foo.wav</p>
+</dd>
+</dl>
+<p>To use the above generated files in C++, please see
+<a class="reference external" href="https://github.com/k2-fsa/sherpa-ncnn">https://github.com/k2-fsa/sherpa-ncnn</a></p>
+<p>It is able to generate a static linked library that can be run on Linux, Windows,
+macOS, Raspberry Pi, etc.</p>
+</section>
+</section>
+<section id="download-pretrained-models">
+<h2>Download pretrained models<a class="headerlink" href="#download-pretrained-models" title="Permalink to this heading"></a></h2>
+<p>If you don’t want to train from scratch, you can download the pretrained models
+by visiting the following links:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><a class="reference external" href="https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03">https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03</a></p></li>
+<li><p><a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-lstm-transducer-stateless-2022-08-18">https://huggingface.co/Zengwei/icefall-asr-librispeech-lstm-transducer-stateless-2022-08-18</a></p></li>
+</ul>
+<p>See <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md">https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md</a>
+for the details of the above pretrained models</p>
+</div></blockquote>
+<p>You can find more usages of the pretrained models in
+<a class="reference external" href="https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html">https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/index.html</a></p>
+</section>
+</section>
+
+
+           </div>
+          </div>
+          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
+        <a href="conformer_ctc.html" class="btn btn-neutral float-left" title="Conformer CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
+    </div>
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>&#169; Copyright 2021, icefall development team.</p>
+  </div>
+
+  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
+    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
+    provided by <a href="https://readthedocs.org">Read the Docs</a>.
+   
+
+</footer>
+        </div>
+      </div>
+    </section>
+  </div>
+  <script>
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script> 
+
+</body>
+</html>
--- a/recipes/librispeech/tdnn_lstm_ctc.html
+++ b/recipes/librispeech/tdnn_lstm_ctc.html
@ -53,6 +53,7 @@
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
+<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
 </ul>
 </li>
 <li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
--- a/recipes/timit/index.html
+++ b/recipes/timit/index.html
@ -20,7 +20,7 @@
    <link rel="index" title="Index" href="../../genindex.html" />
    <link rel="search" title="Search" href="../../search.html" />
    <link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
-    <link rel="prev" title="Conformer CTC" href="../librispeech/conformer_ctc.html" /> 
+    <link rel="prev" title="Transducer" href="../librispeech/lstm_pruned_stateless_transducer.html" /> 
 </head>

 <body class="wy-body-for-nav"> 
@ -95,7 +95,7 @@
           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
-        <a href="../librispeech/conformer_ctc.html" class="btn btn-neutral float-left" title="Conformer CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
+        <a href="../librispeech/lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-left" title="Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

--- a/recipes/yesno/tdnn.html
+++ b/recipes/yesno/tdnn.html
@ -281,7 +281,7 @@ the following screenshot:</p>
 <div><figure class="align-center" id="id1">
 <a class="reference external image-reference" href="https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/"><img alt="TensorBoard screenshot" src="../../_images/tdnn-tensorboard-log.png" style="width: 600px;" /></a>
 <figcaption>
-<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
+<p><span class="caption-number">Fig. 6 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
 </figcaption>
 </figure>
 </div></blockquote>
--- a/searchindex.js
+++ b/searchindex.js