Update doc about exporting LSTM models to ncnn (#914)

2025-12-11 06:55:27 +00:00 · 2023-02-17 12:50:13 +08:00 · 2023-02-17 12:50:13 +08:00 · 52d7cdd1a6
commit 52d7cdd1a6
parent c01175679e
9 changed files with 1484 additions and 885 deletions
--- a/docs/source/model-export/code/export-lstm-transducer-for-ncnn-output.txt
+++ b/docs/source/model-export/code/export-lstm-transducer-for-ncnn-output.txt
@ -0,0 +1,18 @@
 2023-02-17 11:22:42,862 INFO [export-for-ncnn.py:222] device: cpu
 2023-02-17 11:22:42,865 INFO [export-for-ncnn.py:231] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'dim_feedforward': 2048, 'decoder_dim': 512, 'joiner_dim': 512, 'is_pnnx': False, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '62e404dd3f3a811d73e424199b3408e309c06e1a', 'k2-git-date': 'Mon Jan 30 10:26:16 2023', 'lhotse-version': '1.12.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'master', 'icefall-git-sha1': '6d7a559-dirty', 'icefall-git-date': 'Thu Feb 16 19:47:54 2023', 'icefall-path': '/star-fj/fangjun/open-source/icefall-2', 'k2-path': '/star-fj/fangjun/open-source/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-fj/fangjun/open-source/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-3-1220120619-7695ff496b-s9n4w', 'IP address': '10.177.6.147'}, 'epoch': 99, 'iter': 0, 'avg': 1, 'exp_dir': PosixPath('icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp'), 'bpe_model': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/bpe.model', 'context_size': 2, 'use_averaged_model': False, 'num_encoder_layers': 12, 'encoder_dim': 512, 'rnn_hidden_size': 1024, 'aux_layer_period': 0, 'blank_id': 0, 'vocab_size': 500}
 2023-02-17 11:22:42,865 INFO [export-for-ncnn.py:235] About to create model
 2023-02-17 11:22:43,239 INFO [train.py:472] Disable giga
 2023-02-17 11:22:43,249 INFO [checkpoint.py:112] Loading checkpoint from icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/epoch-99.pt
 2023-02-17 11:22:44,595 INFO [export-for-ncnn.py:324] encoder parameters: 83137520
 2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:325] decoder parameters: 257024
 2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:326] joiner parameters: 781812
 2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:327] total parameters: 84176356
 2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:329] Using torch.jit.trace()
 2023-02-17 11:22:44,596 INFO [export-for-ncnn.py:331] Exporting encoder
 2023-02-17 11:22:48,182 INFO [export-for-ncnn.py:158] Saved to icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.pt
 2023-02-17 11:22:48,183 INFO [export-for-ncnn.py:335] Exporting decoder
 /star-fj/fangjun/open-source/icefall-2/egs/librispeech/ASR/lstm_transducer_stateless2/decoder.py:101: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  need_pad = bool(need_pad)
 2023-02-17 11:22:48,259 INFO [export-for-ncnn.py:180] Saved to icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.pt
 2023-02-17 11:22:48,259 INFO [export-for-ncnn.py:339] Exporting joiner
 2023-02-17 11:22:48,304 INFO [export-for-ncnn.py:207] Saved to icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.pt
--- a/docs/source/model-export/code/generate-int-8-scale-table-for-lstm.txt
+++ b/docs/source/model-export/code/generate-int-8-scale-table-for-lstm.txt
@ -0,0 +1,44 @@
 Don't Use GPU. has_gpu: 0, config.use_vulkan_compute: 1
 num encoder conv layers: 28
 num joiner conv layers: 3
 num files: 3
 Processing ../test_wavs/1089-134686-0001.wav
 Processing ../test_wavs/1221-135766-0001.wav
 Processing ../test_wavs/1221-135766-0002.wav
 Processing ../test_wavs/1089-134686-0001.wav
 Processing ../test_wavs/1221-135766-0001.wav
 Processing ../test_wavs/1221-135766-0002.wav
 ----------encoder----------
 conv_15                                  : max = 15.942385        threshold = 15.930708        scale = 7.972025
 conv_16                                  : max = 44.978855        threshold = 17.031788        scale = 7.456645
 conv_17                                  : max = 17.868437        threshold = 7.830528         scale = 16.218575
 linear_18                                : max = 3.107259         threshold = 1.194808         scale = 106.293236
 linear_19                                : max = 6.193777         threshold = 4.634748         scale = 27.401705
 linear_20                                : max = 9.259933         threshold = 2.606617         scale = 48.722160
 linear_21                                : max = 5.186600         threshold = 4.790260         scale = 26.512129
 linear_22                                : max = 9.759041         threshold = 2.265832         scale = 56.050053
 linear_23                                : max = 3.931209         threshold = 3.099090         scale = 40.979767
 linear_24                                : max = 10.324160        threshold = 2.215561         scale = 57.321835
 linear_25                                : max = 3.800708         threshold = 3.599352         scale = 35.284134
 linear_26                                : max = 10.492444        threshold = 3.153369         scale = 40.274391
 linear_27                                : max = 3.660161         threshold = 2.720994         scale = 46.674126
 linear_28                                : max = 9.415265         threshold = 3.174434         scale = 40.007133
 linear_29                                : max = 4.038418         threshold = 3.118534         scale = 40.724262
 linear_30                                : max = 10.072084        threshold = 3.936867         scale = 32.259155
 linear_31                                : max = 4.342712         threshold = 3.599489         scale = 35.282787
 linear_32                                : max = 11.340535        threshold = 3.120308         scale = 40.701103
 linear_33                                : max = 3.846987         threshold = 3.630030         scale = 34.985939
 linear_34                                : max = 10.686298        threshold = 2.204571         scale = 57.607586
 linear_35                                : max = 4.904821         threshold = 4.575518         scale = 27.756420
 linear_36                                : max = 11.806659        threshold = 2.585589         scale = 49.118401
 linear_37                                : max = 6.402340         threshold = 5.047157         scale = 25.162680
 linear_38                                : max = 11.174589        threshold = 1.923361         scale = 66.030258
 linear_39                                : max = 16.178576        threshold = 7.556058         scale = 16.807705
 linear_40                                : max = 12.901954        threshold = 5.301267         scale = 23.956539
 linear_41                                : max = 14.839805        threshold = 7.597429         scale = 16.716181
 linear_42                                : max = 10.178945        threshold = 2.651595         scale = 47.895699
 ----------joiner----------
 linear_2                                 : max = 24.829245        threshold = 16.627592        scale = 7.637907
 linear_1                                 : max = 10.746186        threshold = 5.255032         scale = 24.167313
 linear_3                                 : max = 1.000000         threshold = 0.999756         scale = 127.031013
 ncnn int8 calibration table create success, best wish for your int8 inference has a low accuracy loss...\(^0^)/...233...
--- a/docs/source/model-export/code/test-streaming-ncnn-decode-conv-emformer-transducer-libri.txt
+++ b/docs/source/model-export/code/test-streaming-ncnn-decode-conv-emformer-transducer-libri.txt
--- a/docs/source/model-export/code/test-streaming-ncnn-decode-lstm-transducer-libri.txt
+++ b/docs/source/model-export/code/test-streaming-ncnn-decode-lstm-transducer-libri.txt
@ -0,0 +1,6 @@
 2023-02-17 11:37:30,861 INFO [streaming-ncnn-decode.py:255] {'tokens': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/tokens.txt', 'encoder_param_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param', 'encoder_bin_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin', 'decoder_param_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param', 'decoder_bin_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin', 'joiner_param_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param', 'joiner_bin_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin', 'sound_filename': './icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav'}
 2023-02-17 11:37:31,425 INFO [streaming-ncnn-decode.py:263] Constructing Fbank computer
 2023-02-17 11:37:31,427 INFO [streaming-ncnn-decode.py:266] Reading sound files: ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav
 2023-02-17 11:37:31,431 INFO [streaming-ncnn-decode.py:271] torch.Size([106000])
 2023-02-17 11:37:34,115 INFO [streaming-ncnn-decode.py:342] ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav
 2023-02-17 11:37:34,115 INFO [streaming-ncnn-decode.py:343] AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
--- a/docs/source/model-export/export-ncnn-conv-emformer.rst
+++ b/docs/source/model-export/export-ncnn-conv-emformer.rst
@ -0,0 +1,749 @@
 .. _export_conv_emformer_transducer_models_to_ncnn:
 Export ConvEmformer transducer models to ncnn
 =============================================
 We use the pre-trained model from the following repository as an example:
  - `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_
 We will show you step by step how to export it to `ncnn`_ and run it with `sherpa-ncnn`_.
 .. hint::
  We use ``Ubuntu 18.04``, ``torch 1.13``, and ``Python 3.8`` for testing.
 .. caution::
  Please use a more recent version of PyTorch. For instance, ``torch 1.8``
  may ``not`` work.
 1. Download the pre-trained model
 ---------------------------------
 .. hint::
  You can also refer to `<https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html#icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_ to download the pre-trained model.
  You have to install `git-lfs`_ before you continue.
 .. code-block:: bash
  cd egs/librispeech/ASR
  GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
  git lfs pull --include "exp/pretrained-epoch-30-avg-10-averaged.pt"
  git lfs pull --include "data/lang_bpe_500/bpe.model"
  cd ..
 .. note::
  We downloaded ``exp/pretrained-xxx.pt``, not ``exp/cpu-jit_xxx.pt``.
 In the above code, we downloaded the pre-trained model into the directory
 ``egs/librispeech/ASR/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05``.
 .. _export_for_ncnn_install_ncnn_and_pnnx:
 2. Install ncnn and pnnx
 ------------------------
 .. code-block:: bash
  # We put ncnn into $HOME/open-source/ncnn
  # You can change it to anywhere you like
  cd $HOME
  mkdir -p open-source
  cd open-source
  git clone https://github.com/csukuangfj/ncnn
  cd ncnn
  git submodule update --recursive --init
  # Note: We don't use "python setup.py install" or "pip install ." here
  mkdir -p build-wheel
  cd build-wheel
  cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DNCNN_PYTHON=ON \
    -DNCNN_BUILD_BENCHMARK=OFF \
    -DNCNN_BUILD_EXAMPLES=OFF \
    -DNCNN_BUILD_TOOLS=ON \
  ..
  make -j4
  cd ..
  # Note: $PWD here is $HOME/open-source/ncnn
  export PYTHONPATH=$PWD/python:$PYTHONPATH
  export PATH=$PWD/tools/pnnx/build/src:$PATH
  export PATH=$PWD/build-wheel/tools/quantize:$PATH
  # Now build pnnx
  cd tools/pnnx
  mkdir build
  cd build
  cmake ..
  make -j4
  ./src/pnnx
 Congratulations! You have successfully installed the following components:
  - ``pnxx``, which is an executable located in
    ``$HOME/open-source/ncnn/tools/pnnx/build/src``. We will use
    it to convert models exported by ``torch.jit.trace()``.
  - ``ncnn2int8``, which is an executable located in
    ``$HOME/open-source/ncnn/build-wheel/tools/quantize``. We will use
    it to quantize our models to ``int8``.
  - ``ncnn.cpython-38-x86_64-linux-gnu.so``, which is a Python module located
    in ``$HOME/open-source/ncnn/python/ncnn``.
    .. note::
      I am using ``Python 3.8``, so it
      is ``ncnn.cpython-38-x86_64-linux-gnu.so``. If you use a different
      version, say, ``Python 3.9``, the name would be
      ``ncnn.cpython-39-x86_64-linux-gnu.so``.
      Also, if you are not using Linux, the file name would also be different.
      But that does not matter. As long as you can compile it, it should work.
 We have set up ``PYTHONPATH`` so that you can use ``import ncnn`` in your
 Python code. We have also set up ``PATH`` so that you can use
 ``pnnx`` and ``ncnn2int8`` later in your terminal.
 .. caution::
  Please don't use `<https://github.com/tencent/ncnn>`_.
  We have made some modifications to the offical `ncnn`_.
  We will synchronize `<https://github.com/csukuangfj/ncnn>`_ periodically
  with the official one.
 3. Export the model via torch.jit.trace()
 -----------------------------------------
 First, let us rename our pre-trained model:
 .. code-block::
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp
  ln -s pretrained-epoch-30-avg-10-averaged.pt epoch-30.pt
  cd ../..
 Next, we use the following code to export our model:
 .. code-block:: bash
  dir=./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/
  ./conv_emformer_transducer_stateless2/export-for-ncnn.py \
    --exp-dir $dir/exp \
    --bpe-model $dir/data/lang_bpe_500/bpe.model \
    --epoch 30 \
    --avg 1 \
    --use-averaged-model 0 \
    \
    --num-encoder-layers 12 \
    --chunk-length 32 \
    --cnn-module-kernel 31 \
    --left-context-length 32 \
    --right-context-length 8 \
    --memory-size 32 \
    --encoder-dim 512
 .. hint::
  We have renamed our model to ``epoch-30.pt`` so that we can use ``--epoch 30``.
  There is only one pre-trained model, so we use ``--avg 1 --use-averaged-model 0``.
  If you have trained a model by yourself and if you have all checkpoints
  available, please first use ``decode.py`` to tune ``--epoch --avg``
  and select the best combination with with ``--use-averaged-model 1``.
 .. note::
  You will see the following log output:
  .. literalinclude:: ./code/export-conv-emformer-transducer-for-ncnn-output.txt
  The log shows the model has ``75490012`` parameters, i.e., ``~75 M``.
  .. code-block::
    ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
    -rw-r--r-- 1 kuangfangjun root 289M Jan 11 12:05 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
  You can see that the file size of the pre-trained model is ``289 MB``, which
  is roughly equal to ``75490012*4/1024/1024 = 287.97 MB``.
 After running ``conv_emformer_transducer_stateless2/export-for-ncnn.py``,
 we will get the following files:
 .. code-block:: bash
  ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*pnnx*
  -rw-r--r-- 1 kuangfangjun root 1010K Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.pt
  -rw-r--r-- 1 kuangfangjun root  283M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.pt
  -rw-r--r-- 1 kuangfangjun root  3.0M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.pt
 .. _conv-emformer-step-4-export-torchscript-model-via-pnnx:
 4. Export torchscript model via pnnx
 ------------------------------------
 .. hint::
  Make sure you have set up the ``PATH`` environment variable. Otherwise,
  it will throw an error saying that ``pnnx`` could not be found.
 Now, it's time to export our models to `ncnn`_ via ``pnnx``.
 .. code-block::
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  pnnx ./encoder_jit_trace-pnnx.pt
  pnnx ./decoder_jit_trace-pnnx.pt
  pnnx ./joiner_jit_trace-pnnx.pt
 It will generate the following files:
 .. code-block:: bash
  ls -lh  icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*ncnn*{bin,param}
  -rw-r--r-- 1 kuangfangjun root 503K Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  437 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 142M Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  79K Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 1.5M Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  488 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param
 There are two types of files:
 - ``param``: It is a text file containing the model architectures. You can
  use a text editor to view its content.
 - ``bin``: It is a binary file containing the model parameters.
 We compare the file sizes of the models below before and after converting via ``pnnx``:
 .. see https://tableconvert.com/restructuredtext-generator
 +----------------------------------+------------+
 | File name                        | File size  |
 +==================================+============+
 | encoder_jit_trace-pnnx.pt        | 283 MB     |
 +----------------------------------+------------+
 | decoder_jit_trace-pnnx.pt        | 1010 KB    |
 +----------------------------------+------------+
 | joiner_jit_trace-pnnx.pt         | 3.0 MB     |
 +----------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin  | 142 MB     |
 +----------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin  | 503 KB     |
 +----------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin   | 1.5 MB     |
 +----------------------------------+------------+
 You can see that the file sizes of the models after conversion are about one half
 of the models before conversion:
  - encoder: 283 MB vs 142 MB
  - decoder: 1010 KB vs 503 KB
  - joiner: 3.0 MB vs 1.5 MB
 The reason is that by default ``pnnx`` converts ``float32`` parameters
 to ``float16``. A ``float32`` parameter occupies 4 bytes, while it is 2 bytes
 for ``float16``. Thus, it is ``twice smaller`` after conversion.
 .. hint::
  If you use ``pnnx ./encoder_jit_trace-pnnx.pt fp16=0``, then ``pnnx``
  won't convert ``float32`` to ``float16``.
 5. Test the exported models in icefall
 --------------------------------------
 .. note::
  We assume you have set up the environment variable ``PYTHONPATH`` when
  building `ncnn`_.
 Now we have successfully converted our pre-trained model to `ncnn`_ format.
 The generated 6 files are what we need. You can use the following code to
 test the converted models:
 .. code-block:: bash
  ./conv_emformer_transducer_stateless2/streaming-ncnn-decode.py \
    --tokens ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/data/lang_bpe_500/tokens.txt \
    --encoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param \
    --encoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin \
    --decoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param \
    --decoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin \
    --joiner-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param \
    --joiner-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin \
    ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav
 .. hint::
  `ncnn`_ supports only ``batch size == 1``, so ``streaming-ncnn-decode.py`` accepts
  only 1 wave file as input.
 The output is given below:
 .. literalinclude:: ./code/test-streaming-ncnn-decode-conv-emformer-transducer-libri.txt
 Congratulations! You have successfully exported a model from PyTorch to `ncnn`_!
 .. _conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn:
 6. Modify the exported encoder for sherpa-ncnn
 ----------------------------------------------
 In order to use the exported models in `sherpa-ncnn`_, we have to modify
 ``encoder_jit_trace-pnnx.ncnn.param``.
 Let us have a look at the first few lines of ``encoder_jit_trace-pnnx.ncnn.param``:
 .. code-block::
  7767517
  1060 1342
  Input                    in0                      0 1 in0
 **Explanation** of the above three lines:
  1. ``7767517``, it is a magic number and should not be changed.
  2. ``1060 1342``, the first number ``1060`` specifies the number of layers
     in this file, while ``1342`` specifies the number of intermediate outputs
     of this file
  3. ``Input in0 0 1 in0``, ``Input`` is the layer type of this layer; ``in0``
     is the layer name of this layer; ``0`` means this layer has no input;
     ``1`` means this layer has one output; ``in0`` is the output name of
     this layer.
 We need to add 1 extra line and also increment the number of layers.
 The result looks like below:
 .. code-block:: bash
  7767517
  1061 1342
  SherpaMetaData           sherpa_meta_data1        0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
  Input                    in0                      0 1 in0
 **Explanation**
  1. ``7767517``, it is still the same
  2. ``1061 1342``, we have added an extra layer, so we need to update ``1060`` to ``1061``.
     We don't need to change ``1342`` since the newly added layer has no inputs or outputs.
  3. ``SherpaMetaData  sherpa_meta_data1  0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512``
     This line is newly added. Its explanation is given below:
      - ``SherpaMetaData`` is the type of this layer. Must be ``SherpaMetaData``.
      - ``sherpa_meta_data1`` is the name of this layer. Must be ``sherpa_meta_data1``.
      - ``0 0`` means this layer has no inputs or output. Must be ``0 0``
      - ``0=1``, 0 is the key and 1 is the value. MUST be ``0=1``
      - ``1=12``, 1 is the key and 12 is the value of the
        parameter ``--num-encoder-layers`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``2=32``, 2 is the key and 32 is the value of the
        parameter ``--memory-size`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``3=31``, 3 is the key and 31 is the value of the
        parameter ``--cnn-module-kernel`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``4=8``, 4 is the key and 8 is the value of the
        parameter ``--left-context-length`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``5=32``, 5 is the key and 32 is the value of the
        parameter ``--chunk-length`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``6=8``, 6 is the key and 8 is the value of the
        parameter ``--right-context-length`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``7=512``, 7 is the key and 512 is the value of the
        parameter ``--encoder-dim`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      For ease of reference, we list the key-value pairs that you need to add
      in the following table. If your model has a different setting, please
      change the values for ``SherpaMetaData`` accordingly. Otherwise, you
      will be ``SAD``.
          +------+-----------------------------+
          | key  | value                       |
          +======+=============================+
          | 0    | 1 (fixed)                   |
          +------+-----------------------------+
          | 1    | ``--num-encoder-layers``    |
          +------+-----------------------------+
          | 2    | ``--memory-size``           |
          +------+-----------------------------+
          | 3    | ``--cnn-module-kernel``     |
          +------+-----------------------------+
          | 4    | ``--left-context-length``   |
          +------+-----------------------------+
          | 5    | ``--chunk-length``          |
          +------+-----------------------------+
          | 6    | ``--right-context-length``  |
          +------+-----------------------------+
          | 7    | ``--encoder-dim``           |
          +------+-----------------------------+
  4. ``Input in0 0 1 in0``. No need to change it.
 .. caution::
  When you add a new layer ``SherpaMetaData``, please remember to update the
  number of layers. In our case, update  ``1060`` to ``1061``. Otherwise,
  you will be SAD later.
 .. hint::
  After adding the new layer ``SherpaMetaData``, you cannot use this model
  with ``streaming-ncnn-decode.py`` anymore since ``SherpaMetaData`` is
  supported only in `sherpa-ncnn`_.
 .. hint::
  `ncnn`_ is very flexible. You can add new layers to it just by text-editing
  the ``param`` file! You don't need to change the ``bin`` file.
 Now you can use this model in `sherpa-ncnn`_.
 Please refer to the following documentation:
  - Linux/macOS/Windows/arm/aarch64: `<https://k2-fsa.github.io/sherpa/ncnn/install/index.html>`_
  - ``Android``: `<https://k2-fsa.github.io/sherpa/ncnn/android/index.html>`_
  - ``iOS``: `<https://k2-fsa.github.io/sherpa/ncnn/ios/index.html>`_
  - Python: `<https://k2-fsa.github.io/sherpa/ncnn/python/index.html>`_
 We have a list of pre-trained models that have been exported for `sherpa-ncnn`_:
  - `<https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html>`_
    You can find more usages there.
 7. (Optional) int8 quantization with sherpa-ncnn
 ------------------------------------------------
 This step is optional.
 In this step, we describe how to quantize our model with ``int8``.
 Change :ref:`conv-emformer-step-4-export-torchscript-model-via-pnnx` to
 disable ``fp16`` when using ``pnnx``:
 .. code-block::
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  pnnx ./encoder_jit_trace-pnnx.pt fp16=0
  pnnx ./decoder_jit_trace-pnnx.pt
  pnnx ./joiner_jit_trace-pnnx.pt fp16=0
 .. note::
  We add ``fp16=0`` when exporting the encoder and joiner. `ncnn`_ does not
  support quantizing the decoder model yet. We will update this documentation
  once `ncnn`_ supports it. (Maybe in this year, 2023).
 It will generate the following files
 .. code-block:: bash
  ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*_jit_trace-pnnx.ncnn.{param,bin}
  -rw-r--r-- 1 kuangfangjun root 503K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  437 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 283M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  79K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 3.0M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  488 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param
 Let us compare again the file sizes:
 +----------------------------------------+------------+
 | File name                              | File size  |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.pt              | 283 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.pt              | 1010 KB    |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.pt               | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 142 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp16) | 1.5 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 283 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp32) | 3.0 MB     |
 +----------------------------------------+------------+
 You can see that the file sizes are doubled when we disable ``fp16``.
 .. note::
  You can again use ``streaming-ncnn-decode.py`` to test the exported models.
 Next, follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
 to modify ``encoder_jit_trace-pnnx.ncnn.param``.
 Change
 .. code-block:: bash
  7767517
  1060 1342
  Input                    in0                      0 1 in0
 to
 .. code-block:: bash
  7767517
  1061 1342
  SherpaMetaData           sherpa_meta_data1        0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
  Input                    in0                      0 1 in0
 .. caution::
  Please follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
  to change the values for ``SherpaMetaData`` if your model uses a different setting.
 Next, let us compile `sherpa-ncnn`_ since we will quantize our models within
 `sherpa-ncnn`_.
 .. code-block:: bash
  # We will download sherpa-ncnn to $HOME/open-source/
  # You can change it to anywhere you like.
  cd $HOME
  mkdir -p open-source
  cd open-source
  git clone https://github.com/k2-fsa/sherpa-ncnn
  cd sherpa-ncnn
  mkdir build
  cd build
  cmake ..
  make -j 4
  ./bin/generate-int8-scale-table
  export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH
 The output of the above commands are:
 .. code-block:: bash
  (py38) kuangfangjun:build$ generate-int8-scale-table
  Please provide 10 arg. Currently given: 1
  Usage:
  generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt
  Each line in wave_filenames.txt is a path to some 16k Hz mono wave file.
 We need to create a file ``wave_filenames.txt``, in which we need to put
 some calibration wave files. For testing purpose, we put the ``test_wavs``
 from the pre-trained model repository `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  cat <<EOF > wave_filenames.txt
  ../test_wavs/1089-134686-0001.wav
  ../test_wavs/1221-135766-0001.wav
  ../test_wavs/1221-135766-0002.wav
  EOF
 Now we can calculate the scales needed for quantization with the calibration data:
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  generate-int8-scale-table \
    ./encoder_jit_trace-pnnx.ncnn.param \
    ./encoder_jit_trace-pnnx.ncnn.bin \
    ./decoder_jit_trace-pnnx.ncnn.param \
    ./decoder_jit_trace-pnnx.ncnn.bin \
    ./joiner_jit_trace-pnnx.ncnn.param \
    ./joiner_jit_trace-pnnx.ncnn.bin \
    ./encoder-scale-table.txt \
    ./joiner-scale-table.txt \
    ./wave_filenames.txt
 The output logs are in the following:
 .. literalinclude:: ./code/generate-int-8-scale-table-for-conv-emformer.txt
 It generates the following two files:
 .. code-block:: bash
  $ ls -lh encoder-scale-table.txt joiner-scale-table.txt
  -rw-r--r-- 1 kuangfangjun root 955K Jan 11 17:28 encoder-scale-table.txt
  -rw-r--r-- 1 kuangfangjun root  18K Jan 11 17:28 joiner-scale-table.txt
 .. caution::
  Definitely, you need more calibration data to compute the scale table.
 Finally, let us use the scale table to quantize our models into ``int8``.
 .. code-block:: bash
  ncnn2int8
  usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table]
 First, we quantize the encoder model:
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  ncnn2int8 \
    ./encoder_jit_trace-pnnx.ncnn.param \
    ./encoder_jit_trace-pnnx.ncnn.bin \
    ./encoder_jit_trace-pnnx.ncnn.int8.param \
    ./encoder_jit_trace-pnnx.ncnn.int8.bin \
    ./encoder-scale-table.txt
 Next, we quantize the joiner model:
 .. code-block:: bash
  ncnn2int8 \
    ./joiner_jit_trace-pnnx.ncnn.param \
    ./joiner_jit_trace-pnnx.ncnn.bin \
    ./joiner_jit_trace-pnnx.ncnn.int8.param \
    ./joiner_jit_trace-pnnx.ncnn.int8.bin \
    ./joiner-scale-table.txt
 The above two commands generate the following 4 files:
 .. code-block:: bash
  -rw-r--r-- 1 kuangfangjun root  99M Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.bin
  -rw-r--r-- 1 kuangfangjun root  78K Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.param
  -rw-r--r-- 1 kuangfangjun root 774K Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.bin
  -rw-r--r-- 1 kuangfangjun root  496 Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.param
 Congratulations! You have successfully quantized your model from ``float32`` to ``int8``.
 .. caution::
  ``ncnn.int8.param`` and ``ncnn.int8.bin`` must be used in pairs.
  You can replace ``ncnn.param`` and ``ncnn.bin`` with ``ncnn.int8.param``
  and ``ncnn.int8.bin`` in `sherpa-ncnn`_ if you like.
  For instance, to use only the ``int8`` encoder in ``sherpa-ncnn``, you can
  replace the following invocation:
    .. code-block:: bash
      cd egs/librispeech/ASR
      cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
      sherpa-ncnn \
        ../data/lang_bpe_500/tokens.txt \
        ./encoder_jit_trace-pnnx.ncnn.param \
        ./encoder_jit_trace-pnnx.ncnn.bin \
        ./decoder_jit_trace-pnnx.ncnn.param \
        ./decoder_jit_trace-pnnx.ncnn.bin \
        ./joiner_jit_trace-pnnx.ncnn.param \
        ./joiner_jit_trace-pnnx.ncnn.bin \
        ../test_wavs/1089-134686-0001.wav
  with
    .. code-block::
      cd egs/librispeech/ASR
      cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
      sherpa-ncnn \
        ../data/lang_bpe_500/tokens.txt \
        ./encoder_jit_trace-pnnx.ncnn.int8.param \
        ./encoder_jit_trace-pnnx.ncnn.int8.bin \
        ./decoder_jit_trace-pnnx.ncnn.param \
        ./decoder_jit_trace-pnnx.ncnn.bin \
        ./joiner_jit_trace-pnnx.ncnn.param \
        ./joiner_jit_trace-pnnx.ncnn.bin \
        ../test_wavs/1089-134686-0001.wav
 The following table compares again the file sizes:
 +----------------------------------------+------------+
 | File name                              | File size  |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.pt              | 283 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.pt              | 1010 KB    |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.pt               | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 142 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp16) | 1.5 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 283 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp32) | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.int8.bin   | 99 MB      |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.int8.bin    | 774 KB     |
 +----------------------------------------+------------+
 You can see that the file sizes of the model after ``int8`` quantization
 are much smaller.
 .. hint::
    Currently, only linear layers and convolutional layers are quantized
    with ``int8``, so you don't see an exact ``4x`` reduction in file sizes.
 .. note::
  You need to test the recognition accuracy after ``int8`` quantization.
 You can find the speed comparison at `<https://github.com/k2-fsa/sherpa-ncnn/issues/44>`_.
 That's it! Have fun with `sherpa-ncnn`_!
--- a/docs/source/model-export/export-ncnn-lstm.rst
+++ b/docs/source/model-export/export-ncnn-lstm.rst
@ -0,0 +1,644 @@
 .. _export_lstm_transducer_models_to_ncnn:
 Export LSTM transducer models to ncnn
 -------------------------------------
 We use the pre-trained model from the following repository as an example:
 `<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>`_
 We will show you step by step how to export it to `ncnn`_ and run it with `sherpa-ncnn`_.
 .. hint::
  We use ``Ubuntu 18.04``, ``torch 1.13``, and ``Python 3.8`` for testing.
 .. caution::
  Please use a more recent version of PyTorch. For instance, ``torch 1.8``
  may ``not`` work.
 1. Download the pre-trained model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. hint::
  You have to install `git-lfs`_ before you continue.
 .. code-block:: bash
  cd egs/librispeech/ASR
  GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
  git lfs pull --include "exp/pretrained-iter-468000-avg-16.pt"
  git lfs pull --include "data/lang_bpe_500/bpe.model"
  cd ..
 .. note::
  We downloaded ``exp/pretrained-xxx.pt``, not ``exp/cpu-jit_xxx.pt``.
 In the above code, we downloaded the pre-trained model into the directory
 ``egs/librispeech/ASR/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03``.
 2. Install ncnn and pnnx
 ^^^^^^^^^^^^^^^^^^^^^^^^
 Please refer to :ref:`export_for_ncnn_install_ncnn_and_pnnx` .
 3. Export the model via torch.jit.trace()
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 First, let us rename our pre-trained model:
 .. code-block::
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp
  ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt
  cd ../..
 Next, we use the following code to export our model:
 .. code-block:: bash
  dir=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
  ./lstm_transducer_stateless2/export-for-ncnn.py \
    --exp-dir $dir/exp \
    --bpe-model $dir/data/lang_bpe_500/bpe.model \
    --epoch 99 \
    --avg 1 \
    --use-averaged-model 0 \
    --num-encoder-layers 12 \
    --encoder-dim 512 \
    --rnn-hidden-size 1024
 .. hint::
  We have renamed our model to ``epoch-99.pt`` so that we can use ``--epoch 99``.
  There is only one pre-trained model, so we use ``--avg 1 --use-averaged-model 0``.
  If you have trained a model by yourself and if you have all checkpoints
  available, please first use ``decode.py`` to tune ``--epoch --avg``
  and select the best combination with with ``--use-averaged-model 1``.
 .. note::
  You will see the following log output:
  .. literalinclude:: ./code/export-lstm-transducer-for-ncnn-output.txt
  The log shows the model has ``84176356`` parameters, i.e., ``~84 M``.
  .. code-block::
    ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt
    -rw-r--r-- 1 kuangfangjun root 324M Feb 17 10:34 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt
  You can see that the file size of the pre-trained model is ``324 MB``, which
  is roughly equal to ``84176356*4/1024/1024 = 321.107 MB``.
 After running ``lstm_transducer_stateless2/export-for-ncnn.py``,
 we will get the following files:
 .. code-block:: bash
  ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*pnnx.pt
  -rw-r--r-- 1 kuangfangjun root 1010K Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.pt
  -rw-r--r-- 1 kuangfangjun root  318M Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.pt
  -rw-r--r-- 1 kuangfangjun root  3.0M Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.pt
 .. _lstm-transducer-step-4-export-torchscript-model-via-pnnx:
 4. Export torchscript model via pnnx
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. hint::
  Make sure you have set up the ``PATH`` environment variable
  in :ref:`export_for_ncnn_install_ncnn_and_pnnx`. Otherwise,
  it will throw an error saying that ``pnnx`` could not be found.
 Now, it's time to export our models to `ncnn`_ via ``pnnx``.
 .. code-block::
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
  pnnx ./encoder_jit_trace-pnnx.pt
  pnnx ./decoder_jit_trace-pnnx.pt
  pnnx ./joiner_jit_trace-pnnx.pt
 It will generate the following files:
 .. code-block:: bash
  ls -lh  icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*ncnn*{bin,param}
  -rw-r--r-- 1 kuangfangjun root 503K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  437 Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 159M Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  21K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 1.5M Feb 17 11:33 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  488 Feb 17 11:33 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param
 There are two types of files:
 - ``param``: It is a text file containing the model architectures. You can
  use a text editor to view its content.
 - ``bin``: It is a binary file containing the model parameters.
 We compare the file sizes of the models below before and after converting via ``pnnx``:
 .. see https://tableconvert.com/restructuredtext-generator
 +----------------------------------+------------+
 | File name                        | File size  |
 +==================================+============+
 | encoder_jit_trace-pnnx.pt        | 318 MB     |
 +----------------------------------+------------+
 | decoder_jit_trace-pnnx.pt        | 1010 KB    |
 +----------------------------------+------------+
 | joiner_jit_trace-pnnx.pt         | 3.0 MB     |
 +----------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin  | 159 MB     |
 +----------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin  | 503 KB     |
 +----------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin   | 1.5 MB     |
 +----------------------------------+------------+
 You can see that the file sizes of the models after conversion are about one half
 of the models before conversion:
  - encoder: 318 MB vs 159 MB
  - decoder: 1010 KB vs 503 KB
  - joiner: 3.0 MB vs 1.5 MB
 The reason is that by default ``pnnx`` converts ``float32`` parameters
 to ``float16``. A ``float32`` parameter occupies 4 bytes, while it is 2 bytes
 for ``float16``. Thus, it is ``twice smaller`` after conversion.
 .. hint::
  If you use ``pnnx ./encoder_jit_trace-pnnx.pt fp16=0``, then ``pnnx``
  won't convert ``float32`` to ``float16``.
 5. Test the exported models in icefall
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. note::
  We assume you have set up the environment variable ``PYTHONPATH`` when
  building `ncnn`_.
 Now we have successfully converted our pre-trained model to `ncnn`_ format.
 The generated 6 files are what we need. You can use the following code to
 test the converted models:
 .. code-block:: bash
  python3 ./lstm_transducer_stateless2/streaming-ncnn-decode.py \
    --tokens ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/tokens.txt \
    --encoder-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param \
    --encoder-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin \
    --decoder-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param \
    --decoder-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin \
    --joiner-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param \
    --joiner-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin \
    ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav
 .. hint::
  `ncnn`_ supports only ``batch size == 1``, so ``streaming-ncnn-decode.py`` accepts
  only 1 wave file as input.
 The output is given below:
 .. literalinclude:: ./code/test-streaming-ncnn-decode-lstm-transducer-libri.txt
 Congratulations! You have successfully exported a model from PyTorch to `ncnn`_!
 .. _lstm-modify-the-exported-encoder-for-sherpa-ncnn:
 6. Modify the exported encoder for sherpa-ncnn
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 In order to use the exported models in `sherpa-ncnn`_, we have to modify
 ``encoder_jit_trace-pnnx.ncnn.param``.
 Let us have a look at the first few lines of ``encoder_jit_trace-pnnx.ncnn.param``:
 .. code-block::
  7767517
  267 379
  Input                    in0                      0 1 in0
 **Explanation** of the above three lines:
  1. ``7767517``, it is a magic number and should not be changed.
  2. ``267 379``, the first number ``267`` specifies the number of layers
     in this file, while ``379`` specifies the number of intermediate outputs
     of this file
  3. ``Input in0 0 1 in0``, ``Input`` is the layer type of this layer; ``in0``
     is the layer name of this layer; ``0`` means this layer has no input;
     ``1`` means this layer has one output; ``in0`` is the output name of
     this layer.
 We need to add 1 extra line and also increment the number of layers.
 The result looks like below:
 .. code-block:: bash
  7767517
  268 379
  SherpaMetaData           sherpa_meta_data1        0 0 0=3 1=12 2=512 3=1024
  Input                    in0                      0 1 in0
 **Explanation**
  1. ``7767517``, it is still the same
  2. ``268 379``, we have added an extra layer, so we need to update ``267`` to ``268``.
     We don't need to change ``379`` since the newly added layer has no inputs or outputs.
  3. ``SherpaMetaData  sherpa_meta_data1  0 0 0=3 1=12 2=512 3=1024``
     This line is newly added. Its explanation is given below:
      - ``SherpaMetaData`` is the type of this layer. Must be ``SherpaMetaData``.
      - ``sherpa_meta_data1`` is the name of this layer. Must be ``sherpa_meta_data1``.
      - ``0 0`` means this layer has no inputs or output. Must be ``0 0``
      - ``0=3``, 0 is the key and 3 is the value. MUST be ``0=3``
      - ``1=12``, 1 is the key and 12 is the value of the
        parameter ``--num-encoder-layers`` that you provided when running
        ``./lstm_transducer_stateless2/export-for-ncnn.py``.
      - ``2=512``, 2 is the key and 512 is the value of the
        parameter ``--encoder-dim`` that you provided when running
        ``./lstm_transducer_stateless2/export-for-ncnn.py``.
      - ``3=1024``, 3 is the key and 1024 is the value of the
        parameter ``--rnn-hidden-size`` that you provided when running
        ``./lstm_transducer_stateless2/export-for-ncnn.py``.
      For ease of reference, we list the key-value pairs that you need to add
      in the following table. If your model has a different setting, please
      change the values for ``SherpaMetaData`` accordingly. Otherwise, you
      will be ``SAD``.
          +------+-----------------------------+
          | key  | value                       |
          +======+=============================+
          | 0    | 3 (fixed)                   |
          +------+-----------------------------+
          | 1    | ``--num-encoder-layers``    |
          +------+-----------------------------+
          | 2    | ``--encoder-dim``           |
          +------+-----------------------------+
          | 3    | ``--rnn-hidden-size``       |
          +------+-----------------------------+
  4. ``Input in0 0 1 in0``. No need to change it.
 .. caution::
  When you add a new layer ``SherpaMetaData``, please remember to update the
  number of layers. In our case, update  ``267`` to ``268``. Otherwise,
  you will be SAD later.
 .. hint::
  After adding the new layer ``SherpaMetaData``, you cannot use this model
  with ``streaming-ncnn-decode.py`` anymore since ``SherpaMetaData`` is
  supported only in `sherpa-ncnn`_.
 .. hint::
  `ncnn`_ is very flexible. You can add new layers to it just by text-editing
  the ``param`` file! You don't need to change the ``bin`` file.
 Now you can use this model in `sherpa-ncnn`_.
 Please refer to the following documentation:
  - Linux/macOS/Windows/arm/aarch64: `<https://k2-fsa.github.io/sherpa/ncnn/install/index.html>`_
  - ``Android``: `<https://k2-fsa.github.io/sherpa/ncnn/android/index.html>`_
  - ``iOS``: `<https://k2-fsa.github.io/sherpa/ncnn/ios/index.html>`_
  - Python: `<https://k2-fsa.github.io/sherpa/ncnn/python/index.html>`_
 We have a list of pre-trained models that have been exported for `sherpa-ncnn`_:
  - `<https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html>`_
    You can find more usages there.
 7. (Optional) int8 quantization with sherpa-ncnn
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 This step is optional.
 In this step, we describe how to quantize our model with ``int8``.
 Change :ref:`lstm-transducer-step-4-export-torchscript-model-via-pnnx` to
 disable ``fp16`` when using ``pnnx``:
 .. code-block::
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
  pnnx ./encoder_jit_trace-pnnx.pt fp16=0
  pnnx ./decoder_jit_trace-pnnx.pt
  pnnx ./joiner_jit_trace-pnnx.pt fp16=0
 .. note::
  We add ``fp16=0`` when exporting the encoder and joiner. `ncnn`_ does not
  support quantizing the decoder model yet. We will update this documentation
  once `ncnn`_ supports it. (Maybe in this year, 2023).
 .. code-block:: bash
  ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*_jit_trace-pnnx.ncnn.{param,bin}
  -rw-r--r-- 1 kuangfangjun root 503K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  437 Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 317M Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  21K Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 3.0M Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  488 Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param
 Let us compare again the file sizes:
 +----------------------------------------+------------+
 | File name                              | File size  |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.pt              | 318 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.pt              | 1010 KB    |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.pt               | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 159 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp16) | 1.5 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 317 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp32) | 3.0 MB     |
 +----------------------------------------+------------+
 You can see that the file sizes are doubled when we disable ``fp16``.
 .. note::
  You can again use ``streaming-ncnn-decode.py`` to test the exported models.
 Next, follow :ref:`lstm-modify-the-exported-encoder-for-sherpa-ncnn`
 to modify ``encoder_jit_trace-pnnx.ncnn.param``.
 Change
 .. code-block:: bash
  7767517
  267 379
  Input                    in0                      0 1 in0
 to
 .. code-block:: bash
  7767517
  268 379
  SherpaMetaData           sherpa_meta_data1        0 0 0=3 1=12 2=512 3=1024
  Input                    in0                      0 1 in0
 .. caution::
  Please follow :ref:`lstm-modify-the-exported-encoder-for-sherpa-ncnn`
  to change the values for ``SherpaMetaData`` if your model uses a different setting.
 Next, let us compile `sherpa-ncnn`_ since we will quantize our models within
 `sherpa-ncnn`_.
 .. code-block:: bash
  # We will download sherpa-ncnn to $HOME/open-source/
  # You can change it to anywhere you like.
  cd $HOME
  mkdir -p open-source
  cd open-source
  git clone https://github.com/k2-fsa/sherpa-ncnn
  cd sherpa-ncnn
  mkdir build
  cd build
  cmake ..
  make -j 4
  ./bin/generate-int8-scale-table
  export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH
 The output of the above commands are:
 .. code-block:: bash
  (py38) kuangfangjun:build$ generate-int8-scale-table
  Please provide 10 arg. Currently given: 1
  Usage:
  generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt
  Each line in wave_filenames.txt is a path to some 16k Hz mono wave file.
 We need to create a file ``wave_filenames.txt``, in which we need to put
 some calibration wave files. For testing purpose, we put the ``test_wavs``
 from the pre-trained model repository
 `<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>`_
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
  cat <<EOF > wave_filenames.txt
  ../test_wavs/1089-134686-0001.wav
  ../test_wavs/1221-135766-0001.wav
  ../test_wavs/1221-135766-0002.wav
  EOF
 Now we can calculate the scales needed for quantization with the calibration data:
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
  generate-int8-scale-table \
    ./encoder_jit_trace-pnnx.ncnn.param \
    ./encoder_jit_trace-pnnx.ncnn.bin \
    ./decoder_jit_trace-pnnx.ncnn.param \
    ./decoder_jit_trace-pnnx.ncnn.bin \
    ./joiner_jit_trace-pnnx.ncnn.param \
    ./joiner_jit_trace-pnnx.ncnn.bin \
    ./encoder-scale-table.txt \
    ./joiner-scale-table.txt \
    ./wave_filenames.txt
 The output logs are in the following:
 .. literalinclude:: ./code/generate-int-8-scale-table-for-lstm.txt
 It generates the following two files:
 .. code-block:: bash
  ls -lh encoder-scale-table.txt joiner-scale-table.txt
  -rw-r--r-- 1 kuangfangjun root 345K Feb 17 12:13 encoder-scale-table.txt
  -rw-r--r-- 1 kuangfangjun root  17K Feb 17 12:13 joiner-scale-table.txt
 .. caution::
  Definitely, you need more calibration data to compute the scale table.
 Finally, let us use the scale table to quantize our models into ``int8``.
 .. code-block:: bash
  ncnn2int8
  usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table]
 First, we quantize the encoder model:
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
  ncnn2int8 \
    ./encoder_jit_trace-pnnx.ncnn.param \
    ./encoder_jit_trace-pnnx.ncnn.bin \
    ./encoder_jit_trace-pnnx.ncnn.int8.param \
    ./encoder_jit_trace-pnnx.ncnn.int8.bin \
    ./encoder-scale-table.txt
 Next, we quantize the joiner model:
 .. code-block:: bash
  ncnn2int8 \
    ./joiner_jit_trace-pnnx.ncnn.param \
    ./joiner_jit_trace-pnnx.ncnn.bin \
    ./joiner_jit_trace-pnnx.ncnn.int8.param \
    ./joiner_jit_trace-pnnx.ncnn.int8.bin \
    ./joiner-scale-table.txt
 The above two commands generate the following 4 files:
 .. code-block::
  -rw-r--r-- 1 kuangfangjun root 218M Feb 17 12:19 encoder_jit_trace-pnnx.ncnn.int8.bin
  -rw-r--r-- 1 kuangfangjun root  21K Feb 17 12:19 encoder_jit_trace-pnnx.ncnn.int8.param
  -rw-r--r-- 1 kuangfangjun root 774K Feb 17 12:19 joiner_jit_trace-pnnx.ncnn.int8.bin
  -rw-r--r-- 1 kuangfangjun root  496 Feb 17 12:19 joiner_jit_trace-pnnx.ncnn.int8.param
 Congratulations! You have successfully quantized your model from ``float32`` to ``int8``.
 .. caution::
  ``ncnn.int8.param`` and ``ncnn.int8.bin`` must be used in pairs.
  You can replace ``ncnn.param`` and ``ncnn.bin`` with ``ncnn.int8.param``
  and ``ncnn.int8.bin`` in `sherpa-ncnn`_ if you like.
  For instance, to use only the ``int8`` encoder in ``sherpa-ncnn``, you can
  replace the following invocation:
    .. code-block::
      cd egs/librispeech/ASR
      cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/
      sherpa-ncnn \
        ../data/lang_bpe_500/tokens.txt \
        ./encoder_jit_trace-pnnx.ncnn.param \
        ./encoder_jit_trace-pnnx.ncnn.bin \
        ./decoder_jit_trace-pnnx.ncnn.param \
        ./decoder_jit_trace-pnnx.ncnn.bin \
        ./joiner_jit_trace-pnnx.ncnn.param \
        ./joiner_jit_trace-pnnx.ncnn.bin \
        ../test_wavs/1089-134686-0001.wav
  with
    .. code-block:: bash
      cd egs/librispeech/ASR
      cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
      sherpa-ncnn \
        ../data/lang_bpe_500/tokens.txt \
        ./encoder_jit_trace-pnnx.ncnn.int8.param \
        ./encoder_jit_trace-pnnx.ncnn.int8.bin \
        ./decoder_jit_trace-pnnx.ncnn.param \
        ./decoder_jit_trace-pnnx.ncnn.bin \
        ./joiner_jit_trace-pnnx.ncnn.param \
        ./joiner_jit_trace-pnnx.ncnn.bin \
        ../test_wavs/1089-134686-0001.wav
 The following table compares again the file sizes:
 +----------------------------------------+------------+
 | File name                              | File size  |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.pt              | 318 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.pt              | 1010 KB    |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.pt               | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 159 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp16) | 1.5 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 317 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp32) | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.int8.bin   | 218 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.int8.bin    | 774 KB     |
 +----------------------------------------+------------+
 You can see that the file size of the joiner model after ``int8`` quantization
 is much smaller. However, the size of the encoder model is even larger than
 the ``fp16`` counterpart. The reason is that `ncnn`_ currently does not support
 quantizing ``LSTM`` layers into ``8-bit``. Please see
 `<https://github.com/Tencent/ncnn/issues/4532>`_
 .. hint::
    Currently, only linear layers and convolutional layers are quantized
    with ``int8``, so you don't see an exact ``4x`` reduction in file sizes.
 .. note::
  You need to test the recognition accuracy after ``int8`` quantization.
 That's it! Have fun with `sherpa-ncnn`_!
--- a/docs/source/model-export/export-ncnn.rst
+++ b/docs/source/model-export/export-ncnn.rst
@ -1,15 +1,26 @@
 Export to ncnn
 ==============
-We support exporting both
+We support exporting the following models
-`LSTM transducer models <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2>`_
+to `ncnn <https://github.com/tencent/ncnn>`_:
 and
 `ConvEmformer transducer models <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2>`_
 to `ncnn <https://github.com/tencent/ncnn>`_.
-We also provide `<https://github.com/k2-fsa/sherpa-ncnn>`_
+  - `Zipformer transducer models <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_
-performing speech recognition using ``ncnn`` with exported models.
+
-It has been tested on Linux, macOS, Windows, ``Android``, and ``Raspberry Pi``.
+  - `LSTM transducer models <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2>`_
  - `ConvEmformer transducer models <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2>`_
 We also provide `sherpa-ncnn`_
 for performing speech recognition using `ncnn`_ with exported models.
 It has been tested on the following platforms:
  - Linux
  - macOS
  - Windows
  - ``Android``
  - ``iOS``
  - ``Raspberry Pi``
  - `爱芯派 <https://wiki.sipeed.com/hardware/zh/>`_ (`MAIX-III AXera-Pi <https://wiki.sipeed.com/hardware/en/maixIII/ax-pi/axpi.html>`_).
 `sherpa-ncnn`_ is self-contained and can be statically linked to produce
 a binary containing everything needed. Please refer
@ -18,754 +29,7 @@ to its documentation for details:
 - `<https://k2-fsa.github.io/sherpa/ncnn/index.html>`_
-Export LSTM transducer models
+.. toctree::
 -----------------------------
-Please refer to :ref:`export-lstm-transducer-model-for-ncnn` for details.
+   export-ncnn-conv-emformer
-
+   export-ncnn-lstm
 Export ConvEmformer transducer models
 -------------------------------------
 We use the pre-trained model from the following repository as an example:
  - `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_
 We will show you step by step how to export it to `ncnn`_ and run it with `sherpa-ncnn`_.
 .. hint::
  We use ``Ubuntu 18.04``, ``torch 1.10``, and ``Python 3.8`` for testing.
 .. caution::
  Please use a more recent version of PyTorch. For instance, ``torch 1.8``
  may ``not`` work.
 1. Download the pre-trained model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. hint::
  You can also refer to `<https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html#icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_ to download the pre-trained model.
  You have to install `git-lfs`_ before you continue.
 .. code-block:: bash
  cd egs/librispeech/ASR
  GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
  git lfs pull --include "exp/pretrained-epoch-30-avg-10-averaged.pt"
  git lfs pull --include "data/lang_bpe_500/bpe.model"
  cd ..
 .. note::
  We download ``exp/pretrained-xxx.pt``, not ``exp/cpu-jit_xxx.pt``.
 In the above code, we download the pre-trained model into the directory
 ``egs/librispeech/ASR/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05``.
 2. Install ncnn and pnnx
 ^^^^^^^^^^^^^^^^^^^^^^^^
 .. code-block:: bash
  # We put ncnn into $HOME/open-source/ncnn
  # You can change it to anywhere you like
  cd $HOME
  mkdir -p open-source
  cd open-source
  git clone https://github.com/csukuangfj/ncnn
  cd ncnn
  git submodule update --recursive --init
  # Note: We don't use "python setup.py install" or "pip install ." here
  mkdir -p build-wheel
  cd build-wheel
  cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DNCNN_PYTHON=ON \
    -DNCNN_BUILD_BENCHMARK=OFF \
    -DNCNN_BUILD_EXAMPLES=OFF \
    -DNCNN_BUILD_TOOLS=ON \
  ..
  make -j4
  cd ..
  # Note: $PWD here is $HOME/open-source/ncnn
  export PYTHONPATH=$PWD/python:$PYTHONPATH
  export PATH=$PWD/tools/pnnx/build/src:$PATH
  export PATH=$PWD/build-wheel/tools/quantize:$PATH
  # Now build pnnx
  cd tools/pnnx
  mkdir build
  cd build
  cmake ..
  make -j4
  ./src/pnnx
 Congratulations! You have successfully installed the following components:
  - ``pnxx``, which is an executable located in
    ``$HOME/open-source/ncnn/tools/pnnx/build/src``. We will use
    it to convert models exported by ``torch.jit.trace()``.
  - ``ncnn2int8``, which is an executable located in
    ``$HOME/open-source/ncnn/build-wheel/tools/quantize``. We will use
    it to quantize our models to ``int8``.
  - ``ncnn.cpython-38-x86_64-linux-gnu.so``, which is a Python module located
    in ``$HOME/open-source/ncnn/python/ncnn``.
    .. note::
      I am using ``Python 3.8``, so it
      is ``ncnn.cpython-38-x86_64-linux-gnu.so``. If you use a different
      version, say, ``Python 3.9``, the name would be
      ``ncnn.cpython-39-x86_64-linux-gnu.so``.
      Also, if you are not using Linux, the file name would also be different.
      But that does not matter. As long as you can compile it, it should work.
 We have set up ``PYTHONPATH`` so that you can use ``import ncnn`` in your
 Python code. We have also set up ``PATH`` so that you can use
 ``pnnx`` and ``ncnn2int8`` later in your terminal.
 .. caution::
  Please don't use `<https://github.com/tencent/ncnn>`_.
  We have made some modifications to the offical `ncnn`_.
  We will synchronize `<https://github.com/csukuangfj/ncnn>`_ periodically
  with the official one.
 3. Export the model via torch.jit.trace()
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 First, let us rename our pre-trained model:
 .. code-block::
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp
  ln -s pretrained-epoch-30-avg-10-averaged.pt epoch-30.pt
  cd ../..
 Next, we use the following code to export our model:
 .. code-block:: bash
  dir=./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/
  ./conv_emformer_transducer_stateless2/export-for-ncnn.py \
    --exp-dir $dir/exp \
    --bpe-model $dir/data/lang_bpe_500/bpe.model \
    --epoch 30 \
    --avg 1 \
    --use-averaged-model 0 \
    \
    --num-encoder-layers 12 \
    --chunk-length 32 \
    --cnn-module-kernel 31 \
    --left-context-length 32 \
    --right-context-length 8 \
    --memory-size 32 \
    --encoder-dim 512
 .. hint::
  We have renamed our model to ``epoch-30.pt`` so that we can use ``--epoch 30``.
  There is only one pre-trained model, so we use ``--avg 1 --use-averaged-model 0``.
  If you have trained a model by yourself and if you have all checkpoints
  available, please first use ``decode.py`` to tune ``--epoch --avg``
  and select the best combination with with ``--use-averaged-model 1``.
 .. note::
  You will see the following log output:
  .. literalinclude:: ./code/export-conv-emformer-transducer-for-ncnn-output.txt
  The log shows the model has ``75490012`` parameters, i.e., ``~75 M``.
  .. code-block::
    ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
    -rw-r--r-- 1 kuangfangjun root 289M Jan 11 12:05 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
  You can see that the file size of the pre-trained model is ``289 MB``, which
  is roughly ``75490012*4/1024/1024 = 287.97 MB``.
 After running ``conv_emformer_transducer_stateless2/export-for-ncnn.py``,
 we will get the following files:
 .. code-block:: bash
  ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*pnnx*
  -rw-r--r-- 1 kuangfangjun root 1010K Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.pt
  -rw-r--r-- 1 kuangfangjun root  283M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.pt
  -rw-r--r-- 1 kuangfangjun root  3.0M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.pt
 .. _conv-emformer-step-3-export-torchscript-model-via-pnnx:
 3. Export torchscript model via pnnx
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. hint::
  Make sure you have set up the ``PATH`` environment variable. Otherwise,
  it will throw an error saying that ``pnnx`` could not be found.
 Now, it's time to export our models to `ncnn`_ via ``pnnx``.
 .. code-block::
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  pnnx ./encoder_jit_trace-pnnx.pt
  pnnx ./decoder_jit_trace-pnnx.pt
  pnnx ./joiner_jit_trace-pnnx.pt
 It will generate the following files:
 .. code-block:: bash
  ls -lh  icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*ncnn*{bin,param}
  -rw-r--r-- 1 kuangfangjun root 503K Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  437 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 142M Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  79K Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 1.5M Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  488 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param
 There are two types of files:
 - ``param``: It is a text file containing the model architectures. You can
  use a text editor to view its content.
 - ``bin``: It is a binary file containing the model parameters.
 We compare the file sizes of the models below before and after converting via ``pnnx``:
 .. see https://tableconvert.com/restructuredtext-generator
 +----------------------------------+------------+
 | File name                        | File size  |
 +==================================+============+
 | encoder_jit_trace-pnnx.pt        | 283 MB     |
 +----------------------------------+------------+
 | decoder_jit_trace-pnnx.pt        | 1010 KB    |
 +----------------------------------+------------+
 | joiner_jit_trace-pnnx.pt         | 3.0 MB     |
 +----------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin  | 142 MB     |
 +----------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin  | 503 KB     |
 +----------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin   | 1.5 MB     |
 +----------------------------------+------------+
 You can see that the file sizes of the models after conversion are about one half
 of the models before conversion:
  - encoder: 283 MB vs 142 MB
  - decoder: 1010 KB vs 503 KB
  - joiner: 3.0 MB vs 1.5 MB
 The reason is that by default ``pnnx`` converts ``float32`` parameters
 to ``float16``. A ``float32`` parameter occupies 4 bytes, while it is 2 bytes
 for ``float16``. Thus, it is ``twice smaller`` after conversion.
 .. hint::
  If you use ``pnnx ./encoder_jit_trace-pnnx.pt fp16=0``, then ``pnnx``
  won't convert ``float32`` to ``float16``.
 4. Test the exported models in icefall
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 .. note::
  We assume you have set up the environment variable ``PYTHONPATH`` when
  building `ncnn`_.
 Now we have successfully converted our pre-trained model to `ncnn`_ format.
 The generated 6 files are what we need. You can use the following code to
 test the converted models:
 .. code-block:: bash
  ./conv_emformer_transducer_stateless2/streaming-ncnn-decode.py \
    --tokens ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/data/lang_bpe_500/tokens.txt \
    --encoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param \
    --encoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin \
    --decoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param \
    --decoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin \
    --joiner-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param \
    --joiner-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin \
    ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav
 .. hint::
  `ncnn`_ supports only ``batch size == 1``, so ``streaming-ncnn-decode.py`` accepts
  only 1 wave file as input.
 The output is given below:
 .. literalinclude:: ./code/test-stremaing-ncnn-decode-conv-emformer-transducer-libri.txt
 Congratulations! You have successfully exported a model from PyTorch to `ncnn`_!
 .. _conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn:
 5. Modify the exported encoder for sherpa-ncnn
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 In order to use the exported models in `sherpa-ncnn`_, we have to modify
 ``encoder_jit_trace-pnnx.ncnn.param``.
 Let us have a look at the first few lines of ``encoder_jit_trace-pnnx.ncnn.param``:
 .. code-block::
  7767517
  1060 1342
  Input                    in0                      0 1 in0
 **Explanation** of the above three lines:
  1. ``7767517``, it is a magic number and should not be changed.
  2. ``1060 1342``, the first number ``1060`` specifies the number of layers
     in this file, while ``1342`` specifies the number of intermediate outputs
     of this file
  3. ``Input in0 0 1 in0``, ``Input`` is the layer type of this layer; ``in0``
     is the layer name of this layer; ``0`` means this layer has no input;
     ``1`` means this layer has one output; ``in0`` is the output name of
     this layer.
 We need to add 1 extra line and also increment the number of layers.
 The result looks like below:
 .. code-block:: bash
  7767517
  1061 1342
  SherpaMetaData           sherpa_meta_data1        0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
  Input                    in0                      0 1 in0
 **Explanation**
  1. ``7767517``, it is still the same
  2. ``1061 1342``, we have added an extra layer, so we need to update ``1060`` to ``1061``.
     We don't need to change ``1342`` since the newly added layer has no inputs or outputs.
  3. ``SherpaMetaData  sherpa_meta_data1  0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512``
     This line is newly added. Its explanation is given below:
      - ``SherpaMetaData`` is the type of this layer. Must be ``SherpaMetaData``.
      - ``sherpa_meta_data1`` is the name of this layer. Must be ``sherpa_meta_data1``.
      - ``0 0`` means this layer has no inputs or output. Must be ``0 0``
      - ``0=1``, 0 is the key and 1 is the value. MUST be ``0=1``
      - ``1=12``, 1 is the key and 12 is the value of the
        parameter ``--num-encoder-layers`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``2=32``, 2 is the key and 32 is the value of the
        parameter ``--memory-size`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``3=31``, 3 is the key and 31 is the value of the
        parameter ``--cnn-module-kernel`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``4=8``, 4 is the key and 8 is the value of the
        parameter ``--left-context-length`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``5=32``, 5 is the key and 32 is the value of the
        parameter ``--chunk-length`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``6=8``, 6 is the key and 8 is the value of the
        parameter ``--right-context-length`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      - ``7=512``, 7 is the key and 512 is the value of the
        parameter ``--encoder-dim`` that you provided when running
        ``conv_emformer_transducer_stateless2/export-for-ncnn.py``.
      For ease of reference, we list the key-value pairs that you need to add
      in the following table. If your model has a different setting, please
      change the values for ``SherpaMetaData`` accordingly. Otherwise, you
      will be ``SAD``.
          +------+-----------------------------+
          | key  | value                       |
          +======+=============================+
          | 0    | 1 (fixed)                   |
          +------+-----------------------------+
          | 1    | ``--num-encoder-layers``    |
          +------+-----------------------------+
          | 2    | ``--memory-size``           |
          +------+-----------------------------+
          | 3    | ``--cnn-module-kernel``     |
          +------+-----------------------------+
          | 4    | ``--left-context-length``   |
          +------+-----------------------------+
          | 5    | ``--chunk-length``          |
          +------+-----------------------------+
          | 6    | ``--right-context-length``  |
          +------+-----------------------------+
          | 7    | ``--encoder-dim``           |
          +------+-----------------------------+
  4. ``Input in0 0 1 in0``. No need to change it.
 .. caution::
  When you add a new layer ``SherpaMetaData``, please remember to update the
  number of layers. In our case, update  ``1060`` to ``1061``. Otherwise,
  you will be SAD later.
 .. hint::
  After adding the new layer ``SherpaMetaData``, you cannot use this model
  with ``streaming-ncnn-decode.py`` anymore since ``SherpaMetaData`` is
  supported only in `sherpa-ncnn`_.
 .. hint::
  `ncnn`_ is very flexible. You can add new layers to it just by text-editing
  the ``param`` file! You don't need to change the ``bin`` file.
 Now you can use this model in `sherpa-ncnn`_.
 Please refer to the following documentation:
  - Linux/macOS/Windows/arm/aarch64: `<https://k2-fsa.github.io/sherpa/ncnn/install/index.html>`_
  - Android: `<https://k2-fsa.github.io/sherpa/ncnn/android/index.html>`_
  - Python: `<https://k2-fsa.github.io/sherpa/ncnn/python/index.html>`_
 We have a list of pre-trained models that have been exported for `sherpa-ncnn`_:
  - `<https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html>`_
    You can find more usages there.
 6. (Optional) int8 quantization with sherpa-ncnn
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 This step is optional.
 In this step, we describe how to quantize our model with ``int8``.
 Change :ref:`conv-emformer-step-3-export-torchscript-model-via-pnnx` to
 disable ``fp16`` when using ``pnnx``:
 .. code-block::
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  pnnx ./encoder_jit_trace-pnnx.pt fp16=0
  pnnx ./decoder_jit_trace-pnnx.pt
  pnnx ./joiner_jit_trace-pnnx.pt fp16=0
 .. note::
  We add ``fp16=0`` when exporting the encoder and joiner. `ncnn`_ does not
  support quantizing the decoder model yet. We will update this documentation
  once `ncnn`_ supports it. (Maybe in this year, 2023).
 It will generate the following files
 .. code-block:: bash
  ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*_jit_trace-pnnx.ncnn.{param,bin}
  -rw-r--r-- 1 kuangfangjun root 503K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  437 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 283M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  79K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
  -rw-r--r-- 1 kuangfangjun root 3.0M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
  -rw-r--r-- 1 kuangfangjun root  488 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param
 Let us compare again the file sizes:
 +----------------------------------------+------------+
 | File name                              | File size  |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.pt              | 283 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.pt              | 1010 KB    |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.pt               | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 142 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp16) | 1.5 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 283 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp32) | 3.0 MB     |
 +----------------------------------------+------------+
 You can see that the file sizes are doubled when we disable ``fp16``.
 .. note::
  You can again use ``streaming-ncnn-decode.py`` to test the exported models.
 Next, follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
 to modify ``encoder_jit_trace-pnnx.ncnn.param``.
 Change
 .. code-block:: bash
  7767517
  1060 1342
  Input                    in0                      0 1 in0
 to
 .. code-block:: bash
  7767517
  1061 1342
  SherpaMetaData           sherpa_meta_data1        0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
  Input                    in0                      0 1 in0
 .. caution::
  Please follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
  to change the values for ``SherpaMetaData`` if your model uses a different setting.
 Next, let us compile `sherpa-ncnn`_ since we will quantize our models within
 `sherpa-ncnn`_.
 .. code-block:: bash
  # We will download sherpa-ncnn to $HOME/open-source/
  # You can change it to anywhere you like.
  cd $HOME
  mkdir -p open-source
  cd open-source
  git clone https://github.com/k2-fsa/sherpa-ncnn
  cd sherpa-ncnn
  mkdir build
  cd build
  cmake ..
  make -j 4
  ./bin/generate-int8-scale-table
  export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH
 The output of the above commands are:
 .. code-block:: bash
  (py38) kuangfangjun:build$ generate-int8-scale-table
  Please provide 10 arg. Currently given: 1
  Usage:
  generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt
  Each line in wave_filenames.txt is a path to some 16k Hz mono wave file.
 We need to create a file ``wave_filenames.txt``, in which we need to put
 some calibration wave files. For testing purpose, we put the ``test_wavs``
 from the pre-trained model repository `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  cat <<EOF > wave_filenames.txt
  ../test_wavs/1089-134686-0001.wav
  ../test_wavs/1221-135766-0001.wav
  ../test_wavs/1221-135766-0002.wav
  EOF
 Now we can calculate the scales needed for quantization with the calibration data:
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  generate-int8-scale-table \
    ./encoder_jit_trace-pnnx.ncnn.param \
    ./encoder_jit_trace-pnnx.ncnn.bin \
    ./decoder_jit_trace-pnnx.ncnn.param \
    ./decoder_jit_trace-pnnx.ncnn.bin \
    ./joiner_jit_trace-pnnx.ncnn.param \
    ./joiner_jit_trace-pnnx.ncnn.bin \
    ./encoder-scale-table.txt \
    ./joiner-scale-table.txt \
    ./wave_filenames.txt
 The output logs are in the following:
 .. literalinclude:: ./code/generate-int-8-scale-table-for-conv-emformer.txt
 It generates the following two files:
 .. code-block:: bash
  $ ls -lh encoder-scale-table.txt joiner-scale-table.txt
  -rw-r--r-- 1 kuangfangjun root 955K Jan 11 17:28 encoder-scale-table.txt
  -rw-r--r-- 1 kuangfangjun root  18K Jan 11 17:28 joiner-scale-table.txt
 .. caution::
  Definitely, you need more calibration data to compute the scale table.
 Finally, let us use the scale table to quantize our models into ``int8``.
 .. code-block:: bash
  ncnn2int8
  usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table]
 First, we quantize the encoder model:
 .. code-block:: bash
  cd egs/librispeech/ASR
  cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
  ncnn2int8 \
    ./encoder_jit_trace-pnnx.ncnn.param \
    ./encoder_jit_trace-pnnx.ncnn.bin \
    ./encoder_jit_trace-pnnx.ncnn.int8.param \
    ./encoder_jit_trace-pnnx.ncnn.int8.bin \
    ./encoder-scale-table.txt
 Next, we quantize the joiner model:
 .. code-block:: bash
  ncnn2int8 \
    ./joiner_jit_trace-pnnx.ncnn.param \
    ./joiner_jit_trace-pnnx.ncnn.bin \
    ./joiner_jit_trace-pnnx.ncnn.int8.param \
    ./joiner_jit_trace-pnnx.ncnn.int8.bin \
    ./joiner-scale-table.txt
 The above two commands generate the following 4 files:
 .. code-block:: bash
  -rw-r--r-- 1 kuangfangjun root  99M Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.bin
  -rw-r--r-- 1 kuangfangjun root  78K Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.param
  -rw-r--r-- 1 kuangfangjun root 774K Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.bin
  -rw-r--r-- 1 kuangfangjun root  496 Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.param
 Congratulations! You have successfully quantized your model from ``float32`` to ``int8``.
 .. caution::
  ``ncnn.int8.param`` and ``ncnn.int8.bin`` must be used in pairs.
  You can replace ``ncnn.param`` and ``ncnn.bin`` with ``ncnn.int8.param``
  and ``ncnn.int8.bin`` in `sherpa-ncnn`_ if you like.
  For instance, to use only the ``int8`` encoder in ``sherpa-ncnn``, you can
  replace the following invocation:
    .. code-block::
      cd egs/librispeech/ASR
      cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
      sherpa-ncnn \
        ../data/lang_bpe_500/tokens.txt \
        ./encoder_jit_trace-pnnx.ncnn.param \
        ./encoder_jit_trace-pnnx.ncnn.bin \
        ./decoder_jit_trace-pnnx.ncnn.param \
        ./decoder_jit_trace-pnnx.ncnn.bin \
        ./joiner_jit_trace-pnnx.ncnn.param \
        ./joiner_jit_trace-pnnx.ncnn.bin \
        ../test_wavs/1089-134686-0001.wav
  with
    .. code-block::
      cd egs/librispeech/ASR
      cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
      sherpa-ncnn \
        ../data/lang_bpe_500/tokens.txt \
        ./encoder_jit_trace-pnnx.ncnn.int8.param \
        ./encoder_jit_trace-pnnx.ncnn.int8.bin \
        ./decoder_jit_trace-pnnx.ncnn.param \
        ./decoder_jit_trace-pnnx.ncnn.bin \
        ./joiner_jit_trace-pnnx.ncnn.param \
        ./joiner_jit_trace-pnnx.ncnn.bin \
        ../test_wavs/1089-134686-0001.wav
 The following table compares again the file sizes:
 +----------------------------------------+------------+
 | File name                              | File size  |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.pt              | 283 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.pt              | 1010 KB    |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.pt               | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 142 MB     |
 +----------------------------------------+------------+
 | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp16) | 1.5 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 283 MB     |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.bin  (fp32) | 3.0 MB     |
 +----------------------------------------+------------+
 | encoder_jit_trace-pnnx.ncnn.int8.bin   | 99 MB      |
 +----------------------------------------+------------+
 | joiner_jit_trace-pnnx.ncnn.int8.bin    | 774 KB     |
 +----------------------------------------+------------+
 You can see that the file sizes of the model after ``int8`` quantization
 are much smaller.
 .. hint::
    Currently, only linear layers and convolutional layers are quantized
    with ``int8``, so you don't see an exact ``4x`` reduction in file sizes.
 .. note::
  You need to test the recognition accuracy after ``int8`` quantization.
 You can find the speed comparison at `<https://github.com/k2-fsa/sherpa-ncnn/issues/44>`_.
 That's it! Have fun with `sherpa-ncnn`_!
--- a/docs/source/model-export/export-onnx.rst
+++ b/docs/source/model-export/export-onnx.rst
@ -10,7 +10,7 @@ There is also a file named ``onnx_pretrained.py``, which you can use
 the exported `ONNX`_ model in Python with `onnxruntime`_ to decode sound files.
 Example
-=======
+-------
 In the following, we demonstrate how to export a streaming Zipformer pre-trained
 model from
--- a/docs/source/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst
+++ b/docs/source/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst
@ -515,132 +515,6 @@ To use the generated files with ``./lstm_transducer_stateless2/jit_pretrained``:
   Please see `<https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/english/server.html>`_
   for how to use the exported models in ``sherpa``.
 .. _export-lstm-transducer-model-for-ncnn:
 Export LSTM transducer models for ncnn
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We support exporting pretrained LSTM transducer models to
 `ncnn <https://github.com/tencent/ncnn>`_ using
 `pnnx <https://github.com/Tencent/ncnn/tree/master/tools/pnnx>`_.
 First, let us install a modified version of ``ncnn``:
 .. code-block:: bash
  git clone https://github.com/csukuangfj/ncnn
  cd ncnn
  git submodule update --recursive --init
  # Note: We don't use "python setup.py install" or "pip install ." here
  mkdir -p build-wheel
  cd build-wheel
  cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DNCNN_PYTHON=ON \
    -DNCNN_BUILD_BENCHMARK=OFF \
    -DNCNN_BUILD_EXAMPLES=OFF \
    -DNCNN_BUILD_TOOLS=ON \
    ..
  make -j4
  cd ..
  # Note: $PWD here is /path/to/ncnn
  export PYTHONPATH=$PWD/python:$PYTHONPATH
  export PATH=$PWD/tools/pnnx/build/src:$PATH
  export PATH=$PWD/build-wheel/tools/quantize:$PATH
  # now build pnnx
  cd tools/pnnx
  mkdir build
  cd build
  cmake ..
  make -j4
  ./src/pnnx
 .. note::
   We assume that you have added the path to the binary ``pnnx`` to the
   environment variable ``PATH``.
   We also assume that you have added ``build/tools/quantize`` to the environment
   variable ``PATH`` so that you are able to use ``ncnn2int8`` later.
 Second, let us export the model using ``torch.jit.trace()`` that is suitable
 for ``pnnx``:
 .. code-block:: bash
  iter=468000
  avg=16
  ./lstm_transducer_stateless2/export-for-ncnn.py \
    --exp-dir ./lstm_transducer_stateless2/exp \
    --bpe-model data/lang_bpe_500/bpe.model \
    --iter $iter \
    --avg  $avg
 It will generate 3 files:
  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.pt``
  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.pt``
  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.pt``
 Third, convert torchscript model to ``ncnn`` format:
 .. code-block::
   pnnx ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.pt
   pnnx ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.pt
   pnnx ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.pt
 It will generate the following files:
  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param``
  - ``./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin``
  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param``
  - ``./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin``
  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param``
  - ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin``
 To use the above generated files, run:
 .. code-block:: bash
  ./lstm_transducer_stateless2/ncnn-decode.py \
   --tokens ./data/lang_bpe_500/tokens.txt \
   --encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
   --encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
   --decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
   --decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
   --joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
   --joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
   /path/to/foo.wav
 .. code-block:: bash
  ./lstm_transducer_stateless2/streaming-ncnn-decode.py \
   --tokens ./data/lang_bpe_500/tokens.txt \
   --encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
   --encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
   --decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
   --decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
   --joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
   --joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
   /path/to/foo.wav
 To use the above generated files in C++, please see
 `<https://github.com/k2-fsa/sherpa-ncnn>`_
 It is able to generate a static linked executable that can be run on Linux, Windows,
 macOS, Raspberry Pi, etc, without external dependencies.
 Download pretrained models
 --------------------------