Export to ncnn
We support exporting both LSTM transducer models and ConvEmformer transducer models to ncnn.
We also provide https://github.com/k2-fsa/sherpa-ncnn
performing speech recognition using ncnn
with exported models.
It has been tested on Linux, macOS, Windows, Android
, and Raspberry Pi
.
sherpa-ncnn is self-contained and can be statically linked to produce a binary containing everything needed. Please refer to its documentation for details:
Export LSTM transducer models
Please refer to Export LSTM transducer models for ncnn for details.
Export ConvEmformer transducer models
We use the pre-trained model from the following repository as an example:
We will show you step by step how to export it to ncnn and run it with sherpa-ncnn.
Hint
We use Ubuntu 18.04
, torch 1.10
, and Python 3.8
for testing.
Caution
Please use a more recent version of PyTorch. For instance, torch 1.8
may not
work.
1. Download the pre-trained model
Hint
You can also refer to https://k2-fsa.github.io/sherpa/cpp/pretrained_models/online_transducer.html#icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05 to download the pre-trained model.
You have to install git-lfs before you continue.
cd egs/librispeech/ASR
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
git lfs pull --include "exp/pretrained-epoch-30-avg-10-averaged.pt"
git lfs pull --include "data/lang_bpe_500/bpe.model"
cd ..
Note
We download exp/pretrained-xxx.pt
, not exp/cpu-jit_xxx.pt
.
In the above code, we download the pre-trained model into the directory
egs/librispeech/ASR/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
.
2. Install ncnn and pnnx
# We put ncnn into $HOME/open-source/ncnn
# You can change it to anywhere you like
cd $HOME
mkdir -p open-source
cd open-source
git clone https://github.com/csukuangfj/ncnn
cd ncnn
git submodule update --recursive --init
# Note: We don't use "python setup.py install" or "pip install ." here
mkdir -p build-wheel
cd build-wheel
cmake \
-DCMAKE_BUILD_TYPE=Release \
-DNCNN_PYTHON=ON \
-DNCNN_BUILD_BENCHMARK=OFF \
-DNCNN_BUILD_EXAMPLES=OFF \
-DNCNN_BUILD_TOOLS=ON \
..
make -j4
cd ..
# Note: $PWD here is $HOME/open-source/ncnn
export PYTHONPATH=$PWD/python:$PYTHONPATH
export PATH=$PWD/tools/pnnx/build/src:$PATH
export PATH=$PWD/build-wheel/tools/quantize:$PATH
# Now build pnnx
cd tools/pnnx
mkdir build
cd build
cmake ..
make -j4
./src/pnnx
Congratulations! You have successfully installed the following components:
pnxx
, which is an executable located in$HOME/open-source/ncnn/tools/pnnx/build/src
. We will use it to convert models exported bytorch.jit.trace()
.
ncnn2int8
, which is an executable located in$HOME/open-source/ncnn/build-wheel/tools/quantize
. We will use it to quantize our models toint8
.
ncnn.cpython-38-x86_64-linux-gnu.so
, which is a Python module located in$HOME/open-source/ncnn/python/ncnn
.Note
I am using
Python 3.8
, so it isncnn.cpython-38-x86_64-linux-gnu.so
. If you use a different version, say,Python 3.9
, the name would bencnn.cpython-39-x86_64-linux-gnu.so
.Also, if you are not using Linux, the file name would also be different. But that does not matter. As long as you can compile it, it should work.
We have set up PYTHONPATH
so that you can use import ncnn
in your
Python code. We have also set up PATH
so that you can use
pnnx
and ncnn2int8
later in your terminal.
Caution
Please don’t use https://github.com/tencent/ncnn. We have made some modifications to the offical ncnn.
We will synchronize https://github.com/csukuangfj/ncnn periodically with the official one.
3. Export the model via torch.jit.trace()
First, let us rename our pre-trained model:
cd egs/librispeech/ASR
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp
ln -s pretrained-epoch-30-avg-10-averaged.pt epoch-30.pt
cd ../..
Next, we use the following code to export our model:
dir=./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/
./conv_emformer_transducer_stateless2/export-for-ncnn.py \
--exp-dir $dir/exp \
--bpe-model $dir/data/lang_bpe_500/bpe.model \
--epoch 30 \
--avg 1 \
--use-averaged-model 0 \
\
--num-encoder-layers 12 \
--chunk-length 32 \
--cnn-module-kernel 31 \
--left-context-length 32 \
--right-context-length 8 \
--memory-size 32 \
--encoder-dim 512
Hint
We have renamed our model to epoch-30.pt
so that we can use --epoch 30
.
There is only one pre-trained model, so we use --avg 1 --use-averaged-model 0
.
If you have trained a model by yourself and if you have all checkpoints
available, please first use decode.py
to tune --epoch --avg
and select the best combination with with --use-averaged-model 1
.
Note
You will see the following log output:
2023-01-11 12:15:38,677 INFO [export-for-ncnn.py:220] device: cpu
2023-01-11 12:15:38,681 INFO [export-for-ncnn.py:229] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_v
alid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampl
ing_factor': 4, 'decoder_dim': 512, 'joiner_dim': 512, 'model_warm_step': 3000, 'env_info': {'k2-version': '1.23.2', 'k2-build-type':
'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'a34171ed85605b0926eebbd0463d059431f4f74a', 'k2-git-date': 'Wed Dec 14 00:06:38 2022',
'lhotse-version': '1.12.0.dev+missing.version.file', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': False, 'torch-cuda-vers
ion': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'fix-stateless3-train-2022-12-27', 'icefall-git-sha1': '530e8a1-dirty', '
icefall-git-date': 'Tue Dec 27 13:59:18 2022', 'icefall-path': '/star-fj/fangjun/open-source/icefall', 'k2-path': '/star-fj/fangjun/op
en-source/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-fj/fangjun/open-source/lhotse/lhotse/__init__.py', 'hostname': 'de-74279
-k2-train-3-1220120619-7695ff496b-s9n4w', 'IP address': '127.0.0.1'}, 'epoch': 30, 'iter': 0, 'avg': 1, 'exp_dir': PosixPath('icefa
ll-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp'), 'bpe_model': './icefall-asr-librispeech-conv-emformer-transdu
cer-stateless2-2022-07-05//data/lang_bpe_500/bpe.model', 'jit': False, 'context_size': 2, 'use_averaged_model': False, 'encoder_dim':
512, 'nhead': 8, 'dim_feedforward': 2048, 'num_encoder_layers': 12, 'cnn_module_kernel': 31, 'left_context_length': 32, 'chunk_length'
: 32, 'right_context_length': 8, 'memory_size': 32, 'blank_id': 0, 'vocab_size': 500}
2023-01-11 12:15:38,681 INFO [export-for-ncnn.py:231] About to create model
2023-01-11 12:15:40,053 INFO [checkpoint.py:112] Loading checkpoint from icefall-asr-librispeech-conv-emformer-transducer-stateless2-2
022-07-05/exp/epoch-30.pt
2023-01-11 12:15:40,708 INFO [export-for-ncnn.py:315] Number of model parameters: 75490012
2023-01-11 12:15:41,681 INFO [export-for-ncnn.py:318] Using torch.jit.trace()
2023-01-11 12:15:41,681 INFO [export-for-ncnn.py:320] Exporting encoder
2023-01-11 12:15:41,682 INFO [export-for-ncnn.py:149] chunk_length: 32, right_context_length: 8
The log shows the model has 75490012
number of parameters, i.e., ~75 M
.
ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
-rw-r--r-- 1 kuangfangjun root 289M Jan 11 12:05 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
You can see that the file size of the pre-trained model is 289 MB
, which
is roughly 4 x 75 M
.
After running conv_emformer_transducer_stateless2/export-for-ncnn.py
,
we will get the following files:
ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*pnnx*
-rw-r--r-- 1 kuangfangjun root 1010K Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.pt
-rw-r--r-- 1 kuangfangjun root 283M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.pt
-rw-r--r-- 1 kuangfangjun root 3.0M Jan 11 12:15 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.pt
3. Export torchscript model via pnnx
Hint
Make sure you have set up the PATH
environment variable. Otherwise,
it will throw an error saying that pnnx
could not be found.
Now, it’s time to export our models to ncnn via pnnx
.
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
pnnx ./encoder_jit_trace-pnnx.pt
pnnx ./decoder_jit_trace-pnnx.pt
pnnx ./joiner_jit_trace-pnnx.pt
It will generate the following files:
ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*ncnn*{bin,param}
-rw-r--r-- 1 kuangfangjun root 503K Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 437 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
-rw-r--r-- 1 kuangfangjun root 142M Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 79K Jan 11 12:36 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
-rw-r--r-- 1 kuangfangjun root 1.5M Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
-rw-r--r-- 1 kuangfangjun root 488 Jan 11 12:38 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param
There are two types of files:
param
: It is a text file containing the model architectures. You can use a text editor to view its content.bin
: It is a binary file containing the model parameters.
We compare the file sizes of the models below before and after converting via pnnx
:
File name |
File size |
---|---|
encoder_jit_trace-pnnx.pt |
283 MB |
decoder_jit_trace-pnnx.pt |
1010 KB |
joiner_jit_trace-pnnx.pt |
3.0 MB |
encoder_jit_trace-pnnx.ncnn.bin |
142 MB |
decoder_jit_trace-pnnx.ncnn.bin |
503 KB |
joiner_jit_trace-pnnx.ncnn.bin |
1.5 MB |
You can see that the file size of the models after converting is about one half of the models before converting:
encoder: 283 MB vs 142 MB
decoder: 1010 KB vs 503 KB
joiner: 3.0 MB vs 1.5 MB
The reason is that by default pnnx
converts float32
parameters
to float16
. A float32
parameter occupies 4 bytes, while it is 2 bytes
for float16
. Thus, it is twice smaller
after conversion.
Hint
If you use pnnx ./encoder_jit_trace-pnnx.pt fp16=0
, then pnnx
won’t convert float32
to float16
.
4. Test the exported models in icefall
Note
We assume you have set up the environment variable PYTHONPATH
when
building ncnn.
Now we have successfully converted our pre-trained model to ncnn format. The generated 6 files are what we need. You can use the following code to test the converted models:
./conv_emformer_transducer_stateless2/streaming-ncnn-decode.py \
--tokens ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/data/lang_bpe_500/tokens.txt \
--encoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin \
./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav
Hint
ncnn supports only batch size == 1
, so streaming-ncnn-decode.py
accepts
only 1 wave file as input.
The output is given below:
2023-01-11 14:02:12,216 INFO [streaming-ncnn-decode.py:320] {'tokens': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/data/lang_bpe_500/tokens.txt', 'encoder_param_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param', 'encoder_bin_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin', 'decoder_param_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param', 'decoder_bin_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin', 'joiner_param_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param', 'joiner_bin_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin', 'sound_filename': './icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav'}
T 51 32
2023-01-11 14:02:13,141 INFO [streaming-ncnn-decode.py:328] Constructing Fbank computer
2023-01-11 14:02:13,151 INFO [streaming-ncnn-decode.py:331] Reading sound files: ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav
2023-01-11 14:02:13,176 INFO [streaming-ncnn-decode.py:336] torch.Size([106000])
2023-01-11 14:02:17,581 INFO [streaming-ncnn-decode.py:380] ./icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/test_wavs/1089-134686-0001.wav
2023-01-11 14:02:17,581 INFO [streaming-ncnn-decode.py:381] AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
Congratulations! You have successfully exported a model from PyTorch to ncnn!
5. Modify the exported encoder for sherpa-ncnn
In order to use the exported models in sherpa-ncnn, we have to modify
encoder_jit_trace-pnnx.ncnn.param
.
Let us have a look at the first few lines of encoder_jit_trace-pnnx.ncnn.param
:
7767517
1060 1342
Input in0 0 1 in0
Explanation of the above three lines:
7767517
, it is a magic number and should not be changed.
1060 1342
, the first number1060
specifies the number of layers in this file, while1342
specifies the number intermediate outputs of this file
Input in0 0 1 in0
,Input
is the layer type of this layer;in0
is the layer name of this layer;0
means this layer has no input;1
means this layer has one output.in0
is the output name of this layer.
We need to add 1 extra line and the result looks like below:
7767517
1061 1342
SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
Input in0 0 1 in0
Explanation
7767517
, it is still the same
1061 1342
, we have added an extra layer, so we need to update1060
to1061
. We don’t need to change1342
since the newly added layer has no inputs and outputs.
SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
This line is newly added. Its explanation is given below:
SherpaMetaData
is the type of this layer. Must beSherpaMetaData
.
sherpa_meta_data1
is the name of this layer. Must besherpa_meta_data1
.
0 0
means this layer has no inputs and output. Must be0 0
0=1
, 0 is the key and 1 is the value. MUST be0=1
1=12
, 1 is the key and 12 is the value of the parameter--num-encoder-layers
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.
2=32
, 2 is the key and 32 is the value of the parameter--memory-size
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.
3=31
, 3 is the key and 31 is the value of the parameter--cnn-module-kernel
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.
4=8
, 4 is the key and 8 is the value of the parameter--left-context-length
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.
5=32
, 5 is the key and 32 is the value of the parameter--chunk-length
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.
6=8
, 6 is the key and 8 is the value of the parameter--right-context-length
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.
7=512
, 7 is the key and 512 is the value of the parameter--encoder-dim
that you provided when runningconv_emformer_transducer_stateless2/export-for-ncnn.py
.For ease of reference, we list the key-value pairs that you need to add in the following table. If your model has a different setting, please change the values for
SherpaMetaData
accordingly. Otherwise, you will beSAD
.
key
value
0
1 (fixed)
1
--num-encoder-layers
2
--memory-size
3
--cnn-module-kernel
4
--left-context-length
5
--chunk-length
6
--right-context-length
7
--encoder-dim
Input in0 0 1 in0
. No need to change it.
Caution
When you add a new layer SherpaMetaData
, please remember to update the
number of layers. In our case, update 1060
to 1061
. Otherwise,
you will be SAD later.
Hint
After adding the new layer SherpaMetaData
, you cannot use this model
with streaming-ncnn-decode.py
anymore since SherpaMetaData
is
supported only in sherpa-ncnn.
Hint
ncnn is very flexible. You can add new layers to it just by text-editing
the param
file! You don’t need to change the bin
file.
Now you can use this model in sherpa-ncnn. Please refer to the following documentation:
Linux/macOS/Windows/arm/aarch64: https://k2-fsa.github.io/sherpa/ncnn/install/index.html
Android: https://k2-fsa.github.io/sherpa/ncnn/android/index.html
Python: https://k2-fsa.github.io/sherpa/ncnn/python/index.html
We have a list of pre-trained models that have been exported for sherpa-ncnn:
https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html
You can find more usages there.
6. (Optional) int8 quantization with sherpa-ncnn
This step is optional.
In this step, we describe how to quantize our model with int8
.
Change 3. Export torchscript model via pnnx to
disable fp16
when using pnnx
:
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
pnnx ./encoder_jit_trace-pnnx.pt fp16=0
pnnx ./decoder_jit_trace-pnnx.pt
pnnx ./joiner_jit_trace-pnnx.pt fp16=0
Note
We add fp16=0
when exporting the encoder and joiner. ncnn
does not
support quantizing the decoder model yet. We will update this documentation
once ncnn
supports it. (Maybe in this year, 2023).
TODO(fangjun): Finish it.
Have fun with sherpa-ncnn!