mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-08 09:32:20 +00:00
add doc for int8 quantization with sherpa-ncnn (#832)
* add doc for int8 quantization with sherpa-ncnn * typo fixes
This commit is contained in:
parent
142420b3af
commit
958dbb3a1d
@ -0,0 +1,104 @@
|
||||
Don't Use GPU. has_gpu: 0, config.use_vulkan_compute: 1
|
||||
num encoder conv layers: 88
|
||||
num joiner conv layers: 3
|
||||
num files: 3
|
||||
Processing ../test_wavs/1089-134686-0001.wav
|
||||
Processing ../test_wavs/1221-135766-0001.wav
|
||||
Processing ../test_wavs/1221-135766-0002.wav
|
||||
Processing ../test_wavs/1089-134686-0001.wav
|
||||
Processing ../test_wavs/1221-135766-0001.wav
|
||||
Processing ../test_wavs/1221-135766-0002.wav
|
||||
----------encoder----------
|
||||
conv_87 : max = 15.942385 threshold = 15.938493 scale = 7.968131
|
||||
conv_88 : max = 35.442448 threshold = 15.549335 scale = 8.167552
|
||||
conv_89 : max = 23.228289 threshold = 8.001738 scale = 15.871552
|
||||
linear_90 : max = 3.976146 threshold = 1.101789 scale = 115.267128
|
||||
linear_91 : max = 6.962030 threshold = 5.162033 scale = 24.602713
|
||||
linear_92 : max = 12.323041 threshold = 3.853959 scale = 32.953129
|
||||
linear_94 : max = 6.905416 threshold = 4.648006 scale = 27.323545
|
||||
linear_93 : max = 6.905416 threshold = 5.474093 scale = 23.200188
|
||||
linear_95 : max = 1.888012 threshold = 1.403563 scale = 90.483986
|
||||
linear_96 : max = 6.856741 threshold = 5.398679 scale = 23.524273
|
||||
linear_97 : max = 9.635942 threshold = 2.613655 scale = 48.590950
|
||||
linear_98 : max = 6.460340 threshold = 5.670146 scale = 22.398010
|
||||
linear_99 : max = 9.532276 threshold = 2.585537 scale = 49.119396
|
||||
linear_101 : max = 6.585871 threshold = 5.719224 scale = 22.205809
|
||||
linear_100 : max = 6.585871 threshold = 5.751382 scale = 22.081648
|
||||
linear_102 : max = 1.593344 threshold = 1.450581 scale = 87.551147
|
||||
linear_103 : max = 6.592681 threshold = 5.705824 scale = 22.257959
|
||||
linear_104 : max = 8.752957 threshold = 1.980955 scale = 64.110489
|
||||
linear_105 : max = 6.696240 threshold = 5.877193 scale = 21.608953
|
||||
linear_106 : max = 9.059659 threshold = 2.643138 scale = 48.048950
|
||||
linear_108 : max = 6.975461 threshold = 4.589567 scale = 27.671457
|
||||
linear_107 : max = 6.975461 threshold = 6.190381 scale = 20.515701
|
||||
linear_109 : max = 3.710759 threshold = 2.305635 scale = 55.082436
|
||||
linear_110 : max = 7.531228 threshold = 5.731162 scale = 22.159557
|
||||
linear_111 : max = 10.528083 threshold = 2.259322 scale = 56.211544
|
||||
linear_112 : max = 8.148807 threshold = 5.500842 scale = 23.087374
|
||||
linear_113 : max = 8.592566 threshold = 1.948851 scale = 65.166611
|
||||
linear_115 : max = 8.437109 threshold = 5.608947 scale = 22.642395
|
||||
linear_114 : max = 8.437109 threshold = 6.193942 scale = 20.503904
|
||||
linear_116 : max = 3.966980 threshold = 3.200896 scale = 39.676392
|
||||
linear_117 : max = 9.451303 threshold = 6.061664 scale = 20.951344
|
||||
linear_118 : max = 12.077262 threshold = 3.965800 scale = 32.023804
|
||||
linear_119 : max = 9.671615 threshold = 4.847613 scale = 26.198460
|
||||
linear_120 : max = 8.625638 threshold = 3.131427 scale = 40.556595
|
||||
linear_122 : max = 10.274080 threshold = 4.888716 scale = 25.978189
|
||||
linear_121 : max = 10.274080 threshold = 5.420480 scale = 23.429659
|
||||
linear_123 : max = 4.826197 threshold = 3.599617 scale = 35.281532
|
||||
linear_124 : max = 11.396383 threshold = 7.325849 scale = 17.335875
|
||||
linear_125 : max = 9.337198 threshold = 3.941410 scale = 32.221970
|
||||
linear_126 : max = 9.699965 threshold = 4.842878 scale = 26.224073
|
||||
linear_127 : max = 8.775370 threshold = 3.884215 scale = 32.696438
|
||||
linear_129 : max = 9.872276 threshold = 4.837319 scale = 26.254213
|
||||
linear_128 : max = 9.872276 threshold = 7.180057 scale = 17.687883
|
||||
linear_130 : max = 4.150427 threshold = 3.454298 scale = 36.765789
|
||||
linear_131 : max = 11.112692 threshold = 7.924847 scale = 16.025545
|
||||
linear_132 : max = 11.852893 threshold = 3.116593 scale = 40.749626
|
||||
linear_133 : max = 11.517084 threshold = 5.024665 scale = 25.275314
|
||||
linear_134 : max = 10.683807 threshold = 3.878618 scale = 32.743618
|
||||
linear_136 : max = 12.421055 threshold = 6.322729 scale = 20.086264
|
||||
linear_135 : max = 12.421055 threshold = 5.309880 scale = 23.917679
|
||||
linear_137 : max = 4.827781 threshold = 3.744595 scale = 33.915554
|
||||
linear_138 : max = 14.422395 threshold = 7.742882 scale = 16.402161
|
||||
linear_139 : max = 8.527538 threshold = 3.866123 scale = 32.849449
|
||||
linear_140 : max = 12.128619 threshold = 4.657793 scale = 27.266134
|
||||
linear_141 : max = 9.839593 threshold = 3.845993 scale = 33.021378
|
||||
linear_143 : max = 12.442304 threshold = 7.099039 scale = 17.889746
|
||||
linear_142 : max = 12.442304 threshold = 5.325038 scale = 23.849592
|
||||
linear_144 : max = 5.929444 threshold = 5.618206 scale = 22.605080
|
||||
linear_145 : max = 13.382126 threshold = 9.321095 scale = 13.625010
|
||||
linear_146 : max = 9.894987 threshold = 3.867645 scale = 32.836517
|
||||
linear_147 : max = 10.915313 threshold = 4.906028 scale = 25.886522
|
||||
linear_148 : max = 9.614287 threshold = 3.908151 scale = 32.496181
|
||||
linear_150 : max = 11.724932 threshold = 4.485588 scale = 28.312899
|
||||
linear_149 : max = 11.724932 threshold = 5.161146 scale = 24.606939
|
||||
linear_151 : max = 7.164453 threshold = 5.847355 scale = 21.719223
|
||||
linear_152 : max = 13.086471 threshold = 5.984121 scale = 21.222834
|
||||
linear_153 : max = 11.099524 threshold = 3.991601 scale = 31.816805
|
||||
linear_154 : max = 10.054585 threshold = 4.489706 scale = 28.286930
|
||||
linear_155 : max = 12.389185 threshold = 3.100321 scale = 40.963501
|
||||
linear_157 : max = 9.982999 threshold = 5.154796 scale = 24.637253
|
||||
linear_156 : max = 9.982999 threshold = 8.537706 scale = 14.875190
|
||||
linear_158 : max = 8.420287 threshold = 6.502287 scale = 19.531588
|
||||
linear_159 : max = 25.014746 threshold = 9.423280 scale = 13.477261
|
||||
linear_160 : max = 45.633553 threshold = 5.715335 scale = 22.220921
|
||||
linear_161 : max = 20.371849 threshold = 5.117830 scale = 24.815203
|
||||
linear_162 : max = 12.492933 threshold = 3.126283 scale = 40.623318
|
||||
linear_164 : max = 20.697504 threshold = 4.825712 scale = 26.317358
|
||||
linear_163 : max = 20.697504 threshold = 5.078367 scale = 25.008038
|
||||
linear_165 : max = 9.023975 threshold = 6.836278 scale = 18.577358
|
||||
linear_166 : max = 34.860619 threshold = 7.259792 scale = 17.493614
|
||||
linear_167 : max = 30.380934 threshold = 5.496160 scale = 23.107042
|
||||
linear_168 : max = 20.691216 threshold = 4.733317 scale = 26.831076
|
||||
linear_169 : max = 9.723948 threshold = 3.952728 scale = 32.129707
|
||||
linear_171 : max = 21.034811 threshold = 5.366547 scale = 23.665123
|
||||
linear_170 : max = 21.034811 threshold = 5.356277 scale = 23.710501
|
||||
linear_172 : max = 10.556884 threshold = 5.729481 scale = 22.166058
|
||||
linear_173 : max = 20.033039 threshold = 10.207264 scale = 12.442120
|
||||
linear_174 : max = 11.597379 threshold = 2.658676 scale = 47.768131
|
||||
----------joiner----------
|
||||
linear_2 : max = 19.293503 threshold = 14.305265 scale = 8.877850
|
||||
linear_1 : max = 10.812222 threshold = 8.766452 scale = 14.487047
|
||||
linear_3 : max = 0.999999 threshold = 0.999755 scale = 127.031174
|
||||
ncnn int8 calibration table create success, best wish for your int8 inference has a low accuracy loss...\(^0^)/...233...
|
@ -204,7 +204,7 @@ Next, we use the following code to export our model:
|
||||
|
||||
.. literalinclude:: ./code/export-conv-emformer-transducer-for-ncnn-output.txt
|
||||
|
||||
The log shows the model has ``75490012`` number of parameters, i.e., ``~75 M``.
|
||||
The log shows the model has ``75490012`` parameters, i.e., ``~75 M``.
|
||||
|
||||
.. code-block::
|
||||
|
||||
@ -213,7 +213,7 @@ Next, we use the following code to export our model:
|
||||
-rw-r--r-- 1 kuangfangjun root 289M Jan 11 12:05 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/pretrained-epoch-30-avg-10-averaged.pt
|
||||
|
||||
You can see that the file size of the pre-trained model is ``289 MB``, which
|
||||
is roughly ``4 x 75 M``.
|
||||
is roughly ``75490012*4/1024/1024 = 287.97 MB``.
|
||||
|
||||
After running ``conv_emformer_transducer_stateless2/export-for-ncnn.py``,
|
||||
we will get the following files:
|
||||
@ -286,8 +286,8 @@ We compare the file sizes of the models below before and after converting via ``
|
||||
| joiner_jit_trace-pnnx.ncnn.bin | 1.5 MB |
|
||||
+----------------------------------+------------+
|
||||
|
||||
You can see that the file size of the models after converting is about one half
|
||||
of the models before converting:
|
||||
You can see that the file sizes of the models after conversion are about one half
|
||||
of the models before conversion:
|
||||
|
||||
- encoder: 283 MB vs 142 MB
|
||||
- decoder: 1010 KB vs 503 KB
|
||||
@ -338,6 +338,8 @@ The output is given below:
|
||||
Congratulations! You have successfully exported a model from PyTorch to `ncnn`_!
|
||||
|
||||
|
||||
.. _conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn:
|
||||
|
||||
5. Modify the exported encoder for sherpa-ncnn
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
@ -356,14 +358,15 @@ Let us have a look at the first few lines of ``encoder_jit_trace-pnnx.ncnn.param
|
||||
|
||||
1. ``7767517``, it is a magic number and should not be changed.
|
||||
2. ``1060 1342``, the first number ``1060`` specifies the number of layers
|
||||
in this file, while ``1342`` specifies the number intermediate outputs of
|
||||
this file
|
||||
in this file, while ``1342`` specifies the number of intermediate outputs
|
||||
of this file
|
||||
3. ``Input in0 0 1 in0``, ``Input`` is the layer type of this layer; ``in0``
|
||||
is the layer name of this layer; ``0`` means this layer has no input;
|
||||
``1`` means this layer has one output. ``in0`` is the output name of
|
||||
``1`` means this layer has one output; ``in0`` is the output name of
|
||||
this layer.
|
||||
|
||||
We need to add 1 extra line and the result looks like below:
|
||||
We need to add 1 extra line and also increment the number of layers.
|
||||
The result looks like below:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@ -376,13 +379,13 @@ We need to add 1 extra line and the result looks like below:
|
||||
|
||||
1. ``7767517``, it is still the same
|
||||
2. ``1061 1342``, we have added an extra layer, so we need to update ``1060`` to ``1061``.
|
||||
We don't need to change ``1342`` since the newly added layer has no inputs and outputs.
|
||||
We don't need to change ``1342`` since the newly added layer has no inputs or outputs.
|
||||
3. ``SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512``
|
||||
This line is newly added. Its explanation is given below:
|
||||
|
||||
- ``SherpaMetaData`` is the type of this layer. Must be ``SherpaMetaData``.
|
||||
- ``sherpa_meta_data1`` is the name of this layer. Must be ``sherpa_meta_data1``.
|
||||
- ``0 0`` means this layer has no inputs and output. Must be ``0 0``
|
||||
- ``0 0`` means this layer has no inputs or output. Must be ``0 0``
|
||||
- ``0=1``, 0 is the key and 1 is the value. MUST be ``0=1``
|
||||
- ``1=12``, 1 is the key and 12 is the value of the
|
||||
parameter ``--num-encoder-layers`` that you provided when running
|
||||
@ -483,10 +486,286 @@ disable ``fp16`` when using ``pnnx``:
|
||||
|
||||
.. note::
|
||||
|
||||
We add ``fp16=0`` when exporting the encoder and joiner. ``ncnn`` does not
|
||||
We add ``fp16=0`` when exporting the encoder and joiner. `ncnn`_ does not
|
||||
support quantizing the decoder model yet. We will update this documentation
|
||||
once ``ncnn`` supports it. (Maybe in this year, 2023).
|
||||
once `ncnn`_ supports it. (Maybe in this year, 2023).
|
||||
|
||||
TODO(fangjun): Finish it.
|
||||
It will generate the following files
|
||||
|
||||
Have fun with `sherpa-ncnn`_!
|
||||
.. code-block:: bash
|
||||
|
||||
ls -lh icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/*_jit_trace-pnnx.ncnn.{param,bin}
|
||||
|
||||
-rw-r--r-- 1 kuangfangjun root 503K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.bin
|
||||
-rw-r--r-- 1 kuangfangjun root 437 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/decoder_jit_trace-pnnx.ncnn.param
|
||||
-rw-r--r-- 1 kuangfangjun root 283M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.bin
|
||||
-rw-r--r-- 1 kuangfangjun root 79K Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/encoder_jit_trace-pnnx.ncnn.param
|
||||
-rw-r--r-- 1 kuangfangjun root 3.0M Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.bin
|
||||
-rw-r--r-- 1 kuangfangjun root 488 Jan 11 15:56 icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/joiner_jit_trace-pnnx.ncnn.param
|
||||
|
||||
Let us compare again the file sizes:
|
||||
|
||||
+----------------------------------------+------------+
|
||||
| File name | File size |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.pt | 283 MB |
|
||||
+----------------------------------------+------------+
|
||||
| decoder_jit_trace-pnnx.pt | 1010 KB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.pt | 3.0 MB |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.ncnn.bin (fp16) | 142 MB |
|
||||
+----------------------------------------+------------+
|
||||
| decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.ncnn.bin (fp16) | 1.5 MB |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.ncnn.bin (fp32) | 283 MB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.ncnn.bin (fp32) | 3.0 MB |
|
||||
+----------------------------------------+------------+
|
||||
|
||||
You can see that the file sizes are doubled when we disable ``fp16``.
|
||||
|
||||
.. note::
|
||||
|
||||
You can again use ``streaming-ncnn-decode.py`` to test the exported models.
|
||||
|
||||
Next, follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
|
||||
to modify ``encoder_jit_trace-pnnx.ncnn.param``.
|
||||
|
||||
Change
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
7767517
|
||||
1060 1342
|
||||
Input in0 0 1 in0
|
||||
|
||||
to
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
7767517
|
||||
1061 1342
|
||||
SherpaMetaData sherpa_meta_data1 0 0 0=1 1=12 2=32 3=31 4=8 5=32 6=8 7=512
|
||||
Input in0 0 1 in0
|
||||
|
||||
.. caution::
|
||||
|
||||
Please follow :ref:`conv-emformer-modify-the-exported-encoder-for-sherpa-ncnn`
|
||||
to change the values for ``SherpaMetaData`` if your model uses a different setting.
|
||||
|
||||
|
||||
Next, let us compile `sherpa-ncnn`_ since we will quantize our models within
|
||||
`sherpa-ncnn`_.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# We will download sherpa-ncnn to $HOME/open-source/
|
||||
# You can change it to anywhere you like.
|
||||
cd $HOME
|
||||
mkdir -p open-source
|
||||
|
||||
cd open-source
|
||||
git clone https://github.com/k2-fsa/sherpa-ncnn
|
||||
cd sherpa-ncnn
|
||||
mkdir build
|
||||
cd build
|
||||
cmake ..
|
||||
make -j 4
|
||||
|
||||
./bin/generate-int8-scale-table
|
||||
|
||||
export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH
|
||||
|
||||
The output of the above commands are:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
(py38) kuangfangjun:build$ generate-int8-scale-table
|
||||
Please provide 10 arg. Currently given: 1
|
||||
Usage:
|
||||
generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt
|
||||
|
||||
Each line in wave_filenames.txt is a path to some 16k Hz mono wave file.
|
||||
|
||||
We need to create a file ``wave_filenames.txt``, in which we need to put
|
||||
some calibration wave files. For testing purpose, we put the ``test_wavs``
|
||||
from the pre-trained model repository `<https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05>`_
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
cd egs/librispeech/ASR
|
||||
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
|
||||
|
||||
cat <<EOF > wave_filenames.txt
|
||||
../test_wavs/1089-134686-0001.wav
|
||||
../test_wavs/1221-135766-0001.wav
|
||||
../test_wavs/1221-135766-0002.wav
|
||||
EOF
|
||||
|
||||
Now we can calculate the scales needed for quantization with the calibration data:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
cd egs/librispeech/ASR
|
||||
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
|
||||
|
||||
generate-int8-scale-table \
|
||||
./encoder_jit_trace-pnnx.ncnn.param \
|
||||
./encoder_jit_trace-pnnx.ncnn.bin \
|
||||
./decoder_jit_trace-pnnx.ncnn.param \
|
||||
./decoder_jit_trace-pnnx.ncnn.bin \
|
||||
./joiner_jit_trace-pnnx.ncnn.param \
|
||||
./joiner_jit_trace-pnnx.ncnn.bin \
|
||||
./encoder-scale-table.txt \
|
||||
./joiner-scale-table.txt \
|
||||
./wave_filenames.txt
|
||||
|
||||
The output logs are in the following:
|
||||
|
||||
.. literalinclude:: ./code/generate-int-8-scale-table-for-conv-emformer.txt
|
||||
|
||||
It generates the following two files:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ ls -lh encoder-scale-table.txt joiner-scale-table.txt
|
||||
-rw-r--r-- 1 kuangfangjun root 955K Jan 11 17:28 encoder-scale-table.txt
|
||||
-rw-r--r-- 1 kuangfangjun root 18K Jan 11 17:28 joiner-scale-table.txt
|
||||
|
||||
.. caution::
|
||||
|
||||
Definitely, you need more calibration data to compute the scale table.
|
||||
|
||||
Finally, let us use the scale table to quantize our models into ``int8``.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
ncnn2int8
|
||||
|
||||
usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table]
|
||||
|
||||
First, we quantize the encoder model:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
cd egs/librispeech/ASR
|
||||
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
|
||||
|
||||
ncnn2int8 \
|
||||
./encoder_jit_trace-pnnx.ncnn.param \
|
||||
./encoder_jit_trace-pnnx.ncnn.bin \
|
||||
./encoder_jit_trace-pnnx.ncnn.int8.param \
|
||||
./encoder_jit_trace-pnnx.ncnn.int8.bin \
|
||||
./encoder-scale-table.txt
|
||||
|
||||
Next, we quantize the joiner model:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
ncnn2int8 \
|
||||
./joiner_jit_trace-pnnx.ncnn.param \
|
||||
./joiner_jit_trace-pnnx.ncnn.bin \
|
||||
./joiner_jit_trace-pnnx.ncnn.int8.param \
|
||||
./joiner_jit_trace-pnnx.ncnn.int8.bin \
|
||||
./joiner-scale-table.txt
|
||||
|
||||
The above two commands generate the following 4 files:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
-rw-r--r-- 1 kuangfangjun root 99M Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.bin
|
||||
-rw-r--r-- 1 kuangfangjun root 78K Jan 11 17:34 encoder_jit_trace-pnnx.ncnn.int8.param
|
||||
-rw-r--r-- 1 kuangfangjun root 774K Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.bin
|
||||
-rw-r--r-- 1 kuangfangjun root 496 Jan 11 17:35 joiner_jit_trace-pnnx.ncnn.int8.param
|
||||
|
||||
Congratulations! You have successfully quantized your model from ``float32`` to ``int8``.
|
||||
|
||||
.. caution::
|
||||
|
||||
``ncnn.int8.param`` and ``ncnn.int8.bin`` must be used in pairs.
|
||||
|
||||
You can replace ``ncnn.param`` and ``ncnn.bin`` with ``ncnn.int8.param``
|
||||
and ``ncnn.int8.bin`` in `sherpa-ncnn`_ if you like.
|
||||
|
||||
For instance, to use only the ``int8`` encoder in ``sherpa-ncnn``, you can
|
||||
replace the following invocation:
|
||||
|
||||
.. code-block::
|
||||
|
||||
cd egs/librispeech/ASR
|
||||
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
|
||||
|
||||
sherpa-ncnn \
|
||||
../data/lang_bpe_500/tokens.txt \
|
||||
./encoder_jit_trace-pnnx.ncnn.param \
|
||||
./encoder_jit_trace-pnnx.ncnn.bin \
|
||||
./decoder_jit_trace-pnnx.ncnn.param \
|
||||
./decoder_jit_trace-pnnx.ncnn.bin \
|
||||
./joiner_jit_trace-pnnx.ncnn.param \
|
||||
./joiner_jit_trace-pnnx.ncnn.bin \
|
||||
../test_wavs/1089-134686-0001.wav
|
||||
|
||||
with
|
||||
|
||||
.. code-block::
|
||||
|
||||
cd egs/librispeech/ASR
|
||||
cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/
|
||||
|
||||
sherpa-ncnn \
|
||||
../data/lang_bpe_500/tokens.txt \
|
||||
./encoder_jit_trace-pnnx.ncnn.int8.param \
|
||||
./encoder_jit_trace-pnnx.ncnn.int8.bin \
|
||||
./decoder_jit_trace-pnnx.ncnn.param \
|
||||
./decoder_jit_trace-pnnx.ncnn.bin \
|
||||
./joiner_jit_trace-pnnx.ncnn.param \
|
||||
./joiner_jit_trace-pnnx.ncnn.bin \
|
||||
../test_wavs/1089-134686-0001.wav
|
||||
|
||||
|
||||
The following table compares again the file sizes:
|
||||
|
||||
|
||||
+----------------------------------------+------------+
|
||||
| File name | File size |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.pt | 283 MB |
|
||||
+----------------------------------------+------------+
|
||||
| decoder_jit_trace-pnnx.pt | 1010 KB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.pt | 3.0 MB |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.ncnn.bin (fp16) | 142 MB |
|
||||
+----------------------------------------+------------+
|
||||
| decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.ncnn.bin (fp16) | 1.5 MB |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.ncnn.bin (fp32) | 283 MB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.ncnn.bin (fp32) | 3.0 MB |
|
||||
+----------------------------------------+------------+
|
||||
| encoder_jit_trace-pnnx.ncnn.int8.bin | 99 MB |
|
||||
+----------------------------------------+------------+
|
||||
| joiner_jit_trace-pnnx.ncnn.int8.bin | 774 KB |
|
||||
+----------------------------------------+------------+
|
||||
|
||||
You can see that the file sizes of the model after ``int8`` quantization
|
||||
are much smaller.
|
||||
|
||||
.. hint::
|
||||
|
||||
Currently, only linear layers and convolutional layers are quantized
|
||||
with ``int8``, so you don't see an exact ``4x`` reduction in file sizes.
|
||||
|
||||
.. note::
|
||||
|
||||
You need to test the recognition accuracy after ``int8`` quantization.
|
||||
|
||||
You can find the speed comparison at `<https://github.com/k2-fsa/sherpa-ncnn/issues/44>`_.
|
||||
|
||||
|
||||
That's it! Have fun with `sherpa-ncnn`_!
|
||||
|
Loading…
x
Reference in New Issue
Block a user