deploy: 4e249da2c402eb83e6206365c161693d2f5db070

This commit is contained in:
yfyeung 2022-12-26 06:30:55 +00:00
parent f4a927e35b
commit df2d5486f4
18 changed files with 1058 additions and 20 deletions

View File

@ -1,7 +1,7 @@
.. _export-model-with-torch-jit-script:
Export model with torch.jit.script()
===================================
====================================
In this section, we describe how to export a model via
``torch.jit.script()``.

View File

@ -703,7 +703,7 @@ It will show you the following message:
HLG decoding
^^^^^^^^^^^^
~~~~~~~~~~~~
.. code-block:: bash

View File

@ -888,7 +888,7 @@ It will show you the following message:
CTC decoding
^^^^^^^^^^^^
~~~~~~~~~~~~
.. code-block:: bash
@ -926,7 +926,7 @@ Its output is:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
HLG decoding
^^^^^^^^^^^^
~~~~~~~~~~~~
.. code-block:: bash
@ -966,7 +966,7 @@ The output is:
HLG decoding + n-gram LM rescoring
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
@ -1012,7 +1012,7 @@ The output is:
HLG decoding + n-gram LM rescoring + attention decoder rescoring
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash

View File

@ -7,5 +7,5 @@ LibriSpeech
tdnn_lstm_ctc
conformer_ctc
pruned_transducer_stateless
lstm_pruned_stateless_transducer
zipformer_mmi
zipformer_ctc_blankskip

View File

@ -499,9 +499,10 @@ can run:
Export model using ``torch.jit.script()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
./pruned_transducer_stateless4/export.py \
--exp-dir ./pruned_transducer_stateless4/exp \
--bpe-model data/lang_bpe_500/bpe.model \

View File

@ -0,0 +1,453 @@
Zipformer CTC Blank Skip
========================
.. hint::
Please scroll down to the bottom of this page to find download links
for pretrained models if you don't want to train a model from scratch.
This tutorial shows you how to train a Zipformer model based on the guidance from
a co-trained CTC model using `blank skip method <https://arxiv.org/pdf/2210.16481.pdf>`_
with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
.. note::
We use both CTC and RNN-T loss to train. During the forward pass, the encoder output
is first used to calculate the CTC posterior probability; then for each output frame,
if its blank posterior is bigger than some threshold, it will be simply discarded
from the encoder output. To prevent information loss, we also put a convolution module
similar to the one used in conformer (referred to as “LConv”) before the frame reduction.
Data preparation
----------------
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./prepare.sh
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
All you need to do is to run it.
.. note::
We encourage you to read ``./prepare.sh``.
The data preparation contains several stages. You can use the following two
options:
- ``--stage``
- ``--stop-stage``
to control which stage(s) should be run. By default, all stages are executed.
For example,
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./prepare.sh --stage 0 --stop-stage 0
means to run only stage 0.
To run stage 2 to stage 5, use:
.. code-block:: bash
$ ./prepare.sh --stage 2 --stop-stage 5
.. hint::
If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
``./prepare.sh`` won't re-download them.
.. note::
All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
are saved in ``./data`` directory.
We provide the following YouTube video showing how to run ``./prepare.sh``.
.. note::
To get the latest news of `next-gen Kaldi <https://github.com/k2-fsa>`_, please subscribe
the following YouTube channel by `Nadira Povey <https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_:
`<https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_
.. youtube:: ofEIoJL-mGM
Training
--------
For stability, it doesn`t use blank skip method until model warm-up.
Configurable options
~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/train.py --help
shows you the training options that can be passed from the commandline.
The following options are used quite often:
- ``--full-libri``
If it's True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
``train-clean-100``, which has 100 hours of training data.
.. CAUTION::
The training set is perturbed by speed with two factors: 0.9 and 1.1.
If ``--full-libri`` is True, each epoch actually processes
``3x960 == 2880`` hours of data.
- ``--num-epochs``
It is the number of epochs to train. For instance,
``./pruned_transducer_stateless7_ctc_bs/train.py --num-epochs 30`` trains for 30 epochs
and generates ``epoch-1.pt``, ``epoch-2.pt``, ..., ``epoch-30.pt``
in the folder ``./pruned_transducer_stateless7_ctc_bs/exp``.
- ``--start-epoch``
It's used to resume training.
``./pruned_transducer_stateless7_ctc_bs/train.py --start-epoch 10`` loads the
checkpoint ``./pruned_transducer_stateless7_ctc_bs/exp/epoch-9.pt`` and starts
training from epoch 10, based on the state from epoch 9.
- ``--world-size``
It is used for multi-GPU single-machine DDP training.
- (a) If it is 1, then no DDP training is used.
- (b) If it is 2, then GPU 0 and GPU 1 are used for DDP training.
The following shows some use cases with it.
**Use case 1**: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ export CUDA_VISIBLE_DEVICES="0,2"
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size 2
**Use case 2**: You have 4 GPUs and you want to use all of them
for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size 4
**Use case 3**: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ export CUDA_VISIBLE_DEVICES="3"
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size 1
.. caution::
Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.
- ``--max-duration``
It specifies the number of seconds over all utterances in a
batch, before **padding**.
If you encounter CUDA OOM, please reduce it.
.. HINT::
Due to padding, the number of seconds of all utterances in a
batch will usually be larger than ``--max-duration``.
A larger value for ``--max-duration`` may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.
Pre-configured options
~~~~~~~~~~~~~~~~~~~~~~
There are some training options, e.g., weight decay,
number of warmup steps, results dir, etc,
that are not passed from the commandline.
They are pre-configured by the function ``get_params()`` in
`pruned_transducer_stateless7_ctc_bs/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc_bs/train.py>`_
You don't need to change these pre-configured parameters. If you really need to change
them, please modify ``./pruned_transducer_stateless7_ctc_bs/train.py`` directly.
Training logs
~~~~~~~~~~~~~
Training logs and checkpoints are saved in ``pruned_transducer_stateless7_ctc_bs/exp``.
You will find the following files in that directory:
- ``epoch-1.pt``, ``epoch-2.pt``, ...
These are checkpoint files saved at the end of each epoch, containing model
``state_dict`` and optimizer ``state_dict``.
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
.. code-block:: bash
$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-epoch 11
- ``checkpoint-436000.pt``, ``checkpoint-438000.pt``, ...
These are checkpoint files saved every ``--save-every-n`` batches,
containing model ``state_dict`` and optimizer ``state_dict``.
To resume training from some checkpoint, say ``checkpoint-436000``, you can use:
.. code-block:: bash
$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-batch 436000
- ``tensorboard/``
This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
.. code-block:: bash
$ cd pruned_transducer_stateless7_ctc_bs/exp/tensorboard
$ tensorboard dev upload --logdir . --description "Zipformer MMI training for LibriSpeech with icefall"
It will print something like below:
.. code-block::
TensorFlow installation not found - running with reduced feature set.
Upload started and will continue reading any new data as it's added to the logdir.
To stop uploading, press Ctrl-C.
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/xyOZUKpEQm62HBIlUD4uPA/
Note there is a URL in the above output. Click it and you will see
tensorboard.
.. hint::
If you don't have access to google, you can use the following command
to view the tensorboard log locally:
.. code-block:: bash
cd pruned_transducer_stateless7_ctc_bs/exp/tensorboard
tensorboard --logdir . --port 6008
It will print the following message:
.. code-block::
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.8.0 at http://localhost:6008/ (Press CTRL+C to quit)
Now start your browser and go to `<http://localhost:6008>`_ to view the tensorboard
logs.
- ``log/log-train-xxxx``
It is the detailed training log in text format, same as the one
you saw printed to the console during training.
Usage example
~~~~~~~~~~~~~
You can use the following command to start the training using 4 GPUs:
.. code-block:: bash
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./pruned_transducer_stateless7_ctc_bs/train.py \
--world-size 4 \
--num-epochs 30 \
--start-epoch 1 \
--full-libri 1 \
--exp-dir pruned_transducer_stateless7_ctc_bs/exp \
--max-duration 600 \
--use-fp16 1
Decoding
--------
The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.
.. hint::
There are two kinds of checkpoints:
- (1) ``epoch-1.pt``, ``epoch-2.pt``, ..., which are saved at the end
of each epoch. You can pass ``--epoch`` to
``pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py`` to use them.
- (2) ``checkpoints-436000.pt``, ``epoch-438000.pt``, ..., which are saved
every ``--save-every-n`` batches. You can pass ``--iter`` to
``pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py`` to use them.
We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py --help
shows the options for decoding.
The following shows the example using ``epoch-*.pt``:
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py \
--epoch 30 \
--avg 13 \
--exp-dir pruned_transducer_stateless7_ctc_bs/exp \
--max-duration 600 \
--decoding-method $m
done
To test CTC branch, you can use the following command:
.. code-block:: bash
for m in ctc-decoding 1best; do
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py \
--epoch 30 \
--avg 13 \
--exp-dir pruned_transducer_stateless7_ctc_bs/exp \
--max-duration 600 \
--decoding-method $m
done
Export models
-------------
`pruned_transducer_stateless7_ctc_bs/export.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc_bs/export.py>`_ supports exporting checkpoints from ``pruned_transducer_stateless7_ctc_bs/exp`` in the following ways.
Export ``model.state_dict()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checkpoints saved by ``pruned_transducer_stateless7_ctc_bs/train.py`` also include
``optimizer.state_dict()``. It is useful for resuming training. But after training,
we are interested only in ``model.state_dict()``. You can use the following
command to extract ``model.state_dict()``.
.. code-block:: bash
./pruned_transducer_stateless7_ctc_bs/export.py \
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 30 \
--avg 13 \
--jit 0
It will generate a file ``./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt``.
.. hint::
To use the generated ``pretrained.pt`` for ``pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py``,
you can run:
.. code-block:: bash
cd pruned_transducer_stateless7_ctc_bs/exp
ln -s pretrained epoch-9999.pt
And then pass ``--epoch 9999 --avg 1 --use-averaged-model 0`` to
``./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py``.
To use the exported model with ``./pruned_transducer_stateless7_ctc_bs/pretrained.py``, you
can run:
.. code-block:: bash
./pruned_transducer_stateless7_ctc_bs/pretrained.py \
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt \
--bpe-model ./data/lang_bpe_500/bpe.model \
--method greedy_search \
/path/to/foo.wav \
/path/to/bar.wav
To test CTC branch using the exported model with ``./pruned_transducer_stateless7_ctc_bs/pretrained_ctc.py``:
.. code-block:: bash
./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py \
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt \
--bpe-model data/lang_bpe_500/bpe.model \
--method ctc-decoding \
--sample-rate 16000 \
/path/to/foo.wav \
/path/to/bar.wav
Export model using ``torch.jit.script()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
./pruned_transducer_stateless7_ctc_bs/export.py \
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 30 \
--avg 13 \
--jit 1
It will generate a file ``cpu_jit.pt`` in the given ``exp_dir``. You can later
load it by ``torch.jit.load("cpu_jit.pt")``.
Note ``cpu`` in the name ``cpu_jit.pt`` means the parameters when loaded into Python
are on CPU. You can use ``to("cuda")`` to move them to a CUDA device.
To use the generated files with ``./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py``:
.. code-block:: bash
./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py \
--nn-model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt \
/path/to/foo.wav \
/path/to/bar.wav
To test CTC branch using the generated files with ``./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py``:
.. code-block:: bash
./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py \
--model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt \
--bpe-model data/lang_bpe_500/bpe.model \
--method ctc-decoding \
--sample-rate 16000 \
/path/to/foo.wav \
/path/to/bar.wav
Download pretrained models
--------------------------
If you don't want to train from scratch, you can download the pretrained models
by visiting the following links:
- `<https://huggingface.co/yfyeung/icefall-asr-librispeech-pruned_transducer_stateless7_ctc_bs-2022-12-14>`_
See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
for the details of the above pretrained models

View File

@ -272,7 +272,7 @@ You will find the following files in that directory:
Usage example
~~~~~~~~~~~~~
You can use the following command to start the training using 8 GPUs:
You can use the following command to start the training using 4 GPUs:
.. code-block:: bash
@ -382,7 +382,7 @@ can run:
/path/to/bar.wav
Export model using ``torch.jit.script()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash

Binary file not shown.

View File

@ -331,10 +331,10 @@ $ tensorboard dev upload --logdir . --name <span class="s2">&quot;Aishell confor
<p>Note there is a URL in the above output, click it and you will see
the following screenshot:</p>
<blockquote>
<div><figure class="align-center" id="id2">
<div><figure class="align-center" id="id3">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/WE1DocDqRRCOSAgmGyClhg/"><img alt="TensorBoard screenshot" src="../../../_images/aishell-conformer-ctc-tensorboard-log.jpg" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">Fig. 2 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
<p><span class="caption-number">Fig. 2 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
</figcaption>
</figure>
</div></blockquote>
@ -748,6 +748,8 @@ Caution:
related to <span class="sb">`</span>model.forward<span class="sb">`</span> <span class="k">in</span> this file.
</pre></div>
</div>
<section id="id2">
<h3>HLG decoding<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt <span class="se">\</span>
@ -783,6 +785,7 @@ Caution:
<p>There is a Colab notebook showing you how to run a torch scripted model in C++.
Please see <a class="reference external" href="https://colab.research.google.com/drive/1Vh7RER7saTW01DtNbvr7CY7ovNZgmfWz?usp=sharing"><img alt="aishell asr conformer ctc torch script colab notebook" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
</section>
</section>
</section>

View File

@ -102,6 +102,7 @@
<li class="toctree-l2"><a class="reference internal" href="librispeech/conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>

View File

@ -53,6 +53,7 @@
<li class="toctree-l4 current"><a class="current reference internal" href="#">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -337,10 +338,10 @@ $ tensorboard dev upload --logdir . --description <span class="s2">&quot;Conform
<p>Note there is a URL in the above output, click it and you will see
the following screenshot:</p>
<blockquote>
<div><figure class="align-center" id="id2">
<div><figure class="align-center" id="id4">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-conformer-ctc-tensorboard-log.png" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">Fig. 4 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
<p><span class="caption-number">Fig. 4 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id4" title="Permalink to this image"></a></p>
</figcaption>
</figure>
</div></blockquote>
@ -931,6 +932,8 @@ Caution:
related to <span class="sb">`</span>model.forward<span class="sb">`</span> <span class="k">in</span> this file.
</pre></div>
</div>
<section id="id2">
<h3>CTC decoding<a class="headerlink" href="#id2" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ctc_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
@ -963,6 +966,9 @@ Caution:
<span class="n">YET</span> <span class="n">THESE</span> <span class="n">THOUGHTS</span> <span class="n">AFFECTED</span> <span class="n">HESTER</span> <span class="n">PRYNNE</span> <span class="n">LESS</span> <span class="n">WITH</span> <span class="n">HOPE</span> <span class="n">THAN</span> <span class="n">APPREHENSION</span>
</pre></div>
</div>
</section>
<section id="id3">
<h3>HLG decoding<a class="headerlink" href="#id3" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/hlg_decode <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
@ -996,6 +1002,9 @@ Caution:
<span class="n">YET</span> <span class="n">THESE</span> <span class="n">THOUGHTS</span> <span class="n">AFFECTED</span> <span class="n">HESTER</span> <span class="n">PRYNNE</span> <span class="n">LESS</span> <span class="n">WITH</span> <span class="n">HOPE</span> <span class="n">THAN</span> <span class="n">APPREHENSION</span>
</pre></div>
</div>
</section>
<section id="hlg-decoding-n-gram-lm-rescoring">
<h3>HLG decoding + n-gram LM rescoring<a class="headerlink" href="#hlg-decoding-n-gram-lm-rescoring" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/ngram_lm_rescore <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
@ -1035,6 +1044,9 @@ Caution:
<span class="n">YET</span> <span class="n">THESE</span> <span class="n">THOUGHTS</span> <span class="n">AFFECTED</span> <span class="n">HESTER</span> <span class="n">PRYNNE</span> <span class="n">LESS</span> <span class="n">WITH</span> <span class="n">HOPE</span> <span class="n">THAN</span> <span class="n">APPREHENSION</span>
</pre></div>
</div>
</section>
<section id="hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring">
<h3>HLG decoding + n-gram LM rescoring + attention decoder rescoring<a class="headerlink" href="#hlg-decoding-n-gram-lm-rescoring-attention-decoder-rescoring" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./bin/attention_rescore <span class="se">\</span>
--use_gpu <span class="nb">true</span> <span class="se">\</span>
--nn_model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt <span class="se">\</span>
@ -1084,6 +1096,7 @@ Caution:
<p>There is a Colab notebook showing you how to run a torch scripted model in C++.
Please see <a class="reference external" href="https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing"><img alt="librispeech asr conformer ctc torch script colab notebook" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
</section>
</section>
</section>

View File

@ -53,6 +53,7 @@
<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -102,6 +103,7 @@
<li class="toctree-l1"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l1"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l1"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l1"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</div>
</section>

View File

@ -53,6 +53,7 @@
<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4 current"><a class="current reference internal" href="#">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -578,6 +579,14 @@ can run:</p>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">25</span> <span class="se">\</span>
--avg <span class="m">3</span> <span class="se">\</span>
--jit <span class="m">1</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
load it by <code class="docutils literal notranslate"><span class="pre">torch.jit.load(&quot;cpu_jit.pt&quot;)</span></code>.</p>
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python

View File

@ -53,6 +53,7 @@
<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>

View File

@ -0,0 +1,554 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Zipformer CTC Blank Skip &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../../_static/jquery.js"></script>
<script src="../../../_static/underscore.js"></script>
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../../_static/doctools.js"></script>
<script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TIMIT" href="../timit/index.html" />
<link rel="prev" title="Zipformer MMI" href="zipformer_mmi.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4 current"><a class="current reference internal" href="#">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">Zipformer CTC Blank Skip</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="zipformer-ctc-blank-skip">
<h1>Zipformer CTC Blank Skip<a class="headerlink" href="#zipformer-ctc-blank-skip" title="Permalink to this heading"></a></h1>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Please scroll down to the bottom of this page to find download links
for pretrained models if you dont want to train a model from scratch.</p>
</div>
<p>This tutorial shows you how to train a Zipformer model based on the guidance from
a co-trained CTC model using <a class="reference external" href="https://arxiv.org/pdf/2210.16481.pdf">blank skip method</a>
with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>We use both CTC and RNN-T loss to train. During the forward pass, the encoder output
is first used to calculate the CTC posterior probability; then for each output frame,
if its blank posterior is bigger than some threshold, it will be simply discarded
from the encoder output. To prevent information loss, we also put a convolution module
similar to the one used in conformer (referred to as “LConv”) before the frame reduction.</p>
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
All you need to do is to run it.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>We encourage you to read <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
</div>
<p>The data preparation contains several stages. You can use the following two
options:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">--stage</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--stop-stage</span></code></p></li>
</ul>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you have pre-downloaded the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>
dataset and the <a class="reference external" href="http://www.openslr.org/17/">musan</a> dataset, say,
they are saved in <code class="docutils literal notranslate"><span class="pre">/tmp/LibriSpeech</span></code> and <code class="docutils literal notranslate"><span class="pre">/tmp/musan</span></code>, you can modify
the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></code> variable in <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> to point to <code class="docutils literal notranslate"><span class="pre">/tmp</span></code> so that
<code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> wont re-download them.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>All generated files by <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>, e.g., features, lexicon, etc,
are saved in <code class="docutils literal notranslate"><span class="pre">./data</span></code> directory.</p>
</div>
<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>To get the latest news of <a class="reference external" href="https://github.com/k2-fsa">next-gen Kaldi</a>, please subscribe
the following YouTube channel by <a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">Nadira Povey</a>:</p>
<blockquote>
<div><p><a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw</a></p>
</div></blockquote>
</div>
<div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://www.youtube.com/embed/ofEIoJL-mGM" style="border: 0; height: 345px; width: 560px">
</iframe></div></section>
<section id="training">
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<p>For stability, it doesn`t use blank skip method until model warm-up.</p>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/train.py --help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
The following options are used quite often:</p>
<blockquote>
<div><ul>
<li><p><code class="docutils literal notranslate"><span class="pre">--full-libri</span></code></p>
<p>If its True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
<code class="docutils literal notranslate"><span class="pre">train-clean-100</span></code>, which has 100 hours of training data.</p>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>The training set is perturbed by speed with two factors: 0.9 and 1.1.
If <code class="docutils literal notranslate"><span class="pre">--full-libri</span></code> is True, each epoch actually processes
<code class="docutils literal notranslate"><span class="pre">3x960</span> <span class="pre">==</span> <span class="pre">2880</span></code> hours of data.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--num-epochs</span></code></p>
<p>It is the number of epochs to train. For instance,
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/train.py</span> <span class="pre">--num-epochs</span> <span class="pre">30</span></code> trains for 30 epochs
and generates <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, <code class="docutils literal notranslate"><span class="pre">epoch-30.pt</span></code>
in the folder <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/exp</span></code>.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--start-epoch</span></code></p>
<p>Its used to resume training.
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/train.py</span> <span class="pre">--start-epoch</span> <span class="pre">10</span></code> loads the
checkpoint <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/exp/epoch-9.pt</span></code> and starts
training from epoch 10, based on the state from epoch 9.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--world-size</span></code></p>
<p>It is used for multi-GPU single-machine DDP training.</p>
<blockquote>
<div><ul class="simple">
<li><ol class="loweralpha simple">
<li><p>If it is 1, then no DDP training is used.</p></li>
</ol>
</li>
<li><ol class="loweralpha simple" start="2">
<li><p>If it is 2, then GPU 0 and GPU 1 are used for DDP training.</p></li>
</ol>
</li>
</ul>
</div></blockquote>
<p>The following shows some use cases with it.</p>
<blockquote>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless7_ctc_bs/train.py --world-size <span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
</div></blockquote>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--max-duration</span></code></p>
<p>It specifies the number of seconds over all utterances in a
batch, before <strong>padding</strong>.
If you encounter CUDA OOM, please reduce it.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Due to padding, the number of seconds of all utterances in a
batch will usually be larger than <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code>.</p>
<p>A larger value for <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.</p>
</div>
</li>
</ul>
</div></blockquote>
</section>
<section id="pre-configured-options">
<h3>Pre-configured options<a class="headerlink" href="#pre-configured-options" title="Permalink to this heading"></a></h3>
<p>There are some training options, e.g., weight decay,
number of warmup steps, results dir, etc,
that are not passed from the commandline.
They are pre-configured by the function <code class="docutils literal notranslate"><span class="pre">get_params()</span></code> in
<a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc_bs/train.py">pruned_transducer_stateless7_ctc_bs/train.py</a></p>
<p>You dont need to change these pre-configured parameters. If you really need to change
them, please modify <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/train.py</span></code> directly.</p>
</section>
<section id="training-logs">
<h3>Training logs<a class="headerlink" href="#training-logs" title="Permalink to this heading"></a></h3>
<p>Training logs and checkpoints are saved in <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/exp</span></code>.
You will find the following files in that directory:</p>
<blockquote>
<div><ul>
<li><p><code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …</p>
<p>These are checkpoint files saved at the end of each epoch, containing model
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-epoch <span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">checkpoint-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">checkpoint-438000.pt</span></code>, …</p>
<p>These are checkpoint files saved every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches,
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless7_ctc_bs/train.py --start-batch <span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">tensorboard/</span></code></p>
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;Zipformer MMI training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
<p>It will print something like below:</p>
<blockquote>
<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">TensorFlow</span> <span class="n">installation</span> <span class="ow">not</span> <span class="n">found</span> <span class="o">-</span> <span class="n">running</span> <span class="k">with</span> <span class="n">reduced</span> <span class="n">feature</span> <span class="nb">set</span><span class="o">.</span>
<span class="n">Upload</span> <span class="n">started</span> <span class="ow">and</span> <span class="n">will</span> <span class="k">continue</span> <span class="n">reading</span> <span class="nb">any</span> <span class="n">new</span> <span class="n">data</span> <span class="k">as</span> <span class="n">it</span><span class="s1">&#39;s added to the logdir.</span>
<span class="n">To</span> <span class="n">stop</span> <span class="n">uploading</span><span class="p">,</span> <span class="n">press</span> <span class="n">Ctrl</span><span class="o">-</span><span class="n">C</span><span class="o">.</span>
<span class="n">New</span> <span class="n">experiment</span> <span class="n">created</span><span class="o">.</span> <span class="n">View</span> <span class="n">your</span> <span class="n">TensorBoard</span> <span class="n">at</span><span class="p">:</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">tensorboard</span><span class="o">.</span><span class="n">dev</span><span class="o">/</span><span class="n">experiment</span><span class="o">/</span><span class="n">xyOZUKpEQm62HBIlUD4uPA</span><span class="o">/</span>
</pre></div>
</div>
</div></blockquote>
<p>Note there is a URL in the above output. Click it and you will see
tensorboard.</p>
</li>
</ul>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
<p>It will print the following message:</p>
<blockquote>
<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Serving</span> <span class="n">TensorBoard</span> <span class="n">on</span> <span class="n">localhost</span><span class="p">;</span> <span class="n">to</span> <span class="n">expose</span> <span class="n">to</span> <span class="n">the</span> <span class="n">network</span><span class="p">,</span> <span class="n">use</span> <span class="n">a</span> <span class="n">proxy</span> <span class="ow">or</span> <span class="k">pass</span> <span class="o">--</span><span class="n">bind_all</span>
<span class="n">TensorBoard</span> <span class="mf">2.8.0</span> <span class="n">at</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">localhost</span><span class="p">:</span><span class="mi">6008</span><span class="o">/</span> <span class="p">(</span><span class="n">Press</span> <span class="n">CTRL</span><span class="o">+</span><span class="n">C</span> <span class="n">to</span> <span class="n">quit</span><span class="p">)</span>
</pre></div>
</div>
</div></blockquote>
<p>Now start your browser and go to <a class="reference external" href="http://localhost:6008">http://localhost:6008</a> to view the tensorboard
logs.</p>
</div>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">log/log-train-xxxx</span></code></p>
<p>It is the detailed training log in text format, same as the one
you saw printed to the console during training.</p>
</li>
</ul>
</div></blockquote>
</section>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless7_ctc_bs/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--use-fp16 <span class="m">1</span>
</pre></div>
</div>
</section>
</section>
<section id="decoding">
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>There are two kinds of checkpoints:</p>
<blockquote>
<div><ul class="simple">
<li><p>(1) <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, which are saved at the end
of each epoch. You can pass <code class="docutils literal notranslate"><span class="pre">--epoch</span></code> to
<code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py</span></code> to use them.</p></li>
<li><p>(2) <code class="docutils literal notranslate"><span class="pre">checkpoints-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-438000.pt</span></code>, …, which are saved
every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches. You can pass <code class="docutils literal notranslate"><span class="pre">--iter</span></code> to
<code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py</span></code> to use them.</p></li>
</ul>
<p>We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py --help
</pre></div>
</div>
<p>shows the options for decoding.</p>
<p>The following shows the example using <code class="docutils literal notranslate"><span class="pre">epoch-*.pt</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
</pre></div>
</div>
<p>To test CTC branch, you can use the following command:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> ctc-decoding 1best<span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
</pre></div>
</div>
</section>
<section id="export-models">
<h2>Export models<a class="headerlink" href="#export-models" title="Permalink to this heading"></a></h2>
<p><a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7_ctc_bs/export.py">pruned_transducer_stateless7_ctc_bs/export.py</a> supports exporting checkpoints from <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/exp</span></code> in the following ways.</p>
<section id="export-model-state-dict">
<h3>Export <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code><a class="headerlink" href="#export-model-state-dict" title="Permalink to this heading"></a></h3>
<p>Checkpoints saved by <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/train.py</span></code> also include
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--jit <span class="m">0</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt</span></code>.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless7_ctc_bs/exp
ln -s pretrained epoch-9999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/ctc_guild_decode_bs.py</span></code>.</p>
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
</pre></div>
</div>
<p>To test CTC branch using the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/pretrained_ctc.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless7_ctc_bs/exp/pretrained.pt <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--method ctc-decoding <span class="se">\</span>
--sample-rate <span class="m">16000</span> <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless7_ctc_bs/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">30</span> <span class="se">\</span>
--avg <span class="m">13</span> <span class="se">\</span>
--jit <span class="m">1</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
load it by <code class="docutils literal notranslate"><span class="pre">torch.jit.load(&quot;cpu_jit.pt&quot;)</span></code>.</p>
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to(&quot;cuda&quot;)</span></code> to move them to a CUDA device.</p>
<p>To use the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained.py <span class="se">\</span>
--nn-model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
</pre></div>
</div>
<p>To test CTC branch using the generated files with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py</span></code>:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless7_ctc_bs/jit_pretrained_ctc.py <span class="se">\</span>
--model-filename ./pruned_transducer_stateless7_ctc_bs/exp/cpu_jit.pt <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--method ctc-decoding <span class="se">\</span>
--sample-rate <span class="m">16000</span> <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
</pre></div>
</div>
</section>
</section>
<section id="download-pretrained-models">
<h2>Download pretrained models<a class="headerlink" href="#download-pretrained-models" title="Permalink to this heading"></a></h2>
<p>If you dont want to train from scratch, you can download the pretrained models
by visiting the following links:</p>
<blockquote>
<div><ul class="simple">
<li><p><a class="reference external" href="https://huggingface.co/yfyeung/icefall-asr-librispeech-pruned_transducer_stateless7_ctc_bs-2022-12-14">https://huggingface.co/yfyeung/icefall-asr-librispeech-pruned_transducer_stateless7_ctc_bs-2022-12-14</a></p></li>
</ul>
<p>See <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md">https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md</a>
for the details of the above pretrained models</p>
</div></blockquote>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="zipformer_mmi.html" class="btn btn-neutral float-left" title="Zipformer MMI" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -20,7 +20,7 @@
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TIMIT" href="../timit/index.html" />
<link rel="next" title="Zipformer CTC Blank Skip" href="zipformer_ctc_blankskip.html" />
<link rel="prev" title="Pruned transducer statelessX" href="pruned_transducer_stateless.html" />
</head>
@ -53,6 +53,7 @@
<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4 current"><a class="current reference internal" href="#">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_ctc_blankskip.html">Zipformer CTC Blank Skip</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
@ -357,7 +358,7 @@ you saw printed to the console during training.</p>
</section>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 8 GPUs:</p>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./zipformer_mmi/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
@ -496,7 +497,7 @@ for the details of the above pretrained models</p>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="pruned_transducer_stateless.html" class="btn btn-neutral float-left" title="Pruned transducer statelessX" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
<a href="zipformer_ctc_blankskip.html" class="btn btn-neutral float-right" title="Zipformer CTC Blank Skip" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>

View File

@ -21,7 +21,7 @@
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
<link rel="prev" title="Zipformer MMI" href="../librispeech/zipformer_mmi.html" />
<link rel="prev" title="Zipformer CTC Blank Skip" href="../librispeech/zipformer_ctc_blankskip.html" />
</head>
<body class="wy-body-for-nav">
@ -106,7 +106,7 @@
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../librispeech/zipformer_mmi.html" class="btn btn-neutral float-left" title="Zipformer MMI" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../librispeech/zipformer_ctc_blankskip.html" class="btn btn-neutral float-left" title="Zipformer CTC Blank Skip" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>

File diff suppressed because one or more lines are too long