deploy: ad475ec10dec864373099ba541cad5f743a4726b

This commit is contained in:
csukuangfj 2022-12-15 11:16:21 +00:00
parent d801f223b2
commit f4a927e35b
64 changed files with 4109 additions and 546 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 554 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 547 KiB

View File

@ -22,6 +22,14 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
installation/index installation/index
model-export/index model-export/index
.. toctree::
:maxdepth: 3
recipes/index recipes/index
.. toctree::
:maxdepth: 2
contributing/index contributing/index
huggingface/index huggingface/index

View File

@ -0,0 +1,10 @@
Non Streaming ASR
=================
.. toctree::
:maxdepth: 2
aishell/index
librispeech/index
timit/index
yesno/index

View File

@ -6,5 +6,6 @@ LibriSpeech
tdnn_lstm_ctc tdnn_lstm_ctc
conformer_ctc conformer_ctc
pruned_transducer_stateless
lstm_pruned_stateless_transducer lstm_pruned_stateless_transducer
zipformer_mmi zipformer_mmi

View File

@ -0,0 +1,545 @@
Pruned transducer statelessX
============================
This tutorial shows you how to run a conformer transducer model
with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
.. Note::
The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
`pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
`pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
`pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_,
We will take pruned_transducer_stateless4 as an example in this tutorial.
.. HINT::
We assume you have read the page :ref:`install icefall` and have setup
the environment for ``icefall``.
.. HINT::
We recommend you to use a GPU or several GPUs to run this recipe.
.. hint::
Please scroll down to the bottom of this page to find download links
for pretrained models if you don't want to train a model from scratch.
We use pruned RNN-T to compute the loss.
.. note::
You can find the paper about pruned RNN-T at the following address:
`<https://arxiv.org/abs/2206.13236>`_
The transducer model consists of 3 parts:
- Encoder, a.k.a, the transcription network. We use a Conformer model (the reworked version by Daniel Povey)
- Decoder, a.k.a, the prediction network. We use a stateless model consisting of
``nn.Embedding`` and ``nn.Conv1d``
- Joiner, a.k.a, the joint network.
.. caution::
Contrary to the conventional RNN-T models, we use a stateless decoder.
That is, it has no recurrent connections.
Data preparation
----------------
.. hint::
The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to ``Training`` directly.
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./prepare.sh
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
All you need to do is to run it.
The data preparation contains several stages, you can use the following two
options:
- ``--stage``
- ``--stop-stage``
to control which stage(s) should be run. By default, all stages are executed.
For example,
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./prepare.sh --stage 0 --stop-stage 0
means to run only stage 0.
To run stage 2 to stage 5, use:
.. code-block:: bash
$ ./prepare.sh --stage 2 --stop-stage 5
.. HINT::
If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
``./prepare.sh`` won't re-download them.
.. NOTE::
All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
are saved in ``./data`` directory.
We provide the following YouTube video showing how to run ``./prepare.sh``.
.. note::
To get the latest news of `next-gen Kaldi <https://github.com/k2-fsa>`_, please subscribe
the following YouTube channel by `Nadira Povey <https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_:
`<https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_
.. youtube:: ofEIoJL-mGM
Training
--------
Configurable options
~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --help
shows you the training options that can be passed from the commandline.
The following options are used quite often:
- ``--exp-dir``
The directory to save checkpoints, training logs and tensorboard.
- ``--full-libri``
If it's True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
``train-clean-100``, which has 100 hours of training data.
.. CAUTION::
The training set is perturbed by speed with two factors: 0.9 and 1.1.
If ``--full-libri`` is True, each epoch actually processes
``3x960 == 2880`` hours of data.
- ``--num-epochs``
It is the number of epochs to train. For instance,
``./pruned_transducer_stateless4/train.py --num-epochs 30`` trains for 30 epochs
and generates ``epoch-1.pt``, ``epoch-2.pt``, ..., ``epoch-30.pt``
in the folder ``./pruned_transducer_stateless4/exp``.
- ``--start-epoch``
It's used to resume training.
``./pruned_transducer_stateless4/train.py --start-epoch 10`` loads the
checkpoint ``./pruned_transducer_stateless4/exp/epoch-9.pt`` and starts
training from epoch 10, based on the state from epoch 9.
- ``--world-size``
It is used for multi-GPU single-machine DDP training.
- (a) If it is 1, then no DDP training is used.
- (b) If it is 2, then GPU 0 and GPU 1 are used for DDP training.
The following shows some use cases with it.
**Use case 1**: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ export CUDA_VISIBLE_DEVICES="0,2"
$ ./pruned_transducer_stateless4/train.py --world-size 2
**Use case 2**: You have 4 GPUs and you want to use all of them
for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --world-size 4
**Use case 3**: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ export CUDA_VISIBLE_DEVICES="3"
$ ./pruned_transducer_stateless4/train.py --world-size 1
.. caution::
Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.
- ``--max-duration``
It specifies the number of seconds over all utterances in a
batch, before **padding**.
If you encounter CUDA OOM, please reduce it.
.. HINT::
Due to padding, the number of seconds of all utterances in a
batch will usually be larger than ``--max-duration``.
A larger value for ``--max-duration`` may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.
- ``--use-fp16``
If it is True, the model will train with half precision, from our experiment
results, by using half precision you can train with two times larger ``--max-duration``
so as to get almost 2X speed up.
Pre-configured options
~~~~~~~~~~~~~~~~~~~~~~
There are some training options, e.g., number of encoder layers,
encoder dimension, decoder dimension, number of warmup steps etc,
that are not passed from the commandline.
They are pre-configured by the function ``get_params()`` in
`pruned_transducer_stateless4/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/train.py>`_
You don't need to change these pre-configured parameters. If you really need to change
them, please modify ``./pruned_transducer_stateless4/train.py`` directly.
.. NOTE::
The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`_ are a little different from
other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5.
Training logs
~~~~~~~~~~~~~
Training logs and checkpoints are saved in ``--exp-dir`` (e.g. ``pruned_transducer_stateless4/exp``.
You will find the following files in that directory:
- ``epoch-1.pt``, ``epoch-2.pt``, ...
These are checkpoint files saved at the end of each epoch, containing model
``state_dict`` and optimizer ``state_dict``.
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
.. code-block:: bash
$ ./pruned_transducer_stateless4/train.py --start-epoch 11
- ``checkpoint-436000.pt``, ``checkpoint-438000.pt``, ...
These are checkpoint files saved every ``--save-every-n`` batches,
containing model ``state_dict`` and optimizer ``state_dict``.
To resume training from some checkpoint, say ``checkpoint-436000``, you can use:
.. code-block:: bash
$ ./pruned_transducer_stateless4/train.py --start-batch 436000
- ``tensorboard/``
This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
.. code-block:: bash
$ cd pruned_transducer_stateless4/exp/tensorboard
$ tensorboard dev upload --logdir . --description "pruned transducer training for LibriSpeech with icefall"
It will print something like below:
.. code-block::
TensorFlow installation not found - running with reduced feature set.
Upload started and will continue reading any new data as it's added to the logdir.
To stop uploading, press Ctrl-C.
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/QOGSPBgsR8KzcRMmie9JGw/
[2022-11-20T15:50:50] Started scanning logdir.
Uploading 4468 scalars...
[2022-11-20T15:53:02] Total uploaded: 210171 scalars, 0 tensors, 0 binary objects
Listening for new data in logdir...
Note there is a URL in the above output. Click it and you will see
the following screenshot:
.. figure:: images/librispeech-pruned-transducer-tensorboard-log.jpg
:width: 600
:alt: TensorBoard screenshot
:align: center
:target: https://tensorboard.dev/experiment/QOGSPBgsR8KzcRMmie9JGw/
TensorBoard screenshot.
.. hint::
If you don't have access to google, you can use the following command
to view the tensorboard log locally:
.. code-block:: bash
cd pruned_transducer_stateless4/exp/tensorboard
tensorboard --logdir . --port 6008
It will print the following message:
.. code-block::
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.8.0 at http://localhost:6008/ (Press CTRL+C to quit)
Now start your browser and go to `<http://localhost:6008>`_ to view the tensorboard
logs.
- ``log/log-train-xxxx``
It is the detailed training log in text format, same as the one
you saw printed to the console during training.
Usage example
~~~~~~~~~~~~~
You can use the following command to start the training using 6 GPUs:
.. code-block:: bash
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5"
./pruned_transducer_stateless4/train.py \
--world-size 6 \
--num-epochs 30 \
--start-epoch 1 \
--exp-dir pruned_transducer_stateless4/exp \
--full-libri 1 \
--max-duration 300
Decoding
--------
The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.
.. hint::
There are two kinds of checkpoints:
- (1) ``epoch-1.pt``, ``epoch-2.pt``, ..., which are saved at the end
of each epoch. You can pass ``--epoch`` to
``pruned_transducer_stateless4/decode.py`` to use them.
- (2) ``checkpoints-436000.pt``, ``epoch-438000.pt``, ..., which are saved
every ``--save-every-n`` batches. You can pass ``--iter`` to
``pruned_transducer_stateless4/decode.py`` to use them.
We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/decode.py --help
shows the options for decoding.
The following shows two examples (for two types of checkpoints):
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
for epoch in 25 20; do
for avg in 7 5 3 1; do
./pruned_transducer_stateless4/decode.py \
--epoch $epoch \
--avg $avg \
--exp-dir pruned_transducer_stateless4/exp \
--max-duration 600 \
--decoding-method $m
done
done
done
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
for iter in 474000; do
for avg in 8 10 12 14 16 18; do
./pruned_transducer_stateless4/decode.py \
--iter $iter \
--avg $avg \
--exp-dir pruned_transducer_stateless4/exp \
--max-duration 600 \
--decoding-method $m
done
done
done
.. Note::
Supporting decoding methods are as follows:
- ``greedy_search`` : It takes the symbol with largest posterior probability
of each frame as the decoding result.
- ``beam_search`` : It implements Algorithm 1 in https://arxiv.org/pdf/1211.3711.pdf and
`espnet/nets/beam_search_transducer.py <https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search_transducer.py#L247>`_
is used as a reference. Basicly, it keeps topk states for each frame, and expands the kept states with their own contexts to
next frame.
- ``modified_beam_search`` : It implements the same algorithm as ``beam_search`` above, but it
runs in batch mode with ``--max-sym-per-frame=1`` being hardcoded.
- ``fast_beam_search`` : It implements graph composition between the output ``log_probs`` and
given ``FSAs``. It is hard to describe the details in several lines of texts, you can read
our paper in https://arxiv.org/pdf/2211.00484.pdf or our `rnnt decode code in k2 <https://github.com/k2-fsa/k2/blob/master/k2/csrc/rnnt_decode.h>`_. ``fast_beam_search`` can decode with ``FSAs`` on GPU efficiently.
- ``fast_beam_search_LG`` : The same as ``fast_beam_search`` above, ``fast_beam_search`` uses
an trivial graph that has only one state, while ``fast_beam_search_LG`` uses an LG graph
(with N-gram LM).
- ``fast_beam_search_nbest`` : It produces the decoding results as follows:
- (1) Use ``fast_beam_search`` to get a lattice
- (2) Select ``num_paths`` paths from the lattice using ``k2.random_paths()``
- (3) Unique the selected paths
- (4) Intersect the selected paths with the lattice and compute the
shortest path from the intersection result
- (5) The path with the largest score is used as the decoding output.
- ``fast_beam_search_nbest_LG`` : It implements same logic as ``fast_beam_search_nbest``, the
only difference is that it uses ``fast_beam_search_LG`` to generate the lattice.
Export Model
------------
`pruned_transducer_stateless4/export.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/export.py>`_ supports exporting checkpoints from ``pruned_transducer_stateless4/exp`` in the following ways.
Export ``model.state_dict()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checkpoints saved by ``pruned_transducer_stateless4/train.py`` also include
``optimizer.state_dict()``. It is useful for resuming training. But after training,
we are interested only in ``model.state_dict()``. You can use the following
command to extract ``model.state_dict()``.
.. code-block:: bash
# Assume that --epoch 25 --avg 3 produces the smallest WER
# (You can get such information after running ./pruned_transducer_stateless4/decode.py)
epoch=25
avg=3
./pruned_transducer_stateless4/export.py \
--exp-dir ./pruned_transducer_stateless4/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch $epoch \
--avg $avg
It will generate a file ``./pruned_transducer_stateless4/exp/pretrained.pt``.
.. hint::
To use the generated ``pretrained.pt`` for ``pruned_transducer_stateless4/decode.py``,
you can run:
.. code-block:: bash
cd pruned_transducer_stateless4/exp
ln -s pretrained.pt epoch-999.pt
And then pass ``--epoch 999 --avg 1 --use-averaged-model 0`` to
``./pruned_transducer_stateless4/decode.py``.
To use the exported model with ``./pruned_transducer_stateless4/pretrained.py``, you
can run:
.. code-block:: bash
./pruned_transducer_stateless4/pretrained.py \
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt \
--bpe-model ./data/lang_bpe_500/bpe.model \
--method greedy_search \
/path/to/foo.wav \
/path/to/bar.wav
Export model using ``torch.jit.script()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
./pruned_transducer_stateless4/export.py \
--exp-dir ./pruned_transducer_stateless4/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 25 \
--avg 3 \
--jit 1
It will generate a file ``cpu_jit.pt`` in the given ``exp_dir``. You can later
load it by ``torch.jit.load("cpu_jit.pt")``.
Note ``cpu`` in the name ``cpu_jit.pt`` means the parameters when loaded into Python
are on CPU. You can use ``to("cuda")`` to move them to a CUDA device.
.. NOTE::
You will need this ``cpu_jit.pt`` when deploying with Sherpa framework.
Download pretrained models
--------------------------
If you don't want to train from scratch, you can download the pretrained models
by visiting the following links:
- `pruned_transducer_stateless <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12>`_
- `pruned_transducer_stateless2 <https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29>`_
- `pruned_transducer_stateless4 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless4-2022-06-03>`_
- `pruned_transducer_stateless5 <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07>`_
See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
for the details of the above pretrained models
Deploy with Sherpa
------------------
Please see `<https://k2-fsa.github.io/sherpa/python/offline_asr/conformer/librispeech.html#>`_
for how to deploy the models in ``sherpa``.

View File

@ -0,0 +1,12 @@
Streaming ASR
=============
.. toctree::
:maxdepth: 1
introduction
.. toctree::
:maxdepth: 2
librispeech/index

View File

@ -0,0 +1,52 @@
Introduction
============
This page shows you how we implement streaming **X-former transducer** models for ASR.
.. HINT::
X-former transducer here means the encoder of the transducer model uses Multi-Head Attention,
like `Conformer <https://arxiv.org/pdf/2005.08100.pdf>`_, `EmFormer <https://arxiv.org/pdf/2010.10759.pdf>`_ etc.
Currently we have implemented two types of streaming models, one uses Conformer as encoder, the other uses Emformer as encoder.
Streaming Conformer
-------------------
The main idea of training a streaming model is to make the model see limited contexts
in training time, we can achieve this by applying a mask to the output of self-attention.
In icefall, we implement the streaming conformer the way just like what `WeNet <https://arxiv.org/pdf/2012.05481.pdf>`_ did.
.. NOTE::
The conformer-transducer recipes in LibriSpeech datasets, like, `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
`pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
`pruned_transducer_stateless3 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3>`_,
`pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
`pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_
all support streaming.
.. NOTE::
Training a streaming conformer model in ``icefall`` is almost the same as training a
non-streaming model, all you need to do is passing several extra arguments.
See :doc:`Pruned transducer statelessX <librispeech/pruned_transducer_stateless>` for more details.
.. HINT::
If you want to adapt a non-streaming conformer model to be streaming, please refer
to `this pull request <https://github.com/k2-fsa/icefall/pull/454>`_.
Streaming Emformer
------------------
The Emformer model proposed `here <https://arxiv.org/pdf/2010.10759.pdf>`_ uses more
complicated techniques. It has a memory bank component to memorize history information,
what' more, it also introduces right context in training time by hard-copying part of
the input features.
We have three variants of Emformer models in ``icefall``.
- ``pruned_stateless_emformer_rnnt2`` using Emformer from torchaudio, see `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_stateless_emformer_rnnt2>`_.
- ``conv_emformer_transducer_stateless`` using ConvEmformer implemented by ourself. Different from the Emformer in torchaudio,
ConvEmformer has a convolution in each layer and uses the mechanisms in our reworked conformer model.
See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless>`_.
- ``conv_emformer_transducer_stateless2`` using ConvEmformer implemented by ourself. The only difference from the above one is that
it uses a simplified memory bank. See `LibriSpeech recipe <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2>`_.

View File

@ -0,0 +1,9 @@
LibriSpeech
===========
.. toctree::
:maxdepth: 1
pruned_transducer_stateless
lstm_pruned_stateless_transducer

View File

@ -0,0 +1,735 @@
Pruned transducer statelessX
============================
This tutorial shows you how to run a **streaming** conformer transducer model
with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
.. Note::
The tutorial is suitable for `pruned_transducer_stateless <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless>`_,
`pruned_transducer_stateless2 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2>`_,
`pruned_transducer_stateless4 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4>`_,
`pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5>`_,
We will take pruned_transducer_stateless4 as an example in this tutorial.
.. HINT::
We assume you have read the page :ref:`install icefall` and have setup
the environment for ``icefall``.
.. HINT::
We recommend you to use a GPU or several GPUs to run this recipe.
.. hint::
Please scroll down to the bottom of this page to find download links
for pretrained models if you don't want to train a model from scratch.
We use pruned RNN-T to compute the loss.
.. note::
You can find the paper about pruned RNN-T at the following address:
`<https://arxiv.org/abs/2206.13236>`_
The transducer model consists of 3 parts:
- Encoder, a.k.a, the transcription network. We use a Conformer model (the reworked version by Daniel Povey)
- Decoder, a.k.a, the prediction network. We use a stateless model consisting of
``nn.Embedding`` and ``nn.Conv1d``
- Joiner, a.k.a, the joint network.
.. caution::
Contrary to the conventional RNN-T models, we use a stateless decoder.
That is, it has no recurrent connections.
Data preparation
----------------
.. hint::
The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to ``Training`` directly.
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./prepare.sh
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
All you need to do is to run it.
The data preparation contains several stages, you can use the following two
options:
- ``--stage``
- ``--stop-stage``
to control which stage(s) should be run. By default, all stages are executed.
For example,
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./prepare.sh --stage 0 --stop-stage 0
means to run only stage 0.
To run stage 2 to stage 5, use:
.. code-block:: bash
$ ./prepare.sh --stage 2 --stop-stage 5
.. HINT::
If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
``./prepare.sh`` won't re-download them.
.. NOTE::
All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
are saved in ``./data`` directory.
We provide the following YouTube video showing how to run ``./prepare.sh``.
.. note::
To get the latest news of `next-gen Kaldi <https://github.com/k2-fsa>`_, please subscribe
the following YouTube channel by `Nadira Povey <https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_:
`<https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw>`_
.. youtube:: ofEIoJL-mGM
Training
--------
.. NOTE::
We put the streaming and non-streaming model in one recipe, to train a streaming model you only
need to add **4** extra options comparing with training a non-streaming model. These options are
``--dynamic-chunk-training``, ``--num-left-chunks``, ``--causal-convolution``, ``--short-chunk-size``.
You can see the configurable options below for their meanings or read https://arxiv.org/pdf/2012.05481.pdf for more details.
Configurable options
~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --help
shows you the training options that can be passed from the commandline.
The following options are used quite often:
- ``--exp-dir``
The directory to save checkpoints, training logs and tensorboard.
- ``--full-libri``
If it's True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
``train-clean-100``, which has 100 hours of training data.
.. CAUTION::
The training set is perturbed by speed with two factors: 0.9 and 1.1.
If ``--full-libri`` is True, each epoch actually processes
``3x960 == 2880`` hours of data.
- ``--num-epochs``
It is the number of epochs to train. For instance,
``./pruned_transducer_stateless4/train.py --num-epochs 30`` trains for 30 epochs
and generates ``epoch-1.pt``, ``epoch-2.pt``, ..., ``epoch-30.pt``
in the folder ``./pruned_transducer_stateless4/exp``.
- ``--start-epoch``
It's used to resume training.
``./pruned_transducer_stateless4/train.py --start-epoch 10`` loads the
checkpoint ``./pruned_transducer_stateless4/exp/epoch-9.pt`` and starts
training from epoch 10, based on the state from epoch 9.
- ``--world-size``
It is used for multi-GPU single-machine DDP training.
- (a) If it is 1, then no DDP training is used.
- (b) If it is 2, then GPU 0 and GPU 1 are used for DDP training.
The following shows some use cases with it.
**Use case 1**: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ export CUDA_VISIBLE_DEVICES="0,2"
$ ./pruned_transducer_stateless4/train.py --world-size 2
**Use case 2**: You have 4 GPUs and you want to use all of them
for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --world-size 4
**Use case 3**: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ export CUDA_VISIBLE_DEVICES="3"
$ ./pruned_transducer_stateless4/train.py --world-size 1
.. caution::
Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.
- ``--max-duration``
It specifies the number of seconds over all utterances in a
batch, before **padding**.
If you encounter CUDA OOM, please reduce it.
.. HINT::
Due to padding, the number of seconds of all utterances in a
batch will usually be larger than ``--max-duration``.
A larger value for ``--max-duration`` may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.
- ``--use-fp16``
If it is True, the model will train with half precision, from our experiment
results, by using half precision you can train with two times larger ``--max-duration``
so as to get almost 2X speed up.
- ``--dynamic-chunk-training``
The flag that indicates whether to train a streaming model or not, it
**MUST** be True if you want to train a streaming model.
- ``--short-chunk-size``
When training a streaming attention model with chunk masking, the chunk size
would be either max sequence length of current batch or uniformly sampled from
(1, short_chunk_size). The default value is 25, you don't have to change it most of the time.
- ``--num-left-chunks``
It indicates how many left context (in chunks) that can be seen when calculating attention.
The default value is 4, you don't have to change it most of the time.
- ``--causal-convolution``
Whether to use causal convolution in conformer encoder layer, this requires
to be True when training a streaming model.
Pre-configured options
~~~~~~~~~~~~~~~~~~~~~~
There are some training options, e.g., number of encoder layers,
encoder dimension, decoder dimension, number of warmup steps etc,
that are not passed from the commandline.
They are pre-configured by the function ``get_params()`` in
`pruned_transducer_stateless4/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/train.py>`_
You don't need to change these pre-configured parameters. If you really need to change
them, please modify ``./pruned_transducer_stateless4/train.py`` directly.
.. NOTE::
The options for `pruned_transducer_stateless5 <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py>`_ are a little different from
other recipes. It allows you to configure ``--num-encoder-layers``, ``--dim-feedforward``, ``--nhead``, ``--encoder-dim``, ``--decoder-dim``, ``--joiner-dim`` from commandline, so that you can train models with different size with pruned_transducer_stateless5.
Training logs
~~~~~~~~~~~~~
Training logs and checkpoints are saved in ``--exp-dir`` (e.g. ``pruned_transducer_stateless4/exp``.
You will find the following files in that directory:
- ``epoch-1.pt``, ``epoch-2.pt``, ...
These are checkpoint files saved at the end of each epoch, containing model
``state_dict`` and optimizer ``state_dict``.
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
.. code-block:: bash
$ ./pruned_transducer_stateless4/train.py --start-epoch 11
- ``checkpoint-436000.pt``, ``checkpoint-438000.pt``, ...
These are checkpoint files saved every ``--save-every-n`` batches,
containing model ``state_dict`` and optimizer ``state_dict``.
To resume training from some checkpoint, say ``checkpoint-436000``, you can use:
.. code-block:: bash
$ ./pruned_transducer_stateless4/train.py --start-batch 436000
- ``tensorboard/``
This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
.. code-block:: bash
$ cd pruned_transducer_stateless4/exp/tensorboard
$ tensorboard dev upload --logdir . --description "pruned transducer training for LibriSpeech with icefall"
It will print something like below:
.. code-block::
TensorFlow installation not found - running with reduced feature set.
Upload started and will continue reading any new data as it's added to the logdir.
To stop uploading, press Ctrl-C.
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/97VKXf80Ru61CnP2ALWZZg/
[2022-11-20T15:50:50] Started scanning logdir.
Uploading 4468 scalars...
[2022-11-20T15:53:02] Total uploaded: 210171 scalars, 0 tensors, 0 binary objects
Listening for new data in logdir...
Note there is a URL in the above output. Click it and you will see
the following screenshot:
.. figure:: images/streaming-librispeech-pruned-transducer-tensorboard-log.jpg
:width: 600
:alt: TensorBoard screenshot
:align: center
:target: https://tensorboard.dev/experiment/97VKXf80Ru61CnP2ALWZZg/
TensorBoard screenshot.
.. hint::
If you don't have access to google, you can use the following command
to view the tensorboard log locally:
.. code-block:: bash
cd pruned_transducer_stateless4/exp/tensorboard
tensorboard --logdir . --port 6008
It will print the following message:
.. code-block::
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.8.0 at http://localhost:6008/ (Press CTRL+C to quit)
Now start your browser and go to `<http://localhost:6008>`_ to view the tensorboard
logs.
- ``log/log-train-xxxx``
It is the detailed training log in text format, same as the one
you saw printed to the console during training.
Usage example
~~~~~~~~~~~~~
You can use the following command to start the training using 4 GPUs:
.. code-block:: bash
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./pruned_transducer_stateless4/train.py \
--world-size 4 \
--dynamic-chunk-training 1 \
--causal-convolution 1 \
--num-epochs 30 \
--start-epoch 1 \
--exp-dir pruned_transducer_stateless4/exp \
--full-libri 1 \
--max-duration 300
.. NOTE::
Comparing with training a non-streaming model, you only need to add two extra options,
``--dynamic-chunk-training 1`` and ``--causal-convolution 1`` .
Decoding
--------
The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.
.. hint::
There are two kinds of checkpoints:
- (1) ``epoch-1.pt``, ``epoch-2.pt``, ..., which are saved at the end
of each epoch. You can pass ``--epoch`` to
``pruned_transducer_stateless4/decode.py`` to use them.
- (2) ``checkpoints-436000.pt``, ``epoch-438000.pt``, ..., which are saved
every ``--save-every-n`` batches. You can pass ``--iter`` to
``pruned_transducer_stateless4/decode.py`` to use them.
We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.
.. tip::
To decode a streaming model, you can use either ``simulate streaming decoding`` in ``decode.py`` or
``real streaming decoding`` in ``streaming_decode.py``, the difference between ``decode.py`` and
``streaming_decode.py`` is that, ``decode.py`` processes the whole acoustic frames at one time with masking (i.e. same as training),
but ``streaming_decode.py`` processes the acoustic frames chunk by chunk (so it can only see limited context).
.. NOTE::
``simulate streaming decoding`` in ``decode.py`` and ``real streaming decoding`` in ``streaming_decode.py`` should
produce almost the same results given the same ``--decode-chunk-size`` and ``--left-context``.
Simulate streaming decoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/decode.py --help
shows the options for decoding.
The following options are important for streaming models:
``--simulate-streaming``
If you want to decode a streaming model with ``decode.py``, you **MUST** set
``--simulate-streaming`` to ``True``. ``simulate`` here means the acoustic frames
are not processed frame by frame (or chunk by chunk), instead, the whole sequence
is processed at one time with masking (the same as training).
``--causal-convolution``
If True, the convolution module in encoder layers will be causal convolution.
This is **MUST** be True when decoding with a streaming model.
``--decode-chunk-size``
For streaming models, we will calculate the chunk-wise attention, ``--decode-chunk-size``
indicates the chunk length (in frames after subsampling) for chunk-wise attention.
For ``simulate streaming decoding`` the ``decode-chunk-size`` is used to generate
the attention mask.
``--left-context``
``--left-context`` indicates how many left context frames (after subsampling) can be seen
for current chunk when calculating chunk-wise attention. Normally, ``left-context`` should equal
to ``decode-chunk-size * num-left-chunks``, where ``num-left-chunks`` is the option used
to train this model. For ``simulate streaming decoding`` the ``left-context`` is used to generate
the attention mask.
The following shows two examples (for the two types of checkpoints):
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
for epoch in 25 20; do
for avg in 7 5 3 1; do
./pruned_transducer_stateless4/decode.py \
--epoch $epoch \
--avg $avg \
--simulate-streaming 1 \
--causal-convolution 1 \
--decode-chunk-size 16 \
--left-context 64 \
--exp-dir pruned_transducer_stateless4/exp \
--max-duration 600 \
--decoding-method $m
done
done
done
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
for iter in 474000; do
for avg in 8 10 12 14 16 18; do
./pruned_transducer_stateless4/decode.py \
--iter $iter \
--avg $avg \
--simulate-streaming 1 \
--causal-convolution 1 \
--decode-chunk-size 16 \
--left-context 64 \
--exp-dir pruned_transducer_stateless4/exp \
--max-duration 600 \
--decoding-method $m
done
done
done
Real streaming decoding
~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./pruned_transducer_stateless4/streaming_decode.py --help
shows the options for decoding.
The following options are important for streaming models:
``--decode-chunk-size``
For streaming models, we will calculate the chunk-wise attention, ``--decode-chunk-size``
indicates the chunk length (in frames after subsampling) for chunk-wise attention.
For ``real streaming decoding``, we will process ``decode-chunk-size`` acoustic frames at each time.
``--left-context``
``--left-context`` indicates how many left context frames (after subsampling) can be seen
for current chunk when calculating chunk-wise attention. Normally, ``left-context`` should equal
to ``decode-chunk-size * num-left-chunks``, where ``num-left-chunks`` is the option used
to train this model.
``--num-decode-streams``
The number of decoding streams that can be run in parallel (very similar to the ``bath size``).
For ``real streaming decoding``, the batches will be packed dynamically, for example, if the
``num-decode-streams`` equals to 10, then, sequence 1 to 10 will be decoded at first, after a while,
suppose sequence 1 and 2 are done, so, sequence 3 to 12 will be processed parallelly in a batch.
.. NOTE::
We also try adding ``--right-context`` in the real streaming decoding, but it seems not to benefit
the performance for all the models, the reasons might be the training and decoding mismatch. You
can try decoding with ``--right-context`` to see if it helps. The default value is 0.
The following shows two examples (for the two types of checkpoints):
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
for epoch in 25 20; do
for avg in 7 5 3 1; do
./pruned_transducer_stateless4/decode.py \
--epoch $epoch \
--avg $avg \
--decode-chunk-size 16 \
--left-context 64 \
--num-decode-streams 100 \
--exp-dir pruned_transducer_stateless4/exp \
--max-duration 600 \
--decoding-method $m
done
done
done
.. code-block:: bash
for m in greedy_search fast_beam_search modified_beam_search; do
for iter in 474000; do
for avg in 8 10 12 14 16 18; do
./pruned_transducer_stateless4/decode.py \
--iter $iter \
--avg $avg \
--decode-chunk-size 16 \
--left-context 64 \
--num-decode-streams 100 \
--exp-dir pruned_transducer_stateless4/exp \
--max-duration 600 \
--decoding-method $m
done
done
done
.. tip::
Supporting decoding methods are as follows:
- ``greedy_search`` : It takes the symbol with largest posterior probability
of each frame as the decoding result.
- ``beam_search`` : It implements Algorithm 1 in https://arxiv.org/pdf/1211.3711.pdf and
`espnet/nets/beam_search_transducer.py <https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search_transducer.py#L247>`_
is used as a reference. Basicly, it keeps topk states for each frame, and expands the kept states with their own contexts to
next frame.
- ``modified_beam_search`` : It implements the same algorithm as ``beam_search`` above, but it
runs in batch mode with ``--max-sym-per-frame=1`` being hardcoded.
- ``fast_beam_search`` : It implements graph composition between the output ``log_probs`` and
given ``FSAs``. It is hard to describe the details in several lines of texts, you can read
our paper in https://arxiv.org/pdf/2211.00484.pdf or our `rnnt decode code in k2 <https://github.com/k2-fsa/k2/blob/master/k2/csrc/rnnt_decode.h>`_. ``fast_beam_search`` can decode with ``FSAs`` on GPU efficiently.
- ``fast_beam_search_LG`` : The same as ``fast_beam_search`` above, ``fast_beam_search`` uses
an trivial graph that has only one state, while ``fast_beam_search_LG`` uses an LG graph
(with N-gram LM).
- ``fast_beam_search_nbest`` : It produces the decoding results as follows:
- (1) Use ``fast_beam_search`` to get a lattice
- (2) Select ``num_paths`` paths from the lattice using ``k2.random_paths()``
- (3) Unique the selected paths
- (4) Intersect the selected paths with the lattice and compute the
shortest path from the intersection result
- (5) The path with the largest score is used as the decoding output.
- ``fast_beam_search_nbest_LG`` : It implements same logic as ``fast_beam_search_nbest``, the
only difference is that it uses ``fast_beam_search_LG`` to generate the lattice.
.. NOTE::
The supporting decoding methods in ``streaming_decode.py`` might be less than that in ``decode.py``, if needed,
you can implement them by yourself or file a issue in `icefall <https://github.com/k2-fsa/icefall/issues>`_ .
Export Model
------------
`pruned_transducer_stateless4/export.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/export.py>`_ supports exporting checkpoints from ``pruned_transducer_stateless4/exp`` in the following ways.
Export ``model.state_dict()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checkpoints saved by ``pruned_transducer_stateless4/train.py`` also include
``optimizer.state_dict()``. It is useful for resuming training. But after training,
we are interested only in ``model.state_dict()``. You can use the following
command to extract ``model.state_dict()``.
.. code-block:: bash
# Assume that --epoch 25 --avg 3 produces the smallest WER
# (You can get such information after running ./pruned_transducer_stateless4/decode.py)
epoch=25
avg=3
./pruned_transducer_stateless4/export.py \
--exp-dir ./pruned_transducer_stateless4/exp \
--streaming-model 1 \
--causal-convolution 1 \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch $epoch \
--avg $avg
.. caution::
``--streaming-model`` and ``--causal-convolution`` require to be True to export
a streaming mdoel.
It will generate a file ``./pruned_transducer_stateless4/exp/pretrained.pt``.
.. hint::
To use the generated ``pretrained.pt`` for ``pruned_transducer_stateless4/decode.py``,
you can run:
.. code-block:: bash
cd pruned_transducer_stateless4/exp
ln -s pretrained.pt epoch-999.pt
And then pass ``--epoch 999 --avg 1 --use-averaged-model 0`` to
``./pruned_transducer_stateless4/decode.py``.
To use the exported model with ``./pruned_transducer_stateless4/pretrained.py``, you
can run:
.. code-block:: bash
./pruned_transducer_stateless4/pretrained.py \
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt \
--simulate-streaming 1 \
--causal-convolution 1 \
--bpe-model ./data/lang_bpe_500/bpe.model \
--method greedy_search \
/path/to/foo.wav \
/path/to/bar.wav
Export model using ``torch.jit.script()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
./pruned_transducer_stateless4/export.py \
--exp-dir ./pruned_transducer_stateless4/exp \
--streaming-model 1 \
--causal-convolution 1 \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 25 \
--avg 3 \
--jit 1
.. caution::
``--streaming-model`` and ``--causal-convolution`` require to be True to export
a streaming mdoel.
It will generate a file ``cpu_jit.pt`` in the given ``exp_dir``. You can later
load it by ``torch.jit.load("cpu_jit.pt")``.
Note ``cpu`` in the name ``cpu_jit.pt`` means the parameters when loaded into Python
are on CPU. You can use ``to("cuda")`` to move them to a CUDA device.
.. NOTE::
You will need this ``cpu_jit.pt`` when deploying with Sherpa framework.
Download pretrained models
--------------------------
If you don't want to train from scratch, you can download the pretrained models
by visiting the following links:
- `pruned_transducer_stateless <https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless_20220625>`_
- `pruned_transducer_stateless2 <https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless2_20220625>`_
- `pruned_transducer_stateless4 <https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless4_20220625>`_
- `pruned_transducer_stateless5 <https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless5_20220729>`_
See `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>`_
for the details of the above pretrained models
Deploy with Sherpa
------------------
Please see `<https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/index.html#>`_
for how to deploy the models in ``sherpa``.

View File

@ -13,7 +13,5 @@ We may add recipes for other tasks as well in the future.
:maxdepth: 2 :maxdepth: 2
:caption: Table of Contents :caption: Table of Contents
aishell/index Non-streaming-ASR/index
librispeech/index Streaming-ASR/index
timit/index
yesno/index

View File

@ -40,10 +40,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Contributing</a><ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="index.html">Contributing</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="doc.html">Contributing to Documentation</a></li> <li class="toctree-l2"><a class="reference internal" href="doc.html">Contributing to Documentation</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Follow the code style</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="#">Follow the code style</a></li>
@ -110,7 +114,7 @@ $ pre-commit install
<div><figure class="align-center" id="id2"> <div><figure class="align-center" id="id2">
<a class="reference internal image-reference" href="../_images/pre-commit-check.png"><img alt="../_images/pre-commit-check.png" src="../_images/pre-commit-check.png" style="width: 600px;" /></a> <a class="reference internal image-reference" href="../_images/pre-commit-check.png"><img alt="../_images/pre-commit-check.png" src="../_images/pre-commit-check.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 8 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Failed).</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 10 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Failed).</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
</figcaption> </figcaption>
</figure> </figure>
</div></blockquote> </div></blockquote>
@ -129,7 +133,7 @@ it should succeed this time:</p>
<div><figure class="align-center" id="id3"> <div><figure class="align-center" id="id3">
<a class="reference internal image-reference" href="../_images/pre-commit-check-success.png"><img alt="../_images/pre-commit-check-success.png" src="../_images/pre-commit-check-success.png" style="width: 600px;" /></a> <a class="reference internal image-reference" href="../_images/pre-commit-check-success.png"><img alt="../_images/pre-commit-check-success.png" src="../_images/pre-commit-check-success.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 9 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Succeeded).</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 11 </span><span class="caption-text">pre-commit hooks invoked by <code class="docutils literal notranslate"><span class="pre">git</span> <span class="pre">commit</span></code> (Succeeded).</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
</figcaption> </figcaption>
</figure> </figure>
</div></blockquote> </div></blockquote>

View File

@ -40,10 +40,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Contributing</a><ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="index.html">Contributing</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">Contributing to Documentation</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="#">Contributing to Documentation</a></li>
<li class="toctree-l2"><a class="reference internal" href="code-style.html">Follow the code style</a></li> <li class="toctree-l2"><a class="reference internal" href="code-style.html">Follow the code style</a></li>
@ -118,7 +122,7 @@ the following:</p>
<div><figure class="align-center" id="id1"> <div><figure class="align-center" id="id1">
<a class="reference internal image-reference" href="../_images/doc-contrib.png"><img alt="../_images/doc-contrib.png" src="../_images/doc-contrib.png" style="width: 600px;" /></a> <a class="reference internal image-reference" href="../_images/doc-contrib.png"><img alt="../_images/doc-contrib.png" src="../_images/doc-contrib.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 7 </span><span class="caption-text">View generated documentation locally with <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-m</span> <span class="pre">http.server</span></code>.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 9 </span><span class="caption-text">View generated documentation locally with <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">-m</span> <span class="pre">http.server</span></code>.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
</figcaption> </figcaption>
</figure> </figure>
</div></blockquote> </div></blockquote>

View File

@ -40,10 +40,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Contributing</a><ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="index.html">Contributing</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="doc.html">Contributing to Documentation</a></li> <li class="toctree-l2"><a class="reference internal" href="doc.html">Contributing to Documentation</a></li>
<li class="toctree-l2"><a class="reference internal" href="code-style.html">Follow the code style</a></li> <li class="toctree-l2"><a class="reference internal" href="code-style.html">Follow the code style</a></li>

View File

@ -21,7 +21,7 @@
<link rel="index" title="Index" href="../genindex.html" /> <link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" /> <link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Contributing to Documentation" href="doc.html" /> <link rel="next" title="Contributing to Documentation" href="doc.html" />
<link rel="prev" title="TDNN-CTC" href="../recipes/yesno/tdnn.html" /> <link rel="prev" title="LSTM Transducer" href="../recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html" />
</head> </head>
<body class="wy-body-for-nav"> <body class="wy-body-for-nav">
@ -40,10 +40,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">Contributing</a><ul> <li class="toctree-l1 current"><a class="current reference internal" href="#">Contributing</a><ul>
<li class="toctree-l2"><a class="reference internal" href="doc.html">Contributing to Documentation</a></li> <li class="toctree-l2"><a class="reference internal" href="doc.html">Contributing to Documentation</a></li>
<li class="toctree-l2"><a class="reference internal" href="code-style.html">Follow the code style</a></li> <li class="toctree-l2"><a class="reference internal" href="code-style.html">Follow the code style</a></li>
@ -120,7 +124,7 @@ and code to <code class="docutils literal notranslate"><span class="pre">icefall
</div> </div>
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../recipes/yesno/tdnn.html" class="btn btn-neutral float-left" title="TDNN-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="../recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-left" title="LSTM Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="doc.html" class="btn btn-neutral float-right" title="Contributing to Documentation" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="doc.html" class="btn btn-neutral float-right" title="Contributing to Documentation" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>

View File

@ -40,7 +40,11 @@
<ul> <ul>
<li class="toctree-l1"><a class="reference internal" href="installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
</ul> </ul>

View File

@ -40,10 +40,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Huggingface</a><ul> <li class="toctree-l1 current"><a class="current reference internal" href="#">Huggingface</a><ul>
<li class="toctree-l2"><a class="reference internal" href="pretrained-models.html">Pre-trained models</a></li> <li class="toctree-l2"><a class="reference internal" href="pretrained-models.html">Pre-trained models</a></li>

View File

@ -40,10 +40,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Huggingface</a><ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="index.html">Huggingface</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">Pre-trained models</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="#">Pre-trained models</a></li>

View File

@ -39,10 +39,14 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Huggingface</a><ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="index.html">Huggingface</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="pretrained-models.html">Pre-trained models</a></li> <li class="toctree-l2"><a class="reference internal" href="pretrained-models.html">Pre-trained models</a></li>

View File

@ -42,7 +42,11 @@
<ul> <ul>
<li class="toctree-l1"><a class="reference internal" href="installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
</ul> </ul>
@ -97,13 +101,29 @@ speech recognition recipes using <a class="reference external" href="https://git
<li class="toctree-l2"><a class="reference internal" href="model-export/export-ncnn.html">Export to ncnn</a></li> <li class="toctree-l2"><a class="reference internal" href="model-export/export-ncnn.html">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
</div>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a><ul> <li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a><ul>
<li class="toctree-l2"><a class="reference internal" href="recipes/aishell/index.html">aishell</a></li> <li class="toctree-l2"><a class="reference internal" href="recipes/Non-streaming-ASR/index.html">Non Streaming ASR</a><ul>
<li class="toctree-l2"><a class="reference internal" href="recipes/librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l3"><a class="reference internal" href="recipes/Non-streaming-ASR/aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="recipes/timit/index.html">TIMIT</a></li> <li class="toctree-l3"><a class="reference internal" href="recipes/Non-streaming-ASR/librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="recipes/yesno/index.html">YesNo</a></li> <li class="toctree-l3"><a class="reference internal" href="recipes/Non-streaming-ASR/timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="recipes/Non-streaming-ASR/yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="recipes/Streaming-ASR/index.html">Streaming ASR</a><ul>
<li class="toctree-l3"><a class="reference internal" href="recipes/Streaming-ASR/introduction.html">Introduction</a></li>
<li class="toctree-l3"><a class="reference internal" href="recipes/Streaming-ASR/librispeech/index.html">LibriSpeech</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a><ul> <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a><ul>
<li class="toctree-l2"><a class="reference internal" href="contributing/doc.html">Contributing to Documentation</a></li> <li class="toctree-l2"><a class="reference internal" href="contributing/doc.html">Contributing to Documentation</a></li>
<li class="toctree-l2"><a class="reference internal" href="contributing/code-style.html">Follow the code style</a></li> <li class="toctree-l2"><a class="reference internal" href="contributing/code-style.html">Follow the code style</a></li>

View File

@ -64,7 +64,11 @@
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>

View File

@ -56,7 +56,11 @@
<li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li> <li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>

View File

@ -50,7 +50,11 @@
<li class="toctree-l2 current"><a class="current reference internal" href="#">Export to ncnn</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="#">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
@ -83,7 +87,7 @@
<section id="export-to-ncnn"> <section id="export-to-ncnn">
<h1>Export to ncnn<a class="headerlink" href="#export-to-ncnn" title="Permalink to this heading"></a></h1> <h1>Export to ncnn<a class="headerlink" href="#export-to-ncnn" title="Permalink to this heading"></a></h1>
<p>We support exporting LSTM transducer models to <a class="reference external" href="https://github.com/tencent/ncnn">ncnn</a>.</p> <p>We support exporting LSTM transducer models to <a class="reference external" href="https://github.com/tencent/ncnn">ncnn</a>.</p>
<p>Please refer to <a class="reference internal" href="../recipes/librispeech/lstm_pruned_stateless_transducer.html#export-model-for-ncnn"><span class="std std-ref">Export model for ncnn</span></a> for details.</p> <p>Please refer to <a class="reference internal" href="../recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.html#export-model-for-ncnn"><span class="std std-ref">Export model for ncnn</span></a> for details.</p>
<p>We also provide <a class="reference external" href="https://github.com/k2-fsa/sherpa-ncnn">https://github.com/k2-fsa/sherpa-ncnn</a> <p>We also provide <a class="reference external" href="https://github.com/k2-fsa/sherpa-ncnn">https://github.com/k2-fsa/sherpa-ncnn</a>
performing speech recognition using <code class="docutils literal notranslate"><span class="pre">ncnn</span></code> with exported models. performing speech recognition using <code class="docutils literal notranslate"><span class="pre">ncnn</span></code> with exported models.
It has been tested on Linux, macOS, Windows, and Raspberry Pi. The project is It has been tested on Linux, macOS, Windows, and Raspberry Pi. The project is

View File

@ -55,7 +55,11 @@
<li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li> <li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>

View File

@ -55,7 +55,11 @@
<li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li> <li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>

View File

@ -55,7 +55,11 @@
<li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li> <li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>

View File

@ -50,7 +50,11 @@
<li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li> <li class="toctree-l2"><a class="reference internal" href="export-ncnn.html">Export to ncnn</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="../recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>

Binary file not shown.

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Conformer CTC &mdash; icefall 0.1 documentation</title> <title>Conformer CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Stateless Transducer" href="stateless_transducer.html" /> <link rel="next" title="Stateless Transducer" href="stateless_transducer.html" />
<link rel="prev" title="TDNN-LSTM CTC" href="tdnn_lstm_ctc.html" /> <link rel="prev" title="TDNN-LSTM CTC" href="tdnn_lstm_ctc.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,31 +40,31 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3 current"><a class="reference internal" href="index.html">aishell</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">aishell</a><ul class="current"> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li> <li class="toctree-l4 current"><a class="current reference internal" href="#">Conformer CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">Conformer CTC</a><ul> <li class="toctree-l4"><a class="reference internal" href="stateless_transducer.html">Stateless Transducer</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li>
<li class="toctree-l4"><a class="reference internal" href="#deployment-with-c">Deployment with C++</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l3"><a class="reference internal" href="stateless_transducer.html">Stateless Transducer</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -73,19 +73,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">aishell</a></li> <li class="breadcrumb-item"><a href="index.html">aishell</a></li>
<li class="breadcrumb-item active">Conformer CTC</li> <li class="breadcrumb-item active">Conformer CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/aishell/conformer_ctc.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/aishell/conformer_ctc.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -99,7 +100,7 @@
with the <a class="reference external" href="https://www.openslr.org/33">Aishell</a> dataset.</p> with the <a class="reference external" href="https://www.openslr.org/33">Aishell</a> dataset.</p>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<div class="admonition hint"> <div class="admonition hint">
@ -331,7 +332,7 @@ $ tensorboard dev upload --logdir . --name <span class="s2">&quot;Aishell confor
the following screenshot:</p> the following screenshot:</p>
<blockquote> <blockquote>
<div><figure class="align-center" id="id2"> <div><figure class="align-center" id="id2">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/WE1DocDqRRCOSAgmGyClhg/"><img alt="TensorBoard screenshot" src="../../_images/aishell-conformer-ctc-tensorboard-log.jpg" style="width: 600px;" /></a> <a class="reference external image-reference" href="https://tensorboard.dev/experiment/WE1DocDqRRCOSAgmGyClhg/"><img alt="TensorBoard screenshot" src="../../../_images/aishell-conformer-ctc-tensorboard-log.jpg" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 2 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 2 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
</figcaption> </figcaption>

View File

@ -5,23 +5,23 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>aishell &mdash; icefall 0.1 documentation</title> <title>aishell &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TDNN-LSTM CTC" href="tdnn_lstm_ctc.html" /> <link rel="next" title="TDNN-LSTM CTC" href="tdnn_lstm_ctc.html" />
<link rel="prev" title="Recipes" href="../index.html" /> <link rel="prev" title="Non Streaming ASR" href="../index.html" />
</head> </head>
<body class="wy-body-for-nav"> <body class="wy-body-for-nav">
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,23 +40,31 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3 current"><a class="current reference internal" href="#">aishell</a><ul>
<li class="toctree-l2 current"><a class="current reference internal" href="#">aishell</a><ul> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="stateless_transducer.html">Stateless Transducer</a></li>
<li class="toctree-l3"><a class="reference internal" href="stateless_transducer.html">Stateless Transducer</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> </ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -65,18 +73,19 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item active">aishell</li> <li class="breadcrumb-item active">aishell</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/aishell/index.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/aishell/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -108,7 +117,7 @@ amount of data for new researchers in the field of speech recognition.</p>
</div> </div>
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../index.html" class="btn btn-neutral float-left" title="Recipes" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="../index.html" class="btn btn-neutral float-left" title="Non Streaming ASR" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-right" title="TDNN-LSTM CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-right" title="TDNN-LSTM CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Stateless Transducer &mdash; icefall 0.1 documentation</title> <title>Stateless Transducer &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="LibriSpeech" href="../librispeech/index.html" /> <link rel="next" title="LibriSpeech" href="../librispeech/index.html" />
<link rel="prev" title="Conformer CTC" href="conformer_ctc.html" /> <link rel="prev" title="Conformer CTC" href="conformer_ctc.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,32 +40,31 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3 current"><a class="reference internal" href="index.html">aishell</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">aishell</a><ul class="current"> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l4 current"><a class="current reference internal" href="#">Stateless Transducer</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">Stateless Transducer</a><ul> </ul>
<li class="toctree-l4"><a class="reference internal" href="#the-model">The Model</a></li> </li>
<li class="toctree-l4"><a class="reference internal" href="#the-loss">The Loss</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data Preparation</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li> </ul>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li> </li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>
</li> <ul>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -74,19 +73,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">aishell</a></li> <li class="breadcrumb-item"><a href="index.html">aishell</a></li>
<li class="breadcrumb-item active">Stateless Transducer</li> <li class="breadcrumb-item active">Stateless Transducer</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/aishell/stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/aishell/stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -104,7 +104,7 @@ here. As you will see, there are no RNNs in the model.</p>
</div> </div>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<div class="admonition hint"> <div class="admonition hint">
@ -391,7 +391,7 @@ $ tensorboard dev upload --logdir . --name <span class="s2">&quot;Aishell transd
above output, click it and you will see the following screenshot:</p> above output, click it and you will see the following screenshot:</p>
<blockquote> <blockquote>
<div><figure class="align-center" id="id3"> <div><figure class="align-center" id="id3">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/laGZ6HrcQxOigbFD5E0Y3Q"><img alt="TensorBoard screenshot" src="../../_images/aishell-transducer_stateless_modified-tensorboard-log.png" style="width: 600px;" /></a> <a class="reference external image-reference" href="https://tensorboard.dev/experiment/laGZ6HrcQxOigbFD5E0Y3Q"><img alt="TensorBoard screenshot" src="../../../_images/aishell-transducer_stateless_modified-tensorboard-log.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 3 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 3 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
</figcaption> </figcaption>

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TDNN-LSTM CTC &mdash; icefall 0.1 documentation</title> <title>TDNN-LSTM CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Conformer CTC" href="conformer_ctc.html" /> <link rel="next" title="Conformer CTC" href="conformer_ctc.html" />
<link rel="prev" title="aishell" href="index.html" /> <link rel="prev" title="aishell" href="index.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,30 +40,31 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3 current"><a class="reference internal" href="index.html">aishell</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">aishell</a><ul class="current"> <li class="toctree-l4 current"><a class="current reference internal" href="#">TDNN-LSTM CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">TDNN-LSTM CTC</a><ul> <li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li> <li class="toctree-l4"><a class="reference internal" href="stateless_transducer.html">Stateless Transducer</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l3"><a class="reference internal" href="stateless_transducer.html">Stateless Transducer</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -72,19 +73,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">aishell</a></li> <li class="breadcrumb-item"><a href="index.html">aishell</a></li>
<li class="breadcrumb-item active">TDNN-LSTM CTC</li> <li class="breadcrumb-item active">TDNN-LSTM CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/aishell/tdnn_lstm_ctc.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/aishell/tdnn_lstm_ctc.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -98,7 +100,7 @@
with the <a class="reference external" href="https://www.openslr.org/33">Aishell</a> dataset.</p> with the <a class="reference external" href="https://www.openslr.org/33">Aishell</a> dataset.</p>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<div class="admonition hint"> <div class="admonition hint">
@ -326,7 +328,7 @@ $ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN-LS
the following screenshot:</p> the following screenshot:</p>
<blockquote> <blockquote>
<div><figure class="align-center" id="id2"> <div><figure class="align-center" id="id2">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/LJI9MWUORLOw3jkdhxwk8A/"><img alt="TensorBoard screenshot" src="../../_images/aishell-tdnn-lstm-ctc-tensorboard-log.jpg" style="width: 600px;" /></a> <a class="reference external image-reference" href="https://tensorboard.dev/experiment/LJI9MWUORLOw3jkdhxwk8A/"><img alt="TensorBoard screenshot" src="../../../_images/aishell-tdnn-lstm-ctc-tensorboard-log.jpg" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 1 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 1 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
</figcaption> </figcaption>

View File

@ -0,0 +1,151 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Non Streaming ASR &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="aishell" href="aishell/index.html" />
<link rel="prev" title="Recipes" href="../index.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">Non Streaming ASR</a><ul>
<li class="toctree-l3"><a class="reference internal" href="aishell/index.html">aishell</a></li>
<li class="toctree-l3"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l3"><a class="reference internal" href="timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="yesno/index.html">YesNo</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li>
<li class="breadcrumb-item active">Non Streaming ASR</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="non-streaming-asr">
<h1>Non Streaming ASR<a class="headerlink" href="#non-streaming-asr" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="aishell/index.html">aishell</a><ul>
<li class="toctree-l2"><a class="reference internal" href="aishell/tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="aishell/conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="aishell/stateless_transducer.html">Stateless Transducer</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a><ul>
<li class="toctree-l2"><a class="reference internal" href="librispeech/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/zipformer_mmi.html">Zipformer MMI</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>
<li class="toctree-l2"><a class="reference internal" href="timit/tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="timit/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="yesno/index.html">YesNo</a><ul>
<li class="toctree-l2"><a class="reference internal" href="yesno/tdnn.html">TDNN-CTC</a></li>
</ul>
</li>
</ul>
</div>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../index.html" class="btn btn-neutral float-left" title="Recipes" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="aishell/index.html" class="btn btn-neutral float-right" title="aishell" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -5,22 +5,22 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Conformer CTC &mdash; icefall 0.1 documentation</title> <title>Conformer CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="LSTM Transducer" href="lstm_pruned_stateless_transducer.html" /> <link rel="next" title="Pruned transducer statelessX" href="pruned_transducer_stateless.html" />
<link rel="prev" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" /> <link rel="prev" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,32 +40,32 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current"> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l4 current"><a class="current reference internal" href="#">Conformer CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">Conformer CTC</a><ul> <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li> <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li>
<li class="toctree-l4"><a class="reference internal" href="#deployment-with-c">Deployment with C++</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -74,19 +74,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li> <li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">Conformer CTC</li> <li class="breadcrumb-item active">Conformer CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/librispeech/conformer_ctc.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/conformer_ctc.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -100,7 +101,7 @@
with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p> with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<div class="admonition hint"> <div class="admonition hint">
@ -337,7 +338,7 @@ $ tensorboard dev upload --logdir . --description <span class="s2">&quot;Conform
the following screenshot:</p> the following screenshot:</p>
<blockquote> <blockquote>
<div><figure class="align-center" id="id2"> <div><figure class="align-center" id="id2">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../_images/librispeech-conformer-ctc-tensorboard-log.png" style="width: 600px;" /></a> <a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-conformer-ctc-tensorboard-log.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 4 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 4 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
</figcaption> </figcaption>
@ -1090,7 +1091,7 @@ Please see <a class="reference external" href="https://colab.research.google.com
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-left" title="TDNN-LSTM-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-left" title="TDNN-LSTM-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-right" title="LSTM Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="pruned_transducer_stateless.html" class="btn btn-neutral float-right" title="Pruned transducer statelessX" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>
<hr/> <hr/>

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>LibriSpeech &mdash; icefall 0.1 documentation</title> <title>LibriSpeech &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" /> <link rel="next" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" />
<link rel="prev" title="Stateless Transducer" href="../aishell/stateless_transducer.html" /> <link rel="prev" title="Stateless Transducer" href="../aishell/stateless_transducer.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,24 +40,32 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3 current"><a class="current reference internal" href="#">LibriSpeech</a><ul>
<li class="toctree-l2 current"><a class="current reference internal" href="#">LibriSpeech</a><ul> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li> <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l3"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> </ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -66,18 +74,19 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item active">LibriSpeech</li> <li class="breadcrumb-item active">LibriSpeech</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/librispeech/index.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -91,7 +100,7 @@
<ul> <ul>
<li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l1"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l1"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l1"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li> <li class="toctree-l1"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l1"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li> <li class="toctree-l1"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
</ul> </ul>
</div> </div>

View File

@ -0,0 +1,644 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Pruned transducer statelessX &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../../_static/jquery.js"></script>
<script src="../../../_static/underscore.js"></script>
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../../_static/doctools.js"></script>
<script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Zipformer MMI" href="zipformer_mmi.html" />
<link rel="prev" title="Conformer CTC" href="conformer_ctc.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4 current"><a class="current reference internal" href="#">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">Pruned transducer statelessX</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/pruned_transducer_stateless.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="pruned-transducer-statelessx">
<h1>Pruned transducer statelessX<a class="headerlink" href="#pruned-transducer-statelessx" title="Permalink to this heading"></a></h1>
<p>This tutorial shows you how to run a conformer transducer model
with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The tutorial is suitable for <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless">pruned_transducer_stateless</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2">pruned_transducer_stateless2</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4">pruned_transducer_stateless4</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5">pruned_transducer_stateless5</a>,
We will take pruned_transducer_stateless4 as an example in this tutorial.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>We recommend you to use a GPU or several GPUs to run this recipe.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Please scroll down to the bottom of this page to find download links
for pretrained models if you dont want to train a model from scratch.</p>
</div>
<p>We use pruned RNN-T to compute the loss.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>You can find the paper about pruned RNN-T at the following address:</p>
<p><a class="reference external" href="https://arxiv.org/abs/2206.13236">https://arxiv.org/abs/2206.13236</a></p>
</div>
<p>The transducer model consists of 3 parts:</p>
<blockquote>
<div><ul class="simple">
<li><p>Encoder, a.k.a, the transcription network. We use a Conformer model (the reworked version by Daniel Povey)</p></li>
<li><p>Decoder, a.k.a, the prediction network. We use a stateless model consisting of
<code class="docutils literal notranslate"><span class="pre">nn.Embedding</span></code> and <code class="docutils literal notranslate"><span class="pre">nn.Conv1d</span></code></p></li>
<li><p>Joiner, a.k.a, the joint network.</p></li>
</ul>
</div></blockquote>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>Contrary to the conventional RNN-T models, we use a stateless decoder.
That is, it has no recurrent connections.</p>
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
All you need to do is to run it.</p>
<p>The data preparation contains several stages, you can use the following two
options:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">--stage</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--stop-stage</span></code></p></li>
</ul>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you have pre-downloaded the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>
dataset and the <a class="reference external" href="http://www.openslr.org/17/">musan</a> dataset, say,
they are saved in <code class="docutils literal notranslate"><span class="pre">/tmp/LibriSpeech</span></code> and <code class="docutils literal notranslate"><span class="pre">/tmp/musan</span></code>, you can modify
the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></code> variable in <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> to point to <code class="docutils literal notranslate"><span class="pre">/tmp</span></code> so that
<code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> wont re-download them.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>All generated files by <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>, e.g., features, lexicon, etc,
are saved in <code class="docutils literal notranslate"><span class="pre">./data</span></code> directory.</p>
</div>
<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>To get the latest news of <a class="reference external" href="https://github.com/k2-fsa">next-gen Kaldi</a>, please subscribe
the following YouTube channel by <a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">Nadira Povey</a>:</p>
<blockquote>
<div><p><a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw</a></p>
</div></blockquote>
</div>
<div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://www.youtube.com/embed/ofEIoJL-mGM" style="border: 0; height: 345px; width: 560px">
</iframe></div></section>
<section id="training">
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
The following options are used quite often:</p>
<blockquote>
<div><ul>
<li><p><code class="docutils literal notranslate"><span class="pre">--exp-dir</span></code></p>
<p>The directory to save checkpoints, training logs and tensorboard.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--full-libri</span></code></p>
<p>If its True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
<code class="docutils literal notranslate"><span class="pre">train-clean-100</span></code>, which has 100 hours of training data.</p>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>The training set is perturbed by speed with two factors: 0.9 and 1.1.
If <code class="docutils literal notranslate"><span class="pre">--full-libri</span></code> is True, each epoch actually processes
<code class="docutils literal notranslate"><span class="pre">3x960</span> <span class="pre">==</span> <span class="pre">2880</span></code> hours of data.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--num-epochs</span></code></p>
<p>It is the number of epochs to train. For instance,
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/train.py</span> <span class="pre">--num-epochs</span> <span class="pre">30</span></code> trains for 30 epochs
and generates <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, <code class="docutils literal notranslate"><span class="pre">epoch-30.pt</span></code>
in the folder <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp</span></code>.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--start-epoch</span></code></p>
<p>Its used to resume training.
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/train.py</span> <span class="pre">--start-epoch</span> <span class="pre">10</span></code> loads the
checkpoint <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp/epoch-9.pt</span></code> and starts
training from epoch 10, based on the state from epoch 9.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--world-size</span></code></p>
<p>It is used for multi-GPU single-machine DDP training.</p>
<blockquote>
<div><ul class="simple">
<li><ol class="loweralpha simple">
<li><p>If it is 1, then no DDP training is used.</p></li>
</ol>
</li>
<li><ol class="loweralpha simple" start="2">
<li><p>If it is 2, then GPU 0 and GPU 1 are used for DDP training.</p></li>
</ol>
</li>
</ul>
</div></blockquote>
<p>The following shows some use cases with it.</p>
<blockquote>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
</div></blockquote>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--max-duration</span></code></p>
<p>It specifies the number of seconds over all utterances in a
batch, before <strong>padding</strong>.
If you encounter CUDA OOM, please reduce it.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Due to padding, the number of seconds of all utterances in a
batch will usually be larger than <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code>.</p>
<p>A larger value for <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--use-fp16</span></code></p>
<p>If it is True, the model will train with half precision, from our experiment
results, by using half precision you can train with two times larger <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code>
so as to get almost 2X speed up.</p>
</li>
</ul>
</div></blockquote>
</section>
<section id="pre-configured-options">
<h3>Pre-configured options<a class="headerlink" href="#pre-configured-options" title="Permalink to this heading"></a></h3>
<p>There are some training options, e.g., number of encoder layers,
encoder dimension, decoder dimension, number of warmup steps etc,
that are not passed from the commandline.
They are pre-configured by the function <code class="docutils literal notranslate"><span class="pre">get_params()</span></code> in
<a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/train.py">pruned_transducer_stateless4/train.py</a></p>
<p>You dont need to change these pre-configured parameters. If you really need to change
them, please modify <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/train.py</span></code> directly.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The options for <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py">pruned_transducer_stateless5</a> are a little different from
other recipes. It allows you to configure <code class="docutils literal notranslate"><span class="pre">--num-encoder-layers</span></code>, <code class="docutils literal notranslate"><span class="pre">--dim-feedforward</span></code>, <code class="docutils literal notranslate"><span class="pre">--nhead</span></code>, <code class="docutils literal notranslate"><span class="pre">--encoder-dim</span></code>, <code class="docutils literal notranslate"><span class="pre">--decoder-dim</span></code>, <code class="docutils literal notranslate"><span class="pre">--joiner-dim</span></code> from commandline, so that you can train models with different size with pruned_transducer_stateless5.</p>
</div>
</section>
<section id="training-logs">
<h3>Training logs<a class="headerlink" href="#training-logs" title="Permalink to this heading"></a></h3>
<p>Training logs and checkpoints are saved in <code class="docutils literal notranslate"><span class="pre">--exp-dir</span></code> (e.g. <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/exp</span></code>.
You will find the following files in that directory:</p>
<blockquote>
<div><ul>
<li><p><code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …</p>
<p>These are checkpoint files saved at the end of each epoch, containing model
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-epoch <span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">checkpoint-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">checkpoint-438000.pt</span></code>, …</p>
<p>These are checkpoint files saved every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches,
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-batch <span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">tensorboard/</span></code></p>
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
<p>It will print something like below:</p>
<blockquote>
<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">TensorFlow</span> <span class="n">installation</span> <span class="ow">not</span> <span class="n">found</span> <span class="o">-</span> <span class="n">running</span> <span class="k">with</span> <span class="n">reduced</span> <span class="n">feature</span> <span class="nb">set</span><span class="o">.</span>
<span class="n">Upload</span> <span class="n">started</span> <span class="ow">and</span> <span class="n">will</span> <span class="k">continue</span> <span class="n">reading</span> <span class="nb">any</span> <span class="n">new</span> <span class="n">data</span> <span class="k">as</span> <span class="n">it</span><span class="s1">&#39;s added to the logdir.</span>
<span class="n">To</span> <span class="n">stop</span> <span class="n">uploading</span><span class="p">,</span> <span class="n">press</span> <span class="n">Ctrl</span><span class="o">-</span><span class="n">C</span><span class="o">.</span>
<span class="n">New</span> <span class="n">experiment</span> <span class="n">created</span><span class="o">.</span> <span class="n">View</span> <span class="n">your</span> <span class="n">TensorBoard</span> <span class="n">at</span><span class="p">:</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">tensorboard</span><span class="o">.</span><span class="n">dev</span><span class="o">/</span><span class="n">experiment</span><span class="o">/</span><span class="n">QOGSPBgsR8KzcRMmie9JGw</span><span class="o">/</span>
<span class="p">[</span><span class="mi">2022</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">20</span><span class="n">T15</span><span class="p">:</span><span class="mi">50</span><span class="p">:</span><span class="mi">50</span><span class="p">]</span> <span class="n">Started</span> <span class="n">scanning</span> <span class="n">logdir</span><span class="o">.</span>
<span class="n">Uploading</span> <span class="mi">4468</span> <span class="n">scalars</span><span class="o">...</span>
<span class="p">[</span><span class="mi">2022</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">20</span><span class="n">T15</span><span class="p">:</span><span class="mi">53</span><span class="p">:</span><span class="mi">02</span><span class="p">]</span> <span class="n">Total</span> <span class="n">uploaded</span><span class="p">:</span> <span class="mi">210171</span> <span class="n">scalars</span><span class="p">,</span> <span class="mi">0</span> <span class="n">tensors</span><span class="p">,</span> <span class="mi">0</span> <span class="n">binary</span> <span class="n">objects</span>
<span class="n">Listening</span> <span class="k">for</span> <span class="n">new</span> <span class="n">data</span> <span class="ow">in</span> <span class="n">logdir</span><span class="o">...</span>
</pre></div>
</div>
</div></blockquote>
<p>Note there is a URL in the above output. Click it and you will see
the following screenshot:</p>
<blockquote>
<div><figure class="align-center" id="id7">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/QOGSPBgsR8KzcRMmie9JGw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id7" title="Permalink to this image"></a></p>
</figcaption>
</figure>
</div></blockquote>
</li>
</ul>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
<p>It will print the following message:</p>
<blockquote>
<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Serving</span> <span class="n">TensorBoard</span> <span class="n">on</span> <span class="n">localhost</span><span class="p">;</span> <span class="n">to</span> <span class="n">expose</span> <span class="n">to</span> <span class="n">the</span> <span class="n">network</span><span class="p">,</span> <span class="n">use</span> <span class="n">a</span> <span class="n">proxy</span> <span class="ow">or</span> <span class="k">pass</span> <span class="o">--</span><span class="n">bind_all</span>
<span class="n">TensorBoard</span> <span class="mf">2.8.0</span> <span class="n">at</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">localhost</span><span class="p">:</span><span class="mi">6008</span><span class="o">/</span> <span class="p">(</span><span class="n">Press</span> <span class="n">CTRL</span><span class="o">+</span><span class="n">C</span> <span class="n">to</span> <span class="n">quit</span><span class="p">)</span>
</pre></div>
</div>
</div></blockquote>
<p>Now start your browser and go to <a class="reference external" href="http://localhost:6008">http://localhost:6008</a> to view the tensorboard
logs.</p>
</div>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">log/log-train-xxxx</span></code></p>
<p>It is the detailed training log in text format, same as the one
you saw printed to the console during training.</p>
</li>
</ul>
</div></blockquote>
</section>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 6 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3,4,5&quot;</span>
./pruned_transducer_stateless4/train.py <span class="se">\</span>
--world-size <span class="m">6</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">300</span>
</pre></div>
</div>
</section>
</section>
<section id="decoding">
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>There are two kinds of checkpoints:</p>
<blockquote>
<div><ul class="simple">
<li><p>(1) <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, which are saved at the end
of each epoch. You can pass <code class="docutils literal notranslate"><span class="pre">--epoch</span></code> to
<code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code> to use them.</p></li>
<li><p>(2) <code class="docutils literal notranslate"><span class="pre">checkpoints-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-438000.pt</span></code>, …, which are saved
every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches. You can pass <code class="docutils literal notranslate"><span class="pre">--iter</span></code> to
<code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code> to use them.</p></li>
</ul>
<p>We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/decode.py --help
</pre></div>
</div>
<p>shows the options for decoding.</p>
<p>The following shows two examples (for two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>Supporting decoding methods are as follows:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">greedy_search</span></code> : It takes the symbol with largest posterior probability
of each frame as the decoding result.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">beam_search</span></code> : It implements Algorithm 1 in <a class="reference external" href="https://arxiv.org/pdf/1211.3711.pdf">https://arxiv.org/pdf/1211.3711.pdf</a> and
<a class="reference external" href="https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search_transducer.py#L247">espnet/nets/beam_search_transducer.py</a>
is used as a reference. Basicly, it keeps topk states for each frame, and expands the kept states with their own contexts to
next frame.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code> : It implements the same algorithm as <code class="docutils literal notranslate"><span class="pre">beam_search</span></code> above, but it
runs in batch mode with <code class="docutils literal notranslate"><span class="pre">--max-sym-per-frame=1</span></code> being hardcoded.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> : It implements graph composition between the output <code class="docutils literal notranslate"><span class="pre">log_probs</span></code> and
given <code class="docutils literal notranslate"><span class="pre">FSAs</span></code>. It is hard to describe the details in several lines of texts, you can read
our paper in <a class="reference external" href="https://arxiv.org/pdf/2211.00484.pdf">https://arxiv.org/pdf/2211.00484.pdf</a> or our <a class="reference external" href="https://github.com/k2-fsa/k2/blob/master/k2/csrc/rnnt_decode.h">rnnt decode code in k2</a>. <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> can decode with <code class="docutils literal notranslate"><span class="pre">FSAs</span></code> on GPU efficiently.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search_LG</span></code> : The same as <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> above, <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> uses
an trivial graph that has only one state, while <code class="docutils literal notranslate"><span class="pre">fast_beam_search_LG</span></code> uses an LG graph
(with N-gram LM).</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search_nbest</span></code> : It produces the decoding results as follows:</p>
<ul>
<li><ol class="arabic simple">
<li><p>Use <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> to get a lattice</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="2">
<li><p>Select <code class="docutils literal notranslate"><span class="pre">num_paths</span></code> paths from the lattice using <code class="docutils literal notranslate"><span class="pre">k2.random_paths()</span></code></p></li>
</ol>
</li>
<li><ol class="arabic simple" start="3">
<li><p>Unique the selected paths</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="4">
<li><p>Intersect the selected paths with the lattice and compute the
shortest path from the intersection result</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="5">
<li><p>The path with the largest score is used as the decoding output.</p></li>
</ol>
</li>
</ul>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search_nbest_LG</span></code> : It implements same logic as <code class="docutils literal notranslate"><span class="pre">fast_beam_search_nbest</span></code>, the
only difference is that it uses <code class="docutils literal notranslate"><span class="pre">fast_beam_search_LG</span></code> to generate the lattice.</p></li>
</ul>
</div></blockquote>
</div>
</section>
<section id="export-model">
<h2>Export Model<a class="headerlink" href="#export-model" title="Permalink to this heading"></a></h2>
<p><a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/export.py">pruned_transducer_stateless4/export.py</a> supports exporting checkpoints from <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/exp</span></code> in the following ways.</p>
<section id="export-model-state-dict">
<h3>Export <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code><a class="headerlink" href="#export-model-state-dict" title="Permalink to this heading"></a></h3>
<p>Checkpoints saved by <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/train.py</span></code> also include
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Assume that --epoch 25 --avg 3 produces the smallest WER</span>
<span class="c1"># (You can get such information after running ./pruned_transducer_stateless4/decode.py)</span>
<span class="nv">epoch</span><span class="o">=</span><span class="m">25</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">3</span>
./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span>
</pre></div>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp/pretrained.pt</span></code>.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp
ln -s pretrained.pt epoch-999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/decode.py</span></code>.</p>
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
load it by <code class="docutils literal notranslate"><span class="pre">torch.jit.load(&quot;cpu_jit.pt&quot;)</span></code>.</p>
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to(&quot;cuda&quot;)</span></code> to move them to a CUDA device.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>You will need this <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> when deploying with Sherpa framework.</p>
</div>
</section>
</section>
<section id="download-pretrained-models">
<h2>Download pretrained models<a class="headerlink" href="#download-pretrained-models" title="Permalink to this heading"></a></h2>
<p>If you dont want to train from scratch, you can download the pretrained models
by visiting the following links:</p>
<blockquote>
<div><ul class="simple">
<li><p><a class="reference external" href="https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless-2022-03-12">pruned_transducer_stateless</a></p></li>
<li><p><a class="reference external" href="https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless2-2022-04-29">pruned_transducer_stateless2</a></p></li>
<li><p><a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless4-2022-06-03">pruned_transducer_stateless4</a></p></li>
<li><p><a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless5-2022-07-07">pruned_transducer_stateless5</a></p></li>
</ul>
<p>See <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md">https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md</a>
for the details of the above pretrained models</p>
</div></blockquote>
</section>
<section id="deploy-with-sherpa">
<h2>Deploy with Sherpa<a class="headerlink" href="#deploy-with-sherpa" title="Permalink to this heading"></a></h2>
<p>Please see <a class="reference external" href="https://k2-fsa.github.io/sherpa/python/offline_asr/conformer/librispeech.html#">https://k2-fsa.github.io/sherpa/python/offline_asr/conformer/librispeech.html#</a>
for how to deploy the models in <code class="docutils literal notranslate"><span class="pre">sherpa</span></code>.</p>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="conformer_ctc.html" class="btn btn-neutral float-left" title="Conformer CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="zipformer_mmi.html" class="btn btn-neutral float-right" title="Zipformer MMI" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TDNN-LSTM-CTC &mdash; icefall 0.1 documentation</title> <title>TDNN-LSTM-CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Conformer CTC" href="conformer_ctc.html" /> <link rel="next" title="Conformer CTC" href="conformer_ctc.html" />
<link rel="prev" title="LibriSpeech" href="index.html" /> <link rel="prev" title="LibriSpeech" href="index.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,31 +40,32 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current"> <li class="toctree-l4 current"><a class="current reference internal" href="#">TDNN-LSTM-CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">TDNN-LSTM-CTC</a><ul> <li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li> <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li> <li class="toctree-l4"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
<li class="toctree-l3"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -73,19 +74,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li> <li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">TDNN-LSTM-CTC</li> <li class="breadcrumb-item active">TDNN-LSTM-CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/tdnn_lstm_ctc.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -98,7 +100,7 @@
<p>This tutorial shows you how to run a TDNN-LSTM-CTC model with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p> <p>This tutorial shows you how to run a TDNN-LSTM-CTC model with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<section id="data-preparation"> <section id="data-preparation">

View File

@ -5,23 +5,23 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Zipformer MMI &mdash; icefall 0.1 documentation</title> <title>Zipformer MMI &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TIMIT" href="../timit/index.html" /> <link rel="next" title="TIMIT" href="../timit/index.html" />
<link rel="prev" title="LSTM Transducer" href="lstm_pruned_stateless_transducer.html" /> <link rel="prev" title="Pruned transducer statelessX" href="pruned_transducer_stateless.html" />
</head> </head>
<body class="wy-body-for-nav"> <body class="wy-body-for-nav">
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,31 +40,32 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current"> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li> <li class="toctree-l4 current"><a class="current reference internal" href="#">Zipformer MMI</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">Zipformer MMI</a><ul> </ul>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li> </li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
<li class="toctree-l4"><a class="reference internal" href="#export-models">Export models</a></li> </ul>
<li class="toctree-l4"><a class="reference internal" href="#download-pretrained-models">Download pretrained models</a></li> </li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>
</li> <ul>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li> <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -73,19 +74,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li> <li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">Zipformer MMI</li> <li class="breadcrumb-item active">Zipformer MMI</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/librispeech/zipformer_mmi.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/librispeech/zipformer_mmi.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -493,7 +495,7 @@ for the details of the above pretrained models</p>
</div> </div>
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-left" title="LSTM Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="pruned_transducer_stateless.html" class="btn btn-neutral float-left" title="Pruned transducer statelessX" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="../timit/index.html" class="btn btn-neutral float-right" title="TIMIT" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>

View File

@ -0,0 +1,136 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TIMIT &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../../_static/jquery.js"></script>
<script src="../../../_static/underscore.js"></script>
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../../_static/doctools.js"></script>
<script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
<link rel="prev" title="Zipformer MMI" href="../librispeech/zipformer_mmi.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">TIMIT</a><ul>
<li class="toctree-l4"><a class="reference internal" href="tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item active">TIMIT</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/timit/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="timit">
<h1>TIMIT<a class="headerlink" href="#timit" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li>
<li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
</ul>
</div>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../librispeech/zipformer_mmi.html" class="btn btn-neutral float-left" title="Zipformer MMI" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TDNN-LiGRU-CTC &mdash; icefall 0.1 documentation</title> <title>TDNN-LiGRU-CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" /> <link rel="next" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" />
<link rel="prev" title="TIMIT" href="index.html" /> <link rel="prev" title="TIMIT" href="index.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,29 +40,30 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l3 current"><a class="reference internal" href="index.html">TIMIT</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">TIMIT</a><ul class="current"> <li class="toctree-l4 current"><a class="current reference internal" href="#">TDNN-LiGRU-CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">TDNN-LiGRU-CTC</a><ul> <li class="toctree-l4"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -71,19 +72,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">TIMIT</a></li> <li class="breadcrumb-item"><a href="index.html">TIMIT</a></li>
<li class="breadcrumb-item active">TDNN-LiGRU-CTC</li> <li class="breadcrumb-item active">TDNN-LiGRU-CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/timit/tdnn_ligru_ctc.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/timit/tdnn_ligru_ctc.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -96,7 +98,7 @@
<p>This tutorial shows you how to run a TDNN-LiGRU-CTC model with the <a class="reference external" href="https://data.deepai.org/timit.zip">TIMIT</a> dataset.</p> <p>This tutorial shows you how to run a TDNN-LiGRU-CTC model with the <a class="reference external" href="https://data.deepai.org/timit.zip">TIMIT</a> dataset.</p>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<section id="data-preparation"> <section id="data-preparation">

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TDNN-LSTM-CTC &mdash; icefall 0.1 documentation</title> <title>TDNN-LSTM-CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="YesNo" href="../yesno/index.html" /> <link rel="next" title="YesNo" href="../yesno/index.html" />
<link rel="prev" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" /> <link rel="prev" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,29 +40,30 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l3 current"><a class="reference internal" href="index.html">TIMIT</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">TIMIT</a><ul class="current"> <li class="toctree-l4"><a class="reference internal" href="tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li> <li class="toctree-l4 current"><a class="current reference internal" href="#">TDNN-LSTM-CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">TDNN-LSTM-CTC</a><ul> </ul>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li> </li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li> <li class="toctree-l3"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li> </ul>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li> </li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>
</li> <ul>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li> <li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
</ul> <li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -71,19 +72,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">TIMIT</a></li> <li class="breadcrumb-item"><a href="index.html">TIMIT</a></li>
<li class="breadcrumb-item active">TDNN-LSTM-CTC</li> <li class="breadcrumb-item active">TDNN-LSTM-CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/timit/tdnn_lstm_ctc.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/timit/tdnn_lstm_ctc.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -96,7 +98,7 @@
<p>This tutorial shows you how to run a TDNN-LSTM-CTC model with the <a class="reference external" href="https://data.deepai.org/timit.zip">TIMIT</a> dataset.</p> <p>This tutorial shows you how to run a TDNN-LSTM-CTC model with the <a class="reference external" href="https://data.deepai.org/timit.zip">TIMIT</a> dataset.</p>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<section id="data-preparation"> <section id="data-preparation">

View File

@ -5,21 +5,21 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>YesNo &mdash; icefall 0.1 documentation</title> <title>YesNo &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="TDNN-CTC" href="tdnn.html" /> <link rel="next" title="TDNN-CTC" href="tdnn.html" />
<link rel="prev" title="TDNN-LSTM-CTC" href="../timit/tdnn_lstm_ctc.html" /> <link rel="prev" title="TDNN-LSTM-CTC" href="../timit/tdnn_lstm_ctc.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,21 +40,29 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l3 current"><a class="current reference internal" href="#">YesNo</a><ul>
<li class="toctree-l2 current"><a class="current reference internal" href="#">YesNo</a><ul> <li class="toctree-l4"><a class="reference internal" href="tdnn.html">TDNN-CTC</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn.html">TDNN-CTC</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> </ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -63,18 +71,19 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item active">YesNo</li> <li class="breadcrumb-item active">YesNo</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/yesno/index.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/yesno/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>

View File

@ -5,22 +5,22 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TDNN-CTC &mdash; icefall 0.1 documentation</title> <title>TDNN-CTC &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Contributing" href="../../contributing/index.html" /> <link rel="next" title="Streaming ASR" href="../../Streaming-ASR/index.html" />
<link rel="prev" title="YesNo" href="index.html" /> <link rel="prev" title="YesNo" href="index.html" />
</head> </head>
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,28 +40,29 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Non Streaming ASR</a><ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l3"><a class="reference internal" href="../aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l3"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li> <li class="toctree-l3 current"><a class="reference internal" href="index.html">YesNo</a><ul class="current">
<li class="toctree-l2 current"><a class="reference internal" href="index.html">YesNo</a><ul class="current"> <li class="toctree-l4 current"><a class="current reference internal" href="#">TDNN-CTC</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">TDNN-CTC</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#pre-trained-model">Pre-trained Model</a></li>
<li class="toctree-l4"><a class="reference internal" href="#colab-notebook">Colab notebook</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../../Streaming-ASR/index.html">Streaming ASR</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -70,19 +71,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Non Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">YesNo</a></li> <li class="breadcrumb-item"><a href="index.html">YesNo</a></li>
<li class="breadcrumb-item active">TDNN-CTC</li> <li class="breadcrumb-item active">TDNN-CTC</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/yesno/tdnn.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Non-streaming-ASR/yesno/tdnn.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -188,7 +190,7 @@ for training as well as for decoding.</p>
</div></blockquote> </div></blockquote>
<div class="admonition hint"> <div class="admonition hint">
<p class="admonition-title">Hint</p> <p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup <p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p> the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div> </div>
<div class="admonition hint"> <div class="admonition hint">
@ -281,7 +283,7 @@ $ tensorboard dev upload --logdir . --description <span class="s2">&quot;TDNN tr
the following screenshot:</p> the following screenshot:</p>
<blockquote> <blockquote>
<div><figure class="align-center" id="id1"> <div><figure class="align-center" id="id1">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/"><img alt="TensorBoard screenshot" src="../../_images/tdnn-tensorboard-log.png" style="width: 600px;" /></a> <a class="reference external image-reference" href="https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/"><img alt="TensorBoard screenshot" src="../../../_images/tdnn-tensorboard-log.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 6 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 6 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id1" title="Permalink to this image"></a></p>
</figcaption> </figcaption>
@ -548,7 +550,7 @@ $ ./tdnn/pretrained.py --help
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="index.html" class="btn btn-neutral float-left" title="YesNo" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="index.html" class="btn btn-neutral float-left" title="YesNo" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../../contributing/index.html" class="btn btn-neutral float-right" title="Contributing" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="../../Streaming-ASR/index.html" class="btn btn-neutral float-right" title="Streaming ASR" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>
<hr/> <hr/>

View File

@ -4,7 +4,7 @@
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" /> <meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TIMIT &mdash; icefall 0.1 documentation</title> <title>Streaming ASR &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
@ -20,8 +20,8 @@
<script src="../../_static/js/theme.js"></script> <script src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" /> <link rel="next" title="Introduction" href="introduction.html" />
<link rel="prev" title="Zipformer MMI" href="../librispeech/zipformer_mmi.html" /> <link rel="prev" title="TDNN-CTC" href="../Non-streaming-ASR/yesno/tdnn.html" />
</head> </head>
<body class="wy-body-for-nav"> <body class="wy-body-for-nav">
@ -40,20 +40,22 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l2"><a class="reference internal" href="../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="../librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="#">Streaming ASR</a><ul>
<li class="toctree-l2 current"><a class="current reference internal" href="#">TIMIT</a><ul> <li class="toctree-l3"><a class="reference internal" href="introduction.html">Introduction</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li> <li class="toctree-l3"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
@ -73,9 +75,9 @@
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../index.html">Recipes</a></li>
<li class="breadcrumb-item active">TIMIT</li> <li class="breadcrumb-item active">Streaming ASR</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/timit/index.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Streaming-ASR/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -83,12 +85,20 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody"> <div itemprop="articleBody">
<section id="timit"> <section id="streaming-asr">
<h1>TIMIT<a class="headerlink" href="#timit" title="Permalink to this heading"></a></h1> <h1>Streaming ASR<a class="headerlink" href="#streaming-asr" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound"> <div class="toctree-wrapper compound">
<ul> <ul>
<li class="toctree-l1"><a class="reference internal" href="tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li> <li class="toctree-l1"><a class="reference internal" href="introduction.html">Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> </ul>
</div>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a><ul>
<li class="toctree-l2"><a class="reference internal" href="librispeech/pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
</ul>
</li>
</ul> </ul>
</div> </div>
</section> </section>
@ -97,8 +107,8 @@
</div> </div>
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../librispeech/zipformer_mmi.html" class="btn btn-neutral float-left" title="Zipformer MMI" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="../Non-streaming-ASR/yesno/tdnn.html" class="btn btn-neutral float-left" title="TDNN-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="introduction.html" class="btn btn-neutral float-right" title="Introduction" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>
<hr/> <hr/>

View File

@ -0,0 +1,179 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Introduction &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="LibriSpeech" href="librispeech/index.html" />
<link rel="prev" title="Streaming ASR" href="index.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">Streaming ASR</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">Introduction</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#streaming-conformer">Streaming Conformer</a></li>
<li class="toctree-l4"><a class="reference internal" href="#streaming-emformer">Streaming Emformer</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a></li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="index.html">Streaming ASR</a></li>
<li class="breadcrumb-item active">Introduction</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Streaming-ASR/introduction.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="introduction">
<h1>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading"></a></h1>
<p>This page shows you how we implement streaming <strong>X-former transducer</strong> models for ASR.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>X-former transducer here means the encoder of the transducer model uses Multi-Head Attention,
like <a class="reference external" href="https://arxiv.org/pdf/2005.08100.pdf">Conformer</a>, <a class="reference external" href="https://arxiv.org/pdf/2010.10759.pdf">EmFormer</a> etc.</p>
</div>
<p>Currently we have implemented two types of streaming models, one uses Conformer as encoder, the other uses Emformer as encoder.</p>
<section id="streaming-conformer">
<h2>Streaming Conformer<a class="headerlink" href="#streaming-conformer" title="Permalink to this heading"></a></h2>
<p>The main idea of training a streaming model is to make the model see limited contexts
in training time, we can achieve this by applying a mask to the output of self-attention.
In icefall, we implement the streaming conformer the way just like what <a class="reference external" href="https://arxiv.org/pdf/2012.05481.pdf">WeNet</a> did.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The conformer-transducer recipes in LibriSpeech datasets, like, <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless">pruned_transducer_stateless</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2">pruned_transducer_stateless2</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless3">pruned_transducer_stateless3</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4">pruned_transducer_stateless4</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5">pruned_transducer_stateless5</a>
all support streaming.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>Training a streaming conformer model in <code class="docutils literal notranslate"><span class="pre">icefall</span></code> is almost the same as training a
non-streaming model, all you need to do is passing several extra arguments.
See <a class="reference internal" href="librispeech/pruned_transducer_stateless.html"><span class="doc">Pruned transducer statelessX</span></a> for more details.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you want to adapt a non-streaming conformer model to be streaming, please refer
to <a class="reference external" href="https://github.com/k2-fsa/icefall/pull/454">this pull request</a>.</p>
</div>
</section>
<section id="streaming-emformer">
<h2>Streaming Emformer<a class="headerlink" href="#streaming-emformer" title="Permalink to this heading"></a></h2>
<p>The Emformer model proposed <a class="reference external" href="https://arxiv.org/pdf/2010.10759.pdf">here</a> uses more
complicated techniques. It has a memory bank component to memorize history information,
what more, it also introduces right context in training time by hard-copying part of
the input features.</p>
<p>We have three variants of Emformer models in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">pruned_stateless_emformer_rnnt2</span></code> using Emformer from torchaudio, see <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_stateless_emformer_rnnt2">LibriSpeech recipe</a>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">conv_emformer_transducer_stateless</span></code> using ConvEmformer implemented by ourself. Different from the Emformer in torchaudio,
ConvEmformer has a convolution in each layer and uses the mechanisms in our reworked conformer model.
See <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless">LibriSpeech recipe</a>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">conv_emformer_transducer_stateless2</span></code> using ConvEmformer implemented by ourself. The only difference from the above one is that
it uses a simplified memory bank. See <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conv_emformer_transducer_stateless2">LibriSpeech recipe</a>.</p></li>
</ul>
</div></blockquote>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="index.html" class="btn btn-neutral float-left" title="Streaming ASR" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="librispeech/index.html" class="btn btn-neutral float-right" title="LibriSpeech" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -0,0 +1,134 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>LibriSpeech &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../../_static/jquery.js"></script>
<script src="../../../_static/underscore.js"></script>
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../../_static/doctools.js"></script>
<script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Pruned transducer statelessX" href="pruned_transducer_stateless.html" />
<link rel="prev" title="Introduction" href="../introduction.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Streaming ASR</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../introduction.html">Introduction</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">LibriSpeech</a><ul>
<li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Streaming ASR</a></li>
<li class="breadcrumb-item active">LibriSpeech</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Streaming-ASR/librispeech/index.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="librispeech">
<h1>LibriSpeech<a class="headerlink" href="#librispeech" title="Permalink to this heading"></a></h1>
<div class="toctree-wrapper compound">
<ul>
<li class="toctree-l1"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l1"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
</ul>
</div>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../introduction.html" class="btn btn-neutral float-left" title="Introduction" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="pruned_transducer_stateless.html" class="btn btn-neutral float-right" title="Pruned transducer statelessX" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -5,23 +5,23 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>LSTM Transducer &mdash; icefall 0.1 documentation</title> <title>LSTM Transducer &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]> <!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script> <script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]--> <![endif]-->
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script> <script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script> <script src="../../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script> <script src="../../../_static/underscore.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script> <script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../_static/doctools.js"></script> <script src="../../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script> <script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../_static/js/theme.js"></script> <script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" /> <link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" /> <link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="Zipformer MMI" href="zipformer_mmi.html" /> <link rel="next" title="Contributing" href="../../../contributing/index.html" />
<link rel="prev" title="Conformer CTC" href="conformer_ctc.html" /> <link rel="prev" title="Pruned transducer statelessX" href="pruned_transducer_stateless.html" />
</head> </head>
<body class="wy-body-for-nav"> <body class="wy-body-for-nav">
@ -29,10 +29,10 @@
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> <nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll"> <div class="wy-side-scroll">
<div class="wy-side-nav-search" > <div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> icefall <a href="../../../index.html" class="icon icon-home"> icefall
</a> </a>
<div role="search"> <div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" /> <input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" /> <input type="hidden" name="area" value="default" />
@ -40,32 +40,28 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current"> <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../installation/index.html">Installation</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../model-export/index.html">Model export</a></li> <li class="toctree-l2"><a class="reference internal" href="../../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">Recipes</a><ul class="current"> <li class="toctree-l2 current"><a class="reference internal" href="../index.html">Streaming ASR</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../aishell/index.html">aishell</a></li> <li class="toctree-l3"><a class="reference internal" href="../introduction.html">Introduction</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current"> <li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l4"><a class="reference internal" href="pruned_transducer_stateless.html">Pruned transducer statelessX</a></li>
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l4 current"><a class="current reference internal" href="#">LSTM Transducer</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">LSTM Transducer</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#which-model-to-use">Which model to use</a></li>
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
<li class="toctree-l4"><a class="reference internal" href="#decoding">Decoding</a></li>
<li class="toctree-l4"><a class="reference internal" href="#export-models">Export models</a></li>
<li class="toctree-l4"><a class="reference internal" href="#download-pretrained-models">Download pretrained models</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l3"><a class="reference internal" href="zipformer_mmi.html">Zipformer MMI</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li> </ul>
<li class="toctree-l1"><a class="reference internal" href="../../huggingface/index.html">Huggingface</a></li> <ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
</div> </div>
@ -74,19 +70,20 @@
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" > <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">icefall</a> <a href="../../../index.html">icefall</a>
</nav> </nav>
<div class="wy-nav-content"> <div class="wy-nav-content">
<div class="rst-content"> <div class="rst-content">
<div role="navigation" aria-label="Page navigation"> <div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs"> <ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a></li> <li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">Recipes</a></li> <li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li> <li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">LSTM Transducer</li> <li class="breadcrumb-item active">LSTM Transducer</li>
<li class="wy-breadcrumbs-aside"> <li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a> <a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Streaming-ASR/librispeech/lstm_pruned_stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a>
</li> </li>
</ul> </ul>
<hr/> <hr/>
@ -379,9 +376,9 @@ $ tensorboard dev upload --logdir . --description <span class="s2">&quot;LSTM tr
the following screenshot:</p> the following screenshot:</p>
<blockquote> <blockquote>
<div><figure class="align-center" id="id3"> <div><figure class="align-center" id="id3">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../_images/librispeech-lstm-transducer-tensorboard-log.png" style="width: 600px;" /></a> <a class="reference external image-reference" href="https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/"><img alt="TensorBoard screenshot" src="../../../_images/librispeech-lstm-transducer-tensorboard-log.png" style="width: 600px;" /></a>
<figcaption> <figcaption>
<p><span class="caption-number">Fig. 5 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p> <p><span class="caption-number">Fig. 8 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id3" title="Permalink to this image"></a></p>
</figcaption> </figcaption>
</figure> </figure>
</div></blockquote> </div></blockquote>
@ -701,8 +698,8 @@ for the details of the above pretrained models</p>
</div> </div>
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="conformer_ctc.html" class="btn btn-neutral float-left" title="Conformer CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="pruned_transducer_stateless.html" class="btn btn-neutral float-left" title="Pruned transducer statelessX" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="zipformer_mmi.html" class="btn btn-neutral float-right" title="Zipformer MMI" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="../../../contributing/index.html" class="btn btn-neutral float-right" title="Contributing" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>
<hr/> <hr/>

View File

@ -0,0 +1,825 @@
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Pruned transducer statelessX &mdash; icefall 0.1 documentation</title>
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../../_static/jquery.js"></script>
<script src="../../../_static/underscore.js"></script>
<script src="../../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../../../_static/doctools.js"></script>
<script src="../../../_static/sphinx_highlight.js"></script>
<script src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="next" title="LSTM Transducer" href="lstm_pruned_stateless_transducer.html" />
<link rel="prev" title="LibriSpeech" href="index.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home"> icefall
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="../../index.html">Recipes</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../../Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../index.html">Streaming ASR</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../introduction.html">Introduction</a></li>
<li class="toctree-l3 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
<li class="toctree-l4 current"><a class="current reference internal" href="#">Pruned transducer statelessX</a></li>
<li class="toctree-l4"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../huggingface/index.html">Huggingface</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">icefall</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home"></a></li>
<li class="breadcrumb-item"><a href="../../index.html">Recipes</a></li>
<li class="breadcrumb-item"><a href="../index.html">Streaming ASR</a></li>
<li class="breadcrumb-item"><a href="index.html">LibriSpeech</a></li>
<li class="breadcrumb-item active">Pruned transducer statelessX</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/k2-fsa/icefall/blob/master/docs/source/recipes/Streaming-ASR/librispeech/pruned_transducer_stateless.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="pruned-transducer-statelessx">
<h1>Pruned transducer statelessX<a class="headerlink" href="#pruned-transducer-statelessx" title="Permalink to this heading"></a></h1>
<p>This tutorial shows you how to run a <strong>streaming</strong> conformer transducer model
with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The tutorial is suitable for <a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless">pruned_transducer_stateless</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless2">pruned_transducer_stateless2</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless4">pruned_transducer_stateless4</a>,
<a class="reference external" href="https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless5">pruned_transducer_stateless5</a>,
We will take pruned_transducer_stateless4 as an example in this tutorial.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>We assume you have read the page <a class="reference internal" href="../../../installation/index.html#install-icefall"><span class="std std-ref">Installation</span></a> and have setup
the environment for <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>We recommend you to use a GPU or several GPUs to run this recipe.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Please scroll down to the bottom of this page to find download links
for pretrained models if you dont want to train a model from scratch.</p>
</div>
<p>We use pruned RNN-T to compute the loss.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>You can find the paper about pruned RNN-T at the following address:</p>
<p><a class="reference external" href="https://arxiv.org/abs/2206.13236">https://arxiv.org/abs/2206.13236</a></p>
</div>
<p>The transducer model consists of 3 parts:</p>
<blockquote>
<div><ul class="simple">
<li><p>Encoder, a.k.a, the transcription network. We use a Conformer model (the reworked version by Daniel Povey)</p></li>
<li><p>Decoder, a.k.a, the prediction network. We use a stateless model consisting of
<code class="docutils literal notranslate"><span class="pre">nn.Embedding</span></code> and <code class="docutils literal notranslate"><span class="pre">nn.Conv1d</span></code></p></li>
<li><p>Joiner, a.k.a, the joint network.</p></li>
</ul>
</div></blockquote>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>Contrary to the conventional RNN-T models, we use a stateless decoder.
That is, it has no recurrent connections.</p>
</div>
<section id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this heading"></a></h2>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>The data preparation is the same as other recipes on LibriSpeech dataset,
if you have finished this step, you can skip to <code class="docutils literal notranslate"><span class="pre">Training</span></code> directly.</p>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh
</pre></div>
</div>
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
All you need to do is to run it.</p>
<p>The data preparation contains several stages, you can use the following two
options:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">--stage</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">--stop-stage</span></code></p></li>
</ul>
</div></blockquote>
<p>to control which stage(s) should be run. By default, all stages are executed.</p>
<p>For example,</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./prepare.sh --stage <span class="m">0</span> --stop-stage <span class="m">0</span>
</pre></div>
</div>
<p>means to run only stage 0.</p>
<p>To run stage 2 to stage 5, use:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./prepare.sh --stage <span class="m">2</span> --stop-stage <span class="m">5</span>
</pre></div>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you have pre-downloaded the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a>
dataset and the <a class="reference external" href="http://www.openslr.org/17/">musan</a> dataset, say,
they are saved in <code class="docutils literal notranslate"><span class="pre">/tmp/LibriSpeech</span></code> and <code class="docutils literal notranslate"><span class="pre">/tmp/musan</span></code>, you can modify
the <code class="docutils literal notranslate"><span class="pre">dl_dir</span></code> variable in <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> to point to <code class="docutils literal notranslate"><span class="pre">/tmp</span></code> so that
<code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> wont re-download them.</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>All generated files by <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>, e.g., features, lexicon, etc,
are saved in <code class="docutils literal notranslate"><span class="pre">./data</span></code> directory.</p>
</div>
<p>We provide the following YouTube video showing how to run <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>To get the latest news of <a class="reference external" href="https://github.com/k2-fsa">next-gen Kaldi</a>, please subscribe
the following YouTube channel by <a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">Nadira Povey</a>:</p>
<blockquote>
<div><p><a class="reference external" href="https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw">https://www.youtube.com/channel/UC_VaumpkmINz1pNkFXAN9mw</a></p>
</div></blockquote>
</div>
<div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://www.youtube.com/embed/ofEIoJL-mGM" style="border: 0; height: 345px; width: 560px">
</iframe></div></section>
<section id="training">
<h2>Training<a class="headerlink" href="#training" title="Permalink to this heading"></a></h2>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>We put the streaming and non-streaming model in one recipe, to train a streaming model you only
need to add <strong>4</strong> extra options comparing with training a non-streaming model. These options are
<code class="docutils literal notranslate"><span class="pre">--dynamic-chunk-training</span></code>, <code class="docutils literal notranslate"><span class="pre">--num-left-chunks</span></code>, <code class="docutils literal notranslate"><span class="pre">--causal-convolution</span></code>, <code class="docutils literal notranslate"><span class="pre">--short-chunk-size</span></code>.
You can see the configurable options below for their meanings or read <a class="reference external" href="https://arxiv.org/pdf/2012.05481.pdf">https://arxiv.org/pdf/2012.05481.pdf</a> for more details.</p>
</div>
<section id="configurable-options">
<h3>Configurable options<a class="headerlink" href="#configurable-options" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --help
</pre></div>
</div>
<p>shows you the training options that can be passed from the commandline.
The following options are used quite often:</p>
<blockquote>
<div><ul>
<li><p><code class="docutils literal notranslate"><span class="pre">--exp-dir</span></code></p>
<p>The directory to save checkpoints, training logs and tensorboard.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--full-libri</span></code></p>
<p>If its True, the training part uses all the training data, i.e.,
960 hours. Otherwise, the training part uses only the subset
<code class="docutils literal notranslate"><span class="pre">train-clean-100</span></code>, which has 100 hours of training data.</p>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>The training set is perturbed by speed with two factors: 0.9 and 1.1.
If <code class="docutils literal notranslate"><span class="pre">--full-libri</span></code> is True, each epoch actually processes
<code class="docutils literal notranslate"><span class="pre">3x960</span> <span class="pre">==</span> <span class="pre">2880</span></code> hours of data.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--num-epochs</span></code></p>
<p>It is the number of epochs to train. For instance,
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/train.py</span> <span class="pre">--num-epochs</span> <span class="pre">30</span></code> trains for 30 epochs
and generates <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, <code class="docutils literal notranslate"><span class="pre">epoch-30.pt</span></code>
in the folder <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp</span></code>.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--start-epoch</span></code></p>
<p>Its used to resume training.
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/train.py</span> <span class="pre">--start-epoch</span> <span class="pre">10</span></code> loads the
checkpoint <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp/epoch-9.pt</span></code> and starts
training from epoch 10, based on the state from epoch 9.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--world-size</span></code></p>
<p>It is used for multi-GPU single-machine DDP training.</p>
<blockquote>
<div><ul class="simple">
<li><ol class="loweralpha simple">
<li><p>If it is 1, then no DDP training is used.</p></li>
</ol>
</li>
<li><ol class="loweralpha simple" start="2">
<li><p>If it is 2, then GPU 0 and GPU 1 are used for DDP training.</p></li>
</ol>
</li>
</ul>
</div></blockquote>
<p>The following shows some use cases with it.</p>
<blockquote>
<div><p><strong>Use case 1</strong>: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,2&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">2</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 2</strong>: You have 4 GPUs and you want to use all of them
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">4</span>
</pre></div>
</div>
</div></blockquote>
<p><strong>Use case 3</strong>: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ <span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;3&quot;</span>
$ ./pruned_transducer_stateless4/train.py --world-size <span class="m">1</span>
</pre></div>
</div>
</div></blockquote>
</div></blockquote>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p>Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--max-duration</span></code></p>
<p>It specifies the number of seconds over all utterances in a
batch, before <strong>padding</strong>.
If you encounter CUDA OOM, please reduce it.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Due to padding, the number of seconds of all utterances in a
batch will usually be larger than <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code>.</p>
<p>A larger value for <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code> may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.</p>
</div>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--use-fp16</span></code></p>
<p>If it is True, the model will train with half precision, from our experiment
results, by using half precision you can train with two times larger <code class="docutils literal notranslate"><span class="pre">--max-duration</span></code>
so as to get almost 2X speed up.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--dynamic-chunk-training</span></code></p>
<p>The flag that indicates whether to train a streaming model or not, it
<strong>MUST</strong> be True if you want to train a streaming model.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--short-chunk-size</span></code></p>
<p>When training a streaming attention model with chunk masking, the chunk size
would be either max sequence length of current batch or uniformly sampled from
(1, short_chunk_size). The default value is 25, you dont have to change it most of the time.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--num-left-chunks</span></code></p>
<p>It indicates how many left context (in chunks) that can be seen when calculating attention.
The default value is 4, you dont have to change it most of the time.</p>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--causal-convolution</span></code></p>
<p>Whether to use causal convolution in conformer encoder layer, this requires
to be True when training a streaming model.</p>
</li>
</ul>
</div></blockquote>
</section>
<section id="pre-configured-options">
<h3>Pre-configured options<a class="headerlink" href="#pre-configured-options" title="Permalink to this heading"></a></h3>
<p>There are some training options, e.g., number of encoder layers,
encoder dimension, decoder dimension, number of warmup steps etc,
that are not passed from the commandline.
They are pre-configured by the function <code class="docutils literal notranslate"><span class="pre">get_params()</span></code> in
<a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/train.py">pruned_transducer_stateless4/train.py</a></p>
<p>You dont need to change these pre-configured parameters. If you really need to change
them, please modify <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/train.py</span></code> directly.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The options for <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless5/train.py">pruned_transducer_stateless5</a> are a little different from
other recipes. It allows you to configure <code class="docutils literal notranslate"><span class="pre">--num-encoder-layers</span></code>, <code class="docutils literal notranslate"><span class="pre">--dim-feedforward</span></code>, <code class="docutils literal notranslate"><span class="pre">--nhead</span></code>, <code class="docutils literal notranslate"><span class="pre">--encoder-dim</span></code>, <code class="docutils literal notranslate"><span class="pre">--decoder-dim</span></code>, <code class="docutils literal notranslate"><span class="pre">--joiner-dim</span></code> from commandline, so that you can train models with different size with pruned_transducer_stateless5.</p>
</div>
</section>
<section id="training-logs">
<h3>Training logs<a class="headerlink" href="#training-logs" title="Permalink to this heading"></a></h3>
<p>Training logs and checkpoints are saved in <code class="docutils literal notranslate"><span class="pre">--exp-dir</span></code> (e.g. <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/exp</span></code>.
You will find the following files in that directory:</p>
<blockquote>
<div><ul>
<li><p><code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …</p>
<p>These are checkpoint files saved at the end of each epoch, containing model
<code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">epoch-10.pt</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-epoch <span class="m">11</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">checkpoint-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">checkpoint-438000.pt</span></code>, …</p>
<p>These are checkpoint files saved every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches,
containing model <code class="docutils literal notranslate"><span class="pre">state_dict</span></code> and optimizer <code class="docutils literal notranslate"><span class="pre">state_dict</span></code>.
To resume training from some checkpoint, say <code class="docutils literal notranslate"><span class="pre">checkpoint-436000</span></code>, you can use:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ ./pruned_transducer_stateless4/train.py --start-batch <span class="m">436000</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">tensorboard/</span></code></p>
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
$ tensorboard dev upload --logdir . --description <span class="s2">&quot;pruned transducer training for LibriSpeech with icefall&quot;</span>
</pre></div>
</div>
</div></blockquote>
<p>It will print something like below:</p>
<blockquote>
<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">TensorFlow</span> <span class="n">installation</span> <span class="ow">not</span> <span class="n">found</span> <span class="o">-</span> <span class="n">running</span> <span class="k">with</span> <span class="n">reduced</span> <span class="n">feature</span> <span class="nb">set</span><span class="o">.</span>
<span class="n">Upload</span> <span class="n">started</span> <span class="ow">and</span> <span class="n">will</span> <span class="k">continue</span> <span class="n">reading</span> <span class="nb">any</span> <span class="n">new</span> <span class="n">data</span> <span class="k">as</span> <span class="n">it</span><span class="s1">&#39;s added to the logdir.</span>
<span class="n">To</span> <span class="n">stop</span> <span class="n">uploading</span><span class="p">,</span> <span class="n">press</span> <span class="n">Ctrl</span><span class="o">-</span><span class="n">C</span><span class="o">.</span>
<span class="n">New</span> <span class="n">experiment</span> <span class="n">created</span><span class="o">.</span> <span class="n">View</span> <span class="n">your</span> <span class="n">TensorBoard</span> <span class="n">at</span><span class="p">:</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">tensorboard</span><span class="o">.</span><span class="n">dev</span><span class="o">/</span><span class="n">experiment</span><span class="o">/</span><span class="mi">97</span><span class="n">VKXf80Ru61CnP2ALWZZg</span><span class="o">/</span>
<span class="p">[</span><span class="mi">2022</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">20</span><span class="n">T15</span><span class="p">:</span><span class="mi">50</span><span class="p">:</span><span class="mi">50</span><span class="p">]</span> <span class="n">Started</span> <span class="n">scanning</span> <span class="n">logdir</span><span class="o">.</span>
<span class="n">Uploading</span> <span class="mi">4468</span> <span class="n">scalars</span><span class="o">...</span>
<span class="p">[</span><span class="mi">2022</span><span class="o">-</span><span class="mi">11</span><span class="o">-</span><span class="mi">20</span><span class="n">T15</span><span class="p">:</span><span class="mi">53</span><span class="p">:</span><span class="mi">02</span><span class="p">]</span> <span class="n">Total</span> <span class="n">uploaded</span><span class="p">:</span> <span class="mi">210171</span> <span class="n">scalars</span><span class="p">,</span> <span class="mi">0</span> <span class="n">tensors</span><span class="p">,</span> <span class="mi">0</span> <span class="n">binary</span> <span class="n">objects</span>
<span class="n">Listening</span> <span class="k">for</span> <span class="n">new</span> <span class="n">data</span> <span class="ow">in</span> <span class="n">logdir</span><span class="o">...</span>
</pre></div>
</div>
</div></blockquote>
<p>Note there is a URL in the above output. Click it and you will see
the following screenshot:</p>
<blockquote>
<div><figure class="align-center" id="id7">
<a class="reference external image-reference" href="https://tensorboard.dev/experiment/97VKXf80Ru61CnP2ALWZZg/"><img alt="TensorBoard screenshot" src="../../../_images/streaming-librispeech-pruned-transducer-tensorboard-log.jpg" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">Fig. 7 </span><span class="caption-text">TensorBoard screenshot.</span><a class="headerlink" href="#id7" title="Permalink to this image"></a></p>
</figcaption>
</figure>
</div></blockquote>
</li>
</ul>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>If you dont have access to google, you can use the following command
to view the tensorboard log locally:</p>
<blockquote>
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp/tensorboard
tensorboard --logdir . --port <span class="m">6008</span>
</pre></div>
</div>
</div></blockquote>
<p>It will print the following message:</p>
<blockquote>
<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Serving</span> <span class="n">TensorBoard</span> <span class="n">on</span> <span class="n">localhost</span><span class="p">;</span> <span class="n">to</span> <span class="n">expose</span> <span class="n">to</span> <span class="n">the</span> <span class="n">network</span><span class="p">,</span> <span class="n">use</span> <span class="n">a</span> <span class="n">proxy</span> <span class="ow">or</span> <span class="k">pass</span> <span class="o">--</span><span class="n">bind_all</span>
<span class="n">TensorBoard</span> <span class="mf">2.8.0</span> <span class="n">at</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">localhost</span><span class="p">:</span><span class="mi">6008</span><span class="o">/</span> <span class="p">(</span><span class="n">Press</span> <span class="n">CTRL</span><span class="o">+</span><span class="n">C</span> <span class="n">to</span> <span class="n">quit</span><span class="p">)</span>
</pre></div>
</div>
</div></blockquote>
<p>Now start your browser and go to <a class="reference external" href="http://localhost:6008">http://localhost:6008</a> to view the tensorboard
logs.</p>
</div>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">log/log-train-xxxx</span></code></p>
<p>It is the detailed training log in text format, same as the one
you saw printed to the console during training.</p>
</li>
</ul>
</div></blockquote>
</section>
<section id="usage-example">
<h3>Usage example<a class="headerlink" href="#usage-example" title="Permalink to this heading"></a></h3>
<p>You can use the following command to start the training using 4 GPUs:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">export</span> <span class="nv">CUDA_VISIBLE_DEVICES</span><span class="o">=</span><span class="s2">&quot;0,1,2,3&quot;</span>
./pruned_transducer_stateless4/train.py <span class="se">\</span>
--world-size <span class="m">4</span> <span class="se">\</span>
--dynamic-chunk-training <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--num-epochs <span class="m">30</span> <span class="se">\</span>
--start-epoch <span class="m">1</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--full-libri <span class="m">1</span> <span class="se">\</span>
--max-duration <span class="m">300</span>
</pre></div>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>Comparing with training a non-streaming model, you only need to add two extra options,
<code class="docutils literal notranslate"><span class="pre">--dynamic-chunk-training</span> <span class="pre">1</span></code> and <code class="docutils literal notranslate"><span class="pre">--causal-convolution</span> <span class="pre">1</span></code> .</p>
</div>
</section>
</section>
<section id="decoding">
<h2>Decoding<a class="headerlink" href="#decoding" title="Permalink to this heading"></a></h2>
<p>The decoding part uses checkpoints saved by the training part, so you have
to run the training part first.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>There are two kinds of checkpoints:</p>
<blockquote>
<div><ul class="simple">
<li><p>(1) <code class="docutils literal notranslate"><span class="pre">epoch-1.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-2.pt</span></code>, …, which are saved at the end
of each epoch. You can pass <code class="docutils literal notranslate"><span class="pre">--epoch</span></code> to
<code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code> to use them.</p></li>
<li><p>(2) <code class="docutils literal notranslate"><span class="pre">checkpoints-436000.pt</span></code>, <code class="docutils literal notranslate"><span class="pre">epoch-438000.pt</span></code>, …, which are saved
every <code class="docutils literal notranslate"><span class="pre">--save-every-n</span></code> batches. You can pass <code class="docutils literal notranslate"><span class="pre">--iter</span></code> to
<code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code> to use them.</p></li>
</ul>
<p>We suggest that you try both types of checkpoints and choose the one
that produces the lowest WERs.</p>
</div></blockquote>
</div>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>To decode a streaming model, you can use either <code class="docutils literal notranslate"><span class="pre">simulate</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code> in <code class="docutils literal notranslate"><span class="pre">decode.py</span></code> or
<code class="docutils literal notranslate"><span class="pre">real</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code> in <code class="docutils literal notranslate"><span class="pre">streaming_decode.py</span></code>, the difference between <code class="docutils literal notranslate"><span class="pre">decode.py</span></code> and
<code class="docutils literal notranslate"><span class="pre">streaming_decode.py</span></code> is that, <code class="docutils literal notranslate"><span class="pre">decode.py</span></code> processes the whole acoustic frames at one time with masking (i.e. same as training),
but <code class="docutils literal notranslate"><span class="pre">streaming_decode.py</span></code> processes the acoustic frames chunk by chunk (so it can only see limited context).</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p><code class="docutils literal notranslate"><span class="pre">simulate</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code> in <code class="docutils literal notranslate"><span class="pre">decode.py</span></code> and <code class="docutils literal notranslate"><span class="pre">real</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code> in <code class="docutils literal notranslate"><span class="pre">streaming_decode.py</span></code> should
produce almost the same results given the same <code class="docutils literal notranslate"><span class="pre">--decode-chunk-size</span></code> and <code class="docutils literal notranslate"><span class="pre">--left-context</span></code>.</p>
</div>
<section id="simulate-streaming-decoding">
<h3>Simulate streaming decoding<a class="headerlink" href="#simulate-streaming-decoding" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/decode.py --help
</pre></div>
</div>
<p>shows the options for decoding.
The following options are important for streaming models:</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">--simulate-streaming</span></code></p>
<blockquote>
<div><p>If you want to decode a streaming model with <code class="docutils literal notranslate"><span class="pre">decode.py</span></code>, you <strong>MUST</strong> set
<code class="docutils literal notranslate"><span class="pre">--simulate-streaming</span></code> to <code class="docutils literal notranslate"><span class="pre">True</span></code>. <code class="docutils literal notranslate"><span class="pre">simulate</span></code> here means the acoustic frames
are not processed frame by frame (or chunk by chunk), instead, the whole sequence
is processed at one time with masking (the same as training).</p>
</div></blockquote>
<p><code class="docutils literal notranslate"><span class="pre">--causal-convolution</span></code></p>
<blockquote>
<div><p>If True, the convolution module in encoder layers will be causal convolution.
This is <strong>MUST</strong> be True when decoding with a streaming model.</p>
</div></blockquote>
<p><code class="docutils literal notranslate"><span class="pre">--decode-chunk-size</span></code></p>
<blockquote>
<div><p>For streaming models, we will calculate the chunk-wise attention, <code class="docutils literal notranslate"><span class="pre">--decode-chunk-size</span></code>
indicates the chunk length (in frames after subsampling) for chunk-wise attention.
For <code class="docutils literal notranslate"><span class="pre">simulate</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code> the <code class="docutils literal notranslate"><span class="pre">decode-chunk-size</span></code> is used to generate
the attention mask.</p>
</div></blockquote>
<p><code class="docutils literal notranslate"><span class="pre">--left-context</span></code></p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">--left-context</span></code> indicates how many left context frames (after subsampling) can be seen
for current chunk when calculating chunk-wise attention. Normally, <code class="docutils literal notranslate"><span class="pre">left-context</span></code> should equal
to <code class="docutils literal notranslate"><span class="pre">decode-chunk-size</span> <span class="pre">*</span> <span class="pre">num-left-chunks</span></code>, where <code class="docutils literal notranslate"><span class="pre">num-left-chunks</span></code> is the option used
to train this model. For <code class="docutils literal notranslate"><span class="pre">simulate</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code> the <code class="docutils literal notranslate"><span class="pre">left-context</span></code> is used to generate
the attention mask.</p>
</div></blockquote>
</div></blockquote>
<p>The following shows two examples (for the two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
</section>
<section id="real-streaming-decoding">
<h3>Real streaming decoding<a class="headerlink" href="#real-streaming-decoding" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> egs/librispeech/ASR
$ ./pruned_transducer_stateless4/streaming_decode.py --help
</pre></div>
</div>
<p>shows the options for decoding.
The following options are important for streaming models:</p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">--decode-chunk-size</span></code></p>
<blockquote>
<div><p>For streaming models, we will calculate the chunk-wise attention, <code class="docutils literal notranslate"><span class="pre">--decode-chunk-size</span></code>
indicates the chunk length (in frames after subsampling) for chunk-wise attention.
For <code class="docutils literal notranslate"><span class="pre">real</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code>, we will process <code class="docutils literal notranslate"><span class="pre">decode-chunk-size</span></code> acoustic frames at each time.</p>
</div></blockquote>
<p><code class="docutils literal notranslate"><span class="pre">--left-context</span></code></p>
<blockquote>
<div><p><code class="docutils literal notranslate"><span class="pre">--left-context</span></code> indicates how many left context frames (after subsampling) can be seen
for current chunk when calculating chunk-wise attention. Normally, <code class="docutils literal notranslate"><span class="pre">left-context</span></code> should equal
to <code class="docutils literal notranslate"><span class="pre">decode-chunk-size</span> <span class="pre">*</span> <span class="pre">num-left-chunks</span></code>, where <code class="docutils literal notranslate"><span class="pre">num-left-chunks</span></code> is the option used
to train this model.</p>
</div></blockquote>
<p><code class="docutils literal notranslate"><span class="pre">--num-decode-streams</span></code></p>
<blockquote>
<div><p>The number of decoding streams that can be run in parallel (very similar to the <code class="docutils literal notranslate"><span class="pre">bath</span> <span class="pre">size</span></code>).
For <code class="docutils literal notranslate"><span class="pre">real</span> <span class="pre">streaming</span> <span class="pre">decoding</span></code>, the batches will be packed dynamically, for example, if the
<code class="docutils literal notranslate"><span class="pre">num-decode-streams</span></code> equals to 10, then, sequence 1 to 10 will be decoded at first, after a while,
suppose sequence 1 and 2 are done, so, sequence 3 to 12 will be processed parallelly in a batch.</p>
</div></blockquote>
</div></blockquote>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>We also try adding <code class="docutils literal notranslate"><span class="pre">--right-context</span></code> in the real streaming decoding, but it seems not to benefit
the performance for all the models, the reasons might be the training and decoding mismatch. You
can try decoding with <code class="docutils literal notranslate"><span class="pre">--right-context</span></code> to see if it helps. The default value is 0.</p>
</div>
<p>The following shows two examples (for the two types of checkpoints):</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> epoch <span class="k">in</span> <span class="m">25</span> <span class="m">20</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">7</span> <span class="m">5</span> <span class="m">3</span> <span class="m">1</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> m <span class="k">in</span> greedy_search fast_beam_search modified_beam_search<span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> iter <span class="k">in</span> <span class="m">474000</span><span class="p">;</span> <span class="k">do</span>
<span class="k">for</span> avg <span class="k">in</span> <span class="m">8</span> <span class="m">10</span> <span class="m">12</span> <span class="m">14</span> <span class="m">16</span> <span class="m">18</span><span class="p">;</span> <span class="k">do</span>
./pruned_transducer_stateless4/decode.py <span class="se">\</span>
--iter <span class="nv">$iter</span> <span class="se">\</span>
--avg <span class="nv">$avg</span> <span class="se">\</span>
--decode-chunk-size <span class="m">16</span> <span class="se">\</span>
--left-context <span class="m">64</span> <span class="se">\</span>
--num-decode-streams <span class="m">100</span> <span class="se">\</span>
--exp-dir pruned_transducer_stateless4/exp <span class="se">\</span>
--max-duration <span class="m">600</span> <span class="se">\</span>
--decoding-method <span class="nv">$m</span>
<span class="k">done</span>
<span class="k">done</span>
<span class="k">done</span>
</pre></div>
</div>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>Supporting decoding methods are as follows:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">greedy_search</span></code> : It takes the symbol with largest posterior probability
of each frame as the decoding result.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">beam_search</span></code> : It implements Algorithm 1 in <a class="reference external" href="https://arxiv.org/pdf/1211.3711.pdf">https://arxiv.org/pdf/1211.3711.pdf</a> and
<a class="reference external" href="https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search_transducer.py#L247">espnet/nets/beam_search_transducer.py</a>
is used as a reference. Basicly, it keeps topk states for each frame, and expands the kept states with their own contexts to
next frame.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">modified_beam_search</span></code> : It implements the same algorithm as <code class="docutils literal notranslate"><span class="pre">beam_search</span></code> above, but it
runs in batch mode with <code class="docutils literal notranslate"><span class="pre">--max-sym-per-frame=1</span></code> being hardcoded.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> : It implements graph composition between the output <code class="docutils literal notranslate"><span class="pre">log_probs</span></code> and
given <code class="docutils literal notranslate"><span class="pre">FSAs</span></code>. It is hard to describe the details in several lines of texts, you can read
our paper in <a class="reference external" href="https://arxiv.org/pdf/2211.00484.pdf">https://arxiv.org/pdf/2211.00484.pdf</a> or our <a class="reference external" href="https://github.com/k2-fsa/k2/blob/master/k2/csrc/rnnt_decode.h">rnnt decode code in k2</a>. <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> can decode with <code class="docutils literal notranslate"><span class="pre">FSAs</span></code> on GPU efficiently.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search_LG</span></code> : The same as <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> above, <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> uses
an trivial graph that has only one state, while <code class="docutils literal notranslate"><span class="pre">fast_beam_search_LG</span></code> uses an LG graph
(with N-gram LM).</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search_nbest</span></code> : It produces the decoding results as follows:</p>
<ul>
<li><ol class="arabic simple">
<li><p>Use <code class="docutils literal notranslate"><span class="pre">fast_beam_search</span></code> to get a lattice</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="2">
<li><p>Select <code class="docutils literal notranslate"><span class="pre">num_paths</span></code> paths from the lattice using <code class="docutils literal notranslate"><span class="pre">k2.random_paths()</span></code></p></li>
</ol>
</li>
<li><ol class="arabic simple" start="3">
<li><p>Unique the selected paths</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="4">
<li><p>Intersect the selected paths with the lattice and compute the
shortest path from the intersection result</p></li>
</ol>
</li>
<li><ol class="arabic simple" start="5">
<li><p>The path with the largest score is used as the decoding output.</p></li>
</ol>
</li>
</ul>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">fast_beam_search_nbest_LG</span></code> : It implements same logic as <code class="docutils literal notranslate"><span class="pre">fast_beam_search_nbest</span></code>, the
only difference is that it uses <code class="docutils literal notranslate"><span class="pre">fast_beam_search_LG</span></code> to generate the lattice.</p></li>
</ul>
</div></blockquote>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The supporting decoding methods in <code class="docutils literal notranslate"><span class="pre">streaming_decode.py</span></code> might be less than that in <code class="docutils literal notranslate"><span class="pre">decode.py</span></code>, if needed,
you can implement them by yourself or file a issue in <a class="reference external" href="https://github.com/k2-fsa/icefall/issues">icefall</a> .</p>
</div>
</section>
</section>
<section id="export-model">
<h2>Export Model<a class="headerlink" href="#export-model" title="Permalink to this heading"></a></h2>
<p><a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/export.py">pruned_transducer_stateless4/export.py</a> supports exporting checkpoints from <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/exp</span></code> in the following ways.</p>
<section id="export-model-state-dict">
<h3>Export <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code><a class="headerlink" href="#export-model-state-dict" title="Permalink to this heading"></a></h3>
<p>Checkpoints saved by <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/train.py</span></code> also include
<code class="docutils literal notranslate"><span class="pre">optimizer.state_dict()</span></code>. It is useful for resuming training. But after training,
we are interested only in <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>. You can use the following
command to extract <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code>.</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># Assume that --epoch 25 --avg 3 produces the smallest WER</span>
<span class="c1"># (You can get such information after running ./pruned_transducer_stateless4/decode.py)</span>
<span class="nv">epoch</span><span class="o">=</span><span class="m">25</span>
<span class="nv">avg</span><span class="o">=</span><span class="m">3</span>
./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--streaming-model <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="nv">$epoch</span> <span class="se">\</span>
--avg <span class="nv">$avg</span>
</pre></div>
</div>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p><code class="docutils literal notranslate"><span class="pre">--streaming-model</span></code> and <code class="docutils literal notranslate"><span class="pre">--causal-convolution</span></code> require to be True to export
a streaming mdoel.</p>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/exp/pretrained.pt</span></code>.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>To use the generated <code class="docutils literal notranslate"><span class="pre">pretrained.pt</span></code> for <code class="docutils literal notranslate"><span class="pre">pruned_transducer_stateless4/decode.py</span></code>,
you can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span> pruned_transducer_stateless4/exp
ln -s pretrained.pt epoch-999.pt
</pre></div>
</div>
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
<code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/decode.py</span></code>.</p>
</div>
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./pruned_transducer_stateless4/pretrained.py</span></code>, you
can run:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/pretrained.py <span class="se">\</span>
--checkpoint ./pruned_transducer_stateless4/exp/pretrained.pt <span class="se">\</span>
--simulate-streaming <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--bpe-model ./data/lang_bpe_500/bpe.model <span class="se">\</span>
--method greedy_search <span class="se">\</span>
/path/to/foo.wav <span class="se">\</span>
/path/to/bar.wav
</pre></div>
</div>
</section>
<section id="export-model-using-torch-jit-script">
<h3>Export model using <code class="docutils literal notranslate"><span class="pre">torch.jit.script()</span></code><a class="headerlink" href="#export-model-using-torch-jit-script" title="Permalink to this heading"></a></h3>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./pruned_transducer_stateless4/export.py <span class="se">\</span>
--exp-dir ./pruned_transducer_stateless4/exp <span class="se">\</span>
--streaming-model <span class="m">1</span> <span class="se">\</span>
--causal-convolution <span class="m">1</span> <span class="se">\</span>
--bpe-model data/lang_bpe_500/bpe.model <span class="se">\</span>
--epoch <span class="m">25</span> <span class="se">\</span>
--avg <span class="m">3</span> <span class="se">\</span>
--jit <span class="m">1</span>
</pre></div>
</div>
<div class="admonition caution">
<p class="admonition-title">Caution</p>
<p><code class="docutils literal notranslate"><span class="pre">--streaming-model</span></code> and <code class="docutils literal notranslate"><span class="pre">--causal-convolution</span></code> require to be True to export
a streaming mdoel.</p>
</div>
<p>It will generate a file <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> in the given <code class="docutils literal notranslate"><span class="pre">exp_dir</span></code>. You can later
load it by <code class="docutils literal notranslate"><span class="pre">torch.jit.load(&quot;cpu_jit.pt&quot;)</span></code>.</p>
<p>Note <code class="docutils literal notranslate"><span class="pre">cpu</span></code> in the name <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> means the parameters when loaded into Python
are on CPU. You can use <code class="docutils literal notranslate"><span class="pre">to(&quot;cuda&quot;)</span></code> to move them to a CUDA device.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>You will need this <code class="docutils literal notranslate"><span class="pre">cpu_jit.pt</span></code> when deploying with Sherpa framework.</p>
</div>
</section>
</section>
<section id="download-pretrained-models">
<h2>Download pretrained models<a class="headerlink" href="#download-pretrained-models" title="Permalink to this heading"></a></h2>
<p>If you dont want to train from scratch, you can download the pretrained models
by visiting the following links:</p>
<blockquote>
<div><ul class="simple">
<li><p><a class="reference external" href="https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless_20220625">pruned_transducer_stateless</a></p></li>
<li><p><a class="reference external" href="https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless2_20220625">pruned_transducer_stateless2</a></p></li>
<li><p><a class="reference external" href="https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless4_20220625">pruned_transducer_stateless4</a></p></li>
<li><p><a class="reference external" href="https://huggingface.co/pkufool/icefall_librispeech_streaming_pruned_transducer_stateless5_20220729">pruned_transducer_stateless5</a></p></li>
</ul>
<p>See <a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md">https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md</a>
for the details of the above pretrained models</p>
</div></blockquote>
</section>
<section id="deploy-with-sherpa">
<h2>Deploy with Sherpa<a class="headerlink" href="#deploy-with-sherpa" title="Permalink to this heading"></a></h2>
<p>Please see <a class="reference external" href="https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/index.html#">https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/index.html#</a>
for how to deploy the models in <code class="docutils literal notranslate"><span class="pre">sherpa</span></code>.</p>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="index.html" class="btn btn-neutral float-left" title="LibriSpeech" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-right" title="LSTM Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2021, icefall development team.</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>

View File

@ -20,7 +20,7 @@
<script src="../_static/js/theme.js"></script> <script src="../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../genindex.html" /> <link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" /> <link rel="search" title="Search" href="../search.html" />
<link rel="next" title="aishell" href="aishell/index.html" /> <link rel="next" title="Non Streaming ASR" href="Non-streaming-ASR/index.html" />
<link rel="prev" title="Export to ncnn" href="../model-export/export-ncnn.html" /> <link rel="prev" title="Export to ncnn" href="../model-export/export-ncnn.html" />
</head> </head>
@ -40,16 +40,18 @@
</div> </div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu"> </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Contents:</span></p> <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current"> <ul>
<li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="../installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="../model-export/index.html">Model export</a></li>
</ul>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">Recipes</a><ul> <li class="toctree-l1 current"><a class="current reference internal" href="#">Recipes</a><ul>
<li class="toctree-l2"><a class="reference internal" href="aishell/index.html">aishell</a></li> <li class="toctree-l2"><a class="reference internal" href="Non-streaming-ASR/index.html">Non Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a></li> <li class="toctree-l2"><a class="reference internal" href="Streaming-ASR/index.html">Streaming ASR</a></li>
<li class="toctree-l2"><a class="reference internal" href="timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="../huggingface/index.html">Huggingface</a></li>
</ul> </ul>
@ -86,26 +88,16 @@ Currently, only speech recognition recipes are provided.</p>
<div class="toctree-wrapper compound"> <div class="toctree-wrapper compound">
<p class="caption" role="heading"><span class="caption-text">Table of Contents</span></p> <p class="caption" role="heading"><span class="caption-text">Table of Contents</span></p>
<ul> <ul>
<li class="toctree-l1"><a class="reference internal" href="aishell/index.html">aishell</a><ul> <li class="toctree-l1"><a class="reference internal" href="Non-streaming-ASR/index.html">Non Streaming ASR</a><ul>
<li class="toctree-l2"><a class="reference internal" href="aishell/tdnn_lstm_ctc.html">TDNN-LSTM CTC</a></li> <li class="toctree-l2"><a class="reference internal" href="Non-streaming-ASR/aishell/index.html">aishell</a></li>
<li class="toctree-l2"><a class="reference internal" href="aishell/conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l2"><a class="reference internal" href="Non-streaming-ASR/librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="aishell/stateless_transducer.html">Stateless Transducer</a></li> <li class="toctree-l2"><a class="reference internal" href="Non-streaming-ASR/timit/index.html">TIMIT</a></li>
<li class="toctree-l2"><a class="reference internal" href="Non-streaming-ASR/yesno/index.html">YesNo</a></li>
</ul> </ul>
</li> </li>
<li class="toctree-l1"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a><ul> <li class="toctree-l1"><a class="reference internal" href="Streaming-ASR/index.html">Streaming ASR</a><ul>
<li class="toctree-l2"><a class="reference internal" href="librispeech/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li> <li class="toctree-l2"><a class="reference internal" href="Streaming-ASR/introduction.html">Introduction</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/conformer_ctc.html">Conformer CTC</a></li> <li class="toctree-l2"><a class="reference internal" href="Streaming-ASR/librispeech/index.html">LibriSpeech</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
<li class="toctree-l2"><a class="reference internal" href="librispeech/zipformer_mmi.html">Zipformer MMI</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>
<li class="toctree-l2"><a class="reference internal" href="timit/tdnn_ligru_ctc.html">TDNN-LiGRU-CTC</a></li>
<li class="toctree-l2"><a class="reference internal" href="timit/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="yesno/index.html">YesNo</a><ul>
<li class="toctree-l2"><a class="reference internal" href="yesno/tdnn.html">TDNN-CTC</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>
@ -117,7 +109,7 @@ Currently, only speech recognition recipes are provided.</p>
</div> </div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer"> <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="../model-export/export-ncnn.html" class="btn btn-neutral float-left" title="Export to ncnn" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a> <a href="../model-export/export-ncnn.html" class="btn btn-neutral float-left" title="Export to ncnn" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="aishell/index.html" class="btn btn-neutral float-right" title="aishell" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a> <a href="Non-streaming-ASR/index.html" class="btn btn-neutral float-right" title="Non Streaming ASR" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div> </div>
<hr/> <hr/>

View File

@ -43,7 +43,11 @@
<ul> <ul>
<li class="toctree-l1"><a class="reference internal" href="installation/index.html">Installation</a></li> <li class="toctree-l1"><a class="reference internal" href="installation/index.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="model-export/index.html">Model export</a></li> <li class="toctree-l1"><a class="reference internal" href="model-export/index.html">Model export</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a></li> <li class="toctree-l1"><a class="reference internal" href="recipes/index.html">Recipes</a></li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li> <li class="toctree-l1"><a class="reference internal" href="contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li> <li class="toctree-l1"><a class="reference internal" href="huggingface/index.html">Huggingface</a></li>
</ul> </ul>

File diff suppressed because one or more lines are too long