mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 01:52:41 +00:00
deploy: 9ae2f3a3c5a3c2336ca236c984843c0e133ee307
This commit is contained in:
parent
45a5750eda
commit
a9ca50fb72
@ -1,4 +1,4 @@
|
|||||||
# Sphinx build info version 1
|
# Sphinx build info version 1
|
||||||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
|
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
|
||||||
config: 3ca2e66d59e42ffdb5e0a5ba2153f99e
|
config: cfc3e6ecc44ed7573f700065af8738a7
|
||||||
tags: 645f666f9bcd5a90fca523b33c5a78b7
|
tags: 645f666f9bcd5a90fca523b33c5a78b7
|
||||||
|
@ -1,5 +1,5 @@
|
|||||||
Transducer
|
LSTM Transducer
|
||||||
==========
|
===============
|
||||||
|
|
||||||
.. hint::
|
.. hint::
|
||||||
|
|
||||||
@ -7,7 +7,7 @@ Transducer
|
|||||||
for pretrained models if you don't want to train a model from scratch.
|
for pretrained models if you don't want to train a model from scratch.
|
||||||
|
|
||||||
|
|
||||||
This tutorial shows you how to train a transducer model
|
This tutorial shows you how to train an LSTM transducer model
|
||||||
with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
|
with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
|
||||||
|
|
||||||
We use pruned RNN-T to compute the loss.
|
We use pruned RNN-T to compute the loss.
|
||||||
@ -20,9 +20,9 @@ We use pruned RNN-T to compute the loss.
|
|||||||
|
|
||||||
The transducer model consists of 3 parts:
|
The transducer model consists of 3 parts:
|
||||||
|
|
||||||
- Encoder, a.k.a, transcriber. We use an LSTM model
|
- Encoder, a.k.a, the transcription network. We use an LSTM model
|
||||||
- Decoder, a.k.a, predictor. We use a model consisting of ``nn.Embedding``
|
- Decoder, a.k.a, the prediction network. We use a stateless model consisting of
|
||||||
and ``nn.Conv1d``
|
``nn.Embedding`` and ``nn.Conv1d``
|
||||||
- Joiner, a.k.a, the joint network.
|
- Joiner, a.k.a, the joint network.
|
||||||
|
|
||||||
.. caution::
|
.. caution::
|
||||||
@ -74,7 +74,11 @@ Data preparation
|
|||||||
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
|
The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
|
||||||
All you need to do is to run it.
|
All you need to do is to run it.
|
||||||
|
|
||||||
The data preparation contains several stages, you can use the following two
|
.. note::
|
||||||
|
|
||||||
|
We encourage you to read ``./prepare.sh``.
|
||||||
|
|
||||||
|
The data preparation contains several stages. You can use the following two
|
||||||
options:
|
options:
|
||||||
|
|
||||||
- ``--stage``
|
- ``--stage``
|
||||||
@ -263,7 +267,7 @@ You will find the following files in that directory:
|
|||||||
|
|
||||||
- ``tensorboard/``
|
- ``tensorboard/``
|
||||||
|
|
||||||
This folder contains TensorBoard logs. Training loss, validation loss, learning
|
This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||||
rate, etc, are recorded in these logs. You can visualize them by:
|
rate, etc, are recorded in these logs. You can visualize them by:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
@ -287,7 +291,7 @@ You will find the following files in that directory:
|
|||||||
[2022-09-20T15:53:02] Total uploaded: 210171 scalars, 0 tensors, 0 binary objects
|
[2022-09-20T15:53:02] Total uploaded: 210171 scalars, 0 tensors, 0 binary objects
|
||||||
Listening for new data in logdir...
|
Listening for new data in logdir...
|
||||||
|
|
||||||
Note there is a URL in the above output, click it and you will see
|
Note there is a URL in the above output. Click it and you will see
|
||||||
the following screenshot:
|
the following screenshot:
|
||||||
|
|
||||||
.. figure:: images/librispeech-lstm-transducer-tensorboard-log.png
|
.. figure:: images/librispeech-lstm-transducer-tensorboard-log.png
|
||||||
@ -422,7 +426,7 @@ The following shows two examples:
|
|||||||
Export models
|
Export models
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
`lstm_transducer_stateless2/export.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/export.py>`_ supports to export checkpoints from ``lstm_transducer_stateless2/exp`` in the following ways.
|
`lstm_transducer_stateless2/export.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/export.py>`_ supports exporting checkpoints from ``lstm_transducer_stateless2/exp`` in the following ways.
|
||||||
|
|
||||||
Export ``model.state_dict()``
|
Export ``model.state_dict()``
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -458,7 +462,7 @@ It will generate a file ``./lstm_transducer_stateless2/exp/pretrained.pt``.
|
|||||||
cd lstm_transducer_stateless2/exp
|
cd lstm_transducer_stateless2/exp
|
||||||
ln -s pretrained epoch-9999.pt
|
ln -s pretrained epoch-9999.pt
|
||||||
|
|
||||||
And then pass `--epoch 9999 --avg 1 --use-averaged-model 0` to
|
And then pass ``--epoch 9999 --avg 1 --use-averaged-model 0`` to
|
||||||
``./lstm_transducer_stateless2/decode.py``.
|
``./lstm_transducer_stateless2/decode.py``.
|
||||||
|
|
||||||
To use the exported model with ``./lstm_transducer_stateless2/pretrained.py``, you
|
To use the exported model with ``./lstm_transducer_stateless2/pretrained.py``, you
|
||||||
@ -506,6 +510,11 @@ To use the generated files with ``./lstm_transducer_stateless2/jit_pretrained``:
|
|||||||
/path/to/foo.wav \
|
/path/to/foo.wav \
|
||||||
/path/to/bar.wav
|
/path/to/bar.wav
|
||||||
|
|
||||||
|
.. hint::
|
||||||
|
|
||||||
|
Please see `<https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/english/server.html>`_
|
||||||
|
for how to use the exported models in ``sherpa``.
|
||||||
|
|
||||||
Export model for ncnn
|
Export model for ncnn
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -576,37 +585,37 @@ It will generate the following files:
|
|||||||
- ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param``
|
- ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param``
|
||||||
- ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin``
|
- ``./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin``
|
||||||
|
|
||||||
To use the above generate files, run:
|
To use the above generated files, run:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
./lstm_transducer_stateless2/ncnn-decode.py \
|
./lstm_transducer_stateless2/ncnn-decode.py \
|
||||||
--bpe-model-filename ./data/lang_bpe_500/bpe.model \
|
--bpe-model-filename ./data/lang_bpe_500/bpe.model \
|
||||||
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
|
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
|
||||||
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
|
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
|
||||||
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
|
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
|
||||||
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
|
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
|
||||||
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
|
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
|
||||||
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
|
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
|
||||||
/path/to/foo.wav
|
/path/to/foo.wav
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
./lstm_transducer_stateless2/streaming-ncnn-decode.py \
|
./lstm_transducer_stateless2/streaming-ncnn-decode.py \
|
||||||
--bpe-model-filename ./data/lang_bpe_500/bpe.model \
|
--bpe-model-filename ./data/lang_bpe_500/bpe.model \
|
||||||
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
|
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param \
|
||||||
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
|
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin \
|
||||||
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
|
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param \
|
||||||
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
|
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin \
|
||||||
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
|
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param \
|
||||||
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
|
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin \
|
||||||
/path/to/foo.wav
|
/path/to/foo.wav
|
||||||
|
|
||||||
To use the above generated files in C++, please see
|
To use the above generated files in C++, please see
|
||||||
`<https://github.com/k2-fsa/sherpa-ncnn>`_
|
`<https://github.com/k2-fsa/sherpa-ncnn>`_
|
||||||
|
|
||||||
It is able to generate a static linked library that can be run on Linux, Windows,
|
It is able to generate a static linked executable that can be run on Linux, Windows,
|
||||||
macOS, Raspberry Pi, etc.
|
macOS, Raspberry Pi, etc, without external dependencies.
|
||||||
|
|
||||||
Download pretrained models
|
Download pretrained models
|
||||||
--------------------------
|
--------------------------
|
||||||
|
BIN
objects.inv
BIN
objects.inv
Binary file not shown.
@ -93,7 +93,7 @@ Currently, only speech recognition recipes are provided.</p>
|
|||||||
<li class="toctree-l1"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a><ul>
|
<li class="toctree-l1"><a class="reference internal" href="librispeech/index.html">LibriSpeech</a><ul>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="librispeech/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="librispeech/tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="librispeech/conformer_ctc.html">Conformer CTC</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="librispeech/conformer_ctc.html">Conformer CTC</a></li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="librispeech/lstm_pruned_stateless_transducer.html">Transducer</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="librispeech/lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>
|
<li class="toctree-l1"><a class="reference internal" href="timit/index.html">TIMIT</a><ul>
|
||||||
|
@ -19,7 +19,7 @@
|
|||||||
<script src="../../_static/js/theme.js"></script>
|
<script src="../../_static/js/theme.js"></script>
|
||||||
<link rel="index" title="Index" href="../../genindex.html" />
|
<link rel="index" title="Index" href="../../genindex.html" />
|
||||||
<link rel="search" title="Search" href="../../search.html" />
|
<link rel="search" title="Search" href="../../search.html" />
|
||||||
<link rel="next" title="Transducer" href="lstm_pruned_stateless_transducer.html" />
|
<link rel="next" title="LSTM Transducer" href="lstm_pruned_stateless_transducer.html" />
|
||||||
<link rel="prev" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" />
|
<link rel="prev" title="TDNN-LSTM-CTC" href="tdnn_lstm_ctc.html" />
|
||||||
</head>
|
</head>
|
||||||
|
|
||||||
@ -54,7 +54,7 @@
|
|||||||
<li class="toctree-l4"><a class="reference internal" href="#deployment-with-c">Deployment with C++</a></li>
|
<li class="toctree-l4"><a class="reference internal" href="#deployment-with-c">Deployment with C++</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
|
||||||
@ -1087,7 +1087,7 @@ Please see <a class="reference external" href="https://colab.research.google.com
|
|||||||
</div>
|
</div>
|
||||||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||||||
<a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-left" title="TDNN-LSTM-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
<a href="tdnn_lstm_ctc.html" class="btn btn-neutral float-left" title="TDNN-LSTM-CTC" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||||
<a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-right" title="Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
<a href="lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-right" title="LSTM Transducer" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<hr/>
|
<hr/>
|
||||||
|
@ -46,7 +46,7 @@
|
|||||||
<li class="toctree-l2 current"><a class="current reference internal" href="#">LibriSpeech</a><ul>
|
<li class="toctree-l2 current"><a class="current reference internal" href="#">LibriSpeech</a><ul>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
|
||||||
@ -88,7 +88,7 @@
|
|||||||
<ul>
|
<ul>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
<li class="toctree-l1"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
<li class="toctree-l1"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
||||||
<li class="toctree-l1"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
|
<li class="toctree-l1"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</div>
|
</div>
|
||||||
</section>
|
</section>
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
|
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
|
||||||
|
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
<title>Transducer — icefall 0.1 documentation</title>
|
<title>LSTM Transducer — icefall 0.1 documentation</title>
|
||||||
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
|
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
|
||||||
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
|
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
|
||||||
<!--[if lt IE 9]>
|
<!--[if lt IE 9]>
|
||||||
@ -46,7 +46,7 @@
|
|||||||
<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
|
<li class="toctree-l2 current"><a class="reference internal" href="index.html">LibriSpeech</a><ul class="current">
|
||||||
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="tdnn_lstm_ctc.html">TDNN-LSTM-CTC</a></li>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
||||||
<li class="toctree-l3 current"><a class="current reference internal" href="#">Transducer</a><ul>
|
<li class="toctree-l3 current"><a class="current reference internal" href="#">LSTM Transducer</a><ul>
|
||||||
<li class="toctree-l4"><a class="reference internal" href="#which-model-to-use">Which model to use</a></li>
|
<li class="toctree-l4"><a class="reference internal" href="#which-model-to-use">Which model to use</a></li>
|
||||||
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
|
<li class="toctree-l4"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
|
||||||
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
|
<li class="toctree-l4"><a class="reference internal" href="#training">Training</a></li>
|
||||||
@ -81,7 +81,7 @@
|
|||||||
<li><a href="../../index.html" class="icon icon-home"></a> »</li>
|
<li><a href="../../index.html" class="icon icon-home"></a> »</li>
|
||||||
<li><a href="../index.html">Recipes</a> »</li>
|
<li><a href="../index.html">Recipes</a> »</li>
|
||||||
<li><a href="index.html">LibriSpeech</a> »</li>
|
<li><a href="index.html">LibriSpeech</a> »</li>
|
||||||
<li>Transducer</li>
|
<li>LSTM Transducer</li>
|
||||||
<li class="wy-breadcrumbs-aside">
|
<li class="wy-breadcrumbs-aside">
|
||||||
<a href="https://github.com/k2-fsa/icefall/blob/master/icefall/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a>
|
<a href="https://github.com/k2-fsa/icefall/blob/master/icefall/docs/source/recipes/librispeech/lstm_pruned_stateless_transducer.rst" class="fa fa-github"> Edit on GitHub</a>
|
||||||
</li>
|
</li>
|
||||||
@ -91,14 +91,14 @@
|
|||||||
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
||||||
<div itemprop="articleBody">
|
<div itemprop="articleBody">
|
||||||
|
|
||||||
<section id="transducer">
|
<section id="lstm-transducer">
|
||||||
<h1>Transducer<a class="headerlink" href="#transducer" title="Permalink to this heading"></a></h1>
|
<h1>LSTM Transducer<a class="headerlink" href="#lstm-transducer" title="Permalink to this heading"></a></h1>
|
||||||
<div class="admonition hint">
|
<div class="admonition hint">
|
||||||
<p class="admonition-title">Hint</p>
|
<p class="admonition-title">Hint</p>
|
||||||
<p>Please scroll down to the bottom of this page to find download links
|
<p>Please scroll down to the bottom of this page to find download links
|
||||||
for pretrained models if you don’t want to train a model from scratch.</p>
|
for pretrained models if you don’t want to train a model from scratch.</p>
|
||||||
</div>
|
</div>
|
||||||
<p>This tutorial shows you how to train a transducer model
|
<p>This tutorial shows you how to train an LSTM transducer model
|
||||||
with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
|
with the <a class="reference external" href="https://www.openslr.org/12">LibriSpeech</a> dataset.</p>
|
||||||
<p>We use pruned RNN-T to compute the loss.</p>
|
<p>We use pruned RNN-T to compute the loss.</p>
|
||||||
<div class="admonition note">
|
<div class="admonition note">
|
||||||
@ -109,9 +109,9 @@ with the <a class="reference external" href="https://www.openslr.org/12">LibriSp
|
|||||||
<p>The transducer model consists of 3 parts:</p>
|
<p>The transducer model consists of 3 parts:</p>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<div><ul class="simple">
|
<div><ul class="simple">
|
||||||
<li><p>Encoder, a.k.a, transcriber. We use an LSTM model</p></li>
|
<li><p>Encoder, a.k.a, the transcription network. We use an LSTM model</p></li>
|
||||||
<li><p>Decoder, a.k.a, predictor. We use a model consisting of <code class="docutils literal notranslate"><span class="pre">nn.Embedding</span></code>
|
<li><p>Decoder, a.k.a, the prediction network. We use a stateless model consisting of
|
||||||
and <code class="docutils literal notranslate"><span class="pre">nn.Conv1d</span></code></p></li>
|
<code class="docutils literal notranslate"><span class="pre">nn.Embedding</span></code> and <code class="docutils literal notranslate"><span class="pre">nn.Conv1d</span></code></p></li>
|
||||||
<li><p>Joiner, a.k.a, the joint network.</p></li>
|
<li><p>Joiner, a.k.a, the joint network.</p></li>
|
||||||
</ul>
|
</ul>
|
||||||
</div></blockquote>
|
</div></blockquote>
|
||||||
@ -159,7 +159,11 @@ $ ./prepare_giga_speech.sh
|
|||||||
</div>
|
</div>
|
||||||
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
<p>The script <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code> handles the data preparation for you, <strong>automagically</strong>.
|
||||||
All you need to do is to run it.</p>
|
All you need to do is to run it.</p>
|
||||||
<p>The data preparation contains several stages, you can use the following two
|
<div class="admonition note">
|
||||||
|
<p class="admonition-title">Note</p>
|
||||||
|
<p>We encourage you to read <code class="docutils literal notranslate"><span class="pre">./prepare.sh</span></code>.</p>
|
||||||
|
</div>
|
||||||
|
<p>The data preparation contains several stages. You can use the following two
|
||||||
options:</p>
|
options:</p>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<div><ul class="simple">
|
<div><ul class="simple">
|
||||||
@ -344,7 +348,7 @@ To resume training from some checkpoint, say <code class="docutils literal notra
|
|||||||
</div></blockquote>
|
</div></blockquote>
|
||||||
</li>
|
</li>
|
||||||
<li><p><code class="docutils literal notranslate"><span class="pre">tensorboard/</span></code></p>
|
<li><p><code class="docutils literal notranslate"><span class="pre">tensorboard/</span></code></p>
|
||||||
<p>This folder contains TensorBoard logs. Training loss, validation loss, learning
|
<p>This folder contains tensorBoard logs. Training loss, validation loss, learning
|
||||||
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
rate, etc, are recorded in these logs. You can visualize them by:</p>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
|
<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$ <span class="nb">cd</span> lstm_transducer_stateless2/exp/tensorboard
|
||||||
@ -368,7 +372,7 @@ $ tensorboard dev upload --logdir . --description <span class="s2">"LSTM tr
|
|||||||
</pre></div>
|
</pre></div>
|
||||||
</div>
|
</div>
|
||||||
</div></blockquote>
|
</div></blockquote>
|
||||||
<p>Note there is a URL in the above output, click it and you will see
|
<p>Note there is a URL in the above output. Click it and you will see
|
||||||
the following screenshot:</p>
|
the following screenshot:</p>
|
||||||
<blockquote>
|
<blockquote>
|
||||||
<div><figure class="align-center" id="id2">
|
<div><figure class="align-center" id="id2">
|
||||||
@ -498,7 +502,7 @@ $ ./lstm_transducer_stateless2/decode.py --help
|
|||||||
</section>
|
</section>
|
||||||
<section id="export-models">
|
<section id="export-models">
|
||||||
<h2>Export models<a class="headerlink" href="#export-models" title="Permalink to this heading"></a></h2>
|
<h2>Export models<a class="headerlink" href="#export-models" title="Permalink to this heading"></a></h2>
|
||||||
<p><a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/export.py">lstm_transducer_stateless2/export.py</a> supports to export checkpoints from <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/exp</span></code> in the following ways.</p>
|
<p><a class="reference external" href="https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/lstm_transducer_stateless2/export.py">lstm_transducer_stateless2/export.py</a> supports exporting checkpoints from <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/exp</span></code> in the following ways.</p>
|
||||||
<section id="export-model-state-dict">
|
<section id="export-model-state-dict">
|
||||||
<h3>Export <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code><a class="headerlink" href="#export-model-state-dict" title="Permalink to this heading"></a></h3>
|
<h3>Export <code class="docutils literal notranslate"><span class="pre">model.state_dict()</span></code><a class="headerlink" href="#export-model-state-dict" title="Permalink to this heading"></a></h3>
|
||||||
<p>Checkpoints saved by <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/train.py</span></code> also include
|
<p>Checkpoints saved by <code class="docutils literal notranslate"><span class="pre">lstm_transducer_stateless2/train.py</span></code> also include
|
||||||
@ -527,7 +531,7 @@ you can run:</p>
|
|||||||
ln -s pretrained epoch-9999.pt
|
ln -s pretrained epoch-9999.pt
|
||||||
</pre></div>
|
</pre></div>
|
||||||
</div>
|
</div>
|
||||||
<p>And then pass <cite>–epoch 9999 –avg 1 –use-averaged-model 0</cite> to
|
<p>And then pass <code class="docutils literal notranslate"><span class="pre">--epoch</span> <span class="pre">9999</span> <span class="pre">--avg</span> <span class="pre">1</span> <span class="pre">--use-averaged-model</span> <span class="pre">0</span></code> to
|
||||||
<code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/decode.py</span></code>.</p>
|
<code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/decode.py</span></code>.</p>
|
||||||
</div>
|
</div>
|
||||||
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/pretrained.py</span></code>, you
|
<p>To use the exported model with <code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/pretrained.py</span></code>, you
|
||||||
@ -572,6 +576,11 @@ can run:</p>
|
|||||||
/path/to/bar.wav
|
/path/to/bar.wav
|
||||||
</pre></div>
|
</pre></div>
|
||||||
</div>
|
</div>
|
||||||
|
<div class="admonition hint">
|
||||||
|
<p class="admonition-title">Hint</p>
|
||||||
|
<p>Please see <a class="reference external" href="https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/english/server.html">https://k2-fsa.github.io/sherpa/python/streaming_asr/lstm/english/server.html</a>
|
||||||
|
for how to use the exported models in <code class="docutils literal notranslate"><span class="pre">sherpa</span></code>.</p>
|
||||||
|
</div>
|
||||||
</section>
|
</section>
|
||||||
<section id="export-model-for-ncnn">
|
<section id="export-model-for-ncnn">
|
||||||
<h3>Export model for ncnn<a class="headerlink" href="#export-model-for-ncnn" title="Permalink to this heading"></a></h3>
|
<h3>Export model for ncnn<a class="headerlink" href="#export-model-for-ncnn" title="Permalink to this heading"></a></h3>
|
||||||
@ -639,25 +648,33 @@ for <code class="docutils literal notranslate"><span class="pre">pnnx</span></co
|
|||||||
<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin</span></code></p></li>
|
<li><p><code class="docutils literal notranslate"><span class="pre">./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin</span></code></p></li>
|
||||||
</ul>
|
</ul>
|
||||||
</div></blockquote>
|
</div></blockquote>
|
||||||
<p>To use the above generate files, run:</p>
|
<p>To use the above generated files, run:</p>
|
||||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/ncnn-decode.py <span class="se">\</span>
|
||||||
|
--bpe-model-filename ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||||
|
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||||
|
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||||
|
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||||
|
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||||
|
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||||
|
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||||
|
/path/to/foo.wav
|
||||||
</pre></div>
|
</pre></div>
|
||||||
</div>
|
</div>
|
||||||
<dl class="simple">
|
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>./lstm_transducer_stateless2/streaming-ncnn-decode.py <span class="se">\</span>
|
||||||
<dt>./lstm_transducer_stateless2/ncnn-decode.py </dt><dd><p>–bpe-model-filename ./data/lang_bpe_500/bpe.model –encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param –encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin –decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param –decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin –joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param –joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin /path/to/foo.wav</p>
|
--bpe-model-filename ./data/lang_bpe_500/bpe.model <span class="se">\</span>
|
||||||
</dd>
|
--encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||||
</dl>
|
--encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>
|
--decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||||
|
--decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||||
|
--joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param <span class="se">\</span>
|
||||||
|
--joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin <span class="se">\</span>
|
||||||
|
/path/to/foo.wav
|
||||||
</pre></div>
|
</pre></div>
|
||||||
</div>
|
</div>
|
||||||
<dl class="simple">
|
|
||||||
<dt>./lstm_transducer_stateless2/streaming-ncnn-decode.py </dt><dd><p>–bpe-model-filename ./data/lang_bpe_500/bpe.model –encoder-param-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.param –encoder-bin-filename ./lstm_transducer_stateless2/exp/encoder_jit_trace-pnnx.ncnn.bin –decoder-param-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.param –decoder-bin-filename ./lstm_transducer_stateless2/exp/decoder_jit_trace-pnnx.ncnn.bin –joiner-param-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.param –joiner-bin-filename ./lstm_transducer_stateless2/exp/joiner_jit_trace-pnnx.ncnn.bin /path/to/foo.wav</p>
|
|
||||||
</dd>
|
|
||||||
</dl>
|
|
||||||
<p>To use the above generated files in C++, please see
|
<p>To use the above generated files in C++, please see
|
||||||
<a class="reference external" href="https://github.com/k2-fsa/sherpa-ncnn">https://github.com/k2-fsa/sherpa-ncnn</a></p>
|
<a class="reference external" href="https://github.com/k2-fsa/sherpa-ncnn">https://github.com/k2-fsa/sherpa-ncnn</a></p>
|
||||||
<p>It is able to generate a static linked library that can be run on Linux, Windows,
|
<p>It is able to generate a static linked executable that can be run on Linux, Windows,
|
||||||
macOS, Raspberry Pi, etc.</p>
|
macOS, Raspberry Pi, etc, without external dependencies.</p>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
<section id="download-pretrained-models">
|
<section id="download-pretrained-models">
|
||||||
|
@ -53,7 +53,7 @@
|
|||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="conformer_ctc.html">Conformer CTC</a></li>
|
||||||
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">Transducer</a></li>
|
<li class="toctree-l3"><a class="reference internal" href="lstm_pruned_stateless_transducer.html">LSTM Transducer</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</li>
|
</li>
|
||||||
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
|
<li class="toctree-l2"><a class="reference internal" href="../timit/index.html">TIMIT</a></li>
|
||||||
|
@ -20,7 +20,7 @@
|
|||||||
<link rel="index" title="Index" href="../../genindex.html" />
|
<link rel="index" title="Index" href="../../genindex.html" />
|
||||||
<link rel="search" title="Search" href="../../search.html" />
|
<link rel="search" title="Search" href="../../search.html" />
|
||||||
<link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
|
<link rel="next" title="TDNN-LiGRU-CTC" href="tdnn_ligru_ctc.html" />
|
||||||
<link rel="prev" title="Transducer" href="../librispeech/lstm_pruned_stateless_transducer.html" />
|
<link rel="prev" title="LSTM Transducer" href="../librispeech/lstm_pruned_stateless_transducer.html" />
|
||||||
</head>
|
</head>
|
||||||
|
|
||||||
<body class="wy-body-for-nav">
|
<body class="wy-body-for-nav">
|
||||||
@ -95,7 +95,7 @@
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
|
||||||
<a href="../librispeech/lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-left" title="Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
<a href="../librispeech/lstm_pruned_stateless_transducer.html" class="btn btn-neutral float-left" title="LSTM Transducer" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
|
||||||
<a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
<a href="tdnn_ligru_ctc.html" class="btn btn-neutral float-right" title="TDNN-LiGRU-CTC" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user