mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-10 18:42:19 +00:00
Add icefall tutorials for dummies. (#1220)
This commit is contained in:
parent
9a47c08d08
commit
fc2df07841
@ -95,4 +95,7 @@ rst_epilog = """
|
|||||||
.. _k2: https://github.com/k2-fsa/k2
|
.. _k2: https://github.com/k2-fsa/k2
|
||||||
.. _lhotse: https://github.com/lhotse-speech/lhotse
|
.. _lhotse: https://github.com/lhotse-speech/lhotse
|
||||||
.. _yesno: https://www.openslr.org/1/
|
.. _yesno: https://www.openslr.org/1/
|
||||||
|
.. _Next-gen Kaldi: https://github.com/k2-fsa
|
||||||
|
.. _Kaldi: https://github.com/kaldi-asr/kaldi
|
||||||
|
.. _lilcom: https://github.com/danpovey/lilcom
|
||||||
"""
|
"""
|
||||||
|
180
docs/source/for-dummies/data-preparation.rst
Normal file
180
docs/source/for-dummies/data-preparation.rst
Normal file
@ -0,0 +1,180 @@
|
|||||||
|
.. _dummies_tutorial_data_preparation:
|
||||||
|
|
||||||
|
Data Preparation
|
||||||
|
================
|
||||||
|
|
||||||
|
After :ref:`dummies_tutorial_environment_setup`, we can start preparing the
|
||||||
|
data for training and decoding.
|
||||||
|
|
||||||
|
The first step is to prepare the data for training. We have already provided
|
||||||
|
`prepare.sh <https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh>`_
|
||||||
|
that would prepare everything required for training.
|
||||||
|
|
||||||
|
.. code-block::
|
||||||
|
|
||||||
|
cd /tmp/icefall
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
cd egs/yesno/ASR
|
||||||
|
|
||||||
|
./prepare.sh
|
||||||
|
|
||||||
|
Note that in each recipe from `icefall`_, there exists a file ``prepare.sh``,
|
||||||
|
which you should run before you run anything else.
|
||||||
|
|
||||||
|
That is all you need for data preparation.
|
||||||
|
|
||||||
|
For the more curious
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
If you are wondering how to prepare your own dataset, please refer to the following
|
||||||
|
URLs for more details:
|
||||||
|
|
||||||
|
- `<https://github.com/lhotse-speech/lhotse/tree/master/lhotse/recipes>`_
|
||||||
|
|
||||||
|
It contains recipes for a variety of dataset. If you want to add your own
|
||||||
|
dataset, please read recipes in this folder first.
|
||||||
|
|
||||||
|
- `<https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/yesno.py>`_
|
||||||
|
|
||||||
|
The `yesno`_ recipe in `lhotse`_.
|
||||||
|
|
||||||
|
If you already have a `Kaldi`_ dataset directory, which contains files like
|
||||||
|
``wav.scp``, ``feats.scp``, then you can refer to `<https://lhotse.readthedocs.io/en/latest/kaldi.html#example>`_.
|
||||||
|
|
||||||
|
A quick look to the generated files
|
||||||
|
-----------------------------------
|
||||||
|
|
||||||
|
``./prepare.sh`` puts generated files into two directories:
|
||||||
|
|
||||||
|
- ``download``
|
||||||
|
- ``data``
|
||||||
|
|
||||||
|
download
|
||||||
|
^^^^^^^^
|
||||||
|
|
||||||
|
The ``download`` directory contains downloaded dataset files:
|
||||||
|
|
||||||
|
.. code-block:: bas
|
||||||
|
|
||||||
|
tree -L 1 ./download/
|
||||||
|
|
||||||
|
./download/
|
||||||
|
|-- waves_yesno
|
||||||
|
`-- waves_yesno.tar.gz
|
||||||
|
|
||||||
|
.. hint::
|
||||||
|
|
||||||
|
Please refer to `<https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/yesno.py#L41>`_
|
||||||
|
for how the data is downloaded and extracted.
|
||||||
|
|
||||||
|
data
|
||||||
|
^^^^
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
tree ./data/
|
||||||
|
|
||||||
|
./data/
|
||||||
|
|-- fbank
|
||||||
|
| |-- yesno_cuts_test.jsonl.gz
|
||||||
|
| |-- yesno_cuts_train.jsonl.gz
|
||||||
|
| |-- yesno_feats_test.lca
|
||||||
|
| `-- yesno_feats_train.lca
|
||||||
|
|-- lang_phone
|
||||||
|
| |-- HLG.pt
|
||||||
|
| |-- L.pt
|
||||||
|
| |-- L_disambig.pt
|
||||||
|
| |-- Linv.pt
|
||||||
|
| |-- lexicon.txt
|
||||||
|
| |-- lexicon_disambig.txt
|
||||||
|
| |-- tokens.txt
|
||||||
|
| `-- words.txt
|
||||||
|
|-- lm
|
||||||
|
| |-- G.arpa
|
||||||
|
| `-- G.fst.txt
|
||||||
|
`-- manifests
|
||||||
|
|-- yesno_recordings_test.jsonl.gz
|
||||||
|
|-- yesno_recordings_train.jsonl.gz
|
||||||
|
|-- yesno_supervisions_test.jsonl.gz
|
||||||
|
`-- yesno_supervisions_train.jsonl.gz
|
||||||
|
|
||||||
|
4 directories, 18 files
|
||||||
|
|
||||||
|
**data/manifests**:
|
||||||
|
|
||||||
|
This directory contains manifests. They are used to generate files in
|
||||||
|
``data/fbank``.
|
||||||
|
|
||||||
|
To give you an idea of what it contains, we examine the first few lines of
|
||||||
|
the manifests related to the ``train`` dataset.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd data/manifests
|
||||||
|
gunzip -c yesno_recordings_train.jsonl.gz | head -n 3
|
||||||
|
|
||||||
|
The output is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
{"id": "0_0_0_0_1_1_1_1", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_0_1_1_1_1.wav"}], "sampling_rate": 8000, "num_samples": 50800, "duration": 6.35, "channel_ids": [0]}
|
||||||
|
{"id": "0_0_0_1_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_1_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48880, "duration": 6.11, "channel_ids": [0]}
|
||||||
|
{"id": "0_0_1_0_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_1_0_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48160, "duration": 6.02, "channel_ids": [0]}
|
||||||
|
|
||||||
|
Please refer to `<https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L300>`_
|
||||||
|
for the meaning of each field per line.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
gunzip -c yesno_supervisions_train.jsonl.gz | head -n 3
|
||||||
|
|
||||||
|
The output is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
{"id": "0_0_0_0_1_1_1_1", "recording_id": "0_0_0_0_1_1_1_1", "start": 0.0, "duration": 6.35, "channel": 0, "text": "NO NO NO NO YES YES YES YES", "language": "Hebrew"}
|
||||||
|
{"id": "0_0_0_1_0_1_1_0", "recording_id": "0_0_0_1_0_1_1_0", "start": 0.0, "duration": 6.11, "channel": 0, "text": "NO NO NO YES NO YES YES NO", "language": "Hebrew"}
|
||||||
|
{"id": "0_0_1_0_0_1_1_0", "recording_id": "0_0_1_0_0_1_1_0", "start": 0.0, "duration": 6.02, "channel": 0, "text": "NO NO YES NO NO YES YES NO", "language": "Hebrew"}
|
||||||
|
|
||||||
|
Please refer to `<https://github.com/lhotse-speech/lhotse/blob/master/lhotse/supervision.py#L510>`_
|
||||||
|
for the meaning of each field per line.
|
||||||
|
|
||||||
|
**data/fbank**:
|
||||||
|
|
||||||
|
This directory contains everything from ``data/manifests``. Furthermore, it also contains features
|
||||||
|
for training.
|
||||||
|
|
||||||
|
``data/fbank/yesno_feats_train.lca`` contains the features for the train dataset.
|
||||||
|
Features are compressed using `lilcom`_.
|
||||||
|
|
||||||
|
``data/fbank/yesno_cuts_train.jsonl.gz`` stores the `CutSet <https://github.com/lhotse-speech/lhotse/blob/master/lhotse/cut/set.py#L72>`_,
|
||||||
|
which stores `RecordingSet <https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py#L928>`_,
|
||||||
|
`SupervisionSet <https://github.com/lhotse-speech/lhotse/blob/master/lhotse/supervision.py#L510>`_,
|
||||||
|
and `FeatureSet <https://github.com/lhotse-speech/lhotse/blob/master/lhotse/features/base.py#L593>`_.
|
||||||
|
|
||||||
|
To give you an idea about what it looks like, we can run the following command:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd data/fbank
|
||||||
|
|
||||||
|
gunzip -c yesno_cuts_train.jsonl.gz | head -n 3
|
||||||
|
|
||||||
|
The output is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
{"id": "0_0_0_0_1_1_1_1-0", "start": 0, "duration": 6.35, "channel": 0, "supervisions": [{"id": "0_0_0_0_1_1_1_1", "recording_id": "0_0_0_0_1_1_1_1", "start": 0.0, "duration": 6.35, "channel": 0, "text": "NO NO NO NO YES YES YES YES", "language": "Hebrew"}], "features": {"type": "kaldi-fbank", "num_frames": 635, "num_features": 23, "frame_shift": 0.01, "sampling_rate": 8000, "start": 0, "duration": 6.35, "storage_type": "lilcom_chunky", "storage_path": "data/fbank/yesno_feats_train.lca", "storage_key": "0,13000,3570", "channels": 0}, "recording": {"id": "0_0_0_0_1_1_1_1", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_0_1_1_1_1.wav"}], "sampling_rate": 8000, "num_samples": 50800, "duration": 6.35, "channel_ids": [0]}, "type": "MonoCut"}
|
||||||
|
{"id": "0_0_0_1_0_1_1_0-1", "start": 0, "duration": 6.11, "channel": 0, "supervisions": [{"id": "0_0_0_1_0_1_1_0", "recording_id": "0_0_0_1_0_1_1_0", "start": 0.0, "duration": 6.11, "channel": 0, "text": "NO NO NO YES NO YES YES NO", "language": "Hebrew"}], "features": {"type": "kaldi-fbank", "num_frames": 611, "num_features": 23, "frame_shift": 0.01, "sampling_rate": 8000, "start": 0, "duration": 6.11, "storage_type": "lilcom_chunky", "storage_path": "data/fbank/yesno_feats_train.lca", "storage_key": "16570,12964,2929", "channels": 0}, "recording": {"id": "0_0_0_1_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_0_1_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48880, "duration": 6.11, "channel_ids": [0]}, "type": "MonoCut"}
|
||||||
|
{"id": "0_0_1_0_0_1_1_0-2", "start": 0, "duration": 6.02, "channel": 0, "supervisions": [{"id": "0_0_1_0_0_1_1_0", "recording_id": "0_0_1_0_0_1_1_0", "start": 0.0, "duration": 6.02, "channel": 0, "text": "NO NO YES NO NO YES YES NO", "language": "Hebrew"}], "features": {"type": "kaldi-fbank", "num_frames": 602, "num_features": 23, "frame_shift": 0.01, "sampling_rate": 8000, "start": 0, "duration": 6.02, "storage_type": "lilcom_chunky", "storage_path": "data/fbank/yesno_feats_train.lca", "storage_key": "32463,12936,2696", "channels": 0}, "recording": {"id": "0_0_1_0_0_1_1_0", "sources": [{"type": "file", "channels": [0], "source": "/tmp/icefall/egs/yesno/ASR/download/waves_yesno/0_0_1_0_0_1_1_0.wav"}], "sampling_rate": 8000, "num_samples": 48160, "duration": 6.02, "channel_ids": [0]}, "type": "MonoCut"}
|
||||||
|
|
||||||
|
Note that ``yesno_cuts_train.jsonl.gz`` only stores the information about how to read the features.
|
||||||
|
The actual features are stored separately in ``data/fbank/yesno_feats_train.lca``.
|
||||||
|
|
||||||
|
**data/lang**:
|
||||||
|
|
||||||
|
This directory contains the lexicon.
|
||||||
|
|
||||||
|
**data/lm**:
|
||||||
|
|
||||||
|
This directory contains language models.
|
39
docs/source/for-dummies/decoding.rst
Normal file
39
docs/source/for-dummies/decoding.rst
Normal file
@ -0,0 +1,39 @@
|
|||||||
|
.. _dummies_tutorial_decoding:
|
||||||
|
|
||||||
|
Decoding
|
||||||
|
========
|
||||||
|
|
||||||
|
After :ref:`dummies_tutorial_training`, we can start decoding.
|
||||||
|
|
||||||
|
The command to start the decoding is quite simple:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd /tmp/icefall
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
cd egs/yesno/ASR
|
||||||
|
|
||||||
|
# We use CPU for decoding by setting the following environment variable
|
||||||
|
export CUDA_VISIBLE_DEVICES=""
|
||||||
|
|
||||||
|
./tdnn/decode.py
|
||||||
|
|
||||||
|
The output logs are given below:
|
||||||
|
|
||||||
|
.. literalinclude:: ./code/decoding-yesno.txt
|
||||||
|
|
||||||
|
For the more curious
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
./tdnn/decode.py --help
|
||||||
|
|
||||||
|
will print the usage information about ``./tdnn/decode.py``. For instance, you
|
||||||
|
can specify:
|
||||||
|
|
||||||
|
- ``--epoch`` to use which checkpoint for decoding
|
||||||
|
- ``--avg`` to select how many checkpoints to use for model averaging
|
||||||
|
|
||||||
|
You usually try different combinations of ``--epoch`` and ``--avg`` and select
|
||||||
|
one that leads to the lowest WER (`Word Error Rate <https://en.wikipedia.org/wiki/Word_error_rate>`_).
|
121
docs/source/for-dummies/environment-setup.rst
Normal file
121
docs/source/for-dummies/environment-setup.rst
Normal file
@ -0,0 +1,121 @@
|
|||||||
|
.. _dummies_tutorial_environment_setup:
|
||||||
|
|
||||||
|
Environment setup
|
||||||
|
=================
|
||||||
|
|
||||||
|
We will create an environment for `Next-gen Kaldi`_ that runs on ``CPU``
|
||||||
|
in this tutorial.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Since the `yesno`_ dataset used in this tutorial is very tiny, training on
|
||||||
|
``CPU`` works very well for it.
|
||||||
|
|
||||||
|
If your dataset is very large, e.g., hundreds or thousands of hours of
|
||||||
|
training data, please follow :ref:`install icefall` to install `icefall`_
|
||||||
|
that works with ``GPU``.
|
||||||
|
|
||||||
|
|
||||||
|
Create a virtual environment
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
virtualenv -p python3 /tmp/icefall_env
|
||||||
|
|
||||||
|
The above command creates a virtual environment in the directory ``/tmp/icefall_env``.
|
||||||
|
You can select any directory you want.
|
||||||
|
|
||||||
|
The output of the above command is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
Already using interpreter /usr/bin/python3
|
||||||
|
Using base prefix '/usr'
|
||||||
|
New python executable in /tmp/icefall_env/bin/python3
|
||||||
|
Also creating executable in /tmp/icefall_env/bin/python
|
||||||
|
Installing setuptools, pkg_resources, pip, wheel...done.
|
||||||
|
|
||||||
|
Now we can activate the environment using:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
source /tmp/icefall_env/bin/activate
|
||||||
|
|
||||||
|
Install dependencies
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
Remeber to activate your virtual environment before you continue!
|
||||||
|
|
||||||
|
After activating the virtual environment, we can use the following command
|
||||||
|
to install dependencies of `icefall`_:
|
||||||
|
|
||||||
|
.. hint::
|
||||||
|
|
||||||
|
Remeber that we will run this tutorial on ``CPU``, so we install
|
||||||
|
dependencies required only by running on ``CPU``.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# Caution: Installation order matters!
|
||||||
|
|
||||||
|
# We use torch 2.0.0 and torchaduio 2.0.0 in this tutorial.
|
||||||
|
# Other versions should also work.
|
||||||
|
|
||||||
|
pip install torch==2.0.0+cpu torchaudio==2.0.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
|
||||||
|
|
||||||
|
# If you are using macOS or Windows, please use the following command to install torch and torchaudio
|
||||||
|
# pip install torch==2.0.0 torchaudio==2.0.0 -f https://download.pytorch.org/whl/torch_stable.html
|
||||||
|
|
||||||
|
# Now install k2
|
||||||
|
# Please refer to https://k2-fsa.github.io/k2/installation/from_wheels.html#linux-cpu-example
|
||||||
|
|
||||||
|
pip install k2==1.24.3.dev20230726+cpu.torch2.0.0 -f https://k2-fsa.github.io/k2/cpu.html
|
||||||
|
|
||||||
|
# Install the latest version of lhotse
|
||||||
|
|
||||||
|
pip install git+https://github.com/lhotse-speech/lhotse
|
||||||
|
|
||||||
|
|
||||||
|
Install icefall
|
||||||
|
---------------
|
||||||
|
|
||||||
|
We will put the source code of `icefall`_ into the directory ``/tmp``
|
||||||
|
You can select any directory you want.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd /tmp
|
||||||
|
git clone https://github.com/k2-fsa/icefall
|
||||||
|
cd icefall
|
||||||
|
pip install -r ./requirements.txt
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# Anytime we want to use icefall, we have to set the following
|
||||||
|
# environment variable
|
||||||
|
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
|
||||||
|
.. hint::
|
||||||
|
|
||||||
|
If you get the following error during this tutorial:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
ModuleNotFoundError: No module named 'icefall'
|
||||||
|
|
||||||
|
please set the above environment variable to fix it.
|
||||||
|
|
||||||
|
|
||||||
|
Congratulations! You have installed `icefall`_ successfully.
|
||||||
|
|
||||||
|
For the more curious
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
`icefall`_ contains a collection of Python scripts and you don't need to
|
||||||
|
use ``python3 setup.py install`` or ``pip install icefall`` to install it.
|
||||||
|
All you need to do is to download the code and set the environment variable
|
||||||
|
``PYTHONPATH``.
|
34
docs/source/for-dummies/index.rst
Normal file
34
docs/source/for-dummies/index.rst
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
Icefall for dummies tutorial
|
||||||
|
============================
|
||||||
|
|
||||||
|
This tutorial walks you step by step about how to create a simple
|
||||||
|
ASR (`Automatic Speech Recognition <https://en.wikipedia.org/wiki/Speech_recognition>`_)
|
||||||
|
system with `Next-gen Kaldi`_.
|
||||||
|
|
||||||
|
We use the `yesno`_ dataset for demonstration. We select it out of two reasons:
|
||||||
|
|
||||||
|
- It is quite tiny, containing only about 12 minutes of data
|
||||||
|
- The training can be finished within 20 seconds on ``CPU``.
|
||||||
|
|
||||||
|
That also means you don't need a ``GPU`` to run this tutorial.
|
||||||
|
|
||||||
|
Let's get started!
|
||||||
|
|
||||||
|
Please follow items below **sequentially**.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The :ref:`dummies_tutorial_data_preparation` runs only on Linux and on macOS.
|
||||||
|
All other parts run on Linux, macOS, and Windows.
|
||||||
|
|
||||||
|
Help from the community is appreciated to port the :ref:`dummies_tutorial_data_preparation`
|
||||||
|
to Windows.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
|
||||||
|
./environment-setup.rst
|
||||||
|
./data-preparation.rst
|
||||||
|
./training.rst
|
||||||
|
./decoding.rst
|
||||||
|
./model-export.rst
|
310
docs/source/for-dummies/model-export.rst
Normal file
310
docs/source/for-dummies/model-export.rst
Normal file
@ -0,0 +1,310 @@
|
|||||||
|
Model Export
|
||||||
|
============
|
||||||
|
|
||||||
|
There are three ways to export a pre-trained model.
|
||||||
|
|
||||||
|
- Export the model parameters via `model.state_dict() <https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=load_state_dict#torch.nn.Module.state_dict>`_
|
||||||
|
- Export via `torchscript <https://pytorch.org/docs/stable/jit.html>`_: either `torch.jit.script() <https://pytorch.org/docs/stable/generated/torch.jit.script.html#torch.jit.script>`_ or `torch.jit.trace() <https://pytorch.org/docs/stable/generated/torch.jit.trace.html>`_
|
||||||
|
- Export to `ONNX`_ via `torch.onnx.export() <https://pytorch.org/docs/stable/onnx.html>`_
|
||||||
|
|
||||||
|
Each method is explained below in detail.
|
||||||
|
|
||||||
|
Export the model parameters via model.state_dict()
|
||||||
|
---------------------------------------------------
|
||||||
|
|
||||||
|
The command for this kind of export is
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd /tmp/icefall
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
cd egs/yesno/ASR
|
||||||
|
|
||||||
|
# assume that "--epoch 14 --avg 2" produces the lowest WER.
|
||||||
|
|
||||||
|
./tdnn/export.py --epoch 14 --avg 2
|
||||||
|
|
||||||
|
The output logs are given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 20:42:03,912 INFO [export.py:76] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'epoch': 14, 'avg': 2, 'jit': False}
|
||||||
|
2023-08-16 20:42:03,913 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
||||||
|
2023-08-16 20:42:03,950 INFO [export.py:93] averaging ['tdnn/exp/epoch-13.pt', 'tdnn/exp/epoch-14.pt']
|
||||||
|
2023-08-16 20:42:03,971 INFO [export.py:106] Not using torch.jit.script
|
||||||
|
2023-08-16 20:42:03,974 INFO [export.py:111] Saved to tdnn/exp/pretrained.pt
|
||||||
|
|
||||||
|
We can see from the logs that the exported model is saved to the file ``tdnn/exp/pretrained.pt``.
|
||||||
|
|
||||||
|
To give you an idea of what ``tdnn/exp/pretrained.pt`` contains, we can use the following command:
|
||||||
|
|
||||||
|
.. code-block:: python3
|
||||||
|
|
||||||
|
>>> import torch
|
||||||
|
>>> m = torch.load("tdnn/exp/pretrained.pt")
|
||||||
|
>>> list(m.keys())
|
||||||
|
['model']
|
||||||
|
>>> list(m["model"].keys())
|
||||||
|
['tdnn.0.weight', 'tdnn.0.bias', 'tdnn.2.running_mean', 'tdnn.2.running_var', 'tdnn.2.num_batches_tracked', 'tdnn.3.weight', 'tdnn.3.bias', 'tdnn.5.running_mean', 'tdnn.5.running_var', 'tdnn.5.num_batches_tracked', 'tdnn.6.weight', 'tdnn.6.bias', 'tdnn.8.running_mean', 'tdnn.8.running_var', 'tdnn.8.num_batches_tracked', 'output_linear.weight', 'output_linear.bias']
|
||||||
|
|
||||||
|
We can use ``tdnn/exp/pretrained.pt`` in the following way with ``./tdnn/decode.py``:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd tdnn/exp
|
||||||
|
ln -s pretrained.pt epoch-99.pt
|
||||||
|
cd ../..
|
||||||
|
|
||||||
|
./tdnn/decode.py --epoch 99 --avg 1
|
||||||
|
|
||||||
|
The output logs of the above command are given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 20:45:48,089 INFO [decode.py:262] Decoding started
|
||||||
|
2023-08-16 20:45:48,090 INFO [decode.py:263] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'feature_dim': 23, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 99, 'avg': 1, 'export': False, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': False, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': 'ad79f1c699c684de9785ed6ca5edb805a41f78c3', 'k2-git-date': 'Wed Jul 26 09:30:42 2023', 'lhotse-version': '1.16.0.dev+git.aa073f6.clean', 'torch-version': '2.0.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.1', 'icefall-git-branch': 'master', 'icefall-git-sha1': '9a47c08-clean', 'icefall-git-date': 'Mon Aug 14 22:10:50 2023', 'icefall-path': '/private/tmp/icefall', 'k2-path': '/private/tmp/icefall_env/lib/python3.11/site-packages/k2/__init__.py', 'lhotse-path': '/private/tmp/icefall_env/lib/python3.11/site-packages/lhotse/__init__.py', 'hostname': 'fangjuns-MacBook-Pro.local', 'IP address': '127.0.0.1'}}
|
||||||
|
2023-08-16 20:45:48,092 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
||||||
|
2023-08-16 20:45:48,103 INFO [decode.py:272] device: cpu
|
||||||
|
2023-08-16 20:45:48,109 INFO [checkpoint.py:112] Loading checkpoint from tdnn/exp/epoch-99.pt
|
||||||
|
2023-08-16 20:45:48,115 INFO [asr_datamodule.py:218] About to get test cuts
|
||||||
|
2023-08-16 20:45:48,115 INFO [asr_datamodule.py:253] About to get test cuts
|
||||||
|
2023-08-16 20:45:50,386 INFO [decode.py:203] batch 0/?, cuts processed until now is 4
|
||||||
|
2023-08-16 20:45:50,556 INFO [decode.py:240] The transcripts are stored in tdnn/exp/recogs-test_set.txt
|
||||||
|
2023-08-16 20:45:50,557 INFO [utils.py:564] [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
|
||||||
|
2023-08-16 20:45:50,558 INFO [decode.py:248] Wrote detailed error stats to tdnn/exp/errs-test_set.txt
|
||||||
|
2023-08-16 20:45:50,559 INFO [decode.py:315] Done!
|
||||||
|
|
||||||
|
We can see that it produces an identical WER as before.
|
||||||
|
|
||||||
|
We can also use it to decode files with the following command:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# ./tdnn/pretrained.py requires kaldifeat
|
||||||
|
#
|
||||||
|
# Please refer to https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html
|
||||||
|
# for how to install kaldifeat
|
||||||
|
|
||||||
|
pip install kaldifeat==1.25.0.dev20230726+cpu.torch2.0.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html
|
||||||
|
|
||||||
|
./tdnn/pretrained.py \
|
||||||
|
--checkpoint ./tdnn/exp/pretrained.pt \
|
||||||
|
--HLG ./data/lang_phone/HLG.pt \
|
||||||
|
--words-file ./data/lang_phone/words.txt \
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav
|
||||||
|
|
||||||
|
The output is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 20:53:19,208 INFO [pretrained.py:136] {'feature_dim': 23, 'num_classes': 4, 'sample_rate': 8000, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'checkpoint': './tdnn/exp/pretrained.pt', 'words_file': './data/lang_phone/words.txt', 'HLG': './data/lang_phone/HLG.pt', 'sound_files': ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']}
|
||||||
|
2023-08-16 20:53:19,208 INFO [pretrained.py:142] device: cpu
|
||||||
|
2023-08-16 20:53:19,208 INFO [pretrained.py:144] Creating model
|
||||||
|
2023-08-16 20:53:19,212 INFO [pretrained.py:156] Loading HLG from ./data/lang_phone/HLG.pt
|
||||||
|
2023-08-16 20:53:19,213 INFO [pretrained.py:160] Constructing Fbank computer
|
||||||
|
2023-08-16 20:53:19,213 INFO [pretrained.py:170] Reading sound files: ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']
|
||||||
|
2023-08-16 20:53:19,224 INFO [pretrained.py:176] Decoding started
|
||||||
|
2023-08-16 20:53:19,304 INFO [pretrained.py:212]
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav:
|
||||||
|
NO NO NO YES NO NO NO YES
|
||||||
|
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav:
|
||||||
|
NO NO YES NO NO NO YES NO
|
||||||
|
|
||||||
|
|
||||||
|
2023-08-16 20:53:19,304 INFO [pretrained.py:214] Decoding Done
|
||||||
|
|
||||||
|
|
||||||
|
Export via torch.jit.script()
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
The command for this kind of export is
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd /tmp/icefall
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
cd egs/yesno/ASR
|
||||||
|
|
||||||
|
# assume that "--epoch 14 --avg 2" produces the lowest WER.
|
||||||
|
|
||||||
|
./tdnn/export.py --epoch 14 --avg 2 --jit true
|
||||||
|
|
||||||
|
The output logs are given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 20:47:44,666 INFO [export.py:76] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'epoch': 14, 'avg': 2, 'jit': True}
|
||||||
|
2023-08-16 20:47:44,667 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
||||||
|
2023-08-16 20:47:44,670 INFO [export.py:93] averaging ['tdnn/exp/epoch-13.pt', 'tdnn/exp/epoch-14.pt']
|
||||||
|
2023-08-16 20:47:44,677 INFO [export.py:100] Using torch.jit.script
|
||||||
|
2023-08-16 20:47:44,843 INFO [export.py:104] Saved to tdnn/exp/cpu_jit.pt
|
||||||
|
|
||||||
|
From the output logs we can see that the generated file is saved to ``tdnn/exp/cpu_jit.pt``.
|
||||||
|
|
||||||
|
Don't be confused by the name ``cpu_jit.pt``. The ``cpu`` part means the model is moved to
|
||||||
|
CPU before exporting. That means, when you load it with:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
torch.jit.load()
|
||||||
|
|
||||||
|
you don't need to specify the argument `map_location <https://pytorch.org/docs/stable/generated/torch.jit.load.html#torch.jit.load>`_
|
||||||
|
and it resides on CPU by default.
|
||||||
|
|
||||||
|
To use ``tdnn/exp/cpu_jit.pt`` with `icefall`_ to decode files, we can use:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# ./tdnn/jit_pretrained.py requires kaldifeat
|
||||||
|
#
|
||||||
|
# Please refer to https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html
|
||||||
|
# for how to install kaldifeat
|
||||||
|
|
||||||
|
pip install kaldifeat==1.25.0.dev20230726+cpu.torch2.0.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html
|
||||||
|
|
||||||
|
|
||||||
|
./tdnn/jit_pretrained.py \
|
||||||
|
--nn-model ./tdnn/exp/cpu_jit.pt \
|
||||||
|
--HLG ./data/lang_phone/HLG.pt \
|
||||||
|
--words-file ./data/lang_phone/words.txt \
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav
|
||||||
|
|
||||||
|
The output is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 20:56:00,603 INFO [jit_pretrained.py:121] {'feature_dim': 23, 'num_classes': 4, 'sample_rate': 8000, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'nn_model': './tdnn/exp/cpu_jit.pt', 'words_file': './data/lang_phone/words.txt', 'HLG': './data/lang_phone/HLG.pt', 'sound_files': ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']}
|
||||||
|
2023-08-16 20:56:00,603 INFO [jit_pretrained.py:127] device: cpu
|
||||||
|
2023-08-16 20:56:00,603 INFO [jit_pretrained.py:129] Loading torchscript model
|
||||||
|
2023-08-16 20:56:00,640 INFO [jit_pretrained.py:134] Loading HLG from ./data/lang_phone/HLG.pt
|
||||||
|
2023-08-16 20:56:00,641 INFO [jit_pretrained.py:138] Constructing Fbank computer
|
||||||
|
2023-08-16 20:56:00,641 INFO [jit_pretrained.py:148] Reading sound files: ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']
|
||||||
|
2023-08-16 20:56:00,642 INFO [jit_pretrained.py:154] Decoding started
|
||||||
|
2023-08-16 20:56:00,727 INFO [jit_pretrained.py:190]
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav:
|
||||||
|
NO NO NO YES NO NO NO YES
|
||||||
|
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav:
|
||||||
|
NO NO YES NO NO NO YES NO
|
||||||
|
|
||||||
|
|
||||||
|
2023-08-16 20:56:00,727 INFO [jit_pretrained.py:192] Decoding Done
|
||||||
|
|
||||||
|
.. hint::
|
||||||
|
|
||||||
|
We provide only code for ``torch.jit.script()``. You can try ``torch.jit.trace()``
|
||||||
|
if you want.
|
||||||
|
|
||||||
|
Export via torch.onnx.export()
|
||||||
|
------------------------------
|
||||||
|
|
||||||
|
The command for this kind of export is
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd /tmp/icefall
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
cd egs/yesno/ASR
|
||||||
|
|
||||||
|
# tdnn/export_onnx.py requires onnx and onnxruntime
|
||||||
|
pip install onnx onnxruntime
|
||||||
|
|
||||||
|
# assume that "--epoch 14 --avg 2" produces the lowest WER.
|
||||||
|
|
||||||
|
./tdnn/export_onnx.py \
|
||||||
|
--epoch 14 \
|
||||||
|
--avg 2
|
||||||
|
|
||||||
|
The output logs are given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 20:59:20,888 INFO [export_onnx.py:83] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'epoch': 14, 'avg': 2}
|
||||||
|
2023-08-16 20:59:20,888 INFO [lexicon.py:168] Loading pre-compiled data/lang_phone/Linv.pt
|
||||||
|
2023-08-16 20:59:20,892 INFO [export_onnx.py:100] averaging ['tdnn/exp/epoch-13.pt', 'tdnn/exp/epoch-14.pt']
|
||||||
|
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
|
||||||
|
verbose: False, log level: Level.ERROR
|
||||||
|
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
|
||||||
|
|
||||||
|
2023-08-16 20:59:21,047 INFO [export_onnx.py:127] Saved to tdnn/exp/model-epoch-14-avg-2.onnx
|
||||||
|
2023-08-16 20:59:21,047 INFO [export_onnx.py:136] meta_data: {'model_type': 'tdnn', 'version': '1', 'model_author': 'k2-fsa', 'comment': 'non-streaming tdnn for the yesno recipe', 'vocab_size': 4}
|
||||||
|
2023-08-16 20:59:21,049 INFO [export_onnx.py:140] Generate int8 quantization models
|
||||||
|
2023-08-16 20:59:21,075 INFO [onnx_quantizer.py:538] Quantization parameters for tensor:"/Transpose_1_output_0" not specified
|
||||||
|
2023-08-16 20:59:21,081 INFO [export_onnx.py:151] Saved to tdnn/exp/model-epoch-14-avg-2.int8.onnx
|
||||||
|
|
||||||
|
We can see from the logs that it generates two files:
|
||||||
|
|
||||||
|
- ``tdnn/exp/model-epoch-14-avg-2.onnx`` (ONNX model with ``float32`` weights)
|
||||||
|
- ``tdnn/exp/model-epoch-14-avg-2.int8.onnx`` (ONNX model with ``int8`` weights)
|
||||||
|
|
||||||
|
To use the generated ONNX model files for decoding with `onnxruntime`_, we can use
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
# ./tdnn/onnx_pretrained.py requires kaldifeat
|
||||||
|
#
|
||||||
|
# Please refer to https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html
|
||||||
|
# for how to install kaldifeat
|
||||||
|
|
||||||
|
pip install kaldifeat==1.25.0.dev20230726+cpu.torch2.0.0 -f https://csukuangfj.github.io/kaldifeat/cpu.html
|
||||||
|
|
||||||
|
./tdnn/onnx_pretrained.py \
|
||||||
|
--nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \
|
||||||
|
--HLG ./data/lang_phone/HLG.pt \
|
||||||
|
--words-file ./data/lang_phone/words.txt \
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav
|
||||||
|
|
||||||
|
The output is given below:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
2023-08-16 21:03:24,260 INFO [onnx_pretrained.py:166] {'feature_dim': 23, 'sample_rate': 8000, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'nn_model': './tdnn/exp/model-epoch-14-avg-2.onnx', 'words_file': './data/lang_phone/words.txt', 'HLG': './data/lang_phone/HLG.pt', 'sound_files': ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']}
|
||||||
|
2023-08-16 21:03:24,260 INFO [onnx_pretrained.py:171] device: cpu
|
||||||
|
2023-08-16 21:03:24,260 INFO [onnx_pretrained.py:173] Loading onnx model ./tdnn/exp/model-epoch-14-avg-2.onnx
|
||||||
|
2023-08-16 21:03:24,267 INFO [onnx_pretrained.py:176] Loading HLG from ./data/lang_phone/HLG.pt
|
||||||
|
2023-08-16 21:03:24,270 INFO [onnx_pretrained.py:180] Constructing Fbank computer
|
||||||
|
2023-08-16 21:03:24,273 INFO [onnx_pretrained.py:190] Reading sound files: ['download/waves_yesno/0_0_0_1_0_0_0_1.wav', 'download/waves_yesno/0_0_1_0_0_0_1_0.wav']
|
||||||
|
2023-08-16 21:03:24,279 INFO [onnx_pretrained.py:196] Decoding started
|
||||||
|
2023-08-16 21:03:24,318 INFO [onnx_pretrained.py:232]
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav:
|
||||||
|
NO NO NO YES NO NO NO YES
|
||||||
|
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav:
|
||||||
|
NO NO YES NO NO NO YES NO
|
||||||
|
|
||||||
|
|
||||||
|
2023-08-16 21:03:24,318 INFO [onnx_pretrained.py:234] Decoding Done
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
To use the ``int8`` ONNX model for decoding, please use:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
./tdnn/onnx_pretrained.py \
|
||||||
|
--nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \
|
||||||
|
--HLG ./data/lang_phone/HLG.pt \
|
||||||
|
--words-file ./data/lang_phone/words.txt \
|
||||||
|
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
|
||||||
|
download/waves_yesno/0_0_1_0_0_0_1_0.wav
|
||||||
|
|
||||||
|
For the more curious
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
If you are wondering how to deploy the model without ``torch``, please
|
||||||
|
continue reading. We will show how to use `sherpa-onnx`_ to run the
|
||||||
|
exported ONNX models, which depends only on `onnxruntime`_ and does not
|
||||||
|
depend on ``torch``.
|
||||||
|
|
||||||
|
In this tutorial, we will only demonstrate the usage of `sherpa-onnx`_ with the
|
||||||
|
pre-trained model of the `yesno`_ recipe. There are also other two frameworks
|
||||||
|
available:
|
||||||
|
|
||||||
|
- `sherpa`_. It works with torchscript models.
|
||||||
|
- `sherpa-ncnn`_. It works with models exported using :ref:`icefall_export_to_ncnn` with `ncnn`_
|
||||||
|
|
||||||
|
Please see `<https://k2-fsa.github.io/sherpa/>`_ for further details.
|
39
docs/source/for-dummies/training.rst
Normal file
39
docs/source/for-dummies/training.rst
Normal file
@ -0,0 +1,39 @@
|
|||||||
|
.. _dummies_tutorial_training:
|
||||||
|
|
||||||
|
Training
|
||||||
|
========
|
||||||
|
|
||||||
|
After :ref:`dummies_tutorial_data_preparation`, we can start training.
|
||||||
|
|
||||||
|
The command to start the training is quite simple:
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
cd /tmp/icefall
|
||||||
|
export PYTHONPATH=/tmp/icefall:$PYTHONPATH
|
||||||
|
cd egs/yesno/ASR
|
||||||
|
|
||||||
|
# We use CPU for training by setting the following environment variable
|
||||||
|
export CUDA_VISIBLE_DEVICES=""
|
||||||
|
|
||||||
|
./tdnn/train.py
|
||||||
|
|
||||||
|
That's it!
|
||||||
|
|
||||||
|
You can find the training logs below:
|
||||||
|
|
||||||
|
.. literalinclude:: ./code/train-yesno.txt
|
||||||
|
|
||||||
|
For the more curious
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
./tdnn/train.py --help
|
||||||
|
|
||||||
|
will print the usage information about ``./tdnn/train.py``. For instance, you
|
||||||
|
can specify the number of epochs to train and the location to save the training
|
||||||
|
results.
|
||||||
|
|
||||||
|
The training text logs are saved in ``tdnn/exp/log`` while the tensorboard
|
||||||
|
logs are in ``tdnn/exp/tensorboard``.
|
@ -20,6 +20,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
|
|||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
:caption: Contents:
|
:caption: Contents:
|
||||||
|
|
||||||
|
for-dummies/index.rst
|
||||||
installation/index
|
installation/index
|
||||||
docker/index
|
docker/index
|
||||||
faqs
|
faqs
|
||||||
|
@ -6,6 +6,7 @@ This file shows how to use an ONNX model for decoding with onnxruntime.
|
|||||||
Usage:
|
Usage:
|
||||||
|
|
||||||
(1) Use a not quantized ONNX model, i.e., a float32 model
|
(1) Use a not quantized ONNX model, i.e., a float32 model
|
||||||
|
|
||||||
./tdnn/onnx_pretrained.py \
|
./tdnn/onnx_pretrained.py \
|
||||||
--nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \
|
--nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \
|
||||||
--HLG ./data/lang_phone/HLG.pt \
|
--HLG ./data/lang_phone/HLG.pt \
|
||||||
|
Loading…
x
Reference in New Issue
Block a user