mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-08 09:32:20 +00:00
157 lines
3.8 KiB
ReStructuredText
157 lines
3.8 KiB
ReStructuredText
How to create a recipe
|
|
======================
|
|
|
|
.. HINT::
|
|
|
|
Please read :ref:`follow the code style` to adjust your code style.
|
|
|
|
.. CAUTION::
|
|
|
|
``icefall`` is designed to be as Pythonic as possible. Please use
|
|
Python in your recipe if possible.
|
|
|
|
Data Preparation
|
|
----------------
|
|
|
|
We recommend you to prepare your training/test/validate dataset
|
|
with `lhotse <https://github.com/lhotse-speech/lhotse>`_.
|
|
|
|
Please refer to `<https://lhotse.readthedocs.io/en/latest/index.html>`_
|
|
for how to create a recipe in ``lhotse``.
|
|
|
|
.. HINT::
|
|
|
|
The ``yesno`` recipe in ``lhotse`` is a very good example.
|
|
|
|
Please refer to `<https://github.com/lhotse-speech/lhotse/pull/380>`_,
|
|
which shows how to add a new recipe to ``lhotse``.
|
|
|
|
Suppose you would like to add a recipe for a dataset named ``foo``.
|
|
You can do the following:
|
|
|
|
.. code-block::
|
|
|
|
$ cd egs
|
|
$ mkdir -p foo/ASR
|
|
$ cd foo/ASR
|
|
$ touch prepare.sh
|
|
$ chmod +x prepare.sh
|
|
|
|
If your dataset is very simple, please follow
|
|
`egs/yesno/ASR/prepare.sh <https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh>`_
|
|
to write your own ``prepare.sh``.
|
|
Otherwise, please refer to
|
|
`egs/librispeech/ASR/prepare.sh <https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh>`_
|
|
to prepare your data.
|
|
|
|
|
|
Training
|
|
--------
|
|
|
|
Assume you have a fancy model, called ``bar`` for the ``foo`` recipe, you can
|
|
organize your files in the following way:
|
|
|
|
.. code-block::
|
|
|
|
$ cd egs/foo/ASR
|
|
$ mkdir bar
|
|
$ cd bar
|
|
$ touch README.md model.py train.py decode.py asr_datamodule.py pretrained.py
|
|
|
|
For instance , the ``yesno`` recipe has a ``tdnn`` model and its directory structure
|
|
looks like the following:
|
|
|
|
.. code-block:: bash
|
|
|
|
egs/yesno/ASR/tdnn/
|
|
|-- README.md
|
|
|-- asr_datamodule.py
|
|
|-- decode.py
|
|
|-- model.py
|
|
|-- pretrained.py
|
|
`-- train.py
|
|
|
|
**File description**:
|
|
|
|
- ``README.md``
|
|
|
|
It contains information of this recipe, e.g., how to run it, what the WER is, etc.
|
|
|
|
- ``asr_datamodule.py``
|
|
|
|
It provides code to create PyTorch dataloaders with train/test/validation dataset.
|
|
|
|
- ``decode.py``
|
|
|
|
It takes as inputs the checkpoints saved during the training stage to decode the test
|
|
dataset(s).
|
|
|
|
- ``model.py``
|
|
|
|
It contains the definition of your fancy neural network model.
|
|
|
|
- ``pretrained.py``
|
|
|
|
We can use this script to do inference with a pre-trained model.
|
|
|
|
- ``train.py``
|
|
|
|
It contains training code.
|
|
|
|
|
|
.. HINT::
|
|
|
|
Please take a look at
|
|
|
|
- `egs/yesno/tdnn <https://github.com/k2-fsa/icefall/tree/master/egs/yesno/ASR/tdnn>`_
|
|
- `egs/librispeech/tdnn_lstm_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc>`_
|
|
- `egs/librispeech/conformer_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc>`_
|
|
|
|
to get a feel what the resulting files look like.
|
|
|
|
.. NOTE::
|
|
|
|
Every model in a recipe is kept to be as self-contained as possible.
|
|
We tolerate duplicate code among different recipes.
|
|
|
|
|
|
The training stage should be invocable by:
|
|
|
|
.. code-block::
|
|
|
|
$ cd egs/foo/ASR
|
|
$ ./bar/train.py
|
|
$ ./bar/train.py --help
|
|
|
|
|
|
Decoding
|
|
--------
|
|
|
|
Please refer to
|
|
|
|
- `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conformer_ctc/decode.py>`_
|
|
|
|
If your model is transformer/conformer based.
|
|
|
|
- `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py>`_
|
|
|
|
If your model is TDNN/LSTM based, i.e., there is no attention decoder.
|
|
|
|
- `<https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/tdnn/decode.py>`_
|
|
|
|
If there is no LM rescoring.
|
|
|
|
The decoding stage should be invocable by:
|
|
|
|
.. code-block::
|
|
|
|
$ cd egs/foo/ASR
|
|
$ ./bar/decode.py
|
|
$ ./bar/decode.py --help
|
|
|
|
Pre-trained model
|
|
-----------------
|
|
|
|
Please demonstrate how to use your model for inference in ``egs/foo/ASR/bar/pretrained.py``.
|
|
If possible, please consider creating a Colab notebook to show that.
|