icefall/docs/source/contributing/how-to-create-a-recipe.rst
hairyputtar 3fb99400cf
fix typos (#1336)
* fix typo

* fix typo

* Update pruned_transducer_stateless.rst
2023-10-24 15:47:25 +08:00

157 lines
3.8 KiB
ReStructuredText

How to create a recipe
======================
.. HINT::
Please read :ref:`follow the code style` to adjust your code style.
.. CAUTION::
``icefall`` is designed to be as Pythonic as possible. Please use
Python in your recipe if possible.
Data Preparation
----------------
We recommend you to prepare your training/test/validate dataset
with `lhotse <https://github.com/lhotse-speech/lhotse>`_.
Please refer to `<https://lhotse.readthedocs.io/en/latest/index.html>`_
for how to create a recipe in ``lhotse``.
.. HINT::
The ``yesno`` recipe in ``lhotse`` is a very good example.
Please refer to `<https://github.com/lhotse-speech/lhotse/pull/380>`_,
which shows how to add a new recipe to ``lhotse``.
Suppose you would like to add a recipe for a dataset named ``foo``.
You can do the following:
.. code-block::
$ cd egs
$ mkdir -p foo/ASR
$ cd foo/ASR
$ touch prepare.sh
$ chmod +x prepare.sh
If your dataset is very simple, please follow
`egs/yesno/ASR/prepare.sh <https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh>`_
to write your own ``prepare.sh``.
Otherwise, please refer to
`egs/librispeech/ASR/prepare.sh <https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/prepare.sh>`_
to prepare your data.
Training
--------
Assume you have a fancy model, called ``bar`` for the ``foo`` recipe, you can
organize your files in the following way:
.. code-block::
$ cd egs/foo/ASR
$ mkdir bar
$ cd bar
$ touch README.md model.py train.py decode.py asr_datamodule.py pretrained.py
For instance , the ``yesno`` recipe has a ``tdnn`` model and its directory structure
looks like the following:
.. code-block:: bash
egs/yesno/ASR/tdnn/
|-- README.md
|-- asr_datamodule.py
|-- decode.py
|-- model.py
|-- pretrained.py
`-- train.py
**File description**:
- ``README.md``
It contains information of this recipe, e.g., how to run it, what the WER is, etc.
- ``asr_datamodule.py``
It provides code to create PyTorch dataloaders with train/test/validation dataset.
- ``decode.py``
It takes as inputs the checkpoints saved during the training stage to decode the test
dataset(s).
- ``model.py``
It contains the definition of your fancy neural network model.
- ``pretrained.py``
We can use this script to do inference with a pre-trained model.
- ``train.py``
It contains training code.
.. HINT::
Please take a look at
- `egs/yesno/tdnn <https://github.com/k2-fsa/icefall/tree/master/egs/yesno/ASR/tdnn>`_
- `egs/librispeech/tdnn_lstm_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc>`_
- `egs/librispeech/conformer_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc>`_
to get a feel what the resulting files look like.
.. NOTE::
Every model in a recipe is kept to be as self-contained as possible.
We tolerate duplicate code among different recipes.
The training stage should be invocable by:
.. code-block::
$ cd egs/foo/ASR
$ ./bar/train.py
$ ./bar/train.py --help
Decoding
--------
Please refer to
- `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conformer_ctc/decode.py>`_
If your model is transformer/conformer based.
- `<https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py>`_
If your model is TDNN/LSTM based, i.e., there is no attention decoder.
- `<https://github.com/k2-fsa/icefall/blob/master/egs/yesno/ASR/tdnn/decode.py>`_
If there is no LM rescoring.
The decoding stage should be invocable by:
.. code-block::
$ cd egs/foo/ASR
$ ./bar/decode.py
$ ./bar/decode.py --help
Pre-trained model
-----------------
Please demonstrate how to use your model for inference in ``egs/foo/ASR/bar/pretrained.py``.
If possible, please consider creating a Colab notebook to show that.