WIP: Add doc for the LibriSpeech recipe. (#24)

* WIP: Add doc for the LibriSpeech recipe. * Add more doc for LibriSpeech recipe. * Add more doc for the LibriSpeech recipe. * More doc.
2021-08-24 20:28:32 +08:00 · 2021-08-24 20:28:32 +08:00 · 1bd5dcc8ac
commit 1bd5dcc8ac
parent 01da00dca0
13 changed files with 777 additions and 416 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -3,7 +3,7 @@
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

-icefall
+Icefall
 =======

 .. image:: _static/logo.png
--- a/docs/source/recipes/librispeech.rst
+++ b/docs/source/recipes/librispeech.rst
@ -1,2 +1,10 @@
 LibriSpeech
 ===========
+
+We provide the following models for the LibriSpeech dataset:
+
+.. toctree::
+   :maxdepth: 2
+
+   librispeech/tdnn_lstm_ctc
+   librispeech/conformer_ctc
--- a/docs/source/recipes/librispeech/conformer_ctc.rst
+++ b/docs/source/recipes/librispeech/conformer_ctc.rst
@ -0,0 +1,627 @@
+Confromer CTC
+=============
+
+This tutorial shows you how to run a conformer ctc model
+with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
+
+
+.. HINT::
+
+  We assume you have read the page :ref:`install icefall` and have setup
+  the environment for ``icefall``.
+
+.. HINT::
+
+  We recommend you to use a GPU or several GPUs to run this recipe.
+
+In this tutorial, you will learn:
+
+  - (1) How to prepare data for training and decoding
+  - (2) How to start the training, either with a single GPU or multiple GPUs
+  - (3) How to do decoding after training, with n-gram LM rescoring and attention decoder rescoring
+  - (4) How to use a pre-trained model, provided by us
+
+Data preparation
+----------------
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./prepare.sh
+
+The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
+All you need to do is to run it.
+
+The data preparation contains several stages, you can use the following two
+options:
+
+  - ``--stage``
+  - ``--stop-stage``
+
+to control which stage(s) should be run. By default, all stages are executed.
+
+
+For example,
+
+.. code-block:: bash
+
+  $ cd egs/yesno/ASR
+  $ ./prepare.sh --stage 0 --stop-stage 0
+
+means to run only stage 0.
+
+To run stage 2 to stage 5, use:
+
+.. code-block:: bash
+
+  $ ./prepare.sh --stage 2 --stop-stage 5
+
+.. HINT::
+
+  If you have pre-downloaded the `LibriSpeech <https://www.openslr.org/12>`_
+  dataset and the `musan <http://www.openslr.org/17/>`_ dataset, say,
+  they are saved in ``/tmp/LibriSpeech`` and ``/tmp/musan``, you can modify
+  the ``dl_dir`` variable in ``./prepare.sh`` to point to ``/tmp`` so that
+  ``./prepare.sh`` won't re-download them.
+
+.. NOTE::
+
+  All generated files by ``./prepare.sh``, e.g., features, lexicon, etc,
+  are saved in ``./data`` directory.
+
+
+Training
+--------
+
+Configurable options
+~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/train.py --help
+
+shows you the training options that can be passed from the commandline.
+The following options are used quite often:
+
+  - ``--full-libri``
+
+    If it's True, the training part uses all the training data, i.e.,
+    960 hours. Otherwise, the training part uses only the subset
+    ``train-clean-100``, which has 100 hours of training data.
+
+    .. CAUTION::
+
+      The training set is perturbed by speed with two factors: 0.9 and 1.1.
+      If ``--full-libri`` is True, each epoch actually processes
+      ``3x960 == 2880`` hours of data.
+
+  - ``--num-epochs``
+
+    It is the number of epochs to train. For instance,
+    ``./conformer_ctc/train.py --num-epochs 30`` trains for 30 epochs
+    and generates ``epoch-0.pt``, ``epoch-1.pt``, ..., ``epoch-29.pt``
+    in the folder ``./conformer_ctc/exp``.
+
+  - ``--start-epoch``
+
+    It's used to resume training.
+    ``./conformer_ctc/train.py --start-epoch 10`` loads the
+    checkpoint ``./conformer_ctc/exp/epoch-9.pt`` and starts
+    training from epoch 10, based on the state from epoch 9.
+
+  - ``--world-size``
+
+    It is used for multi-GPU single-machine DDP training.
+
+      - (a) If it is 1, then no DDP training is used.
+
+      - (b) If it is 2, then GPU 0 and GPU 1 are used for DDP training.
+
+    The following shows some use cases with it.
+
+      **Use case 1**: You have 4 GPUs, but you only want to use GPU 0 and
+      GPU 2 for training. You can do the following:
+
+        .. code-block:: bash
+
+          $ cd egs/librispeech/ASR
+          $ export CUDA_VISIBLE_DEVICES="0,2"
+          $ ./conformer_ctc/train.py --world-size 2
+
+      **Use case 2**: You have 4 GPUs and you want to use all of them
+      for training. You can do the following:
+
+        .. code-block:: bash
+
+          $ cd egs/librispeech/ASR
+          $ ./conformer_ctc/train.py --world-size 4
+
+      **Use case 3**: You have 4 GPUs but you only want to use GPU 3
+      for training. You can do the following:
+
+        .. code-block:: bash
+
+          $ cd egs/librispeech/ASR
+          $ export CUDA_VISIBLE_DEVICES="3"
+          $ ./conformer_ctc/train.py --world-size 1
+
+    .. CAUTION::
+
+      Only multi-GPU single-machine DDP training is implemented at present.
+      Multi-GPU multi-machine DDP training will be added later.
+
+  - ``--max-duration``
+
+    It specifies the number of seconds over all utterances in a
+    batch, before **padding**.
+    If you encounter CUDA OOM, please reduce it. For instance, if
+    your are using V100 NVIDIA GPU, we recommend you to set it to ``200``.
+
+    .. HINT::
+
+      Due to padding, the number of seconds of all utterances in a
+      batch will usually be larger than ``--max-duration``.
+
+      A larger value for ``--max-duration`` may cause OOM during training,
+      while a smaller value may increase the training time. You have to
+      tune it.
+
+
+Pre-configured options
+~~~~~~~~~~~~~~~~~~~~~~
+
+There are some training options, e.g., learning rate,
+number of warmup steps, results dir, etc,
+that are not passed from the commandline.
+They are pre-configured by the function ``get_params()`` in
+`conformer_ctc/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/conformer_ctc/train.py>`_
+
+You don't need to change these pre-configured parameters. If you really need to change
+them, please modify ``./conformer_ctc/train.py`` directly.
+
+
+Training logs
+~~~~~~~~~~~~~
+
+Training logs and checkpoints are saved in ``conformer_ctc/exp``.
+You will find the following files in that directory:
+
+  - ``epoch-0.pt``, ``epoch-1.pt``, ...
+
+    These are checkpoint files, containing model ``state_dict`` and optimizer ``state_dict``.
+    To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
+
+      .. code-block:: bash
+
+        $ ./conformer_ctc/train.py --start-epoch 11
+
+  - ``tensorboard/``
+
+    This folder contains TensorBoard logs. Training loss, validation loss, learning
+    rate, etc, are recorded in these logs. You can visualize them by:
+
+      .. code-block:: bash
+
+        $ cd conformer_ctc/exp/tensorboard
+        $ tensorboard dev upload --logdir . --description "Conformer CTC training for LibriSpeech with icefall"
+
+    It will print something like below:
+
+      .. code-block::
+
+        TensorFlow installation not found - running with reduced feature set.
+        Upload started and will continue reading any new data as it's added to the logdir.
+
+        To stop uploading, press Ctrl-C.
+
+        New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/
+
+        [2021-08-24T16:42:43] Started scanning logdir.
+        Uploading 4540 scalars...
+
+    Note there is a URL in the above output, click it and you will see
+    the following screenshot:
+
+      .. figure:: images/librispeech-conformer-ctc-tensorboard-log.png
+         :width: 600
+         :alt: TensorBoard screenshot
+         :align: center
+         :target: https://tensorboard.dev/experiment/lzGnETjwRxC3yghNMd4kPw/
+
+         TensorBoard screenshot.
+
+  - ``log/log-train-xxxx``
+
+    It is the detailed training log in text format, same as the one
+    you saw printed to the console during training.
+
+Usage examples
+~~~~~~~~~~~~~~
+
+The following shows typical use cases:
+
+**Case 1**
+^^^^^^^^^^
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/train.py --max-duration 200 --full-libri 0
+
+It uses ``--max-duration`` of 200 to avoid OOM.  Also, it uses only
+a subset of the LibriSpeech data for training.
+
+
+**Case 2**
+^^^^^^^^^^
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ export CUDA_VISIBLE_DEVICES="0,3"
+  $ ./conformer_ctc/train.py --world-size 2
+
+It uses GPU 0 and GPU 3 for DDP training.
+
+**Case 3**
+^^^^^^^^^^
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/train.py --num-epochs 10 --start-epoch 3
+
+It loads checkpoint ``./conformer_ctc/exp/epoch-2.pt`` and starts
+training from epoch 3. Also, it trains for 10 epochs.
+
+Decoding
+--------
+
+The decoding part uses checkpoints saved by the training part, so you have
+to run the training part first.
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/decode.py --help
+
+shows the options for decoding.
+
+The commonly used options are:
+
+  - ``--method``
+
+    This specifies the decoding method.
+
+    The following command uses attention decoder for rescoring:
+
+    .. code-block::
+
+      $ cd egs/librispeech/ASR
+      $ ./conformer_ctc/decode.py --method attention-decoder --max-duration 30 --lattice-score-scale 0.5
+
+  - ``--lattice-score-scale``
+
+    It is used to scaled down lattice scores so that we can more unique
+    paths for rescoring.
+
+  - ``--max-duration``
+
+    It has the same meaning as the one during training. A larger
+    value may cause OOM.
+
+Pre-trained Model
+-----------------
+
+We have uploaded the pre-trained model to
+`<https://huggingface.co/pkufool/icefall_asr_librispeech_conformer_ctc>`_.
+
+We describe how to use the pre-trained model to transcribe a sound file or
+multiple sound files in the following.
+
+Install kaldifeat
+~~~~~~~~~~~~~~~~~
+
+`kaldifeat <https://github.com/csukuangfj/kaldifeat>`_ is used to
+extract features for a single sound file or multiple soundfiles
+at the same time.
+
+Please refer to `<https://github.com/csukuangfj/kaldifeat>`_ for installation.
+
+Download the pre-trained model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following commands describe how to download the pre-trained model:
+
+.. code-block::
+
+  $ cd egs/librispeech/ASR
+  $ mkdir tmp
+  $ cd tmp
+  $ git lfs install
+  $ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_conformer_ctc
+
+.. CAUTION::
+
+  You have to use ``git lfs`` to download the pre-trained model.
+
+After downloading, you will have the following files:
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ tree tmp
+
+.. code-block:: bash
+
+  tmp
+  `-- icefall_asr_librispeech_conformer_ctc
+      |-- README.md
+      |-- data
+      |   |-- lang_bpe
+      |   |   |-- HLG.pt
+      |   |   |-- bpe.model
+      |   |   |-- tokens.txt
+      |   |   `-- words.txt
+      |   `-- lm
+      |       `-- G_4_gram.pt
+      |-- exp
+      |   `-- pretraind.pt
+      `-- test_wavs
+          |-- 1089-134686-0001.flac
+          |-- 1221-135766-0001.flac
+          |-- 1221-135766-0002.flac
+          `-- trans.txt
+
+  6 directories, 11 files
+
+**File descriptions**:
+
+  - ``data/lang_bpe/HLG.pt``
+
+      It is the decoding graph.
+
+  - ``data/lang_bpe/bpe.model``
+
+      It is a sentencepiece model. You can use it to reproduce our results.
+
+  - ``data/lang_bpe/tokens.txt``
+
+      It contains tokens and their IDs, generated from ``bpe.model``.
+      Provided only for convenience so that you can look up the SOS/EOS ID easily.
+
+  - ``data/lang_bpe/words.txt``
+
+      It contains words and their IDs.
+
+  - ``data/lm/G_4_gram.pt``
+
+      It is a 4-gram LM, useful for LM rescoring.
+
+  - ``exp/pretrained.pt``
+
+      It contains pre-trained model parameters, obtained by averaging
+      checkpoints from ``epoch-15.pt`` to ``epoch-34.pt``.
+      Note: We have removed optimizer ``state_dict`` to reduce file size.
+
+  - ``test_waves/*.flac``
+
+      It contains some test sound files from LibriSpeech ``test-clean`` dataset.
+
+  - `test_waves/trans.txt`
+
+      It contains the reference transcripts for the sound files in `test_waves/`.
+
+The information of the test sound files is listed below:
+
+.. code-block:: bash
+
+  $ soxi tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/*.flac
+
+  Input File     : 'tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac'
+  Channels       : 1
+  Sample Rate    : 16000
+  Precision      : 16-bit
+  Duration       : 00:00:06.62 = 106000 samples ~ 496.875 CDDA sectors
+  File Size      : 116k
+  Bit Rate       : 140k
+  Sample Encoding: 16-bit FLAC
+
+  Input File     : 'tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac'
+  Channels       : 1
+  Sample Rate    : 16000
+  Precision      : 16-bit
+  Duration       : 00:00:16.71 = 267440 samples ~ 1253.62 CDDA sectors
+  File Size      : 343k
+  Bit Rate       : 164k
+  Sample Encoding: 16-bit FLAC
+
+  Input File     : 'tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac'
+  Channels       : 1
+  Sample Rate    : 16000
+  Precision      : 16-bit
+  Duration       : 00:00:04.83 = 77200 samples ~ 361.875 CDDA sectors
+  File Size      : 105k
+  Bit Rate       : 174k
+  Sample Encoding: 16-bit FLAC
+
+  Total Duration of 3 files: 00:00:28.16
+
+Usage
+~~~~~
+
+.. code-block::
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/pretrained.py --help
+
+displays the help information.
+
+It supports three decoding methods:
+
+  - HLG decoding
+  - HLG + n-gram LM rescoring
+  - HLG + n-gram LM rescoring + attention decoder rescoring
+
+HLG decoding
+^^^^^^^^^^^^
+
+HLG decoding uses the best path of the decoding lattice as the decoding result.
+
+The command to run HLG decoding is:
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/pretrained.py \
+    --checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretraind.pt \
+    --words-file ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/words.txt \
+    --HLG ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac
+
+The output is given below:
+
+.. code-block::
+
+  2021-08-20 11:03:05,712 INFO [pretrained.py:217] device: cuda:0
+  2021-08-20 11:03:05,712 INFO [pretrained.py:219] Creating model
+  2021-08-20 11:03:11,345 INFO [pretrained.py:238] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt
+  2021-08-20 11:03:18,442 INFO [pretrained.py:255] Constructing Fbank computer
+  2021-08-20 11:03:18,444 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
+  2021-08-20 11:03:18,507 INFO [pretrained.py:271] Decoding started
+  2021-08-20 11:03:18,795 INFO [pretrained.py:300] Use HLG decoding
+  2021-08-20 11:03:19,149 INFO [pretrained.py:339]
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
+  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac:
+  GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
+  BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
+  YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
+
+  2021-08-20 11:03:19,149 INFO [pretrained.py:341] Decoding Done
+
+HLG decoding + LM rescoring
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It uses an n-gram LM to rescore the decoding lattice and the best
+path of the rescored lattice is the decoding result.
+
+The command to run HLG decoding + LM rescoring is:
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/pretrained.py \
+    --checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretraind.pt \
+    --words-file ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/words.txt \
+    --HLG ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
+    --method whole-lattice-rescoring \
+    --G ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt \
+    --ngram-lm-scale 0.8 \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac
+
+Its output is:
+
+.. code-block::
+
+  2021-08-20 11:12:17,565 INFO [pretrained.py:217] device: cuda:0
+  2021-08-20 11:12:17,565 INFO [pretrained.py:219] Creating model
+  2021-08-20 11:12:23,728 INFO [pretrained.py:238] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt
+  2021-08-20 11:12:30,035 INFO [pretrained.py:246] Loading G from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt
+  2021-08-20 11:13:10,779 INFO [pretrained.py:255] Constructing Fbank computer
+  2021-08-20 11:13:10,787 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
+  2021-08-20 11:13:10,798 INFO [pretrained.py:271] Decoding started
+  2021-08-20 11:13:11,085 INFO [pretrained.py:305] Use HLG decoding + LM rescoring
+  2021-08-20 11:13:11,736 INFO [pretrained.py:339]
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
+  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac:
+  GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
+  BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
+  YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
+
+  2021-08-20 11:13:11,737 INFO [pretrained.py:341] Decoding Done
+
+HLG decoding + LM rescoring + attention decoder rescoring
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It uses an n-gram LM to rescore the decoding lattice, extracts 
+n paths from the rescored lattice, recores the extracted paths with
+an attention decoder. The path with the highest score is the decoding result.
+
+The command to run HLG decoding + LM rescoring + attention decoder rescoring is:
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./conformer_ctc/pretrained.py \
+    --checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretraind.pt \
+    --words-file ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/words.txt \
+    --HLG ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
+    --method attention-decoder \
+    --G ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt \
+    --ngram-lm-scale 1.3 \
+    --attention-decoder-scale 1.2 \
+    --lattice-score-scale 0.5 \
+    --num-paths 100 \
+    --sos-id 1 \
+    --eos-id 1 \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \
+    ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac
+
+The output is below:
+
+.. code-block::
+
+  2021-08-20 11:19:11,397 INFO [pretrained.py:217] device: cuda:0
+  2021-08-20 11:19:11,397 INFO [pretrained.py:219] Creating model
+  2021-08-20 11:19:17,354 INFO [pretrained.py:238] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt
+  2021-08-20 11:19:24,615 INFO [pretrained.py:246] Loading G from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt
+  2021-08-20 11:20:04,576 INFO [pretrained.py:255] Constructing Fbank computer
+  2021-08-20 11:20:04,584 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
+  2021-08-20 11:20:04,595 INFO [pretrained.py:271] Decoding started
+  2021-08-20 11:20:04,854 INFO [pretrained.py:313] Use HLG + LM rescoring + attention decoder rescoring
+  2021-08-20 11:20:05,805 INFO [pretrained.py:339]
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
+  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac:
+  GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
+  BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+
+  ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
+  YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
+
+  2021-08-20 11:20:05,805 INFO [pretrained.py:341] Decoding Done
+
+Colab notebook
+--------------
+
+We do provide a colab notebook for this recipe showing how to use a pre-trained model.
+
+|librispeech asr conformer ctc colab notebook|
+
+.. |librispeech asr conformer ctc colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg
+   :target: https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing
+
+.. HINT::
+
+  Due to limited memory provided by Colab, you have to upgrade to Colab Pro to
+  run ``HLG decoding + LM rescoring`` and
+  ``HLG decoding + LM rescoring + attention decoder rescoring``.
+  Otherwise, you can only run ``HLG decoding`` with Colab.
+
+**Congratulations!** You have finished the librispeech ASR recipe with
+conformer CTC models in ``icefall``.
--- a/docs/source/recipes/librispeech/images/librispeech-conformer-ctc-tensorboard-log.png
+++ b/docs/source/recipes/librispeech/images/librispeech-conformer-ctc-tensorboard-log.png
--- a/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst
+++ b/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst
@ -0,0 +1,2 @@
+TDNN LSTM CTC
+=============
--- a/docs/source/recipes/yesno.rst
+++ b/docs/source/recipes/yesno.rst
@ -1,7 +1,7 @@
 yesno
 =====

-This page shows you how to run the ``yesno`` recipe. It contains:
+This page shows you how to run the `yesno <https://www.openslr.org/1>`_ recipe. It contains:

  - (1) Prepare data for training
  - (2) Train a TDNN model
--- a/egs/librispeech/ASR/conformer_ctc/README.md
+++ b/egs/librispeech/ASR/conformer_ctc/README.md
@ -1,351 +1,4 @@

-# How to use a pre-trained model to transcribe a sound file or multiple sound files
-
-(See the bottom of this document for the link to a colab notebook.)
-
-You need to prepare 4 files:
-
-  - a model checkpoint file, e.g., epoch-20.pt
-  - HLG.pt, the decoding graph
-  - words.txt, the word symbol table
-  - a sound file, whose sampling rate has to be 16 kHz.
-    Supported formats are those supported by `torchaudio.load()`,
-    e.g., wav and flac.
-
-Also, you need to install `kaldifeat`. Please refer to
-<https://github.com/csukuangfj/kaldifeat> for installation.
-
-```bash
-./conformer_ctc/pretrained.py --help
-```
-
-displays the help information.
-
-## HLG decoding
-
-Once you have the above files ready and have `kaldifeat` installed,
-you can run:
-
-```bash
-./conformer_ctc/pretrained.py \
-  --checkpoint /path/to/your/checkpoint.pt \
-  --words-file /path/to/words.txt \
-  --HLG /path/to/HLG.pt \
-  /path/to/your/sound.wav
-```
-
-and you will see the transcribed result.
-
-If you want to transcribe multiple files at the same time, you can use:
-
-```bash
-./conformer_ctc/pretrained.py \
-  --checkpoint /path/to/your/checkpoint.pt \
-  --words-file /path/to/words.txt \
-  --HLG /path/to/HLG.pt \
-  /path/to/your/sound1.wav \
-  /path/to/your/sound2.wav \
-  /path/to/your/sound3.wav
-```
-
-**Note**: This is the fastest decoding method.
-
-## HLG decoding + LM rescoring
-
-`./conformer_ctc/pretrained.py` also supports `whole lattice LM rescoring`
-and `attention decoder rescoring`.
-
-To use whole lattice LM rescoring, you also need the following files:
-
-  - G.pt, e.g., `data/lm/G_4_gram.pt` if you have run `./prepare.sh`
-
-The command to run decoding with LM rescoring is:
-
-```bash
-./conformer_ctc/pretrained.py \
-  --checkpoint /path/to/your/checkpoint.pt \
-  --words-file /path/to/words.txt \
-  --HLG /path/to/HLG.pt \
-  --method whole-lattice-rescoring \
-  --G data/lm/G_4_gram.pt \
-  --ngram-lm-scale 0.8 \
-  /path/to/your/sound1.wav \
-  /path/to/your/sound2.wav \
-  /path/to/your/sound3.wav
-```
-
-## HLG Decoding + LM rescoring + attention decoder rescoring
-
-To use attention decoder for rescoring, you need the following extra information:
-
-  - sos token ID
-  - eos token ID
-
-The command to run decoding with attention decoder rescoring is:
-
-```bash
-./conformer_ctc/pretrained.py \
-  --checkpoint /path/to/your/checkpoint.pt \
-  --words-file /path/to/words.txt \
-  --HLG /path/to/HLG.pt \
-  --method attention-decoder \
-  --G data/lm/G_4_gram.pt \
-  --ngram-lm-scale 1.3 \
-  --attention-decoder-scale 1.2 \
-  --lattice-score-scale 0.5 \
-  --num-paths 100 \
-  --sos-id 1 \
-  --eos-id 1 \
-  /path/to/your/sound1.wav \
-  /path/to/your/sound2.wav \
-  /path/to/your/sound3.wav
-```
-
-# Decoding with a pre-trained model in action
-
-We have uploaded a pre-trained model to <https://huggingface.co/pkufool/conformer_ctc>
-
-The following shows the steps about the usage of the provided pre-trained model.
-
-### (1) Download the pre-trained model
-
-```bash
-sudo apt-get install git-lfs
-cd /path/to/icefall/egs/librispeech/ASR
-git lfs install
-mkdir tmp
-cd tmp
-git clone https://huggingface.co/pkufool/conformer_ctc
-```
-
-**CAUTION**: You have to install `git-lfst` to download the pre-trained model.
-
-You will find the following files:
-
-```
-tmp
-`-- conformer_ctc
-    |-- README.md
-    |-- data
-    |   |-- lang_bpe
-    |   |   |-- HLG.pt
-    |   |   |-- bpe.model
-    |   |   |-- tokens.txt
-    |   |   `-- words.txt
-    |   `-- lm
-    |       `-- G_4_gram.pt
-    |-- exp
-    |   `-- pretraind.pt
-    `-- test_wavs
-        |-- 1089-134686-0001.flac
-        |-- 1221-135766-0001.flac
-        |-- 1221-135766-0002.flac
-        `-- trans.txt
-
-6 directories, 11 files
-```
-
-**File descriptions**:
-
-  - `data/lang_bpe/HLG.pt`
-
-      It is the decoding graph.
-
-  - `data/lang_bpe/bpe.model`
-
-      It is a sentencepiece model. You can use it to reproduce our results.
-
-  - `data/lang_bpe/tokens.txt`
-
-      It contains tokens and their IDs, generated from `bpe.model`.
-      Provided only for convienice so that you can look up the SOS/EOS ID easily.
-
-  - `data/lang_bpe/words.txt`
-
-      It contains words and their IDs.
-
-  - `data/lm/G_4_gram.pt`
-
-      It is a 4-gram LM, useful for LM rescoring.
-
-  - `exp/pretrained.pt`
-
-      It contains pre-trained model parameters, obtained by averaging
-      checkpoints from `epoch-15.pt` to `epoch-34.pt`.
-      Note: We have removed optimizer `state_dict` to reduce file size.
-
-  - `test_waves/*.flac`
-
-      It contains some test sound files from LibriSpeech `test-clean` dataset.
-
-  - `test_waves/trans.txt`
-
-      It contains the reference transcripts for the sound files in `test_waves/`.
-
-The information of the test sound files is listed below:
-
-```
-$ soxi tmp/conformer_ctc/test_wavs/*.flac
-
-Input File     : 'tmp/conformer_ctc/test_wavs/1089-134686-0001.flac'
-Channels       : 1
-Sample Rate    : 16000
-Precision      : 16-bit
-Duration       : 00:00:06.62 = 106000 samples ~ 496.875 CDDA sectors
-File Size      : 116k
-Bit Rate       : 140k
-Sample Encoding: 16-bit FLAC
-
-Input File     : 'tmp/conformer_ctc/test_wavs/1221-135766-0001.flac'
-Channels       : 1
-Sample Rate    : 16000
-Precision      : 16-bit
-Duration       : 00:00:16.71 = 267440 samples ~ 1253.62 CDDA sectors
-File Size      : 343k
-Bit Rate       : 164k
-Sample Encoding: 16-bit FLAC
-
-Input File     : 'tmp/conformer_ctc/test_wavs/1221-135766-0002.flac'
-Channels       : 1
-Sample Rate    : 16000
-Precision      : 16-bit
-Duration       : 00:00:04.83 = 77200 samples ~ 361.875 CDDA sectors
-File Size      : 105k
-Bit Rate       : 174k
-Sample Encoding: 16-bit FLAC
-
-Total Duration of 3 files: 00:00:28.16
-```
-
-### (2) Use HLG decoding
-
-```bash
-cd /path/to/icefall/egs/librispeech/ASR
-
-./conformer_ctc/pretrained.py \
-  --checkpoint ./tmp/conformer_ctc/exp/pretraind.pt \
-  --words-file ./tmp/conformer_ctc/data/lang_bpe/words.txt \
-  --HLG ./tmp/conformer_ctc/data/lang_bpe/HLG.pt \
-  ./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac \
-  ./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac \
-  ./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac
-```
-
-The output is given below:
-
-```
-2021-08-20 11:03:05,712 INFO [pretrained.py:217] device: cuda:0
-2021-08-20 11:03:05,712 INFO [pretrained.py:219] Creating model
-2021-08-20 11:03:11,345 INFO [pretrained.py:238] Loading HLG from ./tmp/conformer_ctc/data/lang_bpe/HLG.pt
-2021-08-20 11:03:18,442 INFO [pretrained.py:255] Constructing Fbank computer
-2021-08-20 11:03:18,444 INFO [pretrained.py:265] Reading sound files: ['./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0002.flac']
-2021-08-20 11:03:18,507 INFO [pretrained.py:271] Decoding started
-2021-08-20 11:03:18,795 INFO [pretrained.py:300] Use HLG decoding
-2021-08-20 11:03:19,149 INFO [pretrained.py:339]
-./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac:
-AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
-
-./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac:
-GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
-BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
-
-./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac:
-YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
-
-
-2021-08-20 11:03:19,149 INFO [pretrained.py:341] Decoding Done
-```
-
-### (3) Use HLG decoding + LM rescoring
-
-```bash
-./conformer_ctc/pretrained.py \
-  --checkpoint ./tmp/conformer_ctc/exp/pretraind.pt \
-  --words-file ./tmp/conformer_ctc/data/lang_bpe/words.txt \
-  --HLG ./tmp/conformer_ctc/data/lang_bpe/HLG.pt \
-  --method whole-lattice-rescoring \
-  --G ./tmp/conformer_ctc/data/lm/G_4_gram.pt \
-  --ngram-lm-scale 0.8 \
-  ./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac \
-  ./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac \
-  ./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac
-```
-
-The output is:
-
-```
-2021-08-20 11:12:17,565 INFO [pretrained.py:217] device: cuda:0
-2021-08-20 11:12:17,565 INFO [pretrained.py:219] Creating model
-2021-08-20 11:12:23,728 INFO [pretrained.py:238] Loading HLG from ./tmp/conformer_ctc/data/lang_bpe/HLG.pt
-2021-08-20 11:12:30,035 INFO [pretrained.py:246] Loading G from ./tmp/conformer_ctc/data/lm/G_4_gram.pt
-2021-08-20 11:13:10,779 INFO [pretrained.py:255] Constructing Fbank computer
-2021-08-20 11:13:10,787 INFO [pretrained.py:265] Reading sound files: ['./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0002.flac']
-2021-08-20 11:13:10,798 INFO [pretrained.py:271] Decoding started
-2021-08-20 11:13:11,085 INFO [pretrained.py:305] Use HLG decoding + LM rescoring
-2021-08-20 11:13:11,736 INFO [pretrained.py:339]
-./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac:
-AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
-
-./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac:
-GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
-BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
-
-./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac:
-YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
-
-
-2021-08-20 11:13:11,737 INFO [pretrained.py:341] Decoding Done
-```
-
-### (4) Use HLG decoding + LM rescoring + attention decoder rescoring
-
-```bash
-./conformer_ctc/pretrained.py \
-  --checkpoint ./tmp/conformer_ctc/exp/pretraind.pt \
-  --words-file ./tmp/conformer_ctc/data/lang_bpe/words.txt \
-  --HLG ./tmp/conformer_ctc/data/lang_bpe/HLG.pt \
-  --method attention-decoder \
-  --G ./tmp/conformer_ctc/data/lm/G_4_gram.pt \
-  --ngram-lm-scale 1.3 \
-  --attention-decoder-scale 1.2 \
-  --lattice-score-scale 0.5 \
-  --num-paths 100 \
-  --sos-id 1 \
-  --eos-id 1 \
-  ./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac \
-  ./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac \
-  ./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac
-```
-
-The output is:
-
-```
-2021-08-20 11:19:11,397 INFO [pretrained.py:217] device: cuda:0
-2021-08-20 11:19:11,397 INFO [pretrained.py:219] Creating model
-2021-08-20 11:19:17,354 INFO [pretrained.py:238] Loading HLG from ./tmp/conformer_ctc/data/lang_bpe/HLG.pt
-2021-08-20 11:19:24,615 INFO [pretrained.py:246] Loading G from ./tmp/conformer_ctc/data/lm/G_4_gram.pt
-2021-08-20 11:20:04,576 INFO [pretrained.py:255] Constructing Fbank computer
-2021-08-20 11:20:04,584 INFO [pretrained.py:265] Reading sound files: ['./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/conformer_ctc/test_wavs/1221-135766-0002.flac']
-2021-08-20 11:20:04,595 INFO [pretrained.py:271] Decoding started
-2021-08-20 11:20:04,854 INFO [pretrained.py:313] Use HLG + LM rescoring + attention decoder rescoring
-2021-08-20 11:20:05,805 INFO [pretrained.py:339]
-./tmp/conformer_ctc/test_wavs/1089-134686-0001.flac:
-AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
-
-./tmp/conformer_ctc/test_wavs/1221-135766-0001.flac:
-GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
-BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
-
-./tmp/conformer_ctc/test_wavs/1221-135766-0002.flac:
-YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
-
-
-2021-08-20 11:20:05,805 INFO [pretrained.py:341] Decoding Done
-```
-
-**NOTE**: We provide a colab notebook for demonstration.
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
-
-Due to limited memory provided by Colab, you have to upgrade to Colab Pro to
-run `HLG decoding + LM rescoring` and `HLG decoding + LM rescoring + attention decoder rescoring`.
-Otherwise, you can only run `HLG decoding` with Colab.
+Please visit
+<https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html>
+for how to run this recipe.
--- a/egs/librispeech/ASR/conformer_ctc/decode.py
+++ b/egs/librispeech/ASR/conformer_ctc/decode.py
@ -57,28 +57,63 @@ def get_parser():
    parser.add_argument(
        "--epoch",
        type=int,
-        default=9,
+        default=34,
        help="It specifies the checkpoint to use for decoding."
        "Note: Epoch counts from 0.",
    )
    parser.add_argument(
        "--avg",
        type=int,
-        default=1,
+        default=20,
        help="Number of checkpoints to average. Automatically select "
        "consecutive checkpoints before the checkpoint specified by "
        "'--epoch'. ",
    )

+    parser.add_argument(
+        "--method",
+        type=str,
+        default="attention-decoder",
+        help="""Decoding method.
+        Supported values are:
+            - (1) 1best. Extract the best path from the decoding lattice as the
+              decoding result.
+            - (2) nbest. Extract n paths from the decoding lattice; the path with
+              the highest score is the decoding result.
+            - (3) nbest-rescoring. Extract n paths from the decoding lattice,
+              rescore them with an n-gram LM (e.g., a 4-gram LM), the path with
+              the highest score is the decoding result.
+            - (4) whole-lattice. Rescore the decoding lattice with an n-gram LM
+              (e.g., a 4-gram LM), the best path of rescored lattice is the
+              decoding result.
+            - (5) attention-decoder. Extract n paths from the LM rescored lattice,
+              the path with the highest score is the decoding result.
+            - (6) nbest-oracle. Its WER is the lower bound of any n-best
+              rescoring method can achieve. Useful for debugging n-best
+              rescoring method.
+        """,
+    )
+
+    parser.add_argument(
+        "--num-paths",
+        type=int,
+        default=100,
+        help="""Number of paths for n-best based decoding method.
+        Used only when "method" is one of the following values:
+        nbest, nbest-rescoring, attention-decoder, and nbest-oracle
+        """,
+    )
+
    parser.add_argument(
        "--lattice-score-scale",
        type=float,
        default=1.0,
-        help="The scale to be applied to `lattice.scores`."
-        "It's needed if you use any kinds of n-best based rescoring. "
-        "Currently, it is used when the decoding method is: nbest, "
-        "nbest-rescoring, attention-decoder, and nbest-oracle. "
-        "A smaller value results in more unique paths.",
+        help="""The scale to be applied to `lattice.scores`.
+        It's needed if you use any kinds of n-best based rescoring.
+        Used only when "method" is one of the following values:
+        nbest, nbest-rescoring, attention-decoder, and nbest-oracle
+        A smaller value results in more unique paths.
+        """,
    )

    return parser
@ -104,21 +139,6 @@ def get_params() -> AttributeDict:
            "min_active_states": 30,
            "max_active_states": 10000,
            "use_double_scores": True,
-            # Possible values for method:
-            #  - 1best
-            #  - nbest
-            #  - nbest-rescoring
-            #  - whole-lattice-rescoring
-            #  - attention-decoder
-            #  - nbest-oracle
-            #  "method": "nbest",
-            #  "method": "nbest-rescoring",
-            #  "method": "whole-lattice-rescoring",
-            "method": "attention-decoder",
-            #  "method": "nbest-oracle",
-            # num_paths is used when method is "nbest", "nbest-rescoring",
-            # attention-decoder, and nbest-oracle
-            "num_paths": 100,
        }
    )
    return params
@ -129,7 +149,7 @@ def decode_one_batch(
    model: nn.Module,
    HLG: k2.Fsa,
    batch: dict,
-    lexicon: Lexicon,
+    word_table: k2.SymbolTable,
    sos_id: int,
    eos_id: int,
    G: Optional[k2.Fsa] = None,
@ -163,8 +183,8 @@ def decode_one_batch(
        It is the return value from iterating
        `lhotse.dataset.K2SpeechRecognitionDataset`. See its documentation
        for the format of the `batch`.
-      lexicon:
-        It contains word symbol table.
+      word_table:
+        The word symbol table.
      sos_id:
        The token ID of the SOS.
      eos_id:
@ -217,7 +237,7 @@ def decode_one_batch(
            lattice=lattice,
            num_paths=params.num_paths,
            ref_texts=supervisions["text"],
-            lexicon=lexicon,
+            word_table=word_table,
            scale=params.lattice_score_scale,
        )

@ -237,7 +257,7 @@ def decode_one_batch(
            key = f"no_rescore-scale-{params.lattice_score_scale}-{params.num_paths}"  # noqa

        hyps = get_texts(best_path)
-        hyps = [[lexicon.word_table[i] for i in ids] for ids in hyps]
+        hyps = [[word_table[i] for i in ids] for ids in hyps]
        return {key: hyps}

    assert params.method in [
@ -283,7 +303,7 @@ def decode_one_batch(
    ans = dict()
    for lm_scale_str, best_path in best_path_dict.items():
        hyps = get_texts(best_path)
-        hyps = [[lexicon.word_table[i] for i in ids] for ids in hyps]
+        hyps = [[word_table[i] for i in ids] for ids in hyps]
        ans[lm_scale_str] = hyps
    return ans

@ -293,7 +313,7 @@ def decode_dataset(
    params: AttributeDict,
    model: nn.Module,
    HLG: k2.Fsa,
-    lexicon: Lexicon,
+    word_table: k2.SymbolTable,
    sos_id: int,
    eos_id: int,
    G: Optional[k2.Fsa] = None,
@ -309,8 +329,8 @@ def decode_dataset(
        The neural model.
      HLG:
        The decoding graph.
-      lexicon:
-        It contains word symbol table.
+      word_table:
+        It is the word symbol table.
      sos_id:
        The token ID for SOS.
      eos_id:
@ -344,7 +364,7 @@ def decode_dataset(
            model=model,
            HLG=HLG,
            batch=batch,
-            lexicon=lexicon,
+            word_table=word_table,
            G=G,
            sos_id=sos_id,
            eos_id=eos_id,
@ -540,7 +560,7 @@ def main():
            params=params,
            model=model,
            HLG=HLG,
-            lexicon=lexicon,
+            word_table=lexicon.word_table,
            G=G,
            sos_id=sos_id,
            eos_id=eos_id,
--- a/egs/librispeech/ASR/conformer_ctc/train.py
+++ b/egs/librispeech/ASR/conformer_ctc/train.py
@ -74,6 +74,23 @@ def get_parser():
        help="Should various information be logged in tensorboard.",
    )

+    parser.add_argument(
+        "--num-epochs",
+        type=int,
+        default=35,
+        help="Number of epochs to train.",
+    )
+
+    parser.add_argument(
+        "--start-epoch",
+        type=int,
+        default=0,
+        help="""Resume training from from this epoch.
+        If it is positive, it will load checkpoint from
+        conformer_ctc/exp/epoch-{start_epoch-1}.pt
+        """,
+    )
+
    return parser


@ -103,11 +120,6 @@ def get_params() -> AttributeDict:

        - subsampling_factor:  The subsampling factor for the model.

-        - start_epoch:  If it is not zero, load checkpoint `start_epoch-1`
-                        and continue training from that checkpoint.
-
-        - num_epochs:  Number of epochs to train.
-
        - best_train_loss: Best training loss so far. It is used to select
                           the model that has the lowest training loss. It is
                           updated during the training.
@ -143,8 +155,6 @@ def get_params() -> AttributeDict:
            "feature_dim": 80,
            "weight_decay": 1e-6,
            "subsampling_factor": 4,
-            "start_epoch": 0,
-            "num_epochs": 20,
            "best_train_loss": float("inf"),
            "best_valid_loss": float("inf"),
            "best_train_epoch": -1,
--- a/egs/librispeech/ASR/tdnn_lstm_ctc/train.py
+++ b/egs/librispeech/ASR/tdnn_lstm_ctc/train.py
@ -75,6 +75,23 @@ def get_parser():
        help="Should various information be logged in tensorboard.",
    )

+    parser.add_argument(
+        "--num-epochs",
+        type=int,
+        default=20,
+        help="Number of epochs to train.",
+    )
+
+    parser.add_argument(
+        "--start-epoch",
+        type=int,
+        default=0,
+        help="""Resume training from from this epoch.
+        If it is positive, it will load checkpoint from
+        tdnn_lstm_ctc/exp/epoch-{start_epoch-1}.pt
+        """,
+    )
+
    return parser


@ -104,11 +121,6 @@ def get_params() -> AttributeDict:

        - subsampling_factor:  The subsampling factor for the model.

-        - start_epoch:  If it is not zero, load checkpoint `start_epoch-1`
-                        and continue training from that checkpoint.
-
-        - num_epochs:  Number of epochs to train.
-
        - best_train_loss: Best training loss so far. It is used to select
                           the model that has the lowest training loss. It is
                           updated during the training.
@ -127,6 +139,8 @@ def get_params() -> AttributeDict:

        - log_interval:  Print training loss if batch_idx % log_interval` is 0

+        - reset_interval: Reset statistics if batch_idx % reset_interval is 0
+
        - valid_interval:  Run validation if batch_idx % valid_interval` is 0

        - beam_size: It is used in k2.ctc_loss
@ -143,14 +157,13 @@ def get_params() -> AttributeDict:
            "feature_dim": 80,
            "weight_decay": 5e-4,
            "subsampling_factor": 3,
-            "start_epoch": 0,
-            "num_epochs": 10,
            "best_train_loss": float("inf"),
            "best_valid_loss": float("inf"),
            "best_train_epoch": -1,
            "best_valid_epoch": -1,
            "batch_idx_train": 0,
            "log_interval": 10,
+            "reset_interval": 200,
            "valid_interval": 1000,
            "beam_size": 10,
            "reduction": "sum",
@ -398,8 +411,12 @@ def train_one_epoch(
    """
    model.train()

-    tot_loss = 0.0  # sum of losses over all batches
-    tot_frames = 0.0  # sum of frames over all batches
+    tot_loss = 0.0  # reset after params.reset_interval of batches
+    tot_frames = 0.0  # reset after params.reset_interval of batches
+
+    params.tot_loss = 0.0
+    params.tot_frames = 0.0
+
    for batch_idx, batch in enumerate(train_dl):
        params.batch_idx_train += 1
        batch_size = len(batch["supervisions"]["text"])
@ -426,6 +443,9 @@ def train_one_epoch(
        tot_loss += loss_cpu
        tot_avg_loss = tot_loss / tot_frames

+        params.tot_frames += params.train_frames
+        params.tot_loss += loss_cpu
+
        if batch_idx % params.log_interval == 0:
            logging.info(
                f"Epoch {params.cur_epoch}, batch {batch_idx}, "
@ -433,6 +453,22 @@ def train_one_epoch(
                f"total avg loss: {tot_avg_loss:.4f}, "
                f"batch size: {batch_size}"
            )
+            if tb_writer is not None:
+                tb_writer.add_scalar(
+                    "train/current_loss",
+                    loss_cpu / params.train_frames,
+                    params.batch_idx_train,
+                )
+
+                tb_writer.add_scalar(
+                    "train/tot_avg_loss",
+                    tot_avg_loss,
+                    params.batch_idx_train,
+                )
+
+        if batch_idx > 0 and batch_idx % params.reset_interval == 0:
+            tot_loss = 0
+            tot_frames = 0

        if batch_idx > 0 and batch_idx % params.valid_interval == 0:
            compute_validation_loss(
@ -449,7 +485,7 @@ def train_one_epoch(
                f"best valid epoch: {params.best_valid_epoch}"
            )

-    params.train_loss = tot_loss / tot_frames
+    params.train_loss = params.tot_loss / params.tot_frames

    if params.train_loss < params.best_train_loss:
        params.best_train_epoch = params.cur_epoch
--- a/egs/yesno/ASR/README.md
+++ b/egs/yesno/ASR/README.md
@ -1,15 +1,14 @@
 ## Yesno recipe

-You can run the recipe with **CPU**.
+This is the simplest ASR recipe in `icefall`.

-
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
-
-The above Colab notebook finishes the training using **CPU**
-within two minutes (50 epochs in total).
-
-The WER is
+It can be run on CPU and takes less than 30 seconds to
+get the following WER:

 ```
 [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
 ```
+
+Please refer to
+<https://icefal1.readthedocs.io/en/latest/recipes/yesno.html>
+for detailed instructions.
--- a/egs/yesno/ASR/tdnn/README.md
+++ b/egs/yesno/ASR/tdnn/README.md
@ -0,0 +1,8 @@
+
+## How to run this recipe
+
+You can find detailed instructions by visiting
+<https://icefal1.readthedocs.io/en/latest/recipes/yesno.html>
+
+It describes how to run this recipe and how to use
+a pre-trained model with `./pretrained.py`.
--- a/icefall/decode.py
+++ b/icefall/decode.py
@ -22,8 +22,6 @@ import kaldialign
 import torch
 import torch.nn as nn

-from icefall.lexicon import Lexicon
-

 def _get_random_paths(
    lattice: k2.Fsa,
@ -623,7 +621,7 @@ def nbest_oracle(
    lattice: k2.Fsa,
    num_paths: int,
    ref_texts: List[str],
-    lexicon: Lexicon,
+    word_table: k2.SymbolTable,
    scale: float = 1.0,
 ) -> Dict[str, List[List[int]]]:
    """Select the best hypothesis given a lattice and a reference transcript.
@ -644,8 +642,8 @@ def nbest_oracle(
      ref_texts:
        A list of reference transcript. Each entry contains space(s)
        separated words
-      lexicon:
-        It is used to convert word IDs to word symbols.
+      word_table:
+        It is the word symbol table.
      scale:
        It's the scale applied to the lattice.scores. A smaller value
        yields more unique paths.
@ -680,7 +678,7 @@ def nbest_oracle(
        best_hyp_words = None
        min_error = float("inf")
        for hyp_words in hyps:
-            hyp_words = [lexicon.word_table[i] for i in hyp_words]
+            hyp_words = [word_table[i] for i in hyp_words]
            this_error = kaldialign.edit_distance(ref_words, hyp_words)["total"]
            if this_error < min_error:
                min_error = this_error