Add docs for TDNN-LSTM-CTC

2025-08-13 12:02:21 +00:00 · 2021-08-24 20:32:57 +08:00 · 2021-08-24 20:32:57 +08:00 · 28352b16d7
commit 28352b16d7
parent 5552571d1e
3 changed files with 357 additions and 2 deletions
--- a/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst
+++ b/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst
@ -0,0 +1,339 @@
+TDNN-LSTM-CTC
+=====
+
+This tutorial shows you how to run a TDNN-LSTM-CTC model with the `LibriSpeech <https://www.openslr.org/12>`_ dataset.
+
+
+.. HINT::
+
+  We assume you have read the page :ref:`install icefall` and have setup
+  the environment for ``icefall``.
+
+
+Data preparation
+----------------
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./prepare.sh
+
+The script ``./prepare.sh`` handles the data preparation for you, **automagically**.
+All you need to do is to run it.
+
+The data preparation contains several stages, you can use the following two
+options:
+
+  - ``--stage``
+  - ``--stop-stage``
+
+to control which stage(s) should be run. By default, all stages are executed.
+
+
+For example,
+
+.. code-block:: bash
+
+  $ cd egs/yesno/ASR
+  $ ./prepare.sh --stage 0 --stop-stage 0
+
+means to run only stage 0.
+
+To run stage 2 to stage 5, use:
+
+.. code-block:: bash
+
+  $ ./prepare.sh --stage 2 --stop-stage 5
+
+
+Training
+--------
+
+Now describing the training of TDNN-LSTM-CTC model, contained in
+the `tdnn_lstm_ctc <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc>`_
+folder.
+
+The command to run the training part is:
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ export CUDA_VISIBLE_DEVICES="0,1,2,3"
+  $ ./tdnn_lstm_ctc/train.py --world-size 4
+
+By default, it will run ``20`` epochs. Training logs and checkpoints are saved
+in ``tdnn_lstm_ctc/exp``.
+
+In ``tdnn_lstm_ctc/exp``, you will find the following files:
+
+  - ``epoch-0.pt``, ``epoch-1.pt``, ..., ``epoch-19.pt``
+
+    These are checkpoint files, containing model ``state_dict`` and optimizer ``state_dict``.
+    To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
+
+      .. code-block:: bash
+
+        $ ./tdnn_lstm_ctc/train.py --start-epoch 11
+
+  - ``tensorboard/``
+
+    This folder contains TensorBoard logs. Training loss, validation loss, learning
+    rate, etc, are recorded in these logs. You can visualize them by:
+
+      .. code-block:: bash
+
+        $ cd tdnn_lstm_ctc/exp/tensorboard
+        $ tensorboard dev upload --logdir . --description "TDNN LSTM training for librispeech with icefall"
+
+    It will print something like below:
+
+      .. code-block::
+
+        TensorFlow installation not found - running with reduced feature set.
+        Upload started and will continue reading any new data as it's added to the logdir.
+
+        To stop uploading, press Ctrl-C.
+
+        New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
+
+        [2021-08-23T23:49:41] Started scanning logdir.
+        [2021-08-23T23:49:42] Total uploaded: 135 scalars, 0 tensors, 0 binary objects
+        Listening for new data in logdir...
+
+    Note there is a URL in the above output, click it and you will see tensorboard page:
+
+  - ``log/log-train-xxxx``
+
+    It is the detailed training log in text format, same as the one
+    you saw printed to the console during training.
+
+
+To see available training options, you can use:
+
+.. code-block:: bash
+
+  $ ./tdnn/train.py --help
+
+Other training options, e.g., learning rate, results dir, etc., are
+pre-configured in the function ``get_params()``
+in `tdnn_lstm_ctc/train.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/train.py>`_.
+Normally, you don't need to change them. You can change them by modifying the code, if
+you want.
+
+Decoding
+--------
+
+The decoding part uses checkpoints saved by the training part, so you have
+to run the training part first.
+
+The command for decoding is:
+
+.. code-block:: bash
+
+  $ export CUDA_VISIBLE_DEVICES="0"
+  $ ./tdnn_lstm_ctc/decode.py
+
+You will see the WER in the output log.
+
+Decoded results are saved in ``tdnn_lstm_ctc/exp``.
+
+.. code-block:: bash
+
+  $ ./tdnn/decode.py --help
+
+shows you the available decoding options.
+
+Some commonly used options are:
+
+  - ``--epoch``
+
+    You can select which checkpoint to be used for decoding.
+    For instance, ``./tdnn_lstm_ctc/decode.py --epoch 10`` means to use
+    ``./tdnn_lstm_ctc/exp/epoch-10.pt`` for decoding.
+
+  - ``--avg``
+
+    It's related to model averaging. It specifies number of checkpoints
+    to be averaged. The averaged model is used for decoding.
+    For example, the following command:
+
+      .. code-block:: bash
+
+        $ ./tdnn_lstm_ctc/decode.py --epoch 10 --avg 3
+
+    uses the average of ``epoch-8.pt``, ``epoch-9.pt`` and ``epoch-10.pt``
+    for decoding.
+
+  - ``--export``
+
+    If it is ``True``, i.e., ``./tdnn_lstm_ctc/decode.py --export 1``, the code
+    will save the averaged model to ``tdnn_lstm_ctc/exp/pretrained.pt``.
+    See :ref:`tdnn_lstm_ctc use a pre-trained model` for how to use it.
+
+.. HINT::
+
+   There are several decoding method provided in `tdnn_lstm_ctc/decode.py <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/tdnn_lstm_ctc/train.py>`_, you can change the decoding method by modifying ``method`` parameter in function ``get_params()``.
+
+
+.. _tdnn_lstm_ctc use a pre-trained model:
+
+Pre-trained Model
+-----------------
+
+We have uploaded the pre-trained model to
+`<https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc>`_.
+
+The following shows you how to use the pre-trained model.
+
+Download the pre-trained model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ mkdir tmp
+  $ cd tmp
+  $ git lfs install
+  $ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc
+
+.. CAUTION::
+
+  You have to use ``git lfs`` to download the pre-trained model.
+
+After downloading, you will have the following files:
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ tree tmp
+
+.. code-block:: bash
+
+  tmp/
+  `-- icefall_asr_librispeech_tdnn-lstm_ctc
+      |-- README.md
+      |-- data
+      |   |-- lang_phone
+      |   |   |-- HLG.pt
+      |   |   |-- tokens.txt
+      |   |   `-- words.txt
+      |   `-- lm
+      |       `-- G_4_gram.pt
+      |-- exp
+      |   `-- pretrained.pt
+      `-- test_wavs
+          |-- 1089-134686-0001.flac
+          |-- 1221-135766-0001.flac
+          |-- 1221-135766-0002.flac
+          `-- trans.txt
+  
+  6 directories, 10 files
+
+
+Download kaldifeat
+~~~~~~~~~~~~~~~~~~
+
+`kaldifeat <https://github.com/csukuangfj/kaldifeat>`_ is used for extracting
+features from a single or multiple sound files. Please refer to
+`<https://github.com/csukuangfj/kaldifeat>`_ to install ``kaldifeat`` first.
+
+Inference with a pre-trained model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: bash
+
+  $ cd egs/librispeech/ASR
+  $ ./tdnn_lstm_ctc/pretrained.py --help
+
+shows the usage information of ``./tdnn_lstm_ctc/pretrained.py``.
+
+To decode with ``1best`` method, we can use:
+
+.. code-block:: bash
+
+  ./tdnn_lstm_ctc/pretrained.py \
+    --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \
+    --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \
+    --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \
+    ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \
+    ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \
+    ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
+
+The output is:
+
+.. code-block::
+
+  2021-08-24 16:57:13,315 INFO [pretrained.py:168] device: cuda:0
+  2021-08-24 16:57:13,315 INFO [pretrained.py:170] Creating model
+  2021-08-24 16:57:18,331 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt
+  2021-08-24 16:57:27,581 INFO [pretrained.py:199] Constructing Fbank computer
+  2021-08-24 16:57:27,584 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac']
+  2021-08-24 16:57:27,599 INFO [pretrained.py:215] Decoding started
+  2021-08-24 16:57:27,791 INFO [pretrained.py:245] Use HLG decoding
+  2021-08-24 16:57:28,098 INFO [pretrained.py:266]
+  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac:
+  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+  
+  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac:
+  GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+  
+  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac:
+  YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
+  
+  
+  2021-08-24 16:57:28,099 INFO [pretrained.py:268] Decoding Done
+
+
+To decode with ``whole-lattice-rescoring`` methond, you can use
+
+.. code-block:: bash
+
+  ./conformer_ctc/pretrained.py \
+    --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \
+    --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \
+    --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \
+    --method whole-lattice-rescoring \
+    --G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt \
+    --ngram-lm-scale 0.8 \
+    ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \
+    ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \
+    ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac
+
+The decoding output is:
+
+.. code-block::
+
+  2021-08-24 16:39:24,725 INFO [pretrained.py:168] device: cuda:0
+  2021-08-24 16:39:24,725 INFO [pretrained.py:170] Creating model
+  2021-08-24 16:39:29,403 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt
+  2021-08-24 16:39:40,631 INFO [pretrained.py:190] Loading G from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt
+  2021-08-24 16:39:53,098 INFO [pretrained.py:199] Constructing Fbank computer
+  2021-08-24 16:39:53,107 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac']
+  2021-08-24 16:39:53,121 INFO [pretrained.py:215] Decoding started
+  2021-08-24 16:39:53,443 INFO [pretrained.py:250] Use HLG decoding + LM rescoring
+  2021-08-24 16:39:54,010 INFO [pretrained.py:266]
+  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac:
+  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
+  
+  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac:
+  GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
+  
+  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac:
+  YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
+  
+  
+  2021-08-24 16:39:54,010 INFO [pretrained.py:268] Decoding Done
+
+
+Colab notebook
+--------------
+
+We do provide a colab notebook for decoding with pre-trained model.
+
+|librispeech-tdnn_lstm_ctc colab notebook|
+
+.. |librispeech-tdnn_lstm_ctc colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg
+   :target: https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd
+
+
+**Congratulations!** You have finished the TDNN-LSTM-CTC recipe on librispeech in ``icefall``.
--- a/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py
+++ b/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py
@ -54,7 +54,7 @@ def get_parser():
    parser.add_argument(
        "--epoch",
        type=int,
-        default=9,
+        default=19,
        help="It specifies the checkpoint to use for decoding."
        "Note: Epoch counts from 0.",
    )
@ -66,6 +66,16 @@ def get_parser():
        "consecutive checkpoints before the checkpoint specified by "
        "'--epoch'. ",
    )
+    parser.add_argument(
+        "--export",
+        type=str2bool,
+        default=False,
+        help="""When enabled, the averaged model is saved to
+        tdnn/exp/pretrained.pt. Note: only model.state_dict() is saved.
+        pretrained.pt contains a dict {"model": model.state_dict()},
+        which can be loaded by `icefall.checkpoint.load_checkpoint()`.
+        """,
+    )
    return parser


@ -408,6 +418,12 @@ def main():
        logging.info(f"averaging {filenames}")
        model.load_state_dict(average_checkpoints(filenames))

+    if params.export:
+        logging.info(f"Export averaged model to {params.exp_dir}/pretrained.pt")
+        torch.save(
+            {"model": model.state_dict()}, f"{params.exp_dir}/pretrained.pt"
+        )
+
    model.to(device)
    model.eval()

--- a/egs/librispeech/ASR/tdnn_lstm_ctc/train.py
+++ b/egs/librispeech/ASR/tdnn_lstm_ctc/train.py
@ -144,7 +144,7 @@ def get_params() -> AttributeDict:
            "weight_decay": 5e-4,
            "subsampling_factor": 3,
            "start_epoch": 0,
-            "num_epochs": 10,
+            "num_epochs": 20,
            "best_train_loss": float("inf"),
            "best_valid_loss": float("inf"),
            "best_train_epoch": -1,