diff --git a/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst b/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst new file mode 100644 index 000000000..778b6b7ee --- /dev/null +++ b/docs/source/recipes/librispeech/tdnn_lstm_ctc.rst @@ -0,0 +1,339 @@ +TDNN-LSTM-CTC +===== + +This tutorial shows you how to run a TDNN-LSTM-CTC model with the `LibriSpeech `_ dataset. + + +.. HINT:: + + We assume you have read the page :ref:`install icefall` and have setup + the environment for ``icefall``. + + +Data preparation +---------------- + +.. code-block:: bash + + $ cd egs/librispeech/ASR + $ ./prepare.sh + +The script ``./prepare.sh`` handles the data preparation for you, **automagically**. +All you need to do is to run it. + +The data preparation contains several stages, you can use the following two +options: + + - ``--stage`` + - ``--stop-stage`` + +to control which stage(s) should be run. By default, all stages are executed. + + +For example, + +.. code-block:: bash + + $ cd egs/yesno/ASR + $ ./prepare.sh --stage 0 --stop-stage 0 + +means to run only stage 0. + +To run stage 2 to stage 5, use: + +.. code-block:: bash + + $ ./prepare.sh --stage 2 --stop-stage 5 + + +Training +-------- + +Now describing the training of TDNN-LSTM-CTC model, contained in +the `tdnn_lstm_ctc `_ +folder. + +The command to run the training part is: + +.. code-block:: bash + + $ cd egs/librispeech/ASR + $ export CUDA_VISIBLE_DEVICES="0,1,2,3" + $ ./tdnn_lstm_ctc/train.py --world-size 4 + +By default, it will run ``20`` epochs. Training logs and checkpoints are saved +in ``tdnn_lstm_ctc/exp``. + +In ``tdnn_lstm_ctc/exp``, you will find the following files: + + - ``epoch-0.pt``, ``epoch-1.pt``, ..., ``epoch-19.pt`` + + These are checkpoint files, containing model ``state_dict`` and optimizer ``state_dict``. + To resume training from some checkpoint, say ``epoch-10.pt``, you can use: + + .. code-block:: bash + + $ ./tdnn_lstm_ctc/train.py --start-epoch 11 + + - ``tensorboard/`` + + This folder contains TensorBoard logs. Training loss, validation loss, learning + rate, etc, are recorded in these logs. You can visualize them by: + + .. code-block:: bash + + $ cd tdnn_lstm_ctc/exp/tensorboard + $ tensorboard dev upload --logdir . --description "TDNN LSTM training for librispeech with icefall" + + It will print something like below: + + .. code-block:: + + TensorFlow installation not found - running with reduced feature set. + Upload started and will continue reading any new data as it's added to the logdir. + + To stop uploading, press Ctrl-C. + + New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/ + + [2021-08-23T23:49:41] Started scanning logdir. + [2021-08-23T23:49:42] Total uploaded: 135 scalars, 0 tensors, 0 binary objects + Listening for new data in logdir... + + Note there is a URL in the above output, click it and you will see tensorboard page: + + - ``log/log-train-xxxx`` + + It is the detailed training log in text format, same as the one + you saw printed to the console during training. + + +To see available training options, you can use: + +.. code-block:: bash + + $ ./tdnn/train.py --help + +Other training options, e.g., learning rate, results dir, etc., are +pre-configured in the function ``get_params()`` +in `tdnn_lstm_ctc/train.py `_. +Normally, you don't need to change them. You can change them by modifying the code, if +you want. + +Decoding +-------- + +The decoding part uses checkpoints saved by the training part, so you have +to run the training part first. + +The command for decoding is: + +.. code-block:: bash + + $ export CUDA_VISIBLE_DEVICES="0" + $ ./tdnn_lstm_ctc/decode.py + +You will see the WER in the output log. + +Decoded results are saved in ``tdnn_lstm_ctc/exp``. + +.. code-block:: bash + + $ ./tdnn/decode.py --help + +shows you the available decoding options. + +Some commonly used options are: + + - ``--epoch`` + + You can select which checkpoint to be used for decoding. + For instance, ``./tdnn_lstm_ctc/decode.py --epoch 10`` means to use + ``./tdnn_lstm_ctc/exp/epoch-10.pt`` for decoding. + + - ``--avg`` + + It's related to model averaging. It specifies number of checkpoints + to be averaged. The averaged model is used for decoding. + For example, the following command: + + .. code-block:: bash + + $ ./tdnn_lstm_ctc/decode.py --epoch 10 --avg 3 + + uses the average of ``epoch-8.pt``, ``epoch-9.pt`` and ``epoch-10.pt`` + for decoding. + + - ``--export`` + + If it is ``True``, i.e., ``./tdnn_lstm_ctc/decode.py --export 1``, the code + will save the averaged model to ``tdnn_lstm_ctc/exp/pretrained.pt``. + See :ref:`tdnn_lstm_ctc use a pre-trained model` for how to use it. + +.. HINT:: + + There are several decoding method provided in `tdnn_lstm_ctc/decode.py `_, you can change the decoding method by modifying ``method`` parameter in function ``get_params()``. + + +.. _tdnn_lstm_ctc use a pre-trained model: + +Pre-trained Model +----------------- + +We have uploaded the pre-trained model to +``_. + +The following shows you how to use the pre-trained model. + +Download the pre-trained model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + $ cd egs/librispeech/ASR + $ mkdir tmp + $ cd tmp + $ git lfs install + $ git clone https://huggingface.co/pkufool/icefall_asr_librispeech_tdnn-lstm_ctc + +.. CAUTION:: + + You have to use ``git lfs`` to download the pre-trained model. + +After downloading, you will have the following files: + +.. code-block:: bash + + $ cd egs/librispeech/ASR + $ tree tmp + +.. code-block:: bash + + tmp/ + `-- icefall_asr_librispeech_tdnn-lstm_ctc + |-- README.md + |-- data + | |-- lang_phone + | | |-- HLG.pt + | | |-- tokens.txt + | | `-- words.txt + | `-- lm + | `-- G_4_gram.pt + |-- exp + | `-- pretrained.pt + `-- test_wavs + |-- 1089-134686-0001.flac + |-- 1221-135766-0001.flac + |-- 1221-135766-0002.flac + `-- trans.txt + + 6 directories, 10 files + + +Download kaldifeat +~~~~~~~~~~~~~~~~~~ + +`kaldifeat `_ is used for extracting +features from a single or multiple sound files. Please refer to +``_ to install ``kaldifeat`` first. + +Inference with a pre-trained model +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: bash + + $ cd egs/librispeech/ASR + $ ./tdnn_lstm_ctc/pretrained.py --help + +shows the usage information of ``./tdnn_lstm_ctc/pretrained.py``. + +To decode with ``1best`` method, we can use: + +.. code-block:: bash + + ./tdnn_lstm_ctc/pretrained.py \ + --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \ + --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \ + --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \ + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \ + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \ + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac + +The output is: + +.. code-block:: + + 2021-08-24 16:57:13,315 INFO [pretrained.py:168] device: cuda:0 + 2021-08-24 16:57:13,315 INFO [pretrained.py:170] Creating model + 2021-08-24 16:57:18,331 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt + 2021-08-24 16:57:27,581 INFO [pretrained.py:199] Constructing Fbank computer + 2021-08-24 16:57:27,584 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac'] + 2021-08-24 16:57:27,599 INFO [pretrained.py:215] Decoding started + 2021-08-24 16:57:27,791 INFO [pretrained.py:245] Use HLG decoding + 2021-08-24 16:57:28,098 INFO [pretrained.py:266] + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac: + AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS + + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac: + GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN + + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac: + YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION + + + 2021-08-24 16:57:28,099 INFO [pretrained.py:268] Decoding Done + + +To decode with ``whole-lattice-rescoring`` methond, you can use + +.. code-block:: bash + + ./conformer_ctc/pretrained.py \ + --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \ + --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \ + --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \ + --method whole-lattice-rescoring \ + --G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt \ + --ngram-lm-scale 0.8 \ + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \ + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \ + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac + +The decoding output is: + +.. code-block:: + + 2021-08-24 16:39:24,725 INFO [pretrained.py:168] device: cuda:0 + 2021-08-24 16:39:24,725 INFO [pretrained.py:170] Creating model + 2021-08-24 16:39:29,403 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt + 2021-08-24 16:39:40,631 INFO [pretrained.py:190] Loading G from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt + 2021-08-24 16:39:53,098 INFO [pretrained.py:199] Constructing Fbank computer + 2021-08-24 16:39:53,107 INFO [pretrained.py:209] Reading sound files: ['./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac'] + 2021-08-24 16:39:53,121 INFO [pretrained.py:215] Decoding started + 2021-08-24 16:39:53,443 INFO [pretrained.py:250] Use HLG decoding + LM rescoring + 2021-08-24 16:39:54,010 INFO [pretrained.py:266] + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac: + AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS + + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac: + GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN + + ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac: + YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION + + + 2021-08-24 16:39:54,010 INFO [pretrained.py:268] Decoding Done + + +Colab notebook +-------------- + +We do provide a colab notebook for decoding with pre-trained model. + +|librispeech-tdnn_lstm_ctc colab notebook| + +.. |librispeech-tdnn_lstm_ctc colab notebook| image:: https://colab.research.google.com/assets/colab-badge.svg + :target: https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd + + +**Congratulations!** You have finished the TDNN-LSTM-CTC recipe on librispeech in ``icefall``. diff --git a/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py b/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py index 27e0b9643..f4a3dea12 100755 --- a/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py +++ b/egs/librispeech/ASR/tdnn_lstm_ctc/decode.py @@ -54,7 +54,7 @@ def get_parser(): parser.add_argument( "--epoch", type=int, - default=9, + default=19, help="It specifies the checkpoint to use for decoding." "Note: Epoch counts from 0.", ) @@ -66,6 +66,16 @@ def get_parser(): "consecutive checkpoints before the checkpoint specified by " "'--epoch'. ", ) + parser.add_argument( + "--export", + type=str2bool, + default=False, + help="""When enabled, the averaged model is saved to + tdnn/exp/pretrained.pt. Note: only model.state_dict() is saved. + pretrained.pt contains a dict {"model": model.state_dict()}, + which can be loaded by `icefall.checkpoint.load_checkpoint()`. + """, + ) return parser @@ -408,6 +418,12 @@ def main(): logging.info(f"averaging {filenames}") model.load_state_dict(average_checkpoints(filenames)) + if params.export: + logging.info(f"Export averaged model to {params.exp_dir}/pretrained.pt") + torch.save( + {"model": model.state_dict()}, f"{params.exp_dir}/pretrained.pt" + ) + model.to(device) model.eval() diff --git a/egs/librispeech/ASR/tdnn_lstm_ctc/train.py b/egs/librispeech/ASR/tdnn_lstm_ctc/train.py index 23e224f76..c18d9742b 100755 --- a/egs/librispeech/ASR/tdnn_lstm_ctc/train.py +++ b/egs/librispeech/ASR/tdnn_lstm_ctc/train.py @@ -144,7 +144,7 @@ def get_params() -> AttributeDict: "weight_decay": 5e-4, "subsampling_factor": 3, "start_epoch": 0, - "num_epochs": 10, + "num_epochs": 20, "best_train_loss": float("inf"), "best_valid_loss": float("inf"), "best_train_epoch": -1,