Add more doc for the recipe yesno.

This commit is contained in:
Fangjun Kuang 2021-08-24 00:00:47 +08:00
parent dcf71b31a5
commit 39554781b2
4 changed files with 112 additions and 6 deletions

View File

@ -310,7 +310,7 @@ The correct fix is:
.6.1 tensorboard-plugin-wit-1.8.0 urllib3-1.26.6 werkzeug-2.0.1
Test your Installation
Test Your Installation
----------------------
To test that your installation is successful, let us run

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

View File

@ -19,7 +19,7 @@ This page shows you how to run the ``yesno`` recipe.
Data preparation
----------------
.. code-block::
.. code-block:: bash
$ cd egs/yesno/ASR
$ ./prepare.sh
@ -64,17 +64,94 @@ The command to run the training part is:
.. code-block:: bash
$ cd egs/yesno/ASR
$ export CUDA_VISIBLE_DEVICES=""
$ ./tdnn/train.py
By default, it will run ``15`` epochs. Training logs and checkpoints are saved
in ``tdnn/exp``.
To see the training options, you can use:
In ``tdnn/exp``, you will find the following files:
- ``epoch-0.pt``, ``epoch-1.pt``, ...
These are checkpoint files, containing model parameters and optimizer ``state_dict``.
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
.. code-block:: bash
$ ./tdnn/train.py --start-epoch 11
- ``tensorboard/``
This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
.. code-block:: bash
$ cd tdnn/exp/tensorboard
$ tensorboard dev upload --logdir . --description "TDNN training for yesno with icefall"
It will print something like below:
.. code-block::
TensorFlow installation not found - running with reduced feature set.
Upload started and will continue reading any new data as it's added to the logdir.
To stop uploading, press Ctrl-C.
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
[2021-08-23T23:49:41] Started scanning logdir.
[2021-08-23T23:49:42] Total uploaded: 135 scalars, 0 tensors, 0 binary objects
Listening for new data in logdir...
Note there is a URL in the above output, click it and you will see
the following screenshot:
.. figure:: images/yesno-tdnn-tensorboard-log.png
:width: 600
:alt: TensorBoard screenshot
:align: center
:target: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
TensorBoard screenshot.
- ``log/log-train-xxxx``
It is the detailed training log in text format, same as the one
you saw printed to the console during training.
To see available training options, you can use:
.. code-block:: bash
$ ./tdnn/train.py --help
.. NOTE::
By default, ``./tdnn/train.py`` uses GPU 0 for training if GPUs are available.
If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
training, you can run:
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES="1"
$ ./tdnn/train.py
Since the ``yesno`` dataset is very small, containing only 30 sound files
for training, and the model in use is also very small, we use:
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES=""
so that ``./tdnn/train.py`` uses CPU during training.
If you don't have GPUs, then you don't need to
run ``export CUDA_VISIBLE_DEVICES=""``.
Decoding
--------
@ -85,10 +162,12 @@ The command for decoding is:
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES=""
$ ./tdnn/decode.py
You will see the WER in the output log.
Decoding results are saved in ``tdnn/exp``.
Decoded results are saved in ``tdnn/exp``.
Colab notebook
--------------

View File

@ -60,6 +60,16 @@ def get_parser():
help="Number of epochs to train.",
)
parser.add_argument(
"--start-epoch",
type=int,
default=0,
help="""Resume training from from this epoch.
If it is positive, it will load checkpoint from
tdnn/exp/epoch-{start_epoch-1}.pt
""",
)
return parser
@ -92,8 +102,6 @@ def get_params() -> AttributeDict:
- start_epoch: If it is not zero, load checkpoint `start_epoch-1`
and continue training from that checkpoint.
- num_epochs: Number of epochs to train.
- best_train_loss: Best training loss so far. It is used to select
the model that has the lowest training loss. It is
updated during the training.
@ -420,6 +428,19 @@ def train_one_epoch(
f"batch size: {batch_size}"
)
if tb_writer is not None:
tb_writer.add_scalar(
"train/current_loss",
loss_cpu / params.train_frames,
params.batch_idx_train,
)
tb_writer.add_scalar(
"train/tot_avg_loss",
tot_avg_loss,
params.batch_idx_train,
)
if batch_idx > 0 and batch_idx % params.valid_interval == 0:
compute_validation_loss(
params=params,
@ -434,6 +455,12 @@ def train_one_epoch(
f" best valid loss: {params.best_valid_loss:.4f} "
f"best valid epoch: {params.best_valid_epoch}"
)
if tb_writer is not None:
tb_writer.add_scalar(
"train/valid_loss",
params.valid_loss,
params.batch_idx_train,
)
params.train_loss = tot_loss / tot_frames