Add more doc for the recipe yesno.

This commit is contained in:
Fangjun Kuang 2021-08-24 00:00:47 +08:00
parent dcf71b31a5
commit 39554781b2
4 changed files with 112 additions and 6 deletions

View File

@ -310,7 +310,7 @@ The correct fix is:
.6.1 tensorboard-plugin-wit-1.8.0 urllib3-1.26.6 werkzeug-2.0.1 .6.1 tensorboard-plugin-wit-1.8.0 urllib3-1.26.6 werkzeug-2.0.1
Test your Installation Test Your Installation
---------------------- ----------------------
To test that your installation is successful, let us run To test that your installation is successful, let us run

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

View File

@ -19,7 +19,7 @@ This page shows you how to run the ``yesno`` recipe.
Data preparation Data preparation
---------------- ----------------
.. code-block:: .. code-block:: bash
$ cd egs/yesno/ASR $ cd egs/yesno/ASR
$ ./prepare.sh $ ./prepare.sh
@ -64,17 +64,94 @@ The command to run the training part is:
.. code-block:: bash .. code-block:: bash
$ cd egs/yesno/ASR $ cd egs/yesno/ASR
$ export CUDA_VISIBLE_DEVICES=""
$ ./tdnn/train.py $ ./tdnn/train.py
By default, it will run ``15`` epochs. Training logs and checkpoints are saved By default, it will run ``15`` epochs. Training logs and checkpoints are saved
in ``tdnn/exp``. in ``tdnn/exp``.
To see the training options, you can use: In ``tdnn/exp``, you will find the following files:
- ``epoch-0.pt``, ``epoch-1.pt``, ...
These are checkpoint files, containing model parameters and optimizer ``state_dict``.
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
.. code-block:: bash
$ ./tdnn/train.py --start-epoch 11
- ``tensorboard/``
This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
.. code-block:: bash
$ cd tdnn/exp/tensorboard
$ tensorboard dev upload --logdir . --description "TDNN training for yesno with icefall"
It will print something like below:
.. code-block::
TensorFlow installation not found - running with reduced feature set.
Upload started and will continue reading any new data as it's added to the logdir.
To stop uploading, press Ctrl-C.
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
[2021-08-23T23:49:41] Started scanning logdir.
[2021-08-23T23:49:42] Total uploaded: 135 scalars, 0 tensors, 0 binary objects
Listening for new data in logdir...
Note there is a URL in the above output, click it and you will see
the following screenshot:
.. figure:: images/yesno-tdnn-tensorboard-log.png
:width: 600
:alt: TensorBoard screenshot
:align: center
:target: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
TensorBoard screenshot.
- ``log/log-train-xxxx``
It is the detailed training log in text format, same as the one
you saw printed to the console during training.
To see available training options, you can use:
.. code-block:: bash .. code-block:: bash
$ ./tdnn/train.py --help $ ./tdnn/train.py --help
.. NOTE::
By default, ``./tdnn/train.py`` uses GPU 0 for training if GPUs are available.
If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
training, you can run:
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES="1"
$ ./tdnn/train.py
Since the ``yesno`` dataset is very small, containing only 30 sound files
for training, and the model in use is also very small, we use:
.. code-block:: bash
$ export CUDA_VISIBLE_DEVICES=""
so that ``./tdnn/train.py`` uses CPU during training.
If you don't have GPUs, then you don't need to
run ``export CUDA_VISIBLE_DEVICES=""``.
Decoding Decoding
-------- --------
@ -85,10 +162,12 @@ The command for decoding is:
.. code-block:: bash .. code-block:: bash
$ export CUDA_VISIBLE_DEVICES=""
$ ./tdnn/decode.py $ ./tdnn/decode.py
You will see the WER in the output log. You will see the WER in the output log.
Decoding results are saved in ``tdnn/exp``.
Decoded results are saved in ``tdnn/exp``.
Colab notebook Colab notebook
-------------- --------------

View File

@ -60,6 +60,16 @@ def get_parser():
help="Number of epochs to train.", help="Number of epochs to train.",
) )
parser.add_argument(
"--start-epoch",
type=int,
default=0,
help="""Resume training from from this epoch.
If it is positive, it will load checkpoint from
tdnn/exp/epoch-{start_epoch-1}.pt
""",
)
return parser return parser
@ -92,8 +102,6 @@ def get_params() -> AttributeDict:
- start_epoch: If it is not zero, load checkpoint `start_epoch-1` - start_epoch: If it is not zero, load checkpoint `start_epoch-1`
and continue training from that checkpoint. and continue training from that checkpoint.
- num_epochs: Number of epochs to train.
- best_train_loss: Best training loss so far. It is used to select - best_train_loss: Best training loss so far. It is used to select
the model that has the lowest training loss. It is the model that has the lowest training loss. It is
updated during the training. updated during the training.
@ -420,6 +428,19 @@ def train_one_epoch(
f"batch size: {batch_size}" f"batch size: {batch_size}"
) )
if tb_writer is not None:
tb_writer.add_scalar(
"train/current_loss",
loss_cpu / params.train_frames,
params.batch_idx_train,
)
tb_writer.add_scalar(
"train/tot_avg_loss",
tot_avg_loss,
params.batch_idx_train,
)
if batch_idx > 0 and batch_idx % params.valid_interval == 0: if batch_idx > 0 and batch_idx % params.valid_interval == 0:
compute_validation_loss( compute_validation_loss(
params=params, params=params,
@ -434,6 +455,12 @@ def train_one_epoch(
f" best valid loss: {params.best_valid_loss:.4f} " f" best valid loss: {params.best_valid_loss:.4f} "
f"best valid epoch: {params.best_valid_epoch}" f"best valid epoch: {params.best_valid_epoch}"
) )
if tb_writer is not None:
tb_writer.add_scalar(
"train/valid_loss",
params.valid_loss,
params.batch_idx_train,
)
params.train_loss = tot_loss / tot_frames params.train_loss = tot_loss / tot_frames