mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-13 20:12:24 +00:00
Add more doc for the recipe yesno.
This commit is contained in:
parent
dcf71b31a5
commit
39554781b2
@ -310,7 +310,7 @@ The correct fix is:
|
||||
.6.1 tensorboard-plugin-wit-1.8.0 urllib3-1.26.6 werkzeug-2.0.1
|
||||
|
||||
|
||||
Test your Installation
|
||||
Test Your Installation
|
||||
----------------------
|
||||
|
||||
To test that your installation is successful, let us run
|
||||
|
BIN
docs/source/recipes/images/yesno-tdnn-tensorboard-log.png
Normal file
BIN
docs/source/recipes/images/yesno-tdnn-tensorboard-log.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 121 KiB |
@ -19,7 +19,7 @@ This page shows you how to run the ``yesno`` recipe.
|
||||
Data preparation
|
||||
----------------
|
||||
|
||||
.. code-block::
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd egs/yesno/ASR
|
||||
$ ./prepare.sh
|
||||
@ -64,17 +64,94 @@ The command to run the training part is:
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd egs/yesno/ASR
|
||||
$ export CUDA_VISIBLE_DEVICES=""
|
||||
$ ./tdnn/train.py
|
||||
|
||||
By default, it will run ``15`` epochs. Training logs and checkpoints are saved
|
||||
in ``tdnn/exp``.
|
||||
|
||||
To see the training options, you can use:
|
||||
In ``tdnn/exp``, you will find the following files:
|
||||
|
||||
- ``epoch-0.pt``, ``epoch-1.pt``, ...
|
||||
|
||||
These are checkpoint files, containing model parameters and optimizer ``state_dict``.
|
||||
To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ ./tdnn/train.py --start-epoch 11
|
||||
|
||||
- ``tensorboard/``
|
||||
|
||||
This folder contains TensorBoard logs. Training loss, validation loss, learning
|
||||
rate, etc, are recorded in these logs. You can visualize them by:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd tdnn/exp/tensorboard
|
||||
$ tensorboard dev upload --logdir . --description "TDNN training for yesno with icefall"
|
||||
|
||||
It will print something like below:
|
||||
|
||||
.. code-block::
|
||||
|
||||
TensorFlow installation not found - running with reduced feature set.
|
||||
Upload started and will continue reading any new data as it's added to the logdir.
|
||||
|
||||
To stop uploading, press Ctrl-C.
|
||||
|
||||
New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
|
||||
|
||||
[2021-08-23T23:49:41] Started scanning logdir.
|
||||
[2021-08-23T23:49:42] Total uploaded: 135 scalars, 0 tensors, 0 binary objects
|
||||
Listening for new data in logdir...
|
||||
|
||||
Note there is a URL in the above output, click it and you will see
|
||||
the following screenshot:
|
||||
|
||||
.. figure:: images/yesno-tdnn-tensorboard-log.png
|
||||
:width: 600
|
||||
:alt: TensorBoard screenshot
|
||||
:align: center
|
||||
:target: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
|
||||
|
||||
TensorBoard screenshot.
|
||||
|
||||
- ``log/log-train-xxxx``
|
||||
|
||||
It is the detailed training log in text format, same as the one
|
||||
you saw printed to the console during training.
|
||||
|
||||
|
||||
To see available training options, you can use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ ./tdnn/train.py --help
|
||||
|
||||
.. NOTE::
|
||||
|
||||
By default, ``./tdnn/train.py`` uses GPU 0 for training if GPUs are available.
|
||||
If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
|
||||
training, you can run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ export CUDA_VISIBLE_DEVICES="1"
|
||||
$ ./tdnn/train.py
|
||||
|
||||
Since the ``yesno`` dataset is very small, containing only 30 sound files
|
||||
for training, and the model in use is also very small, we use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ export CUDA_VISIBLE_DEVICES=""
|
||||
|
||||
so that ``./tdnn/train.py`` uses CPU during training.
|
||||
|
||||
If you don't have GPUs, then you don't need to
|
||||
run ``export CUDA_VISIBLE_DEVICES=""``.
|
||||
|
||||
Decoding
|
||||
--------
|
||||
|
||||
@ -85,10 +162,12 @@ The command for decoding is:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ export CUDA_VISIBLE_DEVICES=""
|
||||
$ ./tdnn/decode.py
|
||||
|
||||
You will see the WER in the output log.
|
||||
Decoding results are saved in ``tdnn/exp``.
|
||||
|
||||
Decoded results are saved in ``tdnn/exp``.
|
||||
|
||||
Colab notebook
|
||||
--------------
|
||||
|
@ -60,6 +60,16 @@ def get_parser():
|
||||
help="Number of epochs to train.",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--start-epoch",
|
||||
type=int,
|
||||
default=0,
|
||||
help="""Resume training from from this epoch.
|
||||
If it is positive, it will load checkpoint from
|
||||
tdnn/exp/epoch-{start_epoch-1}.pt
|
||||
""",
|
||||
)
|
||||
|
||||
return parser
|
||||
|
||||
|
||||
@ -92,8 +102,6 @@ def get_params() -> AttributeDict:
|
||||
- start_epoch: If it is not zero, load checkpoint `start_epoch-1`
|
||||
and continue training from that checkpoint.
|
||||
|
||||
- num_epochs: Number of epochs to train.
|
||||
|
||||
- best_train_loss: Best training loss so far. It is used to select
|
||||
the model that has the lowest training loss. It is
|
||||
updated during the training.
|
||||
@ -420,6 +428,19 @@ def train_one_epoch(
|
||||
f"batch size: {batch_size}"
|
||||
)
|
||||
|
||||
if tb_writer is not None:
|
||||
tb_writer.add_scalar(
|
||||
"train/current_loss",
|
||||
loss_cpu / params.train_frames,
|
||||
params.batch_idx_train,
|
||||
)
|
||||
|
||||
tb_writer.add_scalar(
|
||||
"train/tot_avg_loss",
|
||||
tot_avg_loss,
|
||||
params.batch_idx_train,
|
||||
)
|
||||
|
||||
if batch_idx > 0 and batch_idx % params.valid_interval == 0:
|
||||
compute_validation_loss(
|
||||
params=params,
|
||||
@ -434,6 +455,12 @@ def train_one_epoch(
|
||||
f" best valid loss: {params.best_valid_loss:.4f} "
|
||||
f"best valid epoch: {params.best_valid_epoch}"
|
||||
)
|
||||
if tb_writer is not None:
|
||||
tb_writer.add_scalar(
|
||||
"train/valid_loss",
|
||||
params.valid_loss,
|
||||
params.batch_idx_train,
|
||||
)
|
||||
|
||||
params.train_loss = tot_loss / tot_frames
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user