diff --git a/docs/source/installation/index.rst b/docs/source/installation/index.rst
index efec1f389..e3ccf3e1e 100644
--- a/docs/source/installation/index.rst
+++ b/docs/source/installation/index.rst
@@ -310,7 +310,7 @@ The correct fix is:
   .6.1 tensorboard-plugin-wit-1.8.0 urllib3-1.26.6 werkzeug-2.0.1
 
 
-Test your Installation
+Test Your Installation
 ----------------------
 
 To test that your installation is successful, let us run
diff --git a/docs/source/recipes/images/yesno-tdnn-tensorboard-log.png b/docs/source/recipes/images/yesno-tdnn-tensorboard-log.png
new file mode 100644
index 000000000..3d2612c9c
Binary files /dev/null and b/docs/source/recipes/images/yesno-tdnn-tensorboard-log.png differ
diff --git a/docs/source/recipes/yesno.rst b/docs/source/recipes/yesno.rst
index c5a341759..5d549b06d 100644
--- a/docs/source/recipes/yesno.rst
+++ b/docs/source/recipes/yesno.rst
@@ -19,7 +19,7 @@ This page shows you how to run the ``yesno`` recipe.
 Data preparation
 ----------------
 
-.. code-block::
+.. code-block:: bash
 
   $ cd egs/yesno/ASR
   $ ./prepare.sh
@@ -64,17 +64,94 @@ The command to run the training part is:
 .. code-block:: bash
 
   $ cd egs/yesno/ASR
+  $ export CUDA_VISIBLE_DEVICES=""
   $ ./tdnn/train.py
 
 By default, it will run ``15`` epochs. Training logs and checkpoints are saved
 in ``tdnn/exp``.
 
-To see the training options, you can use:
+In ``tdnn/exp``, you will find the following files:
+
+  - ``epoch-0.pt``, ``epoch-1.pt``, ...
+
+    These are checkpoint files, containing model parameters and optimizer ``state_dict``.
+    To resume training from some checkpoint, say ``epoch-10.pt``, you can use:
+
+      .. code-block:: bash
+
+        $ ./tdnn/train.py --start-epoch 11
+
+  - ``tensorboard/``
+
+    This folder contains TensorBoard logs. Training loss, validation loss, learning
+    rate, etc, are recorded in these logs. You can visualize them by:
+
+      .. code-block:: bash
+
+        $ cd tdnn/exp/tensorboard
+        $ tensorboard dev upload --logdir . --description "TDNN training for yesno with icefall"
+
+    It will print something like below:
+
+      .. code-block::
+
+        TensorFlow installation not found - running with reduced feature set.
+        Upload started and will continue reading any new data as it's added to the logdir.
+
+        To stop uploading, press Ctrl-C.
+
+        New experiment created. View your TensorBoard at: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
+
+        [2021-08-23T23:49:41] Started scanning logdir.
+        [2021-08-23T23:49:42] Total uploaded: 135 scalars, 0 tensors, 0 binary objects
+        Listening for new data in logdir...
+
+    Note there is a URL in the above output, click it and you will see
+    the following screenshot:
+
+      .. figure:: images/yesno-tdnn-tensorboard-log.png
+         :width: 600
+         :alt: TensorBoard screenshot
+         :align: center
+         :target: https://tensorboard.dev/experiment/yKUbhb5wRmOSXYkId1z9eg/
+
+         TensorBoard screenshot.
+
+  - ``log/log-train-xxxx``
+
+    It is the detailed training log in text format, same as the one
+    you saw printed to the console during training.
+
+
+To see available training options, you can use:
 
 .. code-block:: bash
 
   $ ./tdnn/train.py --help
 
+.. NOTE::
+
+  By default, ``./tdnn/train.py`` uses GPU 0 for training if GPUs are available.
+  If you have two GPUs, say, GPU 0 and GPU 1, and you want to use GPU 1 for
+  training, you can run:
+
+    .. code-block:: bash
+
+      $ export CUDA_VISIBLE_DEVICES="1"
+      $ ./tdnn/train.py
+
+  Since the ``yesno`` dataset is very small, containing only 30 sound files
+  for training, and the model in use is also very small, we use:
+
+    .. code-block:: bash
+
+      $ export CUDA_VISIBLE_DEVICES=""
+
+  so that ``./tdnn/train.py`` uses CPU during training.
+
+  If you don't have GPUs, then you don't need to
+  run ``export CUDA_VISIBLE_DEVICES=""``.
+
 Decoding
 --------
 
@@ -85,10 +162,12 @@ The command for decoding is:
 
 .. code-block:: bash
 
+  $ export CUDA_VISIBLE_DEVICES=""
   $ ./tdnn/decode.py
 
 You will see the WER in the output log.
-Decoding results are saved in ``tdnn/exp``.
+
+Decoded results are saved in ``tdnn/exp``.
 
 Colab notebook
 --------------
diff --git a/egs/yesno/ASR/tdnn/train.py b/egs/yesno/ASR/tdnn/train.py
index 04e1ab698..39c5ef3ef 100755
--- a/egs/yesno/ASR/tdnn/train.py
+++ b/egs/yesno/ASR/tdnn/train.py
@@ -60,6 +60,16 @@ def get_parser():
         help="Number of epochs to train.",
     )
 
+    parser.add_argument(
+        "--start-epoch",
+        type=int,
+        default=0,
+        help="""Resume training from from this epoch.
+        If it is positive, it will load checkpoint from
+        tdnn/exp/epoch-{start_epoch-1}.pt
+        """,
+    )
+
     return parser
 
 
@@ -92,8 +102,6 @@ def get_params() -> AttributeDict:
         - start_epoch:  If it is not zero, load checkpoint `start_epoch-1`
                         and continue training from that checkpoint.
 
-        - num_epochs:  Number of epochs to train.
-
         - best_train_loss: Best training loss so far. It is used to select
                            the model that has the lowest training loss. It is
                            updated during the training.
@@ -420,6 +428,19 @@ def train_one_epoch(
                 f"batch size: {batch_size}"
             )
 
+            if tb_writer is not None:
+                tb_writer.add_scalar(
+                    "train/current_loss",
+                    loss_cpu / params.train_frames,
+                    params.batch_idx_train,
+                )
+
+                tb_writer.add_scalar(
+                    "train/tot_avg_loss",
+                    tot_avg_loss,
+                    params.batch_idx_train,
+                )
+
         if batch_idx > 0 and batch_idx % params.valid_interval == 0:
             compute_validation_loss(
                 params=params,
@@ -434,6 +455,12 @@ def train_one_epoch(
                 f" best valid loss: {params.best_valid_loss:.4f} "
                 f"best valid epoch: {params.best_valid_epoch}"
             )
+            if tb_writer is not None:
+                tb_writer.add_scalar(
+                    "train/valid_loss",
+                    params.valid_loss,
+                    params.batch_idx_train,
+                )
 
     params.train_loss = tot_loss / tot_frames