The script ./prepare.sh handles the data preparation for you, automagically.
All you need to do is to run it.
The data preparation contains several stages, you can use the following two
options:
--stage
--stop-stage
to control which stage(s) should be run. By default, all stages are executed.
For example,
$ cd egs/aishell/ASR
$ ./prepare.sh --stage 0 --stop-stage 0
means to run only stage 0.
To run stage 2 to stage 5, use:
$ ./prepare.sh --stage 2 --stop-stage 5
Hint
If you have pre-downloaded the Aishell
dataset and the musan dataset, say,
they are saved in /tmp/aishell and /tmp/musan, you can modify
the dl_dir variable in ./prepare.sh to point to /tmp so that
./prepare.sh won’t re-download them.
Hint
A 3-gram language model will be downloaded from huggingface, we assume you have
intalled and initialized git-lfs. If not, you could install git-lfs by
$ sudo apt-get install git-lfs
$ git-lfs install
If you don’t have the sudo permission, you could download the
git-lfs binary here, then add it to you PATH.
Note
All generated files by ./prepare.sh, e.g., features, lexicon, etc,
are saved in ./data directory.
$ cd egs/aishell/ASR
$ ./conformer_ctc/train.py --help
shows you the training options that can be passed from the commandline.
The following options are used quite often:
--exp-dir
The experiment folder to save logs and model checkpoints,
default ./conformer_ctc/exp.
--num-epochs
It is the number of epochs to train. For instance,
./conformer_ctc/train.py--num-epochs30 trains for 30 epochs
and generates epoch-0.pt, epoch-1.pt, …, epoch-29.pt
in the folder set by --exp-dir.
--start-epoch
It’s used to resume training.
./conformer_ctc/train.py--start-epoch10 loads the
checkpoint ./conformer_ctc/exp/epoch-9.pt and starts
training from epoch 10, based on the state from epoch 9.
--world-size
It is used for multi-GPU single-machine DDP training.
If it is 1, then no DDP training is used.
If it is 2, then GPU 0 and GPU 1 are used for DDP training.
The following shows some use cases with it.
Use case 1: You have 4 GPUs, but you only want to use GPU 0 and
GPU 2 for training. You can do the following:
$ cd egs/aishell/ASR
$ exportCUDA_VISIBLE_DEVICES="0,2"
$ ./conformer_ctc/train.py --world-size 2
Use case 2: You have 4 GPUs and you want to use all of them
for training. You can do the following:
$ cd egs/aishell/ASR
$ ./conformer_ctc/train.py --world-size 4
Use case 3: You have 4 GPUs but you only want to use GPU 3
for training. You can do the following:
$ cd egs/aishell/ASR
$ exportCUDA_VISIBLE_DEVICES="3"
$ ./conformer_ctc/train.py --world-size 1
Caution
Only multi-GPU single-machine DDP training is implemented at present.
Multi-GPU multi-machine DDP training will be added later.
--max-duration
It specifies the number of seconds over all utterances in a
batch, before padding.
If you encounter CUDA OOM, please reduce it. For instance, if
your are using V100 NVIDIA GPU, we recommend you to set it to 200.
Hint
Due to padding, the number of seconds of all utterances in a
batch will usually be larger than --max-duration.
A larger value for --max-duration may cause OOM during training,
while a smaller value may increase the training time. You have to
tune it.
There are some training options, e.g., weight decay,
number of warmup steps, etc,
that are not passed from the commandline.
They are pre-configured by the function get_params() in
conformer_ctc/train.py
You don’t need to change these pre-configured parameters. If you really need to change
them, please modify ./conformer_ctc/train.py directly.
Caution
The training set is perturbed by speed with two factors: 0.9 and 1.1.
Each epoch actually processes 3x150==450 hours of data.
Training logs and checkpoints are saved in the folder set by --exp-dir
(default conformer_ctc/exp). You will find the following files in that directory:
epoch-0.pt, epoch-1.pt, …
These are checkpoint files, containing model state_dict and optimizer state_dict.
To resume training from some checkpoint, say epoch-10.pt, you can use:
$ ./conformer_ctc/train.py --start-epoch 11
tensorboard/
This folder contains TensorBoard logs. Training loss, validation loss, learning
rate, etc, are recorded in these logs. You can visualize them by:
$ cd conformer_ctc/exp/tensorboard
$ tensorboard dev upload --logdir . --name "Aishell conformer ctc training with icefall" --description "Training with new LabelSmoothing loss, see https://github.com/k2-fsa/icefall/pull/109"
It will print something like below:
TensorFlowinstallationnotfound-runningwithreducedfeatureset.Uploadstartedandwillcontinuereadinganynewdataasit's added to the logdir.Tostopuploading,pressCtrl-C.Newexperimentcreated.ViewyourTensorBoardat:https://tensorboard.dev/experiment/engw8KSkTZqS24zBV5dgCg/[2021-11-22T11:09:27]Startedscanninglogdir.[2021-11-22T11:10:14]Totaluploaded:116068scalars,0tensors,0binaryobjectsListeningfornewdatainlogdir...
Note there is a URL in the above output, click it and you will see
the following screenshot:
It contains tokens and their IDs.
Provided only for convenience so that you can look up the SOS/EOS ID easily.
data/lang_char/words.txt
It contains words and their IDs.
exp/pretrained.pt
It contains pre-trained model parameters, obtained by averaging
checkpoints from epoch-25.pt to epoch-84.pt.
Note: We have removed optimizer state_dict to reduce file size.
test_waves/*.wav
It contains some test sound files from Aishell test dataset.
test_waves/trans.txt
It contains the reference transcripts for the sound files in test_waves/.
The information of the test sound files is listed below:
We do provide a colab notebook for this recipe showing how to use a pre-trained model.
Hint
Due to limited memory provided by Colab, you have to upgrade to Colab Pro to
run HLGdecoding+attentiondecoderrescoring.
Otherwise, you can only run HLGdecoding with Colab.
Congratulations! You have finished the aishell ASR recipe with
conformer CTC models in icefall.
If you want to deploy your trained model in C++, please read the following section.
$ mkdir build-release
$ cd build-release
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j hlg_decode
# You will find four binaries in `./bin`, i.e. ./bin/hlg_decode,
Now you are ready to go!
Assume you have run:
$ cd k2/build-release
$ ln -s /path/to/icefall-asr-aishell-conformer-ctc ./
To view the usage of ./bin/hlg_decode, run:
$ ./bin/hlg_decode
It will show you the following message:
Please provide --nn_model
This file implements decoding with an HLG decoding graph.
Usage:
./bin/hlg_decode \
--use_gpu true\
--nn_model <path to torch scripted pt file> \
--hlg <path to HLG.pt> \
--word_table <path to words.txt> \
<path to foo.wav> \
<path to bar.wav> \
<more waves if any>
To see all possible options, use
./bin/hlg_decode --help
Caution:
- Only sound files (*.wav) with single channel are supported.
- It assumes the model is conformer_ctc/transformer.py from icefall.
If you use a different model, you have to change the code
related to `model.forward`in this file.