Fangjun Kuang 12a2fd023e
Add doc about installation and usage (#7)
* Add readme.

* Add TOC.

* fix typos

* Minor fixes after review.
2021-08-12 12:44:04 +08:00

65 lines
1.5 KiB
Markdown

## Data preparation
If you want to use `./prepare.sh` to download everything for you,
you can just run
```
./prepare.sh
```
If you have pre-downloaded the LibriSpeech dataset, please
read `./prepare.sh` and modify it to point to the location
of your dataset so that it won't re-download it. After modification,
please run
```
./prepare.sh
```
The script `./prepare.sh` prepares features, lexicon, LMs, etc.
All generated files are saved in the folder `./data`.
**HINT:** `./prepare.sh` supports options `--stage` and `--stop-stage`.
## TDNN-LSTM CTC training
The folder `tdnn_lstm_ctc` contains scripts for CTC training
with TDNN-LSTM models.
Pre-configured parameters for training and decoding are set in the function
`get_params()` within `tdnn_lstm_ctc/train.py`
and `tdnn_lstm_ctc/decode.py`.
Parameters that can be passed from the command-line can be found by
```
./tdnn_lstm_ctc/train.py --help
./tdnn_lstm_ctc/decode.py --help
```
If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for
mutli-GPU training, you can run
```
export CUDA_VISIBLE_DEVICES="0,2,3"
./tdnn_lstm_ctc/train.py \
--master-port 12345 \
--world-size 3
```
If you want to decode by averaging checkpoints `epoch-8.pt`,
`epoch-9.pt` and `epoch-10.pt`, you can run
```
./tdnn_lstm_ctc/decode.py \
--epoch 10 \
--avg 3
```
## Conformer CTC training
The folder `conformer-ctc` contains scripts for CTC training
with conformer models. The steps of running the training and
decoding are similar to `tdnn_lstm_ctc`.