icefall/egs/librispeech/ASR/README.md


## Data preparation

If you want to use `./prepare.sh` to download everything for you,
you can just run

```
./prepare.sh
```

If you have pre-downloaded the LibriSpeech dataset, please
read `./prepare.sh` and modify it to point to the location
of your dataset so that it won't re-download it. After modification,
please run

```
./prepare.sh
```

The script `./prepare.sh` prepares features, lexicon, LMs, etc.
All generated files are saved in the folder `./data`.

**HINT:** `./prepare.sh` supports options `--stage` and `--stop-stage`.

## TDNN-LSTM CTC training

The folder `tdnn_lstm_ctc` contains scripts for CTC training
with TDNN-LSTM models.

Pre-configured parameters for training and decoding are set in the function
`get_params()` within `tdnn_lstm_ctc/train.py`
and `tdnn_lstm_ctc/decode.py`.

Parameters that can be passed from the command-line can be found by

```
./tdnn_lstm_ctc/train.py --help
./tdnn_lstm_ctc/decode.py --help
```

If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for
mutli-GPU training, you can run

```
export CUDA_VISIBLE_DEVICES="0,2,3"
./tdnn_lstm_ctc/train.py \
  --master-port 12345 \
  --world-size 3
```

If you want to decode by averaging checkpoints `epoch-8.pt`,
`epoch-9.pt` and `epoch-10.pt`, you can run

```
./tdnn_lstm_ctc/decode.py \
  --epoch 10 \
  --avg 3
```

## Conformer CTC training

The folder `conformer-ctc` contains scripts for CTC training
with conformer models. The steps of running the training and
decoding are similar to `tdnn_lstm_ctc`.