Add doc about installation and usage (#7)

* Add readme.

* Add TOC.

* fix typos

* Minor fixes after review.
This commit is contained in:
Fangjun Kuang 2021-08-12 12:44:04 +08:00 committed by GitHub
parent 5a0b9bcb23
commit 12a2fd023e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 134 additions and 134 deletions

View File

@ -1 +1,60 @@
Working in progress.
# Table of Contents
- [Installation](#installation)
* [Install k2](#install-k2)
* [Install lhotse](#install-lhotse)
* [Install icefall](#install-icefall)
- [Run recipes](#run-recipes)
## Installation
`icefall` depends on [k2][k2] for FSA operations and [lhotse][lhotse] for
data preparations. To use `icefall`, you have to install its dependencies first.
The following subsections describe how to setup the environment.
CAUTION: There are various ways to setup the environment. What we describe
here is just one alternative.
### Install k2
Please refer to [k2's installation documentation][k2-install] to install k2.
If you have any issues about installing k2, please open an issue at
<https://github.com/k2-fsa/k2/issues>.
### Install lhotse
Please refer to [lhotse's installation documentation][lhotse-install] to install
lhotse.
### Install icefall
`icefall` is a set of Python scripts. What you need to do is just to set
the environment variable `PYTHONPATH`:
```bash
cd $HOME/open-source
git clone https://github.com/k2-fsa/icefall
cd icefall
pip install -r requirements.txt
export PYTHONPATH=$HOME/open-source/icefall:$PYTHONPATHON
```
To verify `icefall` was installed successfully, you can run:
```bash
python3 -c "import icefall; print(icefall.__file__)"
```
It should print the path to `icefall`.
## Run recipes
At present, only LibriSpeech recipe is provided. Please
follow [egs/librispeech/ASR/README.md][LibriSpeech] to run it.
[LibriSpeech]: egs/librispeech/ASR/README.md
[k2-install]: https://k2.readthedocs.io/en/latest/installation/index.html#
[k2]: https://github.com/k2-fsa/k2
[lhotse]: https://github.com/lhotse-speech/lhotse
[lhotse-install]: https://lhotse.readthedocs.io/en/latest/getting-started.html#installation

View File

@ -1,121 +1,64 @@
Run `./prepare.sh` to prepare the data. ## Data preparation
Run `./xxx_train.py` (to be added) to train a model. If you want to use `./prepare.sh` to download everything for you,
you can just run
## Conformer-CTC
Results of the pre-trained model from
`<https://huggingface.co/GuoLiyong/snowfall_bpe_model/tree/main/exp-duration-200-feat_batchnorm-bpe-lrfactor5.0-conformer-512-8-noam>`
are given below
### HLG - no LM rescoring
(output beam size is 8)
#### 1-best decoding
``` ```
[test-clean-no_rescore] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ] ./prepare.sh
[test-other-no_rescore] %WER 7.03% [3682 / 52343, 220 ins, 1024 del, 2438 sub ]
``` ```
#### n-best decoding If you have pre-downloaded the LibriSpeech dataset, please
read `./prepare.sh` and modify it to point to the location
For n=100, of your dataset so that it won't re-download it. After modification,
please run
``` ```
[test-clean-no_rescore-100] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ] ./prepare.sh
[test-other-no_rescore-100] %WER 7.14% [3737 / 52343, 275 ins, 1020 del, 2442 sub ]
``` ```
For n=200, The script `./prepare.sh` prepares features, lexicon, LMs, etc.
All generated files are saved in the folder `./data`.
**HINT:** `./prepare.sh` supports options `--stage` and `--stop-stage`.
## TDNN-LSTM CTC training
The folder `tdnn_lstm_ctc` contains scripts for CTC training
with TDNN-LSTM models.
Pre-configured parameters for training and decoding are set in the function
`get_params()` within `tdnn_lstm_ctc/train.py`
and `tdnn_lstm_ctc/decode.py`.
Parameters that can be passed from the command-line can be found by
``` ```
[test-clean-no_rescore-200] %WER 3.16% [1660 / 52576, 125 ins, 378 del, 1157 sub ] ./tdnn_lstm_ctc/train.py --help
[test-other-no_rescore-200] %WER 7.04% [3684 / 52343, 228 ins, 1012 del, 2444 sub ] ./tdnn_lstm_ctc/decode.py --help
``` ```
### HLG - with LM rescoring If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for
mutli-GPU training, you can run
#### Whole lattice rescoring
``` ```
[test-clean-lm_scale_0.8] %WER 2.77% [1456 / 52576, 150 ins, 210 del, 1096 sub ] export CUDA_VISIBLE_DEVICES="0,2,3"
[test-other-lm_scale_0.8] %WER 6.23% [3262 / 52343, 246 ins, 635 del, 2381 sub ] ./tdnn_lstm_ctc/train.py \
--master-port 12345 \
--world-size 3
``` ```
WERs of different LM scales are: If you want to decode by averaging checkpoints `epoch-8.pt`,
`epoch-9.pt` and `epoch-10.pt`, you can run
``` ```
For test-clean, WER of different settings are: ./tdnn_lstm_ctc/decode.py \
lm_scale_0.8 2.77 best for test-clean --epoch 10 \
lm_scale_0.9 2.87 --avg 3
lm_scale_1.0 3.06
lm_scale_1.1 3.34
lm_scale_1.2 3.71
lm_scale_1.3 4.18
lm_scale_1.4 4.8
lm_scale_1.5 5.48
lm_scale_1.6 6.08
lm_scale_1.7 6.79
lm_scale_1.8 7.49
lm_scale_1.9 8.14
lm_scale_2.0 8.82
For test-other, WER of different settings are:
lm_scale_0.8 6.23 best for test-other
lm_scale_0.9 6.37
lm_scale_1.0 6.62
lm_scale_1.1 6.99
lm_scale_1.2 7.46
lm_scale_1.3 8.13
lm_scale_1.4 8.84
lm_scale_1.5 9.61
lm_scale_1.6 10.32
lm_scale_1.7 11.17
lm_scale_1.8 12.12
lm_scale_1.9 12.93
lm_scale_2.0 13.77
``` ```
#### n-best LM rescoring ## Conformer CTC training
n = 100 The folder `conformer-ctc` contains scripts for CTC training
with conformer models. The steps of running the training and
``` decoding are similar to `tdnn_lstm_ctc`.
[test-clean-lm_scale_0.8] %WER 2.79% [1469 / 52576, 149 ins, 212 del, 1108 sub ]
[test-other-lm_scale_0.8] %WER 6.36% [3329 / 52343, 259 ins, 666 del, 2404 sub ]
```
WERs of different LM scales are:
```
For test-clean, WER of different settings are:
lm_scale_0.8 2.79 best for test-clean
lm_scale_0.9 2.89
lm_scale_1.0 3.03
lm_scale_1.1 3.28
lm_scale_1.2 3.52
lm_scale_1.3 3.78
lm_scale_1.4 4.04
lm_scale_1.5 4.24
lm_scale_1.6 4.45
lm_scale_1.7 4.58
lm_scale_1.8 4.7
lm_scale_1.9 4.8
lm_scale_2.0 4.92
For test-other, WER of different settings are:
lm_scale_0.8 6.36 best for test-other
lm_scale_0.9 6.45
lm_scale_1.0 6.64
lm_scale_1.1 6.92
lm_scale_1.2 7.25
lm_scale_1.3 7.59
lm_scale_1.4 7.88
lm_scale_1.5 8.13
lm_scale_1.6 8.36
lm_scale_1.7 8.54
lm_scale_1.8 8.71
lm_scale_1.9 8.88
lm_scale_2.0 9.02
```

View File

@ -16,6 +16,7 @@ import torch.nn as nn
from conformer import Conformer from conformer import Conformer
from lhotse.utils import fix_random_seed from lhotse.utils import fix_random_seed
from torch.nn.parallel import DistributedDataParallel as DDP from torch.nn.parallel import DistributedDataParallel as DDP
from torch.nn.utils import clip_grad_norm_
from torch.utils.tensorboard import SummaryWriter from torch.utils.tensorboard import SummaryWriter
from transformer import Noam from transformer import Noam
@ -114,7 +115,9 @@ def get_params() -> AttributeDict:
- log_interval: Print training loss if batch_idx % log_interval` is 0 - log_interval: Print training loss if batch_idx % log_interval` is 0
- valid_interval: Run validation if batch_idx % valid_interval` is 0 - valid_interval: Run validation if batch_idx % valid_interval is 0
- reset_interval: Reset statistics if batch_idx % reset_interval is 0
- beam_size: It is used in k2.ctc_loss - beam_size: It is used in k2.ctc_loss
@ -124,19 +127,20 @@ def get_params() -> AttributeDict:
""" """
params = AttributeDict( params = AttributeDict(
{ {
"exp_dir": Path("conformer_ctc/exp"), "exp_dir": Path("conformer_ctc/exp_new"),
"lang_dir": Path("data/lang_bpe"), "lang_dir": Path("data/lang_bpe"),
"feature_dim": 80, "feature_dim": 80,
"weight_decay": 0.0, "weight_decay": 1e-6,
"subsampling_factor": 4, "subsampling_factor": 4,
"start_epoch": 0, "start_epoch": 0,
"num_epochs": 50, "num_epochs": 20,
"best_train_loss": float("inf"), "best_train_loss": float("inf"),
"best_valid_loss": float("inf"), "best_valid_loss": float("inf"),
"best_train_epoch": -1, "best_train_epoch": -1,
"best_valid_epoch": -1, "best_valid_epoch": -1,
"batch_idx_train": 0, "batch_idx_train": 0,
"log_interval": 10, "log_interval": 10,
"reset_interval": 200,
"valid_interval": 3000, "valid_interval": 3000,
"beam_size": 10, "beam_size": 10,
"reduction": "sum", "reduction": "sum",
@ -440,6 +444,8 @@ def train_one_epoch(
tot_att_loss = 0.0 tot_att_loss = 0.0
tot_frames = 0.0 # sum of frames over all batches tot_frames = 0.0 # sum of frames over all batches
params.tot_loss = 0.0
params.tot_frames = 0.0
for batch_idx, batch in enumerate(train_dl): for batch_idx, batch in enumerate(train_dl):
params.batch_idx_train += 1 params.batch_idx_train += 1
batch_size = len(batch["supervisions"]["text"]) batch_size = len(batch["supervisions"]["text"])
@ -457,6 +463,7 @@ def train_one_epoch(
optimizer.zero_grad() optimizer.zero_grad()
loss.backward() loss.backward()
clip_grad_norm_(model.parameters(), 5.0, 2.0)
optimizer.step() optimizer.step()
loss_cpu = loss.detach().cpu().item() loss_cpu = loss.detach().cpu().item()
@ -468,6 +475,9 @@ def train_one_epoch(
tot_ctc_loss += ctc_loss_cpu tot_ctc_loss += ctc_loss_cpu
tot_att_loss += att_loss_cpu tot_att_loss += att_loss_cpu
params.tot_frames += params.train_frames
params.tot_loss += loss_cpu
tot_avg_loss = tot_loss / tot_frames tot_avg_loss = tot_loss / tot_frames
tot_avg_ctc_loss = tot_ctc_loss / tot_frames tot_avg_ctc_loss = tot_ctc_loss / tot_frames
tot_avg_att_loss = tot_att_loss / tot_frames tot_avg_att_loss = tot_att_loss / tot_frames
@ -516,6 +526,12 @@ def train_one_epoch(
tot_avg_loss, tot_avg_loss,
params.batch_idx_train, params.batch_idx_train,
) )
if batch_idx > 0 and batch_idx % params.reset_interval == 0:
tot_loss = 0.0 # sum of losses over all batches
tot_ctc_loss = 0.0
tot_att_loss = 0.0
tot_frames = 0.0 # sum of frames over all batches
if batch_idx > 0 and batch_idx % params.valid_interval == 0: if batch_idx > 0 and batch_idx % params.valid_interval == 0:
compute_validation_loss( compute_validation_loss(
@ -551,7 +567,7 @@ def train_one_epoch(
params.batch_idx_train, params.batch_idx_train,
) )
params.train_loss = tot_loss / tot_frames params.train_loss = params.tot_loss / params.tot_frames
if params.train_loss < params.best_train_loss: if params.train_loss < params.best_train_loss:
params.best_train_epoch = params.cur_epoch params.best_train_epoch = params.cur_epoch

View File

@ -4,12 +4,9 @@
import math import math
from typing import Dict, List, Optional, Tuple from typing import Dict, List, Optional, Tuple
import k2
import torch import torch
import torch.nn as nn import torch.nn as nn
from subsampling import Conv2dSubsampling, VggSubsampling from subsampling import Conv2dSubsampling, VggSubsampling
from icefall.utils import get_texts
from torch.nn.utils.rnn import pad_sequence from torch.nn.utils.rnn import pad_sequence
# Note: TorchScript requires Dict/List/etc. to be fully typed. # Note: TorchScript requires Dict/List/etc. to be fully typed.
@ -274,9 +271,11 @@ class Transformer(nn.Module):
device device
) )
# TODO: Use eos_id as ignore_id. tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id) # TODO: Use length information to create the decoder padding mask
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad) # We set the first column to False since the first column in ys_in_pad
# contains sos_id, which is the same as eos_id in our current setting.
tgt_key_padding_mask[:, 0] = False
tgt = self.decoder_embed(ys_in_pad) # (N, T) -> (N, T, C) tgt = self.decoder_embed(ys_in_pad) # (N, T) -> (N, T, C)
tgt = self.decoder_pos(tgt) tgt = self.decoder_pos(tgt)
@ -339,9 +338,11 @@ class Transformer(nn.Module):
device device
) )
# TODO: Use eos_id as ignore_id. tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id) # TODO: Use length information to create the decoder padding mask
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad) # We set the first column to False since the first column in ys_in_pad
# contains sos_id, which is the same as eos_id in our current setting.
tgt_key_padding_mask[:, 0] = False
tgt = self.decoder_embed(ys_in_pad) # (B, T) -> (B, T, F) tgt = self.decoder_embed(ys_in_pad) # (B, T) -> (B, T, F)
tgt = self.decoder_pos(tgt) tgt = self.decoder_pos(tgt)

View File

@ -1,22 +1,2 @@
## (To be filled in)
It will contain: Will add results later.
- How to run
- WERs
```bash
cd $PWD/..
./prepare.sh
./tdnn_lstm_ctc/train.py
```
If you have 4 GPUs and want to use GPU 1 and GPU 3 for DDP training,
you can do the following:
```
export CUDA_VISIBLE_DEVICES="1,3"
./tdnn_lstm_ctc/train.py --world-size=2
```

View File

@ -1,3 +1,4 @@
kaldilm kaldilm
kaldialign kaldialign
sentencepiece>=0.1.96 sentencepiece>=0.1.96
tensorboard