Add readme.

This commit is contained in:
Fangjun Kuang 2021-08-10 20:08:23 +08:00
parent 5a0b9bcb23
commit 55be10534d
6 changed files with 151 additions and 134 deletions

View File

@ -1 +1,77 @@
Working in progress.
## Installation
`icefall` depends on [k2][k2] for FSA operations and [lhotse][lhotse] for
data preparations. To use `icefall`, you have to install its dependencies first.
The following subsections describe how to setup the environment.
CAUTION: There are various ways to setup the environment. What we describe
here is just one alternative.
### Install k2
Please refer to [k2's installation documentation][k2-install] to install k2.
If you have any issues about installing k2, please open an issue at
<https://github.com/k2-fsa/k2/issues>.
The following shows the minimal commands needed to install k2 from source:
```bash
mkdir $HOME/open-source
cd $HOME/open-source
git clone https://github.com/k2-fsa/k2.git
cd k2
mkdir build_release
cd build_release
cmake -DCMAKE_BUILD_TYPE=Release
make -j _k2
export PYTHONPATH=$HOME/open-source/k2/k2/python:$PYTHONPATH
export PYTHONPATH=$HOME/open-source/k2/build_release/lib:$PYTHONPATH
```
To check that k2 is installed successfully, please run
```
python3 -m k2.version
```
It should show the information about the environment in which
k2 was built.
### Install lhotse
Please refer to [lhotse's installation documentation][lhotse-install] to install
lhotse.
### Install icefall
`icefall` is a set of Python scripts. What you need to do is just to set
the environment variable `PYTHONPATH`:
```
cd $HOME/open-source
git clone https://github.com/k2-fsa/icefall
cd icefall
pip install -r requirements.txt
export PYTHONPATH=$HOME/open-source/icefall:$PYTHONPATHON
```
To verify `icefall` was installed successfully, you can run:
```
python3 -c "import icefall; print(icefall.__file__)"
```
It should print the path to `icefall`.
## Run recipes
Currently only the LibriSpeech recipe is provided. Please
follow the [egs/librispeech/ASR/README.md][LibriSpeech] to run it.
[LibriSpeech]: egs/librispeech/ASR/README.md
[k2-install]: https://k2.readthedocs.io/en/latest/installation/index.html#
[k2]: https://github.com/k2-fsa/k2
[lhotse]: https://github.com/lhotse-speech/lhotse
[lhotse-install]: https://lhotse.readthedocs.io/en/latest/getting-started.html#installation

View File

@ -1,121 +1,64 @@
Run `./prepare.sh` to prepare the data.
## Data preparation
Run `./xxx_train.py` (to be added) to train a model.
## Conformer-CTC
Results of the pre-trained model from
`<https://huggingface.co/GuoLiyong/snowfall_bpe_model/tree/main/exp-duration-200-feat_batchnorm-bpe-lrfactor5.0-conformer-512-8-noam>`
are given below
### HLG - no LM rescoring
(output beam size is 8)
#### 1-best decoding
If you want to `./prepare.sh` to download everything for you,
you can just run
```
[test-clean-no_rescore] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore] %WER 7.03% [3682 / 52343, 220 ins, 1024 del, 2438 sub ]
./prepare.sh
```
#### n-best decoding
For n=100,
If you have pre-downloaded the LibriSpeech dataset, please
read `./prepare.sh` and modify it to point to the location
of your dataset so that it won't re-download it. After modification,
please run
```
[test-clean-no_rescore-100] %WER 3.15% [1656 / 52576, 127 ins, 377 del, 1152 sub ]
[test-other-no_rescore-100] %WER 7.14% [3737 / 52343, 275 ins, 1020 del, 2442 sub ]
./prepare.sh
```
For n=200,
The script `./prepare.sh` prepare features, lexicon, LMs, etc.
All generated files are saved in the folder `./data`.
HINT: `./prepare.sh` support options `--stage` and `--stop-stage`.
## TDNN-LSTM CTC training
The folder `tdnn_lstm_ctc` contains scripts for CTC training
with TDNN-LSTM models.
Pre-configured parameters for training and decoding are set in the function
`get_params()` within `tdnn_lstm_ctc/train.py`
and `tdnn_lstm_ctc/decode.py`.
Parameters that can be passed from commandlin can be found by
```
[test-clean-no_rescore-200] %WER 3.16% [1660 / 52576, 125 ins, 378 del, 1157 sub ]
[test-other-no_rescore-200] %WER 7.04% [3684 / 52343, 228 ins, 1012 del, 2444 sub ]
./tdnn_lstm_ctc/train.py --help
./tdnn_lstm_ctc/decode.py --help
```
### HLG - with LM rescoring
#### Whole lattice rescoring
If you have 4 GPUs on a machine and want to use GPU 0, 2, 3 for
mutli-GPU training, you can run
```
[test-clean-lm_scale_0.8] %WER 2.77% [1456 / 52576, 150 ins, 210 del, 1096 sub ]
[test-other-lm_scale_0.8] %WER 6.23% [3262 / 52343, 246 ins, 635 del, 2381 sub ]
export CUDA_VISIBLE_DEVICES="0,2,3"
./tdnn_lstm_ctc/train.py \
--master-port 12345 \
--world-size 3
```
WERs of different LM scales are:
If you want to decode by averaging checkpoints `epoch-8.pt`,
`epoch-9.pt` and `epoch-10.pt`, you can run
```
For test-clean, WER of different settings are:
lm_scale_0.8 2.77 best for test-clean
lm_scale_0.9 2.87
lm_scale_1.0 3.06
lm_scale_1.1 3.34
lm_scale_1.2 3.71
lm_scale_1.3 4.18
lm_scale_1.4 4.8
lm_scale_1.5 5.48
lm_scale_1.6 6.08
lm_scale_1.7 6.79
lm_scale_1.8 7.49
lm_scale_1.9 8.14
lm_scale_2.0 8.82
For test-other, WER of different settings are:
lm_scale_0.8 6.23 best for test-other
lm_scale_0.9 6.37
lm_scale_1.0 6.62
lm_scale_1.1 6.99
lm_scale_1.2 7.46
lm_scale_1.3 8.13
lm_scale_1.4 8.84
lm_scale_1.5 9.61
lm_scale_1.6 10.32
lm_scale_1.7 11.17
lm_scale_1.8 12.12
lm_scale_1.9 12.93
lm_scale_2.0 13.77
./tdnn_lstm_ctc/decode.py \
--epoch 10 \
--avg 3
```
#### n-best LM rescoring
## Conformer CTC training
n = 100
```
[test-clean-lm_scale_0.8] %WER 2.79% [1469 / 52576, 149 ins, 212 del, 1108 sub ]
[test-other-lm_scale_0.8] %WER 6.36% [3329 / 52343, 259 ins, 666 del, 2404 sub ]
```
WERs of different LM scales are:
```
For test-clean, WER of different settings are:
lm_scale_0.8 2.79 best for test-clean
lm_scale_0.9 2.89
lm_scale_1.0 3.03
lm_scale_1.1 3.28
lm_scale_1.2 3.52
lm_scale_1.3 3.78
lm_scale_1.4 4.04
lm_scale_1.5 4.24
lm_scale_1.6 4.45
lm_scale_1.7 4.58
lm_scale_1.8 4.7
lm_scale_1.9 4.8
lm_scale_2.0 4.92
For test-other, WER of different settings are:
lm_scale_0.8 6.36 best for test-other
lm_scale_0.9 6.45
lm_scale_1.0 6.64
lm_scale_1.1 6.92
lm_scale_1.2 7.25
lm_scale_1.3 7.59
lm_scale_1.4 7.88
lm_scale_1.5 8.13
lm_scale_1.6 8.36
lm_scale_1.7 8.54
lm_scale_1.8 8.71
lm_scale_1.9 8.88
lm_scale_2.0 9.02
```
The folder `conformer-ctc` contains scripts for CTC training
with conformer models. The steps of running the training and
decoding are similar as `tdnn_lstm_ctc`.

View File

@ -16,6 +16,7 @@ import torch.nn as nn
from conformer import Conformer
from lhotse.utils import fix_random_seed
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.nn.utils import clip_grad_norm_
from torch.utils.tensorboard import SummaryWriter
from transformer import Noam
@ -114,7 +115,9 @@ def get_params() -> AttributeDict:
- log_interval: Print training loss if batch_idx % log_interval` is 0
- valid_interval: Run validation if batch_idx % valid_interval` is 0
- valid_interval: Run validation if batch_idx % valid_interval is 0
- reset_interval: Reset statistics if batch_idx % reset_interval is 0
- beam_size: It is used in k2.ctc_loss
@ -124,19 +127,20 @@ def get_params() -> AttributeDict:
"""
params = AttributeDict(
{
"exp_dir": Path("conformer_ctc/exp"),
"exp_dir": Path("conformer_ctc/exp_new"),
"lang_dir": Path("data/lang_bpe"),
"feature_dim": 80,
"weight_decay": 0.0,
"weight_decay": 1e-6,
"subsampling_factor": 4,
"start_epoch": 0,
"num_epochs": 50,
"num_epochs": 20,
"best_train_loss": float("inf"),
"best_valid_loss": float("inf"),
"best_train_epoch": -1,
"best_valid_epoch": -1,
"batch_idx_train": 0,
"log_interval": 10,
"reset_interval": 200,
"valid_interval": 3000,
"beam_size": 10,
"reduction": "sum",
@ -440,6 +444,8 @@ def train_one_epoch(
tot_att_loss = 0.0
tot_frames = 0.0 # sum of frames over all batches
params.tot_loss = 0.0
params.tot_frames = 0.0
for batch_idx, batch in enumerate(train_dl):
params.batch_idx_train += 1
batch_size = len(batch["supervisions"]["text"])
@ -457,6 +463,7 @@ def train_one_epoch(
optimizer.zero_grad()
loss.backward()
clip_grad_norm_(model.parameters(), 5.0, 2.0)
optimizer.step()
loss_cpu = loss.detach().cpu().item()
@ -468,6 +475,9 @@ def train_one_epoch(
tot_ctc_loss += ctc_loss_cpu
tot_att_loss += att_loss_cpu
params.tot_frames += params.train_frames
params.tot_loss += loss_cpu
tot_avg_loss = tot_loss / tot_frames
tot_avg_ctc_loss = tot_ctc_loss / tot_frames
tot_avg_att_loss = tot_att_loss / tot_frames
@ -516,6 +526,12 @@ def train_one_epoch(
tot_avg_loss,
params.batch_idx_train,
)
if batch_idx > 0 and batch_idx % params.reset_interval == 0:
tot_loss = 0.0 # sum of losses over all batches
tot_ctc_loss = 0.0
tot_att_loss = 0.0
tot_frames = 0.0 # sum of frames over all batches
if batch_idx > 0 and batch_idx % params.valid_interval == 0:
compute_validation_loss(
@ -551,7 +567,7 @@ def train_one_epoch(
params.batch_idx_train,
)
params.train_loss = tot_loss / tot_frames
params.train_loss = params.tot_loss / params.tot_frames
if params.train_loss < params.best_train_loss:
params.best_train_epoch = params.cur_epoch

View File

@ -4,12 +4,9 @@
import math
from typing import Dict, List, Optional, Tuple
import k2
import torch
import torch.nn as nn
from subsampling import Conv2dSubsampling, VggSubsampling
from icefall.utils import get_texts
from torch.nn.utils.rnn import pad_sequence
# Note: TorchScript requires Dict/List/etc. to be fully typed.
@ -274,9 +271,11 @@ class Transformer(nn.Module):
device
)
# TODO: Use eos_id as ignore_id.
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
# TODO: Use length information to create the decoder padding mask
# We set the first column to False since the first column in ys_in_pad
# contains sos_id, which is the same as eos_id in our current setting.
tgt_key_padding_mask[:, 0] = False
tgt = self.decoder_embed(ys_in_pad) # (N, T) -> (N, T, C)
tgt = self.decoder_pos(tgt)
@ -339,9 +338,11 @@ class Transformer(nn.Module):
device
)
# TODO: Use eos_id as ignore_id.
# tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad)
tgt_key_padding_mask = decoder_padding_mask(ys_in_pad, ignore_id=eos_id)
# TODO: Use length information to create the decoder padding mask
# We set the first column to False since the first column in ys_in_pad
# contains sos_id, which is the same as eos_id in our current setting.
tgt_key_padding_mask[:, 0] = False
tgt = self.decoder_embed(ys_in_pad) # (B, T) -> (B, T, F)
tgt = self.decoder_pos(tgt)

View File

@ -1,22 +1,2 @@
## (To be filled in)
It will contain:
- How to run
- WERs
```bash
cd $PWD/..
./prepare.sh
./tdnn_lstm_ctc/train.py
```
If you have 4 GPUs and want to use GPU 1 and GPU 3 for DDP training,
you can do the following:
```
export CUDA_VISIBLE_DEVICES="1,3"
./tdnn_lstm_ctc/train.py --world-size=2
```
Will add results later.

View File

@ -1,3 +1,4 @@
kaldilm
kaldialign
sentencepiece>=0.1.96
tensorboard