icefall

Archived

This repository has been archived on 2026-03-23. You can view files and clone it, but cannot push or open issues or pull requests.

Go to file

Wang, Guanbo 5fe58de43c

GigaSpeech recipe (#120 )

* initial commit

* support download, data prep, and fbank

* on-the-fly feature extraction by default

* support BPE based lang

* support HLG for BPE

* small fix

* small fix

* chunked feature extraction by default

* Compute features for GigaSpeech by splitting the manifest.

* Fixes after review.

* Split manifests into 2000 pieces.

* set audio duration mismatch tolerance to 0.01

* small fix

* add conformer training recipe

* Add conformer.py without pre-commit checking

* lazy loading and use SingleCutSampler

* DynamicBucketingSampler

* use KaldifeatFbank to compute fbank for musan

* use pretrained language model and lexicon

* use 3gram to decode, 4gram to rescore

* Add decode.py

* Update .flake8

* Delete compute_fbank_gigaspeech.py

* Use BucketingSampler for valid and test dataloader

* Update params in train.py

* Use bpe_500

* update params in decode.py

* Decrease num_paths while CUDA OOM

* Added README

* Update RESULTS

* black

* Decrease num_paths while CUDA OOM

* Decode with post-processing

* Update results

* Remove lazy_load option

* Use default `storage_type`

* Keep the original tolerance

* Use split-lazy

* black

* Update pretrained model

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

2022-04-14 16:07:22 +08:00

.github/workflows

Fix CI. (#280 )

2022-03-31 10:43:02 +08:00

docker

add a docker file for some users (#87 )

2021-10-19 13:00:59 +08:00

docs

Update doc to clarify the installation order of dependencies. (#279 )

2022-03-30 18:50:54 +08:00

egs

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

icefall

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

test

RNN-T training for yesno. (#141 )

2021-12-07 21:44:37 +08:00

.flake8

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

.gitignore

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

.pre-commit-config.yaml

Modify init (#301 )

2022-04-10 23:29:28 +08:00

contributing.md

Add style check tools.

2021-07-15 17:36:48 +08:00

LICENSE

Add style check tools.

2021-07-15 17:36:48 +08:00

pyproject.toml

Modify init (#301 )

2022-04-10 23:29:28 +08:00

README.md

Update results for tedlium3 pruned RNN-T (#307 )

2022-04-11 22:19:26 +08:00

requirements-ci.txt

Support modified beam search in batch mode. (#264 )

2022-03-22 15:14:04 +08:00

requirements.txt

Add modified beam search for pruned rnn-t. (#248 )

2022-03-12 16:16:55 +08:00

setup.py

setup.py (#64 )

2021-10-01 16:43:08 +08:00

README.md

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide four recipes at present:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We do provide a Colab notebook for this recipe.

LibriSpeech

We provide 4 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

	test-clean	test-other
WER	2.42	5.73

We provide a Colab notebook to run a pre-trained conformer CTC model:

TDNN LSTM CTC Model

The WER for this model is:

	test-clean	test-other
WER	6.59	17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

	test-clean	test-other
WER	3.07	7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model:

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

	test-clean	test-other
WER	2.56	6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model:

Aishell

We provide two models for this recipe: conformer CTC model and TDNN LSTM CTC model.

Conformer CTC Model

The best CER we currently have is:

	test
CER	4.26

We provide a Colab notebook to run a pre-trained conformer CTC model:

Transducer Stateless Model

The best CER we currently have is:

	test
CER	4.68

We provide a Colab notebook to run a pre-trained TransducerStateless model:

TDNN LSTM CTC Model

The CER for this model is:

	test
CER	10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

	TEST
PER	19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TDNN LiGRU CTC Model

The PER for this model is:

	TEST
PER	17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.91	6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model:

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.77	6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: