icefall

Go to file

Fujimoto Seiji 16c02cfcc2 Merge latest commit 'b0f70c9' on k2-fsa/icefall

I needed this in order to pull unreleased fixes. The last tagged version
was too old (dated back in Jul 2023), and not compatible with recent
lhotse releases.

2023-12-18 15:08:41 +09:00

.github

a bilingual recipe similar to the multi-zh_hans (#1265 )

2023-11-26 10:04:15 +08:00

docker

Normalize dockerfile (#1400 )

2023-12-06 14:33:45 +08:00

docs

A TTS recipe VITS on VCTK dataset (#1380 )

2023-12-06 09:59:19 +08:00

egs

Fix torch.jit.script() export for pruned_transducer_stateless2 (#1410 )

2023-12-10 11:38:39 +08:00

icefall

Add cumstomized score for hotwords (#1385 )

2023-11-18 18:47:55 +08:00

test

Fixes to incorporate with the latest Lhotse release (#1249 )

2023-09-13 12:39:49 +08:00

.flake8

Add a TTS recipe VITS on LJSpeech dataset (#1372 )

2023-11-29 21:28:38 +08:00

.git-blame-ignore-revs

add revision commit to git blame ignore

2022-11-17 14:18:50 -05:00

.gitignore

Support CTC decoding on CPU using OpenFst and kaldi decoders. (#1244 )

2023-09-26 16:36:19 +08:00

.pre-commit-config.yaml

Update pre-commit isort package to v5.11.5 (#1095 )

2023-05-24 19:57:37 +08:00

contributing.md

Enhancing the contributing.md file (#1351 )

2023-10-30 09:07:42 +08:00

LICENSE

Use standard apache 2.0 license (#919 )

2023-02-22 11:15:58 +08:00

pyproject.toml

Add a TTS recipe VITS on LJSpeech dataset (#1372 )

2023-11-29 21:28:38 +08:00

README.md

updated broken link in read.me file (#1342 )

2023-10-27 13:36:15 +08:00

requirements-ci.txt

Libriheavy recipe (zipformer) (#1261 )

2023-11-23 01:22:57 +08:00

requirements-tts.txt

A TTS recipe VITS on VCTK dataset (#1380 )

2023-12-06 09:59:19 +08:00

requirements.txt

A TTS recipe VITS on VCTK dataset (#1380 )

2023-12-06 09:59:19 +08:00

setup.py

apply black on all files

2022-11-17 09:42:17 -05:00

README.md

Introduction

icefall contains ASR recipes for various datasets using https://github.com/k2-fsa/k2.

You can use https://github.com/k2-fsa/sherpa to deploy models trained with icefall.

You can try pre-trained models from within your browser without the need to download or install anything by visiting https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition See https://k2-fsa.github.io/icefall/huggingface/spaces.html for more details.

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide the following recipes:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe:

LibriSpeech

Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.

We provide 5 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

	test-clean	test-other
WER	2.42	5.73

We provide a Colab notebook to run a pre-trained conformer CTC model:

TDNN LSTM CTC Model

The WER for this model is:

	test-clean	test-other
WER	6.59	17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

	test-clean	test-other
WER	3.07	7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model:

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

	test-clean	test-other
WER	2.56	6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model:

k2 pruned RNN-T

Encoder	Params	test-clean	test-other	epochs	devices
zipformer	65.5M	2.21	4.79	50	4 32G-V100
zipformer-small	23.2M	2.42	5.73	50	2 32G-V100
zipformer-large	148.4M	2.06	4.63	50	4 32G-V100
zipformer-large	148.4M	2.00	4.38	174	8 80G-A100

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech

	test-clean	test-other
WER	1.78	4.08

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech + CommonVoice

	test-clean	test-other
WER	1.90	3.98

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

GigaSpeech

We provide three models for this recipe:

Conformer CTC

	Dev	Test
WER	10.47	10.58

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	10.51	10.73
fast beam search	10.50	10.69
modified beam search	10.40	10.51

Transducer: Zipformer encoder + Embedding decoder

	Dev	Test
greedy search	10.31	10.50
fast beam search	10.26	10.48
modified beam search	10.25	10.38

Aishell

We provide three models for this recipe: conformer CTC model, TDNN LSTM CTC model, and Transducer Stateless Model,

Conformer CTC Model

The best CER we currently have is:

	test
CER	4.26

TDNN LSTM CTC Model

The CER for this model is:

	test
CER	10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer Stateless Model

The best CER we currently have is:

	test
CER	4.38

We provide a Colab notebook to run a pre-trained TransducerStateless model:

Aishell2

We provide one model for this recipe: Transducer Stateless Model.

Transducer Stateless Model

The best WER we currently have is:

	dev-ios	test-ios
WER	5.32	5.56

Aishell4

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)

The best CER we currently have is:

	test
CER	29.08

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

	TEST
PER	19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TDNN LiGRU CTC Model

The PER for this model is:

	TEST
PER	17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.91	6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model:

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.77	6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Aidatatang_200zh

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	5.53	6.59
fast beam search	5.30	6.34
modified beam search	5.27	6.33

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

WenetSpeech

We provide some models for this recipe: Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss and Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)

	Dev	Test-Net	Test-Meeting
greedy search	7.80	8.75	13.49
modified beam search	7.76	8.71	13.41
fast beam search	7.94	8.74	13.80

Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

Streaming:

	Dev	Test-Net	Test-Meeting
greedy_search	8.78	10.12	16.16
modified_beam_search	8.53	9.95	15.81
fast_beam_search	9.01	10.47	16.28

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model:

Alimeeting

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)

	Eval	Test-Net
greedy search	31.77	34.66
fast beam search	31.39	33.02
modified beam search	30.38	34.25

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

TAL_CSASR

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):

decoding-method	dev	dev_zh	dev_en	test	test_zh	test_en
greedy_search	7.30	6.48	19.19	7.39	6.66	19.13
modified_beam_search	7.15	6.35	18.95	7.22	6.50	18.70
fast_beam_search	7.18	6.39	18.90	7.27	6.55	18.77

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: