icefall

Archived

This repository has been archived on 2026-03-23. You can view files and clone it, but cannot push or open issues or pull requests.

Go to file

Fujimoto Seiji c1ce7ca9e3 Add first cut at ReazonSpeech recipe

This recipe is mostly based on egs/csj, but tweaked to the point that
can be run with ReazonSpeech corpus.

That being said, there are some big caveats:

 * Currently the model quality is not very good. Actually, it is very
   bad. I trained a model with 1000h corpus, and it resulted in >80%
   CER on JSUT.

 * The core issue seems that Zipformer is prone to ignore untterances
   as sielent segments. It often produces an empty hypothesis despite
   that the audio actually contains human voice.

 * This issue is already reported in the upstream and not fully
   resolved yet as of Dec 2023.

Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>

2023-12-18 16:12:11 +09:00

.github

a bilingual recipe similar to the multi-zh_hans (#1265 )

2023-11-26 10:04:15 +08:00

docker

Normalize dockerfile (#1400 )

2023-12-06 14:33:45 +08:00

docs

A TTS recipe VITS on VCTK dataset (#1380 )

2023-12-06 09:59:19 +08:00

egs

Add first cut at ReazonSpeech recipe

2023-12-18 16:12:11 +09:00

icefall

Add cumstomized score for hotwords (#1385 )

2023-11-18 18:47:55 +08:00

test

Fixes to incorporate with the latest Lhotse release (#1249 )

2023-09-13 12:39:49 +08:00

.flake8

Add a TTS recipe VITS on LJSpeech dataset (#1372 )

2023-11-29 21:28:38 +08:00

.git-blame-ignore-revs

add revision commit to git blame ignore

2022-11-17 14:18:50 -05:00

.gitignore

Support CTC decoding on CPU using OpenFst and kaldi decoders. (#1244 )

2023-09-26 16:36:19 +08:00

.pre-commit-config.yaml

Update pre-commit isort package to v5.11.5 (#1095 )

2023-05-24 19:57:37 +08:00

contributing.md

Enhancing the contributing.md file (#1351 )

2023-10-30 09:07:42 +08:00

LICENSE

Use standard apache 2.0 license (#919 )

2023-02-22 11:15:58 +08:00

pyproject.toml

Add a TTS recipe VITS on LJSpeech dataset (#1372 )

2023-11-29 21:28:38 +08:00

README.md

updated broken link in read.me file (#1342 )

2023-10-27 13:36:15 +08:00

requirements-ci.txt

Libriheavy recipe (zipformer) (#1261 )

2023-11-23 01:22:57 +08:00

requirements-tts.txt

A TTS recipe VITS on VCTK dataset (#1380 )

2023-12-06 09:59:19 +08:00

requirements.txt

A TTS recipe VITS on VCTK dataset (#1380 )

2023-12-06 09:59:19 +08:00

setup.py

apply black on all files

2022-11-17 09:42:17 -05:00

README.md

Introduction

icefall contains ASR recipes for various datasets using https://github.com/k2-fsa/k2.

You can use https://github.com/k2-fsa/sherpa to deploy models trained with icefall.

You can try pre-trained models from within your browser without the need to download or install anything by visiting https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition See https://k2-fsa.github.io/icefall/huggingface/spaces.html for more details.

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide the following recipes:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe:

LibriSpeech

Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.

We provide 5 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

	test-clean	test-other
WER	2.42	5.73

We provide a Colab notebook to run a pre-trained conformer CTC model:

TDNN LSTM CTC Model

The WER for this model is:

	test-clean	test-other
WER	6.59	17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

	test-clean	test-other
WER	3.07	7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model:

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

	test-clean	test-other
WER	2.56	6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model:

k2 pruned RNN-T

Encoder	Params	test-clean	test-other	epochs	devices
zipformer	65.5M	2.21	4.79	50	4 32G-V100
zipformer-small	23.2M	2.42	5.73	50	2 32G-V100
zipformer-large	148.4M	2.06	4.63	50	4 32G-V100
zipformer-large	148.4M	2.00	4.38	174	8 80G-A100

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech

	test-clean	test-other
WER	1.78	4.08

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

k2 pruned RNN-T + GigaSpeech + CommonVoice

	test-clean	test-other
WER	1.90	3.98

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

GigaSpeech

We provide three models for this recipe:

Conformer CTC

	Dev	Test
WER	10.47	10.58

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	10.51	10.73
fast beam search	10.50	10.69
modified beam search	10.40	10.51

Transducer: Zipformer encoder + Embedding decoder

	Dev	Test
greedy search	10.31	10.50
fast beam search	10.26	10.48
modified beam search	10.25	10.38

Aishell

We provide three models for this recipe: conformer CTC model, TDNN LSTM CTC model, and Transducer Stateless Model,

Conformer CTC Model

The best CER we currently have is:

	test
CER	4.26

TDNN LSTM CTC Model

The CER for this model is:

	test
CER	10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer Stateless Model

The best CER we currently have is:

	test
CER	4.38

We provide a Colab notebook to run a pre-trained TransducerStateless model:

Aishell2

We provide one model for this recipe: Transducer Stateless Model.

Transducer Stateless Model

The best WER we currently have is:

	dev-ios	test-ios
WER	5.32	5.56

Aishell4

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)

The best CER we currently have is:

	test
CER	29.08

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

	TEST
PER	19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TDNN LiGRU CTC Model

The PER for this model is:

	TEST
PER	17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.91	6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model:

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.77	6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Aidatatang_200zh

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	5.53	6.59
fast beam search	5.30	6.34
modified beam search	5.27	6.33

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

WenetSpeech

We provide some models for this recipe: Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss and Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)

	Dev	Test-Net	Test-Meeting
greedy search	7.80	8.75	13.49
modified beam search	7.76	8.71	13.41
fast beam search	7.94	8.74	13.80

Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

Streaming:

	Dev	Test-Net	Test-Meeting
greedy_search	8.78	10.12	16.16
modified_beam_search	8.53	9.95	15.81
fast_beam_search	9.01	10.47	16.28

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model:

Alimeeting

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)

	Eval	Test-Net
greedy search	31.77	34.66
fast beam search	31.39	33.02
modified beam search	30.38	34.25

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

TAL_CSASR

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):

decoding-method	dev	dev_zh	dev_en	test	test_zh	test_en
greedy_search	7.30	6.48	19.19	7.39	6.66	19.13
modified_beam_search	7.15	6.35	18.95	7.22	6.50	18.70
fast_beam_search	7.18	6.39	18.90	7.27	6.55	18.77

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: