icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-08-09 18:12:19 +00:00

Go to file

Emformer with conv module and scaling mechanism (#389 )

* copy files from existing branch

* add rule in .flake8

* monir style fix

* fix typos

* add tail padding

* refactor, use fixed-length cache for batch decoding

* copy from streaming branch

* copy from streaming branch

* modify emformer states stack and unstack, streaming decoding, to be continued

* refactor Stream class

* remane streaming_feature_extractor.py

* refactor streaming decoding

* test states stack and unstack

* fix bugs, no grad, and num_proccessed_frames

* add modify_beam_search, fast_beam_search

* support torch.jit.export

* use torch.div

* copy from pruned_transducer_stateless4

* modify export.py

* add author info

* delete other test functions

* minor fix

* modify doc

* fix style

* minor fix doc

* minor fix

* minor fix doc

* update RESULTS.md

* fix typo

* add info

* fix typo

* fix doc

* add test function for conv module, and minor fix.

* add copyright info

* minor change of test_emformer.py

* fix doc of stack and unstack, test case with batch_size=1

* update README.md

2022-06-13 15:09:17 +08:00

.github

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

2022-06-06 10:19:16 +08:00

docker

add a docker file for some users (#87 )

2021-10-19 13:00:59 +08:00

docs

Update doc to clarify the installation order of dependencies. (#279 )

2022-03-30 18:50:54 +08:00

egs

Emformer with conv module and scaling mechanism (#389 )

2022-06-13 15:09:17 +08:00

icefall

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

2022-06-06 10:19:16 +08:00

test

RNN-T training for yesno. (#141 )

2021-12-07 21:44:37 +08:00

.flake8

Emformer with conv module and scaling mechanism (#389 )

2022-06-13 15:09:17 +08:00

.gitignore

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

.pre-commit-config.yaml

Modify init (#301 )

2022-04-10 23:29:28 +08:00

contributing.md

Add style check tools.

2021-07-15 17:36:48 +08:00

LICENSE

Add style check tools.

2021-07-15 17:36:48 +08:00

pyproject.toml

Modify init (#301 )

2022-04-10 23:29:28 +08:00

README.md

Add links to sherpa. (#417 )

2022-06-10 12:19:18 +08:00

requirements-ci.txt

Modified conformer with multi datasets (#312 )

2022-04-29 15:40:30 +08:00

requirements.txt

Add modified beam search for pruned rnn-t. (#248 )

2022-03-12 16:16:55 +08:00

setup.py

setup.py (#64 )

2021-10-01 16:43:08 +08:00

README.md

Introduction

icefall contains ASR recipes for various datasets using https://github.com/k2-fsa/k2.

You can use https://github.com/k2-fsa/sherpa to deploy models trained with icefall.

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide the following recipes:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We do provide a Colab notebook for this recipe.

LibriSpeech

Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.

We provide 4 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

	test-clean	test-other
WER	2.42	5.73

We provide a Colab notebook to run a pre-trained conformer CTC model:

TDNN LSTM CTC Model

The WER for this model is:

	test-clean	test-other
WER	6.59	17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

	test-clean	test-other
WER	3.07	7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model:

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

	test-clean	test-other
WER	2.56	6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model:

k2 pruned RNN-T

	test-clean	test-other
WER	2.57	5.95

k2 pruned RNN-T + GigaSpeech

	test-clean	test-other
WER	2.00	4.63

Aishell

We provide two models for this recipe: conformer CTC model and TDNN LSTM CTC model.

Conformer CTC Model

The best CER we currently have is:

	test
CER	4.26

We provide a Colab notebook to run a pre-trained conformer CTC model: ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg

Transducer Stateless Model

The best CER we currently have is:

	test
CER	4.68

We provide a Colab notebook to run a pre-trained TransducerStateless model:

TDNN LSTM CTC Model

The CER for this model is:

	test
CER	10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

	TEST
PER	19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TDNN LiGRU CTC Model

The PER for this model is:

	TEST
PER	17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.91	6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model:

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.77	6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

GigaSpeech

We provide two models for this recipe: Conformer CTC model and Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Conformer CTC

	Dev	Test
WER	10.47	10.58

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	10.51	10.73
fast beam search	10.50	10.69
modified beam search	10.40	10.51

Aidatatang_200zh

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	5.53	6.59
fast beam search	5.30	6.34
modified beam search	5.27	6.33

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

WenetSpeech

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

	Dev	Test-Net	Test-Meeting
greedy search	7.80	8.75	13.49
fast beam search	7.94	8.74	13.80
modified beam search	7.76	8.71	13.41

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Alimeeting

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)

	Eval	Test-Net
greedy search	31.77	34.66
fast beam search	31.39	33.02
modified beam search	30.38	34.25

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see:

)