icefall/README.md at 0e57b30495fd802b7794b5c4430e1dd927c9a4a8

[Ready to merge] Pruned Transducer Stateless2 for WenetSpeech (char-based) (#349 )

* add char-based pruned-rnnt2 for wenetspeech

* style check

* style check

* change for export.py

* do some changes

* do some changes

* a small change for .flake8

* solve the conflicts

2022-05-23 17:13:01 +08:00

11 KiB

Raw Blame History

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide 6 recipes at present:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We do provide a Colab notebook for this recipe.

LibriSpeech

Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md for the latest results.

We provide 4 models for this recipe:

Conformer CTC Model

The best WER we currently have is:

	test-clean	test-other
WER	2.42	5.73

We provide a Colab notebook to run a pre-trained conformer CTC model:

TDNN LSTM CTC Model

The WER for this model is:

	test-clean	test-other
WER	6.59	17.69

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

Transducer: Conformer encoder + LSTM decoder

Using Conformer as encoder and LSTM as decoder.

The best WER with greedy search is:

	test-clean	test-other
WER	3.07	7.51

We provide a Colab notebook to run a pre-trained RNN-T conformer model:

Transducer: Conformer encoder + Embedding decoder

Using Conformer as encoder. The decoder consists of 1 embedding layer and 1 convolutional layer.

The best WER using modified beam search with beam size 4 is:

	test-clean	test-other
WER	2.56	6.27

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model:

k2 pruned RNN-T

	test-clean	test-other
WER	2.57	5.95

k2 pruned RNN-T + GigaSpeech

	test-clean	test-other
WER	2.00	4.63

Aishell

We provide two models for this recipe: conformer CTC model and TDNN LSTM CTC model.

Conformer CTC Model

The best CER we currently have is:

	test
CER	4.26

We provide a Colab notebook to run a pre-trained conformer CTC model:

Transducer Stateless Model

The best CER we currently have is:

	test
CER	4.68

We provide a Colab notebook to run a pre-trained TransducerStateless model:

TDNN LSTM CTC Model

The CER for this model is:

	test
CER	10.16

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TIMIT

We provide two models for this recipe: TDNN LSTM CTC model and TDNN LiGRU CTC model.

TDNN LSTM CTC Model

The best PER we currently have is:

	TEST
PER	19.71%

We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:

TDNN LiGRU CTC Model

The PER for this model is:

	TEST
PER	17.66%

We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:

TED-LIUM3

We provide two models for this recipe: Transducer Stateless: Conformer encoder + Embedding decoder and Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Transducer Stateless: Conformer encoder + Embedding decoder

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.91	6.33

Note: No auxiliary losses are used in the training and no LMs are used in the decoding.

We provide a Colab notebook to run a pre-trained Transducer Stateless model:

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

The best WER using modified beam search with beam size 4 is:

	dev	test
WER	6.77	6.14

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

GigaSpeech

We provide two models for this recipe: Conformer CTC model and Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Conformer CTC

	Dev	Test
WER	10.47	10.58

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	10.51	10.73
fast beam search	10.50	10.69
modified beam search	10.40	10.51

Aidatatang_200zh

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

	Dev	Test
greedy search	5.53	6.59
fast beam search	5.30	6.34
modified beam search	5.27	6.33

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

WenetSpeech

We provide one model for this recipe: Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss.

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

	Dev	Test-Net	Test-Meeting
greedy search	7.80	8.75	13.49
fast beam search	7.94	8.74	13.80
modified beam search	7.76	8.71	13.41

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model:

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see:

11 KiB Raw Blame History

Installation

Recipes

yesno

LibriSpeech

Conformer CTC Model

TDNN LSTM CTC Model

Transducer: Conformer encoder + LSTM decoder

Transducer: Conformer encoder + Embedding decoder

k2 pruned RNN-T

k2 pruned RNN-T + GigaSpeech

Aishell

Conformer CTC Model

Transducer Stateless Model

TDNN LSTM CTC Model

TIMIT

TDNN LSTM CTC Model

TDNN LiGRU CTC Model

TED-LIUM3

Transducer Stateless: Conformer encoder + Embedding decoder

Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

GigaSpeech

Conformer CTC

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

Aidatatang_200zh

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss

WenetSpeech

Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)

Deployment with C++

11 KiB

Raw Blame History