Reworked README.md (#1470)

* Rework README.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
This commit is contained in:
zr_jin 2024-01-23 16:26:24 +08:00 committed by GitHub
parent 5dfc3ed7f9
commit ebe97a07b0
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

439
README.md
View File

@ -2,46 +2,83 @@
<img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168> <img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168>
</div> </div>
## Introduction # Introduction
icefall contains ASR recipes for various datasets The icefall peoject contains speech related recipes for various datasets
using <https://github.com/k2-fsa/k2>. using [k2-fsa](https://github.com/k2-fsa/k2) and [lhotse](https://github.com/lhotse-speech/lhotse).
You can use <https://github.com/k2-fsa/sherpa> to deploy models You can use [sherpa](https://github.com/k2-fsa/sherpa), [sherpa-ncnn](https://github.com/k2-fsa/sherpa-ncnn) or [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for deployment with models
trained with icefall. in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
You can try pre-trained models from within your browser without the need You can try pre-trained models from within your browser without the need
to download or install anything by visiting <https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition> to download or install anything by visiting this [huggingface space](https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition).
See <https://k2-fsa.github.io/icefall/huggingface/spaces.html> for more details. Please refer to [document](https://k2-fsa.github.io/icefall/huggingface/spaces.html) for more details.
## Installation # Installation
Please refer to <https://icefall.readthedocs.io/en/latest/installation/index.html> Please refer to [document](https://icefall.readthedocs.io/en/latest/installation/index.html)
for installation. for installation.
## Recipes # Recipes
Please refer to <https://icefall.readthedocs.io/en/latest/recipes/index.html> Please refer to [document](https://icefall.readthedocs.io/en/latest/recipes/index.html)
for more information. for more details.
We provide the following recipes: ## ASR: Automatic Speech Recognition
### Supported Datasets
- [yesno][yesno] - [yesno][yesno]
- [LibriSpeech][librispeech]
- [GigaSpeech][gigaspeech] - [Aidatatang_200zh][aidatatang_200zh]
- [AMI][ami]
- [Aishell][aishell] - [Aishell][aishell]
- [Aishell2][aishell2] - [Aishell2][aishell2]
- [Aishell4][aishell4] - [Aishell4][aishell4]
- [Alimeeting][alimeeting]
- [AMI][ami]
- [CommonVoice][commonvoice]
- [Corpus of Spontaneous Japanese][csj]
- [GigaSpeech][gigaspeech]
- [LibriCSS][libricss]
- [LibriSpeech][librispeech]
- [Libriheavy][libriheavy]
- [Multi-Dialect Broadcast News Arabic Speech Recognition][mgb2]
- [PeopleSpeech][peoplespeech]
- [SPGISpeech][spgispeech]
- [Switchboard][swbd]
- [TIMIT][timit] - [TIMIT][timit]
- [TED-LIUM3][tedlium3] - [TED-LIUM3][tedlium3]
- [Aidatatang_200zh][aidatatang_200zh]
- [WenetSpeech][wenetspeech]
- [Alimeeting][alimeeting]
- [Switchboard][swbd]
- [TAL_CSASR][tal_csasr] - [TAL_CSASR][tal_csasr]
- [Voxpopuli][voxpopuli]
- [XBMU-AMDO31][xbmu-amdo31]
- [WenetSpeech][wenetspeech]
More datasets will be added in the future.
### yesno ### Supported Models
The [LibriSpeech][librispeech] recipe supports the most comprehensive set of models, you are welcome to try them out.
#### CTC
- TDNN LSTM CTC
- Conformer CTC
- Zipformer CTC
#### MMI
- Conformer MMI
- Zipformer MMI
#### Transducer
- Conformer-based Encoder
- LSTM-based Encoder
- Zipformer-based Encoder
- LSTM-based Predictor
- [Stateless Predictor](https://research.google/pubs/rnn-transducer-with-stateless-prediction-network/)
If you are willing to contribute to icefall, please refer to [contributing](https://icefall.readthedocs.io/en/latest/contributing/index.html) for more details.
We would like to highlight the performance of some of the recipes here.
### [yesno][yesno]
This is the simplest ASR recipe in `icefall` and can be run on CPU. This is the simplest ASR recipe in `icefall` and can be run on CPU.
Training takes less than 30 seconds and gives you the following WER: Training takes less than 30 seconds and gives you the following WER:
@ -52,350 +89,264 @@ Training takes less than 30 seconds and gives you the following WER:
We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
### LibriSpeech ### [LibriSpeech][librispeech]
Please see <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md> Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)
for the **latest** results. for the **latest** results.
We provide 5 models for this recipe: #### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc)
- [conformer CTC model][LibriSpeech_conformer_ctc]
- [TDNN LSTM CTC model][LibriSpeech_tdnn_lstm_ctc]
- [Transducer: Conformer encoder + LSTM decoder][LibriSpeech_transducer]
- [Transducer: Conformer encoder + Embedding decoder][LibriSpeech_transducer_stateless]
- [Transducer: Zipformer encoder + Embedding decoder][LibriSpeech_zipformer]
#### Conformer CTC Model
The best WER we currently have is:
| | test-clean | test-other | | | test-clean | test-other |
|-----|------------|------------| |-----|------------|------------|
| WER | 2.42 | 5.73 | | WER | 2.42 | 5.73 |
We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
#### TDNN LSTM CTC Model #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc)
The WER for this model is:
| | test-clean | test-other | | | test-clean | test-other |
|-----|------------|------------| |-----|------------|------------|
| WER | 6.59 | 17.69 | | WER | 6.59 | 17.69 |
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
#### Transducer: Conformer encoder + LSTM decoder #### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
Using Conformer as encoder and LSTM as decoder. | | test-clean | test-other |
|---------------|------------|------------|
| greedy search | 3.07 | 7.51 |
The best WER with greedy search is: We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
| | test-clean | test-other | #### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
|-----|------------|------------|
| WER | 3.07 | 7.51 |
We provide a Colab notebook to run a pre-trained RNN-T conformer model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing) | | test-clean | test-other |
|---------------------------------------|------------|------------|
#### Transducer: Conformer encoder + Embedding decoder | modified_beam_search (`beam_size=4`) | 2.56 | 6.27 |
Using Conformer as encoder. The decoder consists of 1 embedding layer
and 1 convolutional layer.
The best WER using modified beam search with beam size 4 is:
| | test-clean | test-other |
|-----|------------|------------|
| WER | 2.56 | 6.27 |
Note: No auxiliary losses are used in the training and no LMs are used
in the decoding.
We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
#### k2 pruned RNN-T We provide a Colab notebook to run test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer)
WER (modified_beam_search `beam_size=4` unless further stated)
1. LibriSpeech-960hr
| Encoder | Params | test-clean | test-other | epochs | devices | | Encoder | Params | test-clean | test-other | epochs | devices |
|-----------------|--------|------------|------------|---------|------------| |-----------------|--------|------------|------------|---------|------------|
| zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 | | Zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
| zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 | | Zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
| zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 | | Zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
| zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 | | Zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
Note: No auxiliary losses are used in the training and no LMs are used 2. LibriSpeech-960hr + GigaSpeech
in the decoding.
#### k2 pruned RNN-T + GigaSpeech | Encoder | Params | test-clean | test-other |
|-----------------|--------|------------|------------|
| | test-clean | test-other | | Zipformer | 65.5M | 1.78 | 4.08 |
|-----|------------|------------|
| WER | 1.78 | 4.08 |
Note: No auxiliary losses are used in the training and no LMs are used
in the decoding.
#### k2 pruned RNN-T + GigaSpeech + CommonVoice
| | test-clean | test-other |
|-----|------------|------------|
| WER | 1.90 | 3.98 |
Note: No auxiliary losses are used in the training and no LMs are used
in the decoding.
### GigaSpeech 3. LibriSpeech-960hr + GigaSpeech + CommonVoice
We provide three models for this recipe: | Encoder | Params | test-clean | test-other |
|-----------------|--------|------------|------------|
| Zipformer | 65.5M | 1.90 | 3.98 |
- [Conformer CTC model][GigaSpeech_conformer_ctc]
- [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
- [Transducer: Zipformer encoder + Embedding decoder][GigaSpeech_zipformer]
#### Conformer CTC ### [GigaSpeech][gigaspeech]
#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/conformer_ctc)
| | Dev | Test | | | Dev | Test |
|-----|-------|-------| |-----|-------|-------|
| WER | 10.47 | 10.58 | | WER | 10.47 | 10.58 |
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/pruned_transducer_stateless2)
Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
| | Dev | Test | | | Dev | Test |
|----------------------|-------|-------| |----------------------|-------|-------|
| greedy search | 10.51 | 10.73 | | greedy_search | 10.51 | 10.73 |
| fast beam search | 10.50 | 10.69 | | fast_beam_search | 10.50 | 10.69 |
| modified beam search | 10.40 | 10.51 | | modified_beam_search | 10.40 | 10.51 |
#### Transducer: Zipformer encoder + Embedding decoder #### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/zipformer)
| | Dev | Test | | | Dev | Test |
|----------------------|-------|-------| |----------------------|-------|-------|
| greedy search | 10.31 | 10.50 | | greedy_search | 10.31 | 10.50 |
| fast beam search | 10.26 | 10.48 | | fast_beam_search | 10.26 | 10.48 |
| modified beam search | 10.25 | 10.38 | | modified_beam_search | 10.25 | 10.38 |
### Aishell ### [Aishell][aishell]
We provide three models for this recipe: [conformer CTC model][Aishell_conformer_ctc], #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/tdnn_lstm_ctc)
[TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc], and [Transducer Stateless Model][Aishell_pruned_transducer_stateless7],
#### Conformer CTC Model
The best CER we currently have is:
| | test |
|-----|------|
| CER | 4.26 |
#### TDNN LSTM CTC Model
The CER for this model is:
| | test | | | test |
|-----|-------| |-----|-------|
| CER | 10.16 | | CER | 10.16 |
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
#### Transducer Stateless Model #### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/transducer_stateless)
The best CER we currently have is:
| | test | | | test |
|-----|------| |-----|------|
| CER | 4.38 | | CER | 4.38 |
We provide a Colab notebook to run a pre-trained TransducerStateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/zipformer)
WER (modified_beam_search `beam_size=4`)
| Encoder | Params | dev | test | epochs |
|-----------------|--------|-----|------|---------|
| Zipformer | 73.4M | 4.13| 4.40 | 55 |
| Zipformer-small | 30.2M | 4.40| 4.67 | 55 |
| Zipformer-large | 157.3M | 4.03| 4.28 | 56 |
### Aishell2 ### [Aishell4][aishell4]
We provide one model for this recipe: [Transducer Stateless Model][Aishell2_pruned_transducer_stateless5]. #### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell4/ASR/pruned_transducer_stateless5)
#### Transducer Stateless Model
The best WER we currently have is:
| | dev-ios | test-ios |
|-----|------------|------------|
| WER | 5.32 | 5.56 |
### Aishell4
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
The best CER we currently have is:
1 Trained with all subsets:
| | test | | | test |
|-----|------------| |-----|------------|
| CER | 29.08 | | CER | 29.08 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
### TIMIT ### [TIMIT][timit]
We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc] #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_lstm_ctc)
and [TDNN LiGRU CTC model][TIMIT_tdnn_ligru_ctc].
#### TDNN LSTM CTC Model | |TEST|
|---|----|
The best PER we currently have is:
||TEST|
|--|--|
|PER| 19.71% | |PER| 19.71% |
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
#### TDNN LiGRU CTC Model #### [TDNN LiGRU CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_ligru_ctc)
The PER for this model is: | |TEST|
|---|----|
||TEST|
|--|--|
|PER| 17.66% | |PER| 17.66% |
We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
### TED-LIUM3 ### [TED-LIUM3][tedlium3]
We provide two models for this recipe: [Transducer Stateless: Conformer encoder + Embedding decoder][TED-LIUM3_transducer_stateless] and [Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TED-LIUM3_pruned_transducer_stateless]. #### [Transducer (Conformer Encoder + Embedding Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless)
#### Transducer Stateless: Conformer encoder + Embedding decoder | | dev | test |
|--------------------------------------|-------|--------|
The best WER using modified beam search with beam size 4 is: | modified_beam_search (`beam_size=4`) | 6.91 | 6.33 |
| | dev | test |
|-----|-------|--------|
| WER | 6.91 | 6.33 |
Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
We provide a Colab notebook to run a pre-trained Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
#### Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
The best WER using modified beam search with beam size 4 is:
| | dev | test |
|-----|-------|--------|
| WER | 6.77 | 6.14 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
### Aidatatang_200zh We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aidatatang_200zh_pruned_transducer_stateless2]. #### [Transducer (pruned_transducer_stateless)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/pruned_transducer_stateless)
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss | | dev | test |
|--------------------------------------|-------|--------|
| modified_beam_search (`beam_size=4`) | 6.77 | 6.14 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
### [Aidatatang_200zh][aidatatang_200zh]
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/aidatatang_200zh/ASR/pruned_transducer_stateless2)
| | Dev | Test | | | Dev | Test |
|----------------------|-------|-------| |----------------------|-------|-------|
| greedy search | 5.53 | 6.59 | | greedy_search | 5.53 | 6.59 |
| fast beam search | 5.30 | 6.34 | | fast_beam_search | 5.30 | 6.34 |
| modified beam search | 5.27 | 6.33 | | modified_beam_search | 5.27 | 6.33 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
### WenetSpeech ### [WenetSpeech][wenetspeech]
We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5]. #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2)
#### Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)
| | Dev | Test-Net | Test-Meeting | | | Dev | Test-Net | Test-Meeting |
|----------------------|-------|----------|--------------| |----------------------|-------|----------|--------------|
| greedy search | 7.80 | 8.75 | 13.49 | | greedy_search | 7.80 | 8.75 | 13.49 |
| modified beam search| 7.76 | 8.71 | 13.41 | | fast_beam_search | 7.94 | 8.74 | 13.80 |
| fast beam search | 7.94 | 8.74 | 13.80 | | modified_beam_search | 7.76 | 8.71 | 13.41 |
We provide a Colab notebook to run the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
#### [Transducer **Streaming** (pruned_transducer_stateless5) ](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5)
#### Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)
**Streaming**:
| | Dev | Test-Net | Test-Meeting | | | Dev | Test-Net | Test-Meeting |
|----------------------|-------|----------|--------------| |----------------------|-------|----------|--------------|
| greedy_search | 8.78 | 10.12 | 16.16 | | greedy_search | 8.78 | 10.12 | 16.16 |
| modified_beam_search | 8.53| 9.95 | 15.81 |
| fast_beam_search| 9.01 | 10.47 | 16.28 | | fast_beam_search| 9.01 | 10.47 | 16.28 |
| modified_beam_search | 8.53| 9.95 | 15.81 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
### Alimeeting ### [Alimeeting][alimeeting]
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Alimeeting_pruned_transducer_stateless2]. #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/alimeeting/ASR/pruned_transducer_stateless2)
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)
| | Eval | Test-Net | | | Eval | Test-Net |
|----------------------|--------|----------| |----------------------|--------|----------|
| greedy search | 31.77 | 34.66 | | greedy_search | 31.77 | 34.66 |
| fast beam search | 31.39 | 33.02 | | fast_beam_search | 31.39 | 33.02 |
| modified beam search | 30.38 | 34.25 | | modified_beam_search | 30.38 | 34.25 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
### TAL_CSASR ### [TAL_CSASR][tal_csasr]
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TAL_CSASR_pruned_transducer_stateless5].
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss #### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5)
The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English): The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
|decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en | |decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
|--|--|--|--|--|--|--| |--|--|--|--|--|--|--|
|greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13| |greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13|
|modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
|fast_beam_search| 7.18 | 6.39| 18.90 | 7.27| 6.55 | 18.77| |fast_beam_search| 7.18 | 6.39| 18.90 | 7.27| 6.55 | 18.77|
|modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
## Deployment with C++ ## TTS: Text-to-Speech
Once you have trained a model in icefall, you may want to deploy it with C++, ### Supported Datasets
without Python dependencies.
Please refer to the documentation - [LJSpeech][ljspeech]
<https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c> - [VCTK][vctk]
### Supported Models
- [VITS](https://arxiv.org/abs/2106.06103)
# Deployment with C++
Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
Please refer to the [document](https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c)
for how to do this. for how to do this.
We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++. We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++.
Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing) Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing)
[LibriSpeech_tdnn_lstm_ctc]: egs/librispeech/ASR/tdnn_lstm_ctc
[LibriSpeech_conformer_ctc]: egs/librispeech/ASR/conformer_ctc
[LibriSpeech_transducer]: egs/librispeech/ASR/transducer
[LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless
[LibriSpeech_zipformer]: egs/librispeech/ASR/zipformer
[Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc
[Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc
[Aishell_pruned_transducer_stateless7]: egs/aishell/ASR/pruned_transducer_stateless7_bbpe
[Aishell2_pruned_transducer_stateless5]: egs/aishell2/ASR/pruned_transducer_stateless5
[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
[TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc
[TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc
[TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless
[TED-LIUM3_pruned_transducer_stateless]: egs/tedlium3/ASR/pruned_transducer_stateless
[GigaSpeech_conformer_ctc]: egs/gigaspeech/ASR/conformer_ctc
[GigaSpeech_pruned_transducer_stateless2]: egs/gigaspeech/ASR/pruned_transducer_stateless2
[GigaSpeech_zipformer]: egs/gigaspeech/ASR/zipformer
[Aidatatang_200zh_pruned_transducer_stateless2]: egs/aidatatang_200zh/ASR/pruned_transducer_stateless2
[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
[Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
[TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5
[yesno]: egs/yesno/ASR [yesno]: egs/yesno/ASR
[librispeech]: egs/librispeech/ASR [librispeech]: egs/librispeech/ASR
[aishell]: egs/aishell/ASR [aishell]: egs/aishell/ASR
@ -411,3 +362,15 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
[ami]: egs/ami [ami]: egs/ami
[swbd]: egs/swbd/ASR [swbd]: egs/swbd/ASR
[k2]: https://github.com/k2-fsa/k2 [k2]: https://github.com/k2-fsa/k2
[commonvoice]: egs/commonvoice/ASR
[csj]: egs/csj/ASR
[libricss]: egs/libricss/SURT
[libriheavy]: egs/libriheavy/ASR
[mgb2]: egs/mgb2/ASR
[peoplespeech]: egs/peoplespeech/ASR
[spgispeech]: egs/spgispeech/ASR
[voxpopuli]: egs/voxpopuli/ASR
[xbmu-amdo31]: egs/xbmu-amdo31/ASR
[vctk]: egs/vctk/TTS
[ljspeech]: egs/ljspeech/TTS