Reworked README.md (#1470)

* Rework README.md Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> --------- Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2025-12-10 22:45:27 +00:00 · 2024-01-23 16:26:24 +08:00 · 2024-01-23 16:26:24 +08:00 · ebe97a07b0
commit ebe97a07b0
parent 5dfc3ed7f9
1 changed files with 201 additions and 238 deletions
--- a/README.md
+++ b/README.md
@ -2,46 +2,83 @@
 <img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168>
 </div>
-## Introduction
+# Introduction
-icefall contains ASR recipes for various datasets
+The icefall peoject contains speech related recipes for various datasets
-using <https://github.com/k2-fsa/k2>.
+using [k2-fsa](https://github.com/k2-fsa/k2) and [lhotse](https://github.com/lhotse-speech/lhotse).
-You can use <https://github.com/k2-fsa/sherpa> to deploy models
+You can use [sherpa](https://github.com/k2-fsa/sherpa), [sherpa-ncnn](https://github.com/k2-fsa/sherpa-ncnn) or [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for deployment with models
-trained with icefall.
+in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
 You can try pre-trained models from within your browser without the need
-to download or install anything by visiting <https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition>
+to download or install anything by visiting this [huggingface space](https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition).
-See <https://k2-fsa.github.io/icefall/huggingface/spaces.html> for more details.
+Please refer to [document](https://k2-fsa.github.io/icefall/huggingface/spaces.html) for more details.
-## Installation
+# Installation
-Please refer to <https://icefall.readthedocs.io/en/latest/installation/index.html>
+Please refer to [document](https://icefall.readthedocs.io/en/latest/installation/index.html)
 for installation.
-## Recipes
+# Recipes
-Please refer to <https://icefall.readthedocs.io/en/latest/recipes/index.html>
+Please refer to [document](https://icefall.readthedocs.io/en/latest/recipes/index.html)
-for more information.
+for more details.
-We provide the following recipes:
+## ASR: Automatic Speech Recognition
 ### Supported Datasets
  - [yesno][yesno]
-  - [LibriSpeech][librispeech]
+  
-  - [GigaSpeech][gigaspeech]
+  - [Aidatatang_200zh][aidatatang_200zh]
  - [AMI][ami]
  - [Aishell][aishell]
  - [Aishell2][aishell2]
  - [Aishell4][aishell4]
  - [Alimeeting][alimeeting]
  - [AMI][ami]
  - [CommonVoice][commonvoice]
  - [Corpus of Spontaneous Japanese][csj]
  - [GigaSpeech][gigaspeech]
  - [LibriCSS][libricss]
  - [LibriSpeech][librispeech]
  - [Libriheavy][libriheavy]
  - [Multi-Dialect Broadcast News Arabic Speech Recognition][mgb2]
  - [PeopleSpeech][peoplespeech]
  - [SPGISpeech][spgispeech]
  - [Switchboard][swbd]
  - [TIMIT][timit]
  - [TED-LIUM3][tedlium3]
  - [Aidatatang_200zh][aidatatang_200zh]
  - [WenetSpeech][wenetspeech]
  - [Alimeeting][alimeeting]
  - [Switchboard][swbd]
  - [TAL_CSASR][tal_csasr]
  - [Voxpopuli][voxpopuli]
  - [XBMU-AMDO31][xbmu-amdo31]
  - [WenetSpeech][wenetspeech]
 More datasets will be added in the future.
-### yesno
+### Supported Models
 The [LibriSpeech][librispeech] recipe supports the most comprehensive set of models, you are welcome to try them out.
 #### CTC 
  - TDNN LSTM CTC
  - Conformer CTC
  - Zipformer CTC
 #### MMI
  - Conformer MMI
  - Zipformer MMI
 #### Transducer
  - Conformer-based Encoder
  - LSTM-based Encoder
  - Zipformer-based Encoder
  - LSTM-based Predictor
  - [Stateless Predictor](https://research.google/pubs/rnn-transducer-with-stateless-prediction-network/)
 If you are willing to contribute to icefall, please refer to [contributing](https://icefall.readthedocs.io/en/latest/contributing/index.html) for more details.
 We would like to highlight the performance of some of the recipes here.
 ### [yesno][yesno]
 This is the simplest ASR recipe in `icefall` and can be run on CPU.
 Training takes less than 30 seconds and gives you the following WER:
@ -52,350 +89,264 @@ Training takes less than 30 seconds and gives you the following WER:
 We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
-### LibriSpeech
+### [LibriSpeech][librispeech]
-Please see <https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md>
+Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)
 for the **latest** results.
-We provide 5 models for this recipe:
+#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc)
 - [conformer CTC model][LibriSpeech_conformer_ctc]
 - [TDNN LSTM CTC model][LibriSpeech_tdnn_lstm_ctc]
 - [Transducer: Conformer encoder + LSTM decoder][LibriSpeech_transducer]
 - [Transducer: Conformer encoder + Embedding decoder][LibriSpeech_transducer_stateless]
 - [Transducer: Zipformer encoder + Embedding decoder][LibriSpeech_zipformer]
 #### Conformer CTC Model
 The best WER we currently have is:
 |     | test-clean | test-other |
 |-----|------------|------------|
 | WER | 2.42       | 5.73       |
-We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
-#### TDNN LSTM CTC Model
+#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc)
 The WER for this model is:
 |     | test-clean | test-other |
 |-----|------------|------------|
 | WER | 6.59       | 17.69      |
-We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
-#### Transducer: Conformer encoder + LSTM decoder
+#### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
-Using Conformer as encoder and LSTM as decoder.
+|               | test-clean | test-other |
 |---------------|------------|------------|
 | greedy search | 3.07       | 7.51       |
-The best WER with greedy search is:
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
-|     | test-clean | test-other |
+#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
 |-----|------------|------------|
 | WER | 3.07       | 7.51       |
-We provide a Colab notebook to run a pre-trained RNN-T conformer model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
+|                                       | test-clean | test-other |
-
+|---------------------------------------|------------|------------|
-#### Transducer: Conformer encoder + Embedding decoder
+| modified_beam_search (`beam_size=4`) | 2.56       | 6.27       |
 Using Conformer as encoder. The decoder consists of 1 embedding layer
 and 1 convolutional layer.
 The best WER using modified beam search with beam size 4 is:
 |     | test-clean | test-other |
 |-----|------------|------------|
 | WER | 2.56       | 6.27       |
 Note: No auxiliary losses are used in the training and no LMs are used
 in the decoding.
 We provide a Colab notebook to run a pre-trained transducer conformer + stateless decoder model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
-#### k2 pruned RNN-T
+We provide a Colab notebook to run test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
 #### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer)
 WER (modified_beam_search `beam_size=4` unless further stated) 
 1. LibriSpeech-960hr
 | Encoder         | Params | test-clean | test-other | epochs  | devices    |
 |-----------------|--------|------------|------------|---------|------------|
-| zipformer       | 65.5M  | 2.21       | 4.79       | 50      | 4 32G-V100 |
+| Zipformer       | 65.5M  | 2.21       | 4.79       | 50      | 4 32G-V100 |
-| zipformer-small | 23.2M  | 2.42       | 5.73       | 50      | 2 32G-V100 |
+| Zipformer-small | 23.2M  | 2.42       | 5.73       | 50      | 2 32G-V100 |
-| zipformer-large | 148.4M | 2.06       | 4.63       | 50      | 4 32G-V100 |
+| Zipformer-large | 148.4M | 2.06       | 4.63       | 50      | 4 32G-V100 |
-| zipformer-large | 148.4M | 2.00       | 4.38       | 174     | 8 80G-A100 |
+| Zipformer-large | 148.4M | 2.00       | 4.38       | 174     | 8 80G-A100 |
-Note: No auxiliary losses are used in the training and no LMs are used
+2. LibriSpeech-960hr + GigaSpeech
 in the decoding.
-#### k2 pruned RNN-T + GigaSpeech
+| Encoder         | Params | test-clean | test-other |
-
+|-----------------|--------|------------|------------|
-|     | test-clean | test-other |
+| Zipformer | 65.5M   | 1.78       | 4.08       |
 |-----|------------|------------|
 | WER | 1.78       | 4.08       |
 Note: No auxiliary losses are used in the training and no LMs are used
 in the decoding.
 #### k2 pruned RNN-T + GigaSpeech + CommonVoice
 |     | test-clean | test-other |
 |-----|------------|------------|
 | WER | 1.90       | 3.98       |
 Note: No auxiliary losses are used in the training and no LMs are used
 in the decoding.
-### GigaSpeech
+3. LibriSpeech-960hr + GigaSpeech + CommonVoice
-We provide three models for this recipe:
+| Encoder         | Params | test-clean | test-other |
 |-----------------|--------|------------|------------|
 | Zipformer | 65.5M   | 1.90       | 3.98       |
 - [Conformer CTC model][GigaSpeech_conformer_ctc]
 - [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
 - [Transducer: Zipformer encoder + Embedding decoder][GigaSpeech_zipformer]
-#### Conformer CTC
+### [GigaSpeech][gigaspeech]
 #### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/conformer_ctc)
 |     |  Dev  | Test  |
 |-----|-------|-------|
 | WER | 10.47 | 10.58 |
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/pruned_transducer_stateless2)
 Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
 |                      |  Dev  | Test  |
 |----------------------|-------|-------|
-|    greedy search     | 10.51 | 10.73 |
+|    greedy_search     | 10.51 | 10.73 |
-|   fast beam search   | 10.50 | 10.69 |
+|   fast_beam_search   | 10.50 | 10.69 |
-| modified beam search | 10.40 | 10.51 |
+| modified_beam_search | 10.40 | 10.51 |
-#### Transducer: Zipformer encoder + Embedding decoder
+#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/zipformer)
 |                      |  Dev  | Test  |
 |----------------------|-------|-------|
-|    greedy search     | 10.31 | 10.50 |
+|    greedy_search     | 10.31 | 10.50 |
-|   fast beam search   | 10.26 | 10.48 |
+|   fast_beam_search   | 10.26 | 10.48 |
-| modified beam search | 10.25 | 10.38 |
+| modified_beam_search | 10.25 | 10.38 |
-### Aishell
+### [Aishell][aishell]
-We provide three models for this recipe: [conformer CTC model][Aishell_conformer_ctc],
+#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/tdnn_lstm_ctc)
 [TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc], and [Transducer Stateless Model][Aishell_pruned_transducer_stateless7],
 #### Conformer CTC Model
 The best CER we currently have is:
 |     | test |
 |-----|------|
 | CER | 4.26 |
 #### TDNN LSTM CTC Model
 The CER for this model is:
 |     | test  |
 |-----|-------|
 | CER | 10.16 |
-We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model:  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
+We provide a Colab notebook to test the pre-trained model:  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
-#### Transducer Stateless Model
+#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/transducer_stateless)
 The best CER we currently have is:
 |     | test |
 |-----|------|
 | CER | 4.38 |
-We provide a Colab notebook to run a pre-trained TransducerStateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
 #### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/zipformer)
 WER (modified_beam_search `beam_size=4`) 
 | Encoder         | Params | dev | test | epochs  |
 |-----------------|--------|-----|------|---------|
 | Zipformer       | 73.4M  | 4.13| 4.40 | 55      |
 | Zipformer-small | 30.2M  | 4.40| 4.67 | 55      |
 | Zipformer-large | 157.3M | 4.03| 4.28 | 56      |
-### Aishell2
+### [Aishell4][aishell4]
-We provide one model for this recipe: [Transducer Stateless Model][Aishell2_pruned_transducer_stateless5].
+#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell4/ASR/pruned_transducer_stateless5)
 #### Transducer Stateless Model
 The best WER we currently have is:
 |     |   dev-ios  |  test-ios  |
 |-----|------------|------------|
 | WER |    5.32    |    5.56    |
 ### Aishell4
 We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
 #### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
 The best CER we currently have is:
 1 Trained with all subsets: 
 |     |   test     |
 |-----|------------|
 | CER |   29.08    |
-
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
 We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
-### TIMIT
+### [TIMIT][timit]
-We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc]
+#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_lstm_ctc)
 and [TDNN LiGRU CTC model][TIMIT_tdnn_ligru_ctc].
-#### TDNN LSTM CTC Model
+|   |TEST|
-
+|---|----|
 The best PER we currently have is:
 ||TEST|
 |--|--|
 |PER| 19.71% |
-We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
-#### TDNN LiGRU CTC Model
+#### [TDNN LiGRU CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_ligru_ctc)
-The PER for this model is:
+|   |TEST|
-
+|---|----|
 ||TEST|
 |--|--|
 |PER| 17.66% |
-We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model:  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
-### TED-LIUM3
+### [TED-LIUM3][tedlium3]
-We provide two models for this recipe: [Transducer Stateless: Conformer encoder + Embedding decoder][TED-LIUM3_transducer_stateless] and [Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TED-LIUM3_pruned_transducer_stateless].
+#### [Transducer (Conformer Encoder + Embedding Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless)
-#### Transducer Stateless:  Conformer encoder + Embedding decoder
+|                                      |  dev  |  test  |
-
+|--------------------------------------|-------|--------|
-The best WER using modified beam search with beam size 4 is:
+| modified_beam_search (`beam_size=4`) |  6.91 |  6.33  |
 |     |  dev  |  test  |
 |-----|-------|--------|
 | WER |  6.91 |  6.33  |
 Note: No auxiliary losses are used in the training and no LMs are used in the decoding.
 We provide a Colab notebook to run a pre-trained Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
 #### Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
 The best WER using modified beam search with beam size 4 is:
 |     |  dev  |  test  |
 |-----|-------|--------|
 | WER |  6.77 |  6.14  |
 We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
-### Aidatatang_200zh
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aidatatang_200zh_pruned_transducer_stateless2].
+#### [Transducer (pruned_transducer_stateless)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/pruned_transducer_stateless)
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
+|                                      |  dev  |  test  |
 |--------------------------------------|-------|--------|
 | modified_beam_search (`beam_size=4`) |  6.77 |  6.14  |
 We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
 ### [Aidatatang_200zh][aidatatang_200zh]
 #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/aidatatang_200zh/ASR/pruned_transducer_stateless2)
 |                      |  Dev  | Test  |
 |----------------------|-------|-------|
-|    greedy search     | 5.53  | 6.59  |
+|    greedy_search     | 5.53  | 6.59  |
-|   fast beam search   | 5.30  | 6.34  |
+|   fast_beam_search   | 5.30  | 6.34  |
-| modified beam search | 5.27  | 6.33  |
+| modified_beam_search | 5.27  | 6.33  |
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
-### WenetSpeech
+### [WenetSpeech][wenetspeech]
-We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5].
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2)
 #### Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)
 |                      |  Dev  | Test-Net | Test-Meeting |
 |----------------------|-------|----------|--------------|
-|    greedy search     | 7.80  |  8.75    |  13.49       |
+|    greedy_search     | 7.80  |  8.75    |  13.49       |
-| modified beam search| 7.76  |  8.71    |  13.41       |
+|   fast_beam_search   | 7.94  |  8.74    |  13.80       |
-|   fast beam search   | 7.94  |  8.74    |  13.80       |
+| modified_beam_search | 7.76  |  8.71    |  13.41       |
 We provide a Colab notebook to run the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
 #### [Transducer **Streaming** (pruned_transducer_stateless5) ](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5)
 #### Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)
 **Streaming**:
 |                      |  Dev  | Test-Net | Test-Meeting |
 |----------------------|-------|----------|--------------|
 | greedy_search | 8.78 | 10.12 | 16.16 |
 | modified_beam_search | 8.53| 9.95 | 15.81 |
 | fast_beam_search| 9.01 | 10.47 | 16.28 |
 | modified_beam_search | 8.53| 9.95 | 15.81 |
 We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
-### Alimeeting
+### [Alimeeting][alimeeting]
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Alimeeting_pruned_transducer_stateless2].
+#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/alimeeting/ASR/pruned_transducer_stateless2)
 #### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with far subset)
 |                      |  Eval  | Test-Net |
 |----------------------|--------|----------|
-|    greedy search     | 31.77  |  34.66   |
+|    greedy_search     | 31.77  |  34.66   |
-|   fast beam search   | 31.39  |  33.02   |
+|   fast_beam_search   | 31.39  |  33.02   |
-| modified beam search | 30.38  |  34.25   |
+| modified_beam_search | 30.38  |  34.25   |
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
-### TAL_CSASR
+### [TAL_CSASR][tal_csasr]
 We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TAL_CSASR_pruned_transducer_stateless5].
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
+#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5)
 The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
 |decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
 |--|--|--|--|--|--|--|
 |greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13|
 |modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
 |fast_beam_search| 7.18 | 6.39| 18.90 |  7.27| 6.55 | 18.77|
 |modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
+We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
-## Deployment with C++
+## TTS: Text-to-Speech
-Once you have trained a model in icefall, you may want to deploy it with C++,
+### Supported Datasets
 without Python dependencies.
-Please refer to the documentation
+  - [LJSpeech][ljspeech]
-<https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c>
+  - [VCTK][vctk]
 ### Supported Models
  - [VITS](https://arxiv.org/abs/2106.06103)
 # Deployment with C++
 Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
 Please refer to the [document](https://icefall.readthedocs.io/en/latest/recipes/Non-streaming-ASR/librispeech/conformer_ctc.html#deployment-with-c)
 for how to do this.
 We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++.
 Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing)
 [LibriSpeech_tdnn_lstm_ctc]: egs/librispeech/ASR/tdnn_lstm_ctc
 [LibriSpeech_conformer_ctc]: egs/librispeech/ASR/conformer_ctc
 [LibriSpeech_transducer]: egs/librispeech/ASR/transducer
 [LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless
 [LibriSpeech_zipformer]: egs/librispeech/ASR/zipformer
 [Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc
 [Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc
 [Aishell_pruned_transducer_stateless7]: egs/aishell/ASR/pruned_transducer_stateless7_bbpe
 [Aishell2_pruned_transducer_stateless5]: egs/aishell2/ASR/pruned_transducer_stateless5
 [Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
 [TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc
 [TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc
 [TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless
 [TED-LIUM3_pruned_transducer_stateless]: egs/tedlium3/ASR/pruned_transducer_stateless
 [GigaSpeech_conformer_ctc]: egs/gigaspeech/ASR/conformer_ctc
 [GigaSpeech_pruned_transducer_stateless2]: egs/gigaspeech/ASR/pruned_transducer_stateless2
 [GigaSpeech_zipformer]: egs/gigaspeech/ASR/zipformer
 [Aidatatang_200zh_pruned_transducer_stateless2]: egs/aidatatang_200zh/ASR/pruned_transducer_stateless2
 [WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
 [WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
 [Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
 [TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5
 [yesno]: egs/yesno/ASR
 [librispeech]: egs/librispeech/ASR
 [aishell]: egs/aishell/ASR
@ -411,3 +362,15 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
 [ami]: egs/ami
 [swbd]: egs/swbd/ASR
 [k2]: https://github.com/k2-fsa/k2
 [commonvoice]: egs/commonvoice/ASR
 [csj]: egs/csj/ASR
 [libricss]: egs/libricss/SURT
 [libriheavy]: egs/libriheavy/ASR
 [mgb2]: egs/mgb2/ASR
 [peoplespeech]: egs/peoplespeech/ASR
 [spgispeech]: egs/spgispeech/ASR
 [voxpopuli]: egs/voxpopuli/ASR
 [xbmu-amdo31]: egs/xbmu-amdo31/ASR
 [vctk]: egs/vctk/TTS
 [ljspeech]: egs/ljspeech/TTS