diff --git a/README.md b/README.md index 240272712..50082fb45 100644 --- a/README.md +++ b/README.md @@ -80,7 +80,7 @@ If you are willing to contribute to icefall, please refer to [contributing](http We would like to highlight the performance of some of the recipes here. -#### yesno +#### [yesno][yesno] This is the simplest ASR recipe in `icefall` and can be run on CPU. Training takes less than 30 seconds and gives you the following WER: @@ -96,10 +96,8 @@ We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.res Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md) for the **latest** results. - #### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc) - | | test-clean | test-other | |-----|------------|------------| | WER | 2.42 | 5.73 | @@ -109,7 +107,6 @@ We provide a Colab notebook to test the pre-trained model: [![Open In Colab](htt #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc) - | | test-clean | test-other | |-----|------------|------------| | WER | 6.59 | 17.69 | @@ -119,9 +116,6 @@ We provide a Colab notebook to test the pre-trained model: [![Open In Colab](htt #### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer) - -The best WER is: - | | test-clean | test-other | |---------------|------------|------------| | greedy search | 3.07 | 7.51 | @@ -254,8 +248,6 @@ We provide a Colab notebook to test the pre-trained model: [![Open In Colab](htt ### [TED-LIUM3][tedlium3] -We provide two models for this recipe: [Transducer Stateless: Conformer encoder + Embedding decoder][TED-LIUM3_transducer_stateless] and [Pruned Transducer Stateless: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TED-LIUM3_pruned_transducer_stateless]. - #### [Transducer (Conformer Encoder + Embedding Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless) | | dev | test | @@ -323,7 +315,6 @@ We provide a Colab notebook to test the pre-trained model: [![Open In Colab](htt ### [TAL_CSASR][tal_csasr] -We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][TAL_CSASR_pruned_transducer_stateless5]. #### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5) @@ -336,6 +327,17 @@ The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing) +## TTS: Text-to-Speech + +### Supported Datasets + + - [LJSpeech][ljspeech] + - [VCTK][vctk] + +#### Supported Models + + - [VITS](https://arxiv.org/abs/2106.06103) + # Deployment with C++ Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies. @@ -370,4 +372,7 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad [peoplespeech]: egs/peoplespeech/ASR [spgispeech]: egs/spgispeech/ASR [voxpopuli]: egs/voxpopuli/ASR -[xbmu-amdo31]: egs/xbmu-amdo31/ASR \ No newline at end of file +[xbmu-amdo31]: egs/xbmu-amdo31/ASR + +[vctk]: egs/vctk/TTS +[ljspeech]: egs/ljspeech/TTS \ No newline at end of file