From 24b50a5bad73f03e9d1b6f3ed5a6f10ea651b815 Mon Sep 17 00:00:00 2001 From: Yifan Yang <64255737+yfyeung@users.noreply.github.com> Date: Mon, 8 May 2023 16:59:05 +0800 Subject: [PATCH] Update README.md (#1043) * Update README.md --- README.md | 151 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 92 insertions(+), 59 deletions(-) diff --git a/README.md b/README.md index 83ce0ac16..476aae6de 100644 --- a/README.md +++ b/README.md @@ -28,14 +28,15 @@ We provide the following recipes: - [yesno][yesno] - [LibriSpeech][librispeech] + - [GigaSpeech][gigaspeech] - [Aishell][aishell] + - [Aishell2][aishell2] + - [Aishell4][aishell4] - [TIMIT][timit] - [TED-LIUM3][tedlium3] - - [GigaSpeech][gigaspeech] - [Aidatatang_200zh][aidatatang_200zh] - [WenetSpeech][wenetspeech] - [Alimeeting][alimeeting] - - [Aishell4][aishell4] - [TAL_CSASR][tal_csasr] ### yesno @@ -46,9 +47,7 @@ Training takes less than 30 seconds and gives you the following WER: ``` [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] ``` -We do provide a Colab notebook for this recipe. - -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) +We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing) ### LibriSpeech @@ -82,7 +81,7 @@ The WER for this model is: |-----|------------|------------| | WER | 6.59 | 17.69 | -We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing) +We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing) #### Transducer: Conformer encoder + LSTM decoder @@ -118,19 +117,54 @@ We provide a Colab notebook to run a pre-trained transducer conformer + stateles | | test-clean | test-other | |-----|------------|------------| -| WER | 2.57 | 5.95 | +| WER | 2.15 | 5.20 | + +Note: No auxiliary losses are used in the training and no LMs are used +in the decoding. #### k2 pruned RNN-T + GigaSpeech | | test-clean | test-other | |-----|------------|------------| -| WER | 2.00 | 4.63 | +| WER | 1.78 | 4.08 | + +Note: No auxiliary losses are used in the training and no LMs are used +in the decoding. + +#### k2 pruned RNN-T + GigaSpeech + CommonVoice + +| | test-clean | test-other | +|-----|------------|------------| +| WER | 1.90 | 3.98 | + +Note: No auxiliary losses are used in the training and no LMs are used +in the decoding. + + +### GigaSpeech + +We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc] +and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2]. + +#### Conformer CTC + +| | Dev | Test | +|-----|-------|-------| +| WER | 10.47 | 10.58 | + +#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss + +| | Dev | Test | +|----------------------|-------|-------| +| greedy search | 10.51 | 10.73 | +| fast beam search | 10.50 | 10.69 | +| modified beam search | 10.40 | 10.51 | ### Aishell -We provide two models for this recipe: [conformer CTC model][Aishell_conformer_ctc] -and [TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc]. +We provide three models for this recipe: [conformer CTC model][Aishell_conformer_ctc], +[TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc], and [Transducer Stateless Model][Aishell_pruned_transducer_stateless7], #### Conformer CTC Model @@ -140,20 +174,6 @@ The best CER we currently have is: |-----|------| | CER | 4.26 | - -We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg](https://colab.research.google.com/drive/1WnG17io5HEZ0Gn_cnh_VzK5QYOoiiklC?usp=sharing) - -#### Transducer Stateless Model - -The best CER we currently have is: - -| | test | -|-----|------| -| CER | 4.68 | - - -We provide a Colab notebook to run a pre-trained TransducerStateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing) - #### TDNN LSTM CTC Model The CER for this model is: @@ -164,6 +184,46 @@ The CER for this model is: We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing) +#### Transducer Stateless Model + +The best CER we currently have is: + +| | test | +|-----|------| +| CER | 4.38 | + +We provide a Colab notebook to run a pre-trained TransducerStateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing) + + +### Aishell2 + +We provide one model for this recipe: [Transducer Stateless Model][Aishell2_pruned_transducer_stateless5]. + +#### Transducer Stateless Model + +The best WER we currently have is: + +| | dev-ios | test-ios | +|-----|------------|------------| +| WER | 5.32 | 5.56 | + + +### Aishell4 + +We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5]. + +#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets) + +The best CER we currently have is: + +| | test | +|-----|------------| +| CER | 29.08 | + + +We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing) + + ### TIMIT We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc] @@ -187,7 +247,8 @@ The PER for this model is: |--|--| |PER| 17.66% | -We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11IT-k4HQIgQngXz1uvWsEYktjqQt7Tmb?usp=sharing) +We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing) + ### TED-LIUM3 @@ -215,24 +276,6 @@ The best WER using modified beam search with beam size 4 is: We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing) -### GigaSpeech - -We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc] -and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2]. - -#### Conformer CTC - -| | Dev | Test | -|-----|-------|-------| -| WER | 10.47 | 10.58 | - -#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss - -| | Dev | Test | -|----------------------|-------|-------| -| greedy search | 10.51 | 10.73 | -| fast beam search | 10.50 | 10.69 | -| modified beam search | 10.40 | 10.51 | ### Aidatatang_200zh @@ -248,6 +291,7 @@ We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing) + ### WenetSpeech We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5]. @@ -284,20 +328,6 @@ We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing) -### Aishell4 - -We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5]. - -#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets) - -The best CER(%) results: -| | test | -|----------------------|--------| -| greedy search | 29.89 | -| fast beam search | 28.91 | -| modified beam search | 29.08 | - -We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing) ### TAL_CSASR @@ -333,6 +363,9 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad [LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless [Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc [Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc +[Aishell_pruned_transducer_stateless7]: egs/aishell/ASR/pruned_transducer_stateless7_bbpe +[Aishell2_pruned_transducer_stateless5]: egs/aishell2/ASR/pruned_transducer_stateless5 +[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5 [TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc [TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc [TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless @@ -343,17 +376,17 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad [WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2 [WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5 [Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2 -[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5 [TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5 [yesno]: egs/yesno/ASR [librispeech]: egs/librispeech/ASR [aishell]: egs/aishell/ASR +[aishell2]: egs/aishell2/ASR +[aishell4]: egs/aishell4/ASR [timit]: egs/timit/ASR [tedlium3]: egs/tedlium3/ASR [gigaspeech]: egs/gigaspeech/ASR [aidatatang_200zh]: egs/aidatatang_200zh/ASR [wenetspeech]: egs/wenetspeech/ASR [alimeeting]: egs/alimeeting/ASR -[aishell4]: egs/aishell4/ASR [tal_csasr]: egs/tal_csasr/ASR [k2]: https://github.com/k2-fsa/k2