mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-08 09:32:20 +00:00
parent
efbb577b88
commit
24b50a5bad
151
README.md
151
README.md
@ -28,14 +28,15 @@ We provide the following recipes:
|
||||
|
||||
- [yesno][yesno]
|
||||
- [LibriSpeech][librispeech]
|
||||
- [GigaSpeech][gigaspeech]
|
||||
- [Aishell][aishell]
|
||||
- [Aishell2][aishell2]
|
||||
- [Aishell4][aishell4]
|
||||
- [TIMIT][timit]
|
||||
- [TED-LIUM3][tedlium3]
|
||||
- [GigaSpeech][gigaspeech]
|
||||
- [Aidatatang_200zh][aidatatang_200zh]
|
||||
- [WenetSpeech][wenetspeech]
|
||||
- [Alimeeting][alimeeting]
|
||||
- [Aishell4][aishell4]
|
||||
- [TAL_CSASR][tal_csasr]
|
||||
|
||||
### yesno
|
||||
@ -46,9 +47,7 @@ Training takes less than 30 seconds and gives you the following WER:
|
||||
```
|
||||
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
|
||||
```
|
||||
We do provide a Colab notebook for this recipe.
|
||||
|
||||
[](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
|
||||
We provide a Colab notebook for this recipe: [](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
|
||||
|
||||
|
||||
### LibriSpeech
|
||||
@ -82,7 +81,7 @@ The WER for this model is:
|
||||
|-----|------------|------------|
|
||||
| WER | 6.59 | 17.69 |
|
||||
|
||||
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
|
||||
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
|
||||
|
||||
|
||||
#### Transducer: Conformer encoder + LSTM decoder
|
||||
@ -118,19 +117,54 @@ We provide a Colab notebook to run a pre-trained transducer conformer + stateles
|
||||
|
||||
| | test-clean | test-other |
|
||||
|-----|------------|------------|
|
||||
| WER | 2.57 | 5.95 |
|
||||
| WER | 2.15 | 5.20 |
|
||||
|
||||
Note: No auxiliary losses are used in the training and no LMs are used
|
||||
in the decoding.
|
||||
|
||||
#### k2 pruned RNN-T + GigaSpeech
|
||||
|
||||
| | test-clean | test-other |
|
||||
|-----|------------|------------|
|
||||
| WER | 2.00 | 4.63 |
|
||||
| WER | 1.78 | 4.08 |
|
||||
|
||||
Note: No auxiliary losses are used in the training and no LMs are used
|
||||
in the decoding.
|
||||
|
||||
#### k2 pruned RNN-T + GigaSpeech + CommonVoice
|
||||
|
||||
| | test-clean | test-other |
|
||||
|-----|------------|------------|
|
||||
| WER | 1.90 | 3.98 |
|
||||
|
||||
Note: No auxiliary losses are used in the training and no LMs are used
|
||||
in the decoding.
|
||||
|
||||
|
||||
### GigaSpeech
|
||||
|
||||
We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc]
|
||||
and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
|
||||
|
||||
#### Conformer CTC
|
||||
|
||||
| | Dev | Test |
|
||||
|-----|-------|-------|
|
||||
| WER | 10.47 | 10.58 |
|
||||
|
||||
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
|
||||
|
||||
| | Dev | Test |
|
||||
|----------------------|-------|-------|
|
||||
| greedy search | 10.51 | 10.73 |
|
||||
| fast beam search | 10.50 | 10.69 |
|
||||
| modified beam search | 10.40 | 10.51 |
|
||||
|
||||
|
||||
### Aishell
|
||||
|
||||
We provide two models for this recipe: [conformer CTC model][Aishell_conformer_ctc]
|
||||
and [TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc].
|
||||
We provide three models for this recipe: [conformer CTC model][Aishell_conformer_ctc],
|
||||
[TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc], and [Transducer Stateless Model][Aishell_pruned_transducer_stateless7],
|
||||
|
||||
#### Conformer CTC Model
|
||||
|
||||
@ -140,20 +174,6 @@ The best CER we currently have is:
|
||||
|-----|------|
|
||||
| CER | 4.26 |
|
||||
|
||||
|
||||
We provide a Colab notebook to run a pre-trained conformer CTC model: [
|
||||
|
||||
#### Transducer Stateless Model
|
||||
|
||||
The best CER we currently have is:
|
||||
|
||||
| | test |
|
||||
|-----|------|
|
||||
| CER | 4.68 |
|
||||
|
||||
|
||||
We provide a Colab notebook to run a pre-trained TransducerStateless model: [](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
|
||||
|
||||
#### TDNN LSTM CTC Model
|
||||
|
||||
The CER for this model is:
|
||||
@ -164,6 +184,46 @@ The CER for this model is:
|
||||
|
||||
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
|
||||
|
||||
#### Transducer Stateless Model
|
||||
|
||||
The best CER we currently have is:
|
||||
|
||||
| | test |
|
||||
|-----|------|
|
||||
| CER | 4.38 |
|
||||
|
||||
We provide a Colab notebook to run a pre-trained TransducerStateless model: [](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
|
||||
|
||||
|
||||
### Aishell2
|
||||
|
||||
We provide one model for this recipe: [Transducer Stateless Model][Aishell2_pruned_transducer_stateless5].
|
||||
|
||||
#### Transducer Stateless Model
|
||||
|
||||
The best WER we currently have is:
|
||||
|
||||
| | dev-ios | test-ios |
|
||||
|-----|------------|------------|
|
||||
| WER | 5.32 | 5.56 |
|
||||
|
||||
|
||||
### Aishell4
|
||||
|
||||
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
|
||||
|
||||
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
|
||||
|
||||
The best CER we currently have is:
|
||||
|
||||
| | test |
|
||||
|-----|------------|
|
||||
| CER | 29.08 |
|
||||
|
||||
|
||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
||||
|
||||
|
||||
### TIMIT
|
||||
|
||||
We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc]
|
||||
@ -187,7 +247,8 @@ The PER for this model is:
|
||||
|--|--|
|
||||
|PER| 17.66% |
|
||||
|
||||
We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [](https://colab.research.google.com/drive/11IT-k4HQIgQngXz1uvWsEYktjqQt7Tmb?usp=sharing)
|
||||
We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
||||
|
||||
|
||||
### TED-LIUM3
|
||||
|
||||
@ -215,24 +276,6 @@ The best WER using modified beam search with beam size 4 is:
|
||||
|
||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
|
||||
|
||||
### GigaSpeech
|
||||
|
||||
We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc]
|
||||
and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
|
||||
|
||||
#### Conformer CTC
|
||||
|
||||
| | Dev | Test |
|
||||
|-----|-------|-------|
|
||||
| WER | 10.47 | 10.58 |
|
||||
|
||||
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
|
||||
|
||||
| | Dev | Test |
|
||||
|----------------------|-------|-------|
|
||||
| greedy search | 10.51 | 10.73 |
|
||||
| fast beam search | 10.50 | 10.69 |
|
||||
| modified beam search | 10.40 | 10.51 |
|
||||
|
||||
### Aidatatang_200zh
|
||||
|
||||
@ -248,6 +291,7 @@ We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder
|
||||
|
||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
|
||||
|
||||
|
||||
### WenetSpeech
|
||||
|
||||
We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5].
|
||||
@ -284,20 +328,6 @@ We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder
|
||||
|
||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
|
||||
|
||||
### Aishell4
|
||||
|
||||
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
|
||||
|
||||
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
|
||||
|
||||
The best CER(%) results:
|
||||
| | test |
|
||||
|----------------------|--------|
|
||||
| greedy search | 29.89 |
|
||||
| fast beam search | 28.91 |
|
||||
| modified beam search | 29.08 |
|
||||
|
||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
||||
|
||||
### TAL_CSASR
|
||||
|
||||
@ -333,6 +363,9 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
|
||||
[LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless
|
||||
[Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc
|
||||
[Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc
|
||||
[Aishell_pruned_transducer_stateless7]: egs/aishell/ASR/pruned_transducer_stateless7_bbpe
|
||||
[Aishell2_pruned_transducer_stateless5]: egs/aishell2/ASR/pruned_transducer_stateless5
|
||||
[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
|
||||
[TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc
|
||||
[TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc
|
||||
[TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless
|
||||
@ -343,17 +376,17 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
|
||||
[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
|
||||
[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
|
||||
[Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
|
||||
[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
|
||||
[TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5
|
||||
[yesno]: egs/yesno/ASR
|
||||
[librispeech]: egs/librispeech/ASR
|
||||
[aishell]: egs/aishell/ASR
|
||||
[aishell2]: egs/aishell2/ASR
|
||||
[aishell4]: egs/aishell4/ASR
|
||||
[timit]: egs/timit/ASR
|
||||
[tedlium3]: egs/tedlium3/ASR
|
||||
[gigaspeech]: egs/gigaspeech/ASR
|
||||
[aidatatang_200zh]: egs/aidatatang_200zh/ASR
|
||||
[wenetspeech]: egs/wenetspeech/ASR
|
||||
[alimeeting]: egs/alimeeting/ASR
|
||||
[aishell4]: egs/aishell4/ASR
|
||||
[tal_csasr]: egs/tal_csasr/ASR
|
||||
[k2]: https://github.com/k2-fsa/k2
|
||||
|
Loading…
x
Reference in New Issue
Block a user