mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 01:52:41 +00:00
parent
efbb577b88
commit
24b50a5bad
149
README.md
149
README.md
@ -28,14 +28,15 @@ We provide the following recipes:
|
|||||||
|
|
||||||
- [yesno][yesno]
|
- [yesno][yesno]
|
||||||
- [LibriSpeech][librispeech]
|
- [LibriSpeech][librispeech]
|
||||||
|
- [GigaSpeech][gigaspeech]
|
||||||
- [Aishell][aishell]
|
- [Aishell][aishell]
|
||||||
|
- [Aishell2][aishell2]
|
||||||
|
- [Aishell4][aishell4]
|
||||||
- [TIMIT][timit]
|
- [TIMIT][timit]
|
||||||
- [TED-LIUM3][tedlium3]
|
- [TED-LIUM3][tedlium3]
|
||||||
- [GigaSpeech][gigaspeech]
|
|
||||||
- [Aidatatang_200zh][aidatatang_200zh]
|
- [Aidatatang_200zh][aidatatang_200zh]
|
||||||
- [WenetSpeech][wenetspeech]
|
- [WenetSpeech][wenetspeech]
|
||||||
- [Alimeeting][alimeeting]
|
- [Alimeeting][alimeeting]
|
||||||
- [Aishell4][aishell4]
|
|
||||||
- [TAL_CSASR][tal_csasr]
|
- [TAL_CSASR][tal_csasr]
|
||||||
|
|
||||||
### yesno
|
### yesno
|
||||||
@ -46,9 +47,7 @@ Training takes less than 30 seconds and gives you the following WER:
|
|||||||
```
|
```
|
||||||
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
|
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
|
||||||
```
|
```
|
||||||
We do provide a Colab notebook for this recipe.
|
We provide a Colab notebook for this recipe: [](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
|
||||||
|
|
||||||
[](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
|
|
||||||
|
|
||||||
|
|
||||||
### LibriSpeech
|
### LibriSpeech
|
||||||
@ -118,19 +117,54 @@ We provide a Colab notebook to run a pre-trained transducer conformer + stateles
|
|||||||
|
|
||||||
| | test-clean | test-other |
|
| | test-clean | test-other |
|
||||||
|-----|------------|------------|
|
|-----|------------|------------|
|
||||||
| WER | 2.57 | 5.95 |
|
| WER | 2.15 | 5.20 |
|
||||||
|
|
||||||
|
Note: No auxiliary losses are used in the training and no LMs are used
|
||||||
|
in the decoding.
|
||||||
|
|
||||||
#### k2 pruned RNN-T + GigaSpeech
|
#### k2 pruned RNN-T + GigaSpeech
|
||||||
|
|
||||||
| | test-clean | test-other |
|
| | test-clean | test-other |
|
||||||
|-----|------------|------------|
|
|-----|------------|------------|
|
||||||
| WER | 2.00 | 4.63 |
|
| WER | 1.78 | 4.08 |
|
||||||
|
|
||||||
|
Note: No auxiliary losses are used in the training and no LMs are used
|
||||||
|
in the decoding.
|
||||||
|
|
||||||
|
#### k2 pruned RNN-T + GigaSpeech + CommonVoice
|
||||||
|
|
||||||
|
| | test-clean | test-other |
|
||||||
|
|-----|------------|------------|
|
||||||
|
| WER | 1.90 | 3.98 |
|
||||||
|
|
||||||
|
Note: No auxiliary losses are used in the training and no LMs are used
|
||||||
|
in the decoding.
|
||||||
|
|
||||||
|
|
||||||
|
### GigaSpeech
|
||||||
|
|
||||||
|
We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc]
|
||||||
|
and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
|
||||||
|
|
||||||
|
#### Conformer CTC
|
||||||
|
|
||||||
|
| | Dev | Test |
|
||||||
|
|-----|-------|-------|
|
||||||
|
| WER | 10.47 | 10.58 |
|
||||||
|
|
||||||
|
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
|
||||||
|
|
||||||
|
| | Dev | Test |
|
||||||
|
|----------------------|-------|-------|
|
||||||
|
| greedy search | 10.51 | 10.73 |
|
||||||
|
| fast beam search | 10.50 | 10.69 |
|
||||||
|
| modified beam search | 10.40 | 10.51 |
|
||||||
|
|
||||||
|
|
||||||
### Aishell
|
### Aishell
|
||||||
|
|
||||||
We provide two models for this recipe: [conformer CTC model][Aishell_conformer_ctc]
|
We provide three models for this recipe: [conformer CTC model][Aishell_conformer_ctc],
|
||||||
and [TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc].
|
[TDNN LSTM CTC model][Aishell_tdnn_lstm_ctc], and [Transducer Stateless Model][Aishell_pruned_transducer_stateless7],
|
||||||
|
|
||||||
#### Conformer CTC Model
|
#### Conformer CTC Model
|
||||||
|
|
||||||
@ -140,20 +174,6 @@ The best CER we currently have is:
|
|||||||
|-----|------|
|
|-----|------|
|
||||||
| CER | 4.26 |
|
| CER | 4.26 |
|
||||||
|
|
||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained conformer CTC model: [
|
|
||||||
|
|
||||||
#### Transducer Stateless Model
|
|
||||||
|
|
||||||
The best CER we currently have is:
|
|
||||||
|
|
||||||
| | test |
|
|
||||||
|-----|------|
|
|
||||||
| CER | 4.68 |
|
|
||||||
|
|
||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained TransducerStateless model: [](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
|
|
||||||
|
|
||||||
#### TDNN LSTM CTC Model
|
#### TDNN LSTM CTC Model
|
||||||
|
|
||||||
The CER for this model is:
|
The CER for this model is:
|
||||||
@ -164,6 +184,46 @@ The CER for this model is:
|
|||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
|
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
|
||||||
|
|
||||||
|
#### Transducer Stateless Model
|
||||||
|
|
||||||
|
The best CER we currently have is:
|
||||||
|
|
||||||
|
| | test |
|
||||||
|
|-----|------|
|
||||||
|
| CER | 4.38 |
|
||||||
|
|
||||||
|
We provide a Colab notebook to run a pre-trained TransducerStateless model: [](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
|
||||||
|
|
||||||
|
|
||||||
|
### Aishell2
|
||||||
|
|
||||||
|
We provide one model for this recipe: [Transducer Stateless Model][Aishell2_pruned_transducer_stateless5].
|
||||||
|
|
||||||
|
#### Transducer Stateless Model
|
||||||
|
|
||||||
|
The best WER we currently have is:
|
||||||
|
|
||||||
|
| | dev-ios | test-ios |
|
||||||
|
|-----|------------|------------|
|
||||||
|
| WER | 5.32 | 5.56 |
|
||||||
|
|
||||||
|
|
||||||
|
### Aishell4
|
||||||
|
|
||||||
|
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
|
||||||
|
|
||||||
|
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
|
||||||
|
|
||||||
|
The best CER we currently have is:
|
||||||
|
|
||||||
|
| | test |
|
||||||
|
|-----|------------|
|
||||||
|
| CER | 29.08 |
|
||||||
|
|
||||||
|
|
||||||
|
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
||||||
|
|
||||||
|
|
||||||
### TIMIT
|
### TIMIT
|
||||||
|
|
||||||
We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc]
|
We provide two models for this recipe: [TDNN LSTM CTC model][TIMIT_tdnn_lstm_ctc]
|
||||||
@ -187,7 +247,8 @@ The PER for this model is:
|
|||||||
|--|--|
|
|--|--|
|
||||||
|PER| 17.66% |
|
|PER| 17.66% |
|
||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [](https://colab.research.google.com/drive/11IT-k4HQIgQngXz1uvWsEYktjqQt7Tmb?usp=sharing)
|
We provide a Colab notebook to run a pre-trained TDNN LiGRU CTC model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
||||||
|
|
||||||
|
|
||||||
### TED-LIUM3
|
### TED-LIUM3
|
||||||
|
|
||||||
@ -215,24 +276,6 @@ The best WER using modified beam search with beam size 4 is:
|
|||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
|
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
|
||||||
|
|
||||||
### GigaSpeech
|
|
||||||
|
|
||||||
We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc]
|
|
||||||
and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
|
|
||||||
|
|
||||||
#### Conformer CTC
|
|
||||||
|
|
||||||
| | Dev | Test |
|
|
||||||
|-----|-------|-------|
|
|
||||||
| WER | 10.47 | 10.58 |
|
|
||||||
|
|
||||||
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss
|
|
||||||
|
|
||||||
| | Dev | Test |
|
|
||||||
|----------------------|-------|-------|
|
|
||||||
| greedy search | 10.51 | 10.73 |
|
|
||||||
| fast beam search | 10.50 | 10.69 |
|
|
||||||
| modified beam search | 10.40 | 10.51 |
|
|
||||||
|
|
||||||
### Aidatatang_200zh
|
### Aidatatang_200zh
|
||||||
|
|
||||||
@ -248,6 +291,7 @@ We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder
|
|||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
|
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
|
||||||
|
|
||||||
|
|
||||||
### WenetSpeech
|
### WenetSpeech
|
||||||
|
|
||||||
We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5].
|
We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5].
|
||||||
@ -284,20 +328,6 @@ We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder
|
|||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
|
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
|
||||||
|
|
||||||
### Aishell4
|
|
||||||
|
|
||||||
We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aishell4_pruned_transducer_stateless5].
|
|
||||||
|
|
||||||
#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with all subsets)
|
|
||||||
|
|
||||||
The best CER(%) results:
|
|
||||||
| | test |
|
|
||||||
|----------------------|--------|
|
|
||||||
| greedy search | 29.89 |
|
|
||||||
| fast beam search | 28.91 |
|
|
||||||
| modified beam search | 29.08 |
|
|
||||||
|
|
||||||
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
|
||||||
|
|
||||||
### TAL_CSASR
|
### TAL_CSASR
|
||||||
|
|
||||||
@ -333,6 +363,9 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
|
|||||||
[LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless
|
[LibriSpeech_transducer_stateless]: egs/librispeech/ASR/transducer_stateless
|
||||||
[Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc
|
[Aishell_tdnn_lstm_ctc]: egs/aishell/ASR/tdnn_lstm_ctc
|
||||||
[Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc
|
[Aishell_conformer_ctc]: egs/aishell/ASR/conformer_ctc
|
||||||
|
[Aishell_pruned_transducer_stateless7]: egs/aishell/ASR/pruned_transducer_stateless7_bbpe
|
||||||
|
[Aishell2_pruned_transducer_stateless5]: egs/aishell2/ASR/pruned_transducer_stateless5
|
||||||
|
[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
|
||||||
[TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc
|
[TIMIT_tdnn_lstm_ctc]: egs/timit/ASR/tdnn_lstm_ctc
|
||||||
[TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc
|
[TIMIT_tdnn_ligru_ctc]: egs/timit/ASR/tdnn_ligru_ctc
|
||||||
[TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless
|
[TED-LIUM3_transducer_stateless]: egs/tedlium3/ASR/transducer_stateless
|
||||||
@ -343,17 +376,17 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
|
|||||||
[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
|
[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
|
||||||
[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
|
[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
|
||||||
[Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
|
[Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
|
||||||
[Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
|
|
||||||
[TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5
|
[TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5
|
||||||
[yesno]: egs/yesno/ASR
|
[yesno]: egs/yesno/ASR
|
||||||
[librispeech]: egs/librispeech/ASR
|
[librispeech]: egs/librispeech/ASR
|
||||||
[aishell]: egs/aishell/ASR
|
[aishell]: egs/aishell/ASR
|
||||||
|
[aishell2]: egs/aishell2/ASR
|
||||||
|
[aishell4]: egs/aishell4/ASR
|
||||||
[timit]: egs/timit/ASR
|
[timit]: egs/timit/ASR
|
||||||
[tedlium3]: egs/tedlium3/ASR
|
[tedlium3]: egs/tedlium3/ASR
|
||||||
[gigaspeech]: egs/gigaspeech/ASR
|
[gigaspeech]: egs/gigaspeech/ASR
|
||||||
[aidatatang_200zh]: egs/aidatatang_200zh/ASR
|
[aidatatang_200zh]: egs/aidatatang_200zh/ASR
|
||||||
[wenetspeech]: egs/wenetspeech/ASR
|
[wenetspeech]: egs/wenetspeech/ASR
|
||||||
[alimeeting]: egs/alimeeting/ASR
|
[alimeeting]: egs/alimeeting/ASR
|
||||||
[aishell4]: egs/aishell4/ASR
|
|
||||||
[tal_csasr]: egs/tal_csasr/ASR
|
[tal_csasr]: egs/tal_csasr/ASR
|
||||||
[k2]: https://github.com/k2-fsa/k2
|
[k2]: https://github.com/k2-fsa/k2
|
||||||
|
Loading…
x
Reference in New Issue
Block a user