mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-12-11 06:55:27 +00:00
71 lines
1.8 KiB
Markdown
71 lines
1.8 KiB
Markdown
## Results
|
|
|
|
### Zipformer
|
|
|
|
#### Non-streaming
|
|
|
|
The training command is:
|
|
|
|
```shell
|
|
./zipformer/train.py \
|
|
--world-size 4 \
|
|
--num-epochs 21 \
|
|
--start-epoch 1 \
|
|
--use-fp16 1 \
|
|
--exp-dir zipformer/exp \
|
|
--manifest-dir data/manifests
|
|
```
|
|
|
|
The decoding command is:
|
|
|
|
```shell
|
|
./zipformer/decode.py \
|
|
--epoch 21 \
|
|
--avg 15 \
|
|
--exp-dir ./zipformer/exp \
|
|
--max-duration 600 \
|
|
--decoding-method greedy_search
|
|
```
|
|
|
|
To export the model with onnx:
|
|
|
|
```shell
|
|
./zipformer/export-onnx.py \
|
|
--tokens ./data/lang/bbpe_2000/tokens.txt \
|
|
--use-averaged-model 0 \
|
|
--epoch 21 \
|
|
--avg 1 \
|
|
--exp-dir ./zipformer/exp
|
|
```
|
|
|
|
Word Error Rates (WERs) listed below:
|
|
|
|
| Datasets | ReazonSpeech | ReazonSpeech | LibriSpeech | LibriSpeech |
|
|
|----------------------|--------------|---------------|--------------------|-------------------|
|
|
| Zipformer WER (%) | dev | test | test-clean | test-other |
|
|
| greedy_search | 5.9 | 4.07 | 3.46 | 8.35 |
|
|
| modified_beam_search | 4.87 | 3.61 | 3.28 | 8.07 |
|
|
|
|
|
|
|
|
We also include WER% for common English ASR datasets:
|
|
|
|
| Corpus | WER (%) |
|
|
|-----------------------------|---------|
|
|
| CommonVoice | 29.03 |
|
|
| TED | 16.78 |
|
|
| MLS English (test-clean) | 8.64 |
|
|
|
|
|
|
And CER% for common Japanese datasets:
|
|
|
|
| Corpus | CER (%) |
|
|
|---------------|---------|
|
|
| JSUT | 8.13 |
|
|
| CommonVoice | 9.82 |
|
|
| TEDx | 11.64 |
|
|
|
|
|
|
Pre-trained model can be found here: [https://huggingface.co/reazon-research/reazonspeech-k2-v2-ja-en/tree/m](https://huggingface.co/reazon-research/reazonspeech-k2-v2-ja-en/tree/multi_ja_en_15k15k)
|
|
|