mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-12-09 14:05:33 +00:00
Update RESULTS.md
This commit is contained in:
parent
94cf8c3afb
commit
0ca7595d25
@ -8,20 +8,19 @@ The training command is:
|
||||
|
||||
```shell
|
||||
./zipformer/train.py \
|
||||
--bilingual 1 \
|
||||
--world-size 4 \
|
||||
--num-epochs 30 \
|
||||
--num-epochs 21 \
|
||||
--start-epoch 1 \
|
||||
--use-fp16 1 \
|
||||
--exp-dir zipformer/exp \
|
||||
--max-duration 600
|
||||
--manifest-dir data/manifests
|
||||
```
|
||||
|
||||
The decoding command is:
|
||||
|
||||
```shell
|
||||
./zipformer/decode.py \
|
||||
--epoch 28 \
|
||||
--epoch 21 \
|
||||
--avg 15 \
|
||||
--exp-dir ./zipformer/exp \
|
||||
--max-duration 600 \
|
||||
@ -31,8 +30,14 @@ The decoding command is:
|
||||
To export the model with onnx:
|
||||
|
||||
```shell
|
||||
./zipformer/export-onnx.py --tokens data/lang_bbpe_2000/tokens.txt --use-averaged-model 0 --epoch 35 --avg 1 --exp-dir zipformer/exp --num-encoder-layers "2,2,3,4,3,2" --downsampling-factor "1,2,4,8,4,2" --feedforward-dim "512,768,1024,1536,1024,768" --num-heads "4,4,4,8,4,4" --encoder-dim "192,256,384,512,384,256" --query-head-dim 32 --value-head-dim 12 --pos-head-dim 4 --pos-dim 48 --encoder-unmasked-dim "192,192,256,256,256,192" --cnn-module-kernel "31,31,15,15,15,31" --decoder-dim 512 --joiner-dim 512 --causal False --chunk-size "16,32,64,-1" --left-context-frames "64,128,256,-1" --fp16 True
|
||||
./zipformer/export-onnx.py \
|
||||
--tokens ./data/lang/bbpe_2000/tokens.txt \
|
||||
--use-averaged-model 0 \
|
||||
--epoch 21 \
|
||||
--avg 1 \
|
||||
--exp-dir ./zipformer/exp
|
||||
```
|
||||
|
||||
Word Error Rates (WERs) listed below:
|
||||
|
||||
| Datasets | ReazonSpeech | ReazonSpeech | LibriSpeech | LibriSpeech |
|
||||
@ -42,11 +47,26 @@ Word Error Rates (WERs) listed below:
|
||||
| modified_beam_search | 4.87 | 3.61 | 3.28 | 8.07 |
|
||||
|
||||
|
||||
Character Error Rates (CERs) for Japanese listed below:
|
||||
| Decoding Method | In-Distribution CER | JSUT | CommonVoice | TEDx |
|
||||
| :------------------: | :-----------------: | :--: | :---------: | :---: |
|
||||
| greedy search | 12.56 | 6.93 | 9.75 | 9.67 |
|
||||
| modified beam search | 11.59 | 6.97 | 9.55 | 9.51 |
|
||||
|
||||
We also include WER% for common English ASR datasets:
|
||||
|
||||
| Corpus | WER (%) |
|
||||
|-----------------------------|---------|
|
||||
| LibriSpeech (test-clean) | 3.49 |
|
||||
| LibriSpeech (test-other) | 7.64 |
|
||||
| CommonVoice | 39.87 |
|
||||
| TED | 23.92 |
|
||||
| MLS English (test-clean) | 10.16 |
|
||||
|
||||
|
||||
And CER% for common Japanese datasets:
|
||||
|
||||
| Corpus | CER (%) |
|
||||
|---------------|---------|
|
||||
| JSUT | 10.04 |
|
||||
| CommonVoice | 10.39 |
|
||||
| TEDx | 12.22 |
|
||||
|
||||
|
||||
Pre-trained model can be found here: https://huggingface.co/reazon-research/reazonspeech-k2-v2-ja-en/tree/main
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user