Machiko Bailey 0855b0338a
Merge japanese-to-english multilingual branch (#1860)
* add streaming support to reazonresearch

* update README for streaming

* Update RESULTS.md

* add onnx decode

---------

Co-authored-by: root <root@KDA03.cm.cluster>
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
Co-authored-by: root <root@KDA01.cm.cluster>
Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2025-02-04 01:33:09 +08:00

88 lines
2.3 KiB
Markdown

## Results
### Zipformer
#### Non-streaming
##### large-scaled model, number of model parameters: 159337842, i.e., 159.34 M
| decoding method | In-Distribution CER | JSUT | CommonVoice | TEDx | comment |
| :------------------: | :-----------------: | :--: | :---------: | :---: | :----------------: |
| greedy search | 4.2 | 6.7 | 7.84 | 17.9 | --epoch 39 --avg 7 |
| modified beam search | 4.13 | 6.77 | 7.69 | 17.82 | --epoch 39 --avg 7 |
The training command is:
```shell
./zipformer/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp-large \
--causal 0 \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--lang data/lang_char \
--max-duration 1600
```
The decoding command is:
```shell
./zipformer/decode.py \
--epoch 40 \
--avg 16 \
--exp-dir zipformer/exp-large \
--max-duration 600 \
--causal 0 \
--decoding-method greedy_search \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--lang data/lang_char \
--blank-penalty 0
```
#### Streaming
We have not completed evaluation of our models yet and will add evaluation results here once it's completed.
The training command is:
```shell
./zipformer/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp-large \
--causal 1 \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--lang data/lang_char \
--max-duration 1600
```
The decoding command is:
```shell
./zipformer/streaming_decode.py \
--epoch 28 \
--avg 15 \
--causal 1 \
--chunk-size 32 \
--left-context-frames 256 \
--exp-dir ./zipformer/exp-large \
--lang data/lang_char \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192
```