icefall/egs/reazonspeech/ASR/RESULTS.md

## Results

### Zipformer

#### Non-streaming

##### large-scaled model, number of model parameters: 159337842, i.e., 159.34 M

|   decoding method    | In-Distribution CER | JSUT | CommonVoice | TEDx  |      comment       |
| :------------------: | :-----------------: | :--: | :---------: | :---: | :----------------: |
|    greedy search     |         4.2         | 6.7  |    7.84     | 17.9  | --epoch 39 --avg 7 |
| modified beam search |        4.13         | 6.77 |    7.69     | 17.82 | --epoch 39 --avg 7 |

The training command is:

```shell
./zipformer/train.py \
  --world-size 8 \
  --num-epochs 40 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-large \
  --causal 0 \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192 \
  --lang data/lang_char \
  --max-duration 1600
```

The decoding command is:

```shell
./zipformer/decode.py \
    --epoch 40 \
    --avg 16 \
    --exp-dir zipformer/exp-large \
    --max-duration 600 \
    --causal 0 \
    --decoding-method greedy_search \
    --num-encoder-layers 2,2,4,5,4,2 \
    --feedforward-dim 512,768,1536,2048,1536,768 \
    --encoder-dim 192,256,512,768,512,256 \
    --encoder-unmasked-dim 192,192,256,320,256,192 \
    --lang data/lang_char \
    --blank-penalty 0
```

#### Streaming

We have not completed evaluation of our models yet and will add evaluation results here once it's completed.

The training command is:
```shell
./zipformer/train.py \
  --world-size 8 \
  --num-epochs 40 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-large \
  --causal 1 \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192 \
  --lang data/lang_char \
  --max-duration 1600
```

The decoding command is:

```shell
./zipformer/streaming_decode.py \
  --epoch 28 \
  --avg 15 \
  --causal 1 \
  --chunk-size 32 \
  --left-context-frames 256 \
  --exp-dir ./zipformer/exp-large \
  --lang data/lang_char \
  --num-encoder-layers 2,2,4,5,4,2 \
  --feedforward-dim 512,768,1536,2048,1536,768 \
  --encoder-dim 192,256,512,768,512,256 \
  --encoder-unmasked-dim 192,192,256,320,256,192
```