mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 10:02:22 +00:00
Update RESULTS.md, adding results and model links of zipformer-small/medium CTC/AED models (#1683)
This commit is contained in:
parent
1c3d992a39
commit
785f3f0bcf
@ -50,7 +50,7 @@ We place an additional Conv1d layer right after the input embedding layer.
|
||||
| `conformer-ctc2` | Reworked Conformer | Use auxiliary attention head |
|
||||
| `conformer-ctc3` | Reworked Conformer | Streaming version + delay penalty |
|
||||
| `zipformer-ctc` | Zipformer | Use auxiliary attention head |
|
||||
| `zipformer` | Upgraded Zipformer | Use auxiliary transducer head | The latest recipe |
|
||||
| `zipformer` | Upgraded Zipformer | Use auxiliary transducer head / attention-decoder head | The latest recipe |
|
||||
|
||||
# MMI
|
||||
|
||||
|
@ -8,6 +8,117 @@ See <https://github.com/k2-fsa/icefall/pull/1389> for more details.
|
||||
|
||||
#### Non-streaming
|
||||
|
||||
##### small-scale model, number of model parameters: 46282107, i.e., 46.3 M
|
||||
|
||||
You can find a pretrained model, training logs, decoding logs, and decoding results at:
|
||||
<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-small-ctc-attention-decoder-2024-07-09>
|
||||
|
||||
You can use <https://github.com/k2-fsa/sherpa> to deploy it.
|
||||
|
||||
| decoding method | test-clean | test-other | comment |
|
||||
|--------------------------------------|------------|------------|---------------------|
|
||||
| ctc-decoding | 3.04 | 7.04 | --epoch 50 --avg 30 |
|
||||
| attention-decoder-rescoring-no-ngram | 2.45 | 6.08 | --epoch 50 --avg 30 |
|
||||
|
||||
The training command is:
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES="0,1"
|
||||
# For non-streaming model training:
|
||||
./zipformer/train.py \
|
||||
--world-size 2 \
|
||||
--num-epochs 50 \
|
||||
--start-epoch 1 \
|
||||
--use-fp16 1 \
|
||||
--exp-dir zipformer/exp-small \
|
||||
--full-libri 1 \
|
||||
--use-ctc 1 \
|
||||
--use-transducer 0 \
|
||||
--use-attention-decoder 1 \
|
||||
--ctc-loss-scale 0.1 \
|
||||
--attention-decoder-loss-scale 0.9 \
|
||||
--num-encoder-layers 2,2,2,2,2,2 \
|
||||
--feedforward-dim 512,768,768,768,768,768 \
|
||||
--encoder-dim 192,256,256,256,256,256 \
|
||||
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
||||
--base-lr 0.04 \
|
||||
--max-duration 1700 \
|
||||
--master-port 12345
|
||||
```
|
||||
|
||||
The decoding command is:
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES="0"
|
||||
for m in ctc-decoding attention-decoder-rescoring-no-ngram; do
|
||||
./zipformer/ctc_decode.py \
|
||||
--epoch 50 \
|
||||
--avg 30 \
|
||||
--exp-dir zipformer/exp-small \
|
||||
--use-ctc 1 \
|
||||
--use-transducer 0 \
|
||||
--use-attention-decoder 1 \
|
||||
--attention-decoder-loss-scale 0.9 \
|
||||
--num-encoder-layers 2,2,2,2,2,2 \
|
||||
--feedforward-dim 512,768,768,768,768,768 \
|
||||
--encoder-dim 192,256,256,256,256,256 \
|
||||
--encoder-unmasked-dim 192,192,192,192,192,192 \
|
||||
--max-duration 100 \
|
||||
--causal 0 \
|
||||
--num-paths 100 \
|
||||
--decoding-method $m
|
||||
done
|
||||
```
|
||||
|
||||
##### medium-scale model, number of model parameters: 89987295, i.e., 90.0 M
|
||||
|
||||
You can find a pretrained model, training logs, decoding logs, and decoding results at:
|
||||
<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-ctc-attention-decoder-2024-07-08>
|
||||
|
||||
You can use <https://github.com/k2-fsa/sherpa> to deploy it.
|
||||
|
||||
| decoding method | test-clean | test-other | comment |
|
||||
|--------------------------------------|------------|------------|---------------------|
|
||||
| ctc-decoding | 2.46 | 5.57 | --epoch 50 --avg 22 |
|
||||
| attention-decoder-rescoring-no-ngram | 2.23 | 4.98 | --epoch 50 --avg 22 |
|
||||
|
||||
The training command is:
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES="0,1,2,3"
|
||||
# For non-streaming model training:
|
||||
./zipformer/train.py \
|
||||
--world-size 4 \
|
||||
--num-epochs 50 \
|
||||
--start-epoch 1 \
|
||||
--use-fp16 1 \
|
||||
--exp-dir zipformer/exp \
|
||||
--full-libri 1 \
|
||||
--use-ctc 1 \
|
||||
--use-transducer 0 \
|
||||
--use-attention-decoder 1 \
|
||||
--ctc-loss-scale 0.1 \
|
||||
--attention-decoder-loss-scale 0.9 \
|
||||
--max-duration 1200 \
|
||||
--master-port 12345
|
||||
```
|
||||
|
||||
The decoding command is:
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES="0"
|
||||
for m in ctc-decoding attention-decoder-rescoring-no-ngram; do
|
||||
./zipformer/ctc_decode.py \
|
||||
--epoch 50 \
|
||||
--avg 22 \
|
||||
--exp-dir zipformer/exp \
|
||||
--use-ctc 1 \
|
||||
--use-transducer 0 \
|
||||
--use-attention-decoder 1 \
|
||||
--attention-decoder-loss-scale 0.9 \
|
||||
--max-duration 100 \
|
||||
--causal 0 \
|
||||
--num-paths 100 \
|
||||
--decoding-method $m
|
||||
done
|
||||
```
|
||||
|
||||
##### large-scale model, number of model parameters: 174319650, i.e., 174.3 M
|
||||
|
||||
You can find a pretrained model, training logs, decoding logs, and decoding results at:
|
||||
@ -15,8 +126,6 @@ You can find a pretrained model, training logs, decoding logs, and decoding resu
|
||||
|
||||
You can use <https://github.com/k2-fsa/sherpa> to deploy it.
|
||||
|
||||
Results of the CTC head:
|
||||
|
||||
| decoding method | test-clean | test-other | comment |
|
||||
|--------------------------------------|------------|------------|---------------------|
|
||||
| ctc-decoding | 2.29 | 5.14 | --epoch 50 --avg 29 |
|
||||
|
Loading…
x
Reference in New Issue
Block a user