mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-27 10:44:19 +00:00
update RESULTS.md
This commit is contained in:
parent
acdc333971
commit
4e7cdb5d7e
@ -1,5 +1,75 @@
|
|||||||
## Results
|
## Results
|
||||||
|
|
||||||
|
### zipformer (zipformer + CTC/AED)
|
||||||
|
|
||||||
|
See <https://github.com/k2-fsa/icefall/pull/1389> for more details.
|
||||||
|
|
||||||
|
[zipformer](./zipformer)
|
||||||
|
|
||||||
|
#### Non-streaming
|
||||||
|
|
||||||
|
##### large-scale model, number of model parameters: 174319650, i.e., 174.3 M
|
||||||
|
|
||||||
|
You can find a pretrained model, training logs, decoding logs, and decoding results at:
|
||||||
|
<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-large-ctc-attention-decoder-2024-05-26>
|
||||||
|
|
||||||
|
You can use <https://github.com/k2-fsa/sherpa> to deploy it.
|
||||||
|
|
||||||
|
Results of the CTC head:
|
||||||
|
|
||||||
|
| decoding method | test-clean | test-other | comment |
|
||||||
|
|--------------------------------------|------------|------------|---------------------|
|
||||||
|
| ctc-decoding | 2.29 | 5.14 | --epoch 50 --avg 29 |
|
||||||
|
| attention-decoder-rescoring-no-ngram | 2.1 | 4.57 | --epoch 50 --avg 29 |
|
||||||
|
|
||||||
|
The training command is:
|
||||||
|
```bash
|
||||||
|
export CUDA_VISIBLE_DEVICES="0,1,2,3"
|
||||||
|
# For non-streaming model training:
|
||||||
|
./zipformer/train.py \
|
||||||
|
--world-size 4 \
|
||||||
|
--num-epochs 50 \
|
||||||
|
--start-epoch 1 \
|
||||||
|
--use-fp16 1 \
|
||||||
|
--exp-dir zipformer/exp-large \
|
||||||
|
--full-libri 1 \
|
||||||
|
--use-ctc 1 \
|
||||||
|
--use-transducer 0 \
|
||||||
|
--use-attention-decoder 1 \
|
||||||
|
--ctc-loss-scale 0.1 \
|
||||||
|
--attention-decoder-loss-scale 0.9 \
|
||||||
|
--num-encoder-layers 2,2,4,5,4,2 \
|
||||||
|
--feedforward-dim 512,768,1536,2048,1536,768 \
|
||||||
|
--encoder-dim 192,256,512,768,512,256 \
|
||||||
|
--encoder-unmasked-dim 192,192,256,320,256,192 \
|
||||||
|
--max-duration 1200 \
|
||||||
|
--master-port 12345
|
||||||
|
```
|
||||||
|
|
||||||
|
The decoding command is:
|
||||||
|
```bash
|
||||||
|
export CUDA_VISIBLE_DEVICES="0"
|
||||||
|
for m in ctc-decoding attention-decoder-rescoring-no-ngram; do
|
||||||
|
./zipformer/ctc_decode.py \
|
||||||
|
--epoch 50 \
|
||||||
|
--avg 29 \
|
||||||
|
--exp-dir zipformer/exp-large \
|
||||||
|
--use-ctc 1 \
|
||||||
|
--use-transducer 0 \
|
||||||
|
--use-attention-decoder 1 \
|
||||||
|
--attention-decoder-loss-scale 0.9 \
|
||||||
|
--num-encoder-layers 2,2,4,5,4,2 \
|
||||||
|
--feedforward-dim 512,768,1536,2048,1536,768 \
|
||||||
|
--encoder-dim 192,256,512,768,512,256 \
|
||||||
|
--encoder-unmasked-dim 192,192,256,320,256,192 \
|
||||||
|
--max-duration 100 \
|
||||||
|
--causal 0 \
|
||||||
|
--num-paths 100 \
|
||||||
|
--decoding-method $m
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
### zipformer (zipformer + pruned stateless transducer + CTC)
|
### zipformer (zipformer + pruned stateless transducer + CTC)
|
||||||
|
|
||||||
See <https://github.com/k2-fsa/icefall/pull/1111> for more details.
|
See <https://github.com/k2-fsa/icefall/pull/1111> for more details.
|
||||||
|
@ -72,6 +72,8 @@ class AsrModel(nn.Module):
|
|||||||
Whether use transducer head. Default: True.
|
Whether use transducer head. Default: True.
|
||||||
use_ctc:
|
use_ctc:
|
||||||
Whether use CTC head. Default: False.
|
Whether use CTC head. Default: False.
|
||||||
|
use_attention_decoder:
|
||||||
|
Whether use attention-decoder head. Default: False.
|
||||||
"""
|
"""
|
||||||
super().__init__()
|
super().__init__()
|
||||||
|
|
||||||
|
@ -48,7 +48,8 @@ It supports training with:
|
|||||||
- transducer loss (default), with `--use-transducer True --use-ctc False`
|
- transducer loss (default), with `--use-transducer True --use-ctc False`
|
||||||
- ctc loss (not recommended), with `--use-transducer False --use-ctc True`
|
- ctc loss (not recommended), with `--use-transducer False --use-ctc True`
|
||||||
- transducer loss & ctc loss, with `--use-transducer True --use-ctc True`
|
- transducer loss & ctc loss, with `--use-transducer True --use-ctc True`
|
||||||
- ctc loss & attention decoder loss, with `--use-ctc True --use-attention-decoder True `
|
- ctc loss & attention decoder loss, no transducer loss,
|
||||||
|
with `--use-transducer False --use-ctc True --use-attention-decoder True`
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user