mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-09-09 09:04:19 +00:00
Update results for streaming Emformer.
This commit is contained in:
parent
1b909d9178
commit
a6f4bc77c8
@ -22,6 +22,7 @@ The following table lists the differences among them.
|
|||||||
| `pruned_transducer_stateless4` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless2 + save averaged models periodically during training |
|
| `pruned_transducer_stateless4` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless2 + save averaged models periodically during training |
|
||||||
| `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + more layers + random combiner|
|
| `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + more layers + random combiner|
|
||||||
| `pruned_transducer_stateless6` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + distillation with hubert|
|
| `pruned_transducer_stateless6` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + distillation with hubert|
|
||||||
|
| `pruned_stateless_emformer_rnnt2` | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
|
||||||
|
|
||||||
|
|
||||||
The decoder in `transducer_stateless` is modified from the paper
|
The decoder in `transducer_stateless` is modified from the paper
|
||||||
|
@ -1,5 +1,69 @@
|
|||||||
## Results
|
## Results
|
||||||
|
|
||||||
|
### LibriSpeech BPE training results (Pruned Stateless Emformer RNN-T)
|
||||||
|
|
||||||
|
[pruned_stateless_emformer_rnnt2](./pruned_stateless_emformer_rnnt2)
|
||||||
|
|
||||||
|
Use [Emformer](https://arxiv.org/abs/2010.10759) from [torchaudio](https://github.com/pytorch/audio)
|
||||||
|
for streaming ASR. The Emformer model is imported from torchaudio without modifications.
|
||||||
|
|
||||||
|
| | test-clean | test-other | comment |
|
||||||
|
|-------------------------------------|------------|------------|----------------------------------------|
|
||||||
|
| greedy search (max sym per frame 1) | 4.28 | 11.42 | --epoch 39 --avg 6 --max-duration 600 |
|
||||||
|
| modified beam search | 4.22 | 11.16 | --epoch 39 --avg 6 --max-duration 600 |
|
||||||
|
| fast beam search | 4.29 | 11.26 | --epoch 39 --avg 6 --max-duration 600 |
|
||||||
|
|
||||||
|
|
||||||
|
The training commands are:
|
||||||
|
```bash
|
||||||
|
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
|
||||||
|
|
||||||
|
./pruned_stateless_emformer_rnnt2/train.py \
|
||||||
|
--world-size 8 \
|
||||||
|
--num-epochs 40 \
|
||||||
|
--start-epoch 1 \
|
||||||
|
--exp-dir pruned_stateless_emformer_rnnt2/exp-full \
|
||||||
|
--full-libri 1 \
|
||||||
|
--use-fp16 0 \
|
||||||
|
--max-duration 200 \
|
||||||
|
--prune-range 5 \
|
||||||
|
--lm-scale 0.25 \
|
||||||
|
--master-port 12358 \
|
||||||
|
--num-encoder-layers 18 \
|
||||||
|
--left-context-length 128 \
|
||||||
|
--segment-length 8 \
|
||||||
|
--right-context-length 4
|
||||||
|
```
|
||||||
|
|
||||||
|
The tensorboard log can be found at
|
||||||
|
<https://tensorboard.dev/experiment/ZyiqhAhmRjmr49xml4ofLw/>
|
||||||
|
|
||||||
|
The decoding commands are:
|
||||||
|
```bash
|
||||||
|
for m in greedy_search fast_beam_search modified_beam_search; do
|
||||||
|
for epoch in 39; do
|
||||||
|
for avg in 6; do
|
||||||
|
./pruned_stateless_emformer_rnnt2/decode.py \
|
||||||
|
--epoch $epoch \
|
||||||
|
--avg $avg \
|
||||||
|
--use-averaged-model 1 \
|
||||||
|
--exp-dir pruned_stateless_emformer_rnnt2/exp-full \
|
||||||
|
--max-duration 50 \
|
||||||
|
--decoding-method $m \
|
||||||
|
--num-encoder-layers 18 \
|
||||||
|
--left-context-length 128 \
|
||||||
|
--segment-length 8 \
|
||||||
|
--right-context-length 4
|
||||||
|
done
|
||||||
|
done
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
You can find a pretrained model, training logs, decoding logs, and decoding
|
||||||
|
results at:
|
||||||
|
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01>
|
||||||
|
|
||||||
|
|
||||||
### LibriSpeech BPE training results (Pruned Stateless Transducer 5)
|
### LibriSpeech BPE training results (Pruned Stateless Transducer 5)
|
||||||
|
|
||||||
[pruned_transducer_stateless5](./pruned_transducer_stateless5)
|
[pruned_transducer_stateless5](./pruned_transducer_stateless5)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user