Update results for streaming Emformer.

This commit is contained in:
Fangjun Kuang 2022-06-01 08:32:36 +08:00
parent 1b909d9178
commit a6f4bc77c8
2 changed files with 65 additions and 0 deletions

View File

@ -22,6 +22,7 @@ The following table lists the differences among them.
| `pruned_transducer_stateless4` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless2 + save averaged models periodically during training | | `pruned_transducer_stateless4` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless2 + save averaged models periodically during training |
| `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + more layers + random combiner| | `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + more layers + random combiner|
| `pruned_transducer_stateless6` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + distillation with hubert| | `pruned_transducer_stateless6` | Conformer(modified) | Embedding + Conv1d | same as pruned_transducer_stateless4 + distillation with hubert|
| `pruned_stateless_emformer_rnnt2` | Emformer(from torchaudio) | Embedding + Conv1d | Using Emformer from torchaudio for streaming ASR|
The decoder in `transducer_stateless` is modified from the paper The decoder in `transducer_stateless` is modified from the paper

View File

@ -1,5 +1,69 @@
## Results ## Results
### LibriSpeech BPE training results (Pruned Stateless Emformer RNN-T)
[pruned_stateless_emformer_rnnt2](./pruned_stateless_emformer_rnnt2)
Use [Emformer](https://arxiv.org/abs/2010.10759) from [torchaudio](https://github.com/pytorch/audio)
for streaming ASR. The Emformer model is imported from torchaudio without modifications.
| | test-clean | test-other | comment |
|-------------------------------------|------------|------------|----------------------------------------|
| greedy search (max sym per frame 1) | 4.28 | 11.42 | --epoch 39 --avg 6 --max-duration 600 |
| modified beam search | 4.22 | 11.16 | --epoch 39 --avg 6 --max-duration 600 |
| fast beam search | 4.29 | 11.26 | --epoch 39 --avg 6 --max-duration 600 |
The training commands are:
```bash
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
./pruned_stateless_emformer_rnnt2/train.py \
--world-size 8 \
--num-epochs 40 \
--start-epoch 1 \
--exp-dir pruned_stateless_emformer_rnnt2/exp-full \
--full-libri 1 \
--use-fp16 0 \
--max-duration 200 \
--prune-range 5 \
--lm-scale 0.25 \
--master-port 12358 \
--num-encoder-layers 18 \
--left-context-length 128 \
--segment-length 8 \
--right-context-length 4
```
The tensorboard log can be found at
<https://tensorboard.dev/experiment/ZyiqhAhmRjmr49xml4ofLw/>
The decoding commands are:
```bash
for m in greedy_search fast_beam_search modified_beam_search; do
for epoch in 39; do
for avg in 6; do
./pruned_stateless_emformer_rnnt2/decode.py \
--epoch $epoch \
--avg $avg \
--use-averaged-model 1 \
--exp-dir pruned_stateless_emformer_rnnt2/exp-full \
--max-duration 50 \
--decoding-method $m \
--num-encoder-layers 18 \
--left-context-length 128 \
--segment-length 8 \
--right-context-length 4
done
done
done
```
You can find a pretrained model, training logs, decoding logs, and decoding
results at:
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-stateless-emformer-rnnt2-2022-06-01>
### LibriSpeech BPE training results (Pruned Stateless Transducer 5) ### LibriSpeech BPE training results (Pruned Stateless Transducer 5)
[pruned_transducer_stateless5](./pruned_transducer_stateless5) [pruned_transducer_stateless5](./pruned_transducer_stateless5)