From ffb25d12dcfda20f68459eb3915939f94f9c9104 Mon Sep 17 00:00:00 2001 From: luomingshuang <739314837@qq.com> Date: Wed, 27 Jul 2022 16:41:23 +0800 Subject: [PATCH] add README.md and RESULTS.md --- egs/wenetspeech/ASR/README.md | 1 + egs/wenetspeech/ASR/RESULTS.md | 78 ++++++++++++++++++++++++++++++++-- 2 files changed, 76 insertions(+), 3 deletions(-) diff --git a/egs/wenetspeech/ASR/README.md b/egs/wenetspeech/ASR/README.md index c92f1b4e6..44e631b4a 100644 --- a/egs/wenetspeech/ASR/README.md +++ b/egs/wenetspeech/ASR/README.md @@ -13,6 +13,7 @@ The following table lists the differences among them. | | Encoder | Decoder | Comment | |---------------------------------------|---------------------|--------------------|-----------------------------| | `pruned_transducer_stateless2` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss | | +| `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss | | The decoder in `transducer_stateless` is modified from the paper [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/). diff --git a/egs/wenetspeech/ASR/RESULTS.md b/egs/wenetspeech/ASR/RESULTS.md index ea6658ddb..cc36ae4f2 100644 --- a/egs/wenetspeech/ASR/RESULTS.md +++ b/egs/wenetspeech/ASR/RESULTS.md @@ -1,12 +1,84 @@ ## Results +### WenetSpeech char-based training results (offline and streaming) (Pruned Transducer 5) + +#### 2022-07-22 + +Using the codes from this PR https://github.com/k2-fsa/icefall/pull/447. + +When training with the L subset, the CERs are + +**Offline**: +|decoding-method| epoch | avg | use-averaged-model | DEV | TEST-NET | TEST-MEETING| +|-- | -- | -- | -- | -- | -- | --| +|greedy_search | 4 | 1 | True | 8.22 | 9.03 | 14.54| +|modified_beam_search | 4 | 1 | True | **8.17** | **9.04** | **14.44**| +|fast_beam_search | 4 | 1 | True | 8.29 | 9.00 | 14.93| + +The offline training command for reproducing is given below: +``` +export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" + +./pruned_transducer_stateless5/train.py \ + --lang-dir data/lang_char \ + --exp-dir pruned_transducer_stateless5/exp_L_offline \ + --world-size 8 \ + --num-epochs 15 \ + --start-epoch 2 \ + --max-duration 120 \ + --valid-interval 3000 \ + --model-warm-step 3000 \ + --save-every-n 8000 \ + --average-period 1000 \ + --training-subset L +``` + +The tensorboard training log can be found at https://tensorboard.dev/experiment/SvnN2jfyTB2Hjqu22Z7ZoQ/#scalars . + + +A pre-trained offline model and decoding logs can be found at + +**Streaming**: +|decoding-method| epoch | avg | use-averaged-model | DEV | TEST-NET | TEST-MEETING| +|--|--|--|--|--|--|--| +| greedy_search | 7| 1| True | 8.78 | 10.12 | 16.16 | +| modified_beam_search | 7| 1| True| **8.53**| **9.95** | **15.81** | +| fast_beam_search | 7 | 1| True | 9.01 | 10.47 | 16.28 | + +The streaming training command for reproducing is given below: +``` +export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" + +./pruned_transducer_stateless5/train.py \ + --lang-dir data/lang_char \ + --exp-dir pruned_transducer_stateless5/exp_L_streaming \ + --world-size 8 \ + --num-epochs 15 \ + --start-epoch 1 \ + --max-duration 140 \ + --valid-interval 3000 \ + --model-warm-step 3000 \ + --save-every-n 8000 \ + --average-period 1000 \ + --training-subset L \ + --dynamic-chunk-training True \ + --causal-convolution True \ + --short-chunk-size 25 \ + --num-left-chunks 4 +``` + +The tensorboard training log can be found at https://tensorboard.dev/experiment/E2NXPVflSOKWepzJ1a1uDQ/#scalars . + + +A pre-trained offline model and decoding logs can be found at + ### WenetSpeech char-based training results (Pruned Transducer 2) #### 2022-05-19 Using the codes from this PR https://github.com/k2-fsa/icefall/pull/349. -When training with the L subset, the WERs are +When training with the L subset, the CERs are | | dev | test-net | test-meeting | comment | |------------------------------------|-------|----------|--------------|------------------------------------------| @@ -72,7 +144,7 @@ avg=2 --max-states 8 ``` -When training with the M subset, the WERs are +When training with the M subset, the CERs are | | dev | test-net | test-meeting | comment | |------------------------------------|--------|-----------|---------------|-------------------------------------------| @@ -81,7 +153,7 @@ When training with the M subset, the WERs are | fast beam search (set as default) | 10.18 | 11.10 | 19.32 | --epoch 29, --avg 11, --max-duration 1500 | -When training with the S subset, the WERs are +When training with the S subset, the CERs are | | dev | test-net | test-meeting | comment | |------------------------------------|--------|-----------|---------------|-------------------------------------------|