mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-09-19 05:54:20 +00:00
add README.md and RESULTS.md
This commit is contained in:
parent
a5a7514ef0
commit
ffb25d12dc
@ -13,6 +13,7 @@ The following table lists the differences among them.
|
||||
| | Encoder | Decoder | Comment |
|
||||
|---------------------------------------|---------------------|--------------------|-----------------------------|
|
||||
| `pruned_transducer_stateless2` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss | |
|
||||
| `pruned_transducer_stateless5` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss | |
|
||||
|
||||
The decoder in `transducer_stateless` is modified from the paper
|
||||
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
|
||||
|
@ -1,12 +1,84 @@
|
||||
## Results
|
||||
|
||||
### WenetSpeech char-based training results (offline and streaming) (Pruned Transducer 5)
|
||||
|
||||
#### 2022-07-22
|
||||
|
||||
Using the codes from this PR https://github.com/k2-fsa/icefall/pull/447.
|
||||
|
||||
When training with the L subset, the CERs are
|
||||
|
||||
**Offline**:
|
||||
|decoding-method| epoch | avg | use-averaged-model | DEV | TEST-NET | TEST-MEETING|
|
||||
|-- | -- | -- | -- | -- | -- | --|
|
||||
|greedy_search | 4 | 1 | True | 8.22 | 9.03 | 14.54|
|
||||
|modified_beam_search | 4 | 1 | True | **8.17** | **9.04** | **14.44**|
|
||||
|fast_beam_search | 4 | 1 | True | 8.29 | 9.00 | 14.93|
|
||||
|
||||
The offline training command for reproducing is given below:
|
||||
```
|
||||
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
|
||||
|
||||
./pruned_transducer_stateless5/train.py \
|
||||
--lang-dir data/lang_char \
|
||||
--exp-dir pruned_transducer_stateless5/exp_L_offline \
|
||||
--world-size 8 \
|
||||
--num-epochs 15 \
|
||||
--start-epoch 2 \
|
||||
--max-duration 120 \
|
||||
--valid-interval 3000 \
|
||||
--model-warm-step 3000 \
|
||||
--save-every-n 8000 \
|
||||
--average-period 1000 \
|
||||
--training-subset L
|
||||
```
|
||||
|
||||
The tensorboard training log can be found at https://tensorboard.dev/experiment/SvnN2jfyTB2Hjqu22Z7ZoQ/#scalars .
|
||||
|
||||
|
||||
A pre-trained offline model and decoding logs can be found at <https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless5_offline>
|
||||
|
||||
**Streaming**:
|
||||
|decoding-method| epoch | avg | use-averaged-model | DEV | TEST-NET | TEST-MEETING|
|
||||
|--|--|--|--|--|--|--|
|
||||
| greedy_search | 7| 1| True | 8.78 | 10.12 | 16.16 |
|
||||
| modified_beam_search | 7| 1| True| **8.53**| **9.95** | **15.81** |
|
||||
| fast_beam_search | 7 | 1| True | 9.01 | 10.47 | 16.28 |
|
||||
|
||||
The streaming training command for reproducing is given below:
|
||||
```
|
||||
export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
|
||||
|
||||
./pruned_transducer_stateless5/train.py \
|
||||
--lang-dir data/lang_char \
|
||||
--exp-dir pruned_transducer_stateless5/exp_L_streaming \
|
||||
--world-size 8 \
|
||||
--num-epochs 15 \
|
||||
--start-epoch 1 \
|
||||
--max-duration 140 \
|
||||
--valid-interval 3000 \
|
||||
--model-warm-step 3000 \
|
||||
--save-every-n 8000 \
|
||||
--average-period 1000 \
|
||||
--training-subset L \
|
||||
--dynamic-chunk-training True \
|
||||
--causal-convolution True \
|
||||
--short-chunk-size 25 \
|
||||
--num-left-chunks 4
|
||||
```
|
||||
|
||||
The tensorboard training log can be found at https://tensorboard.dev/experiment/E2NXPVflSOKWepzJ1a1uDQ/#scalars .
|
||||
|
||||
|
||||
A pre-trained offline model and decoding logs can be found at <https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless5_streaming>
|
||||
|
||||
### WenetSpeech char-based training results (Pruned Transducer 2)
|
||||
|
||||
#### 2022-05-19
|
||||
|
||||
Using the codes from this PR https://github.com/k2-fsa/icefall/pull/349.
|
||||
|
||||
When training with the L subset, the WERs are
|
||||
When training with the L subset, the CERs are
|
||||
|
||||
| | dev | test-net | test-meeting | comment |
|
||||
|------------------------------------|-------|----------|--------------|------------------------------------------|
|
||||
@ -72,7 +144,7 @@ avg=2
|
||||
--max-states 8
|
||||
```
|
||||
|
||||
When training with the M subset, the WERs are
|
||||
When training with the M subset, the CERs are
|
||||
|
||||
| | dev | test-net | test-meeting | comment |
|
||||
|------------------------------------|--------|-----------|---------------|-------------------------------------------|
|
||||
@ -81,7 +153,7 @@ When training with the M subset, the WERs are
|
||||
| fast beam search (set as default) | 10.18 | 11.10 | 19.32 | --epoch 29, --avg 11, --max-duration 1500 |
|
||||
|
||||
|
||||
When training with the S subset, the WERs are
|
||||
When training with the S subset, the CERs are
|
||||
|
||||
| | dev | test-net | test-meeting | comment |
|
||||
|------------------------------------|--------|-----------|---------------|-------------------------------------------|
|
||||
|
Loading…
x
Reference in New Issue
Block a user