mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-09 01:52:41 +00:00
Updating RESULTS.md; fix in beam_search.py
This commit is contained in:
parent
a92133ef96
commit
e8eb0b94d9
@ -9,13 +9,15 @@ for how to run models in this recipe.
|
|||||||
There are various folders containing the name `transducer` in this folder.
|
There are various folders containing the name `transducer` in this folder.
|
||||||
The following table lists the differences among them.
|
The following table lists the differences among them.
|
||||||
|
|
||||||
| | Encoder | Decoder | Comment |
|
| | Encoder | Decoder | Comment |
|
||||||
|---------------------------------------|-----------|--------------------|---------------------------------------------------|
|
|---------------------------------------|---------------------|--------------------|---------------------------------------------------|
|
||||||
| `transducer` | Conformer | LSTM | |
|
| `transducer` | Conformer | LSTM | |
|
||||||
| `transducer_stateless` | Conformer | Embedding + Conv1d | |
|
| `transducer_stateless` | Conformer | Embedding + Conv1d | |
|
||||||
| `transducer_lstm` | LSTM | LSTM | |
|
| `transducer_lstm` | LSTM | LSTM | |
|
||||||
| `transducer_stateless_multi_datasets` | Conformer | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
|
| `transducer_stateless_multi_datasets` | Conformer | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
|
||||||
| `pruned_transducer_stateless` | Conformer | Embedding + Conv1d | Using k2 pruned RNN-T loss |
|
| `pruned_transducer_stateless` | Conformer | Embedding + Conv1d | Using k2 pruned RNN-T loss |
|
||||||
|
| `pruned_transducer_stateless2` | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss |
|
||||||
|
|
||||||
|
|
||||||
The decoder in `transducer_stateless` is modified from the paper
|
The decoder in `transducer_stateless` is modified from the paper
|
||||||
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
|
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
|
||||||
|
@ -1,5 +1,79 @@
|
|||||||
## Results
|
## Results
|
||||||
|
|
||||||
|
### LibriSpeech BPE training results (Pruned Transducer 2)
|
||||||
|
|
||||||
|
This is with a reworked version of the conformer encoder, with many changes.
|
||||||
|
|
||||||
|
[pruned_transducer_stateless2](./pruned_transducer_stateless2)
|
||||||
|
|
||||||
|
using commit `34aad74a2c849542dd5f6359c9e6b527e8782fd6`.
|
||||||
|
See <https://github.com/k2-fsa/icefall/pull/288>
|
||||||
|
|
||||||
|
The WERs are:
|
||||||
|
|
||||||
|
| | test-clean | test-other | comment |
|
||||||
|
|-------------------------------------|------------|------------|-------------------------------------------------------------------------------|
|
||||||
|
| greedy search (max sym per frame 1) | 2.62 | 6.37 | --epoch 25, --avg 8, --max-duration 600 |
|
||||||
|
| fast beam search | 2.61 | 6.17 | --epoch 25, --avg 8, --max-duration 600 --decoding-method fast_beam_search |
|
||||||
|
| modified beam search | 2.59 | 6.19 | --epoch 25, --avg 8, --max-duration 600 --decoding-method modified_beam_search|
|
||||||
|
|
||||||
|
|
||||||
|
The train and decode commands are:
|
||||||
|
`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp --world-size 8 --num-epochs 26 --full-libri 1 --max-duration 300`
|
||||||
|
and:
|
||||||
|
`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp --epoch 25 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`
|
||||||
|
|
||||||
|
The Tensorboard log is at <https://tensorboard.dev/experiment/UKI6z9BvT6iaUkXPxex1OA>
|
||||||
|
|
||||||
|
|
||||||
|
The WERs for librispeech 100 hours are:
|
||||||
|
|
||||||
|
Trained with one job:
|
||||||
|
`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws1 --world-size 1 --num-epochs 40 --full-libri 0 --max-duration 300`
|
||||||
|
and decoded with:
|
||||||
|
`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws1 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
|
||||||
|
|
||||||
|
The Tensorboard log is at <https://tensorboard.dev/experiment/AhnhooUBRPqTnaggoqo7lg> (learning rate
|
||||||
|
schedule is not visible due to a since-fixed bug).
|
||||||
|
|
||||||
|
| | test-clean | test-other | comment |
|
||||||
|
|-------------------------------------|------------|------------|-------------------------------------------------------|
|
||||||
|
| greedy search (max sym per frame 1) | 7.12 | 18.42 | --epoch 19 --avg 8 |
|
||||||
|
| greedy search (max sym per frame 1) | 6.71 | 17.77 | --epoch 29 --avg 8 |
|
||||||
|
| fast beam search | 6.58 | 17.27 | --epoch 19 --avg 8 --decoding-method fast_beam_search |
|
||||||
|
|
||||||
|
Trained with two jobs:
|
||||||
|
`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws2 --world-size 2 --num-epochs 40 --full-libri 0 --max-duration 300`
|
||||||
|
and decoded with:
|
||||||
|
`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws2 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
|
||||||
|
|
||||||
|
The Tensorboard log is at <https://tensorboard.dev/experiment/dvOC9wsrSdWrAIdsebJILg/>
|
||||||
|
(learning rate schedule is not visible due to a since-fixed bug).
|
||||||
|
|
||||||
|
| | test-clean | test-other | comment |
|
||||||
|
|-------------------------------------|------------|------------|-----------------------|
|
||||||
|
| greedy search (max sym per frame 1) | 7.05 | 18.77 | --epoch 19, --avg 8 |
|
||||||
|
| greedy search (max sym per frame 1) | 6.82 | 18.14 | --epoch 29, --avg 8 |
|
||||||
|
| greedy search (max sym per frame 1) | 6.81 | 17.66 | --epoch 30, --avg 10 |
|
||||||
|
|
||||||
|
|
||||||
|
Trained with 4 jobs:
|
||||||
|
`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws4 --world-size 4 --num-epochs 40 --full-libri 0 --max-duration 300`
|
||||||
|
and decoded with:
|
||||||
|
`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws4 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
|
||||||
|
|
||||||
|
|
||||||
|
The Tensorboard log is at <https://tensorboard.dev/experiment/a3T0TyC0R5aLj5bmFbRErA/>
|
||||||
|
(learning rate schedule is not visible due to a since-fixed bug).
|
||||||
|
|
||||||
|
| | test-clean | test-other | comment |
|
||||||
|
|-------------------------------------|------------|------------|-----------------------|
|
||||||
|
| greedy search (max sym per frame 1) | 7.31 | 19.55 | --epoch 19, --avg 8 |
|
||||||
|
| greedy search (max sym per frame 1) | 7.08 | 18.59 | --epoch 29, --avg 8 |
|
||||||
|
| greedy search (max sym per frame 1) | 6.86 | 18.29 | --epoch 30, --avg 10 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### LibriSpeech BPE training results (Pruned Transducer)
|
### LibriSpeech BPE training results (Pruned Transducer)
|
||||||
|
|
||||||
Conformer encoder + non-current decoder. The decoder
|
Conformer encoder + non-current decoder. The decoder
|
||||||
@ -23,6 +97,10 @@ The WERs are:
|
|||||||
| modified beam search (beam size 4) | 2.56 | 6.27 | --epoch 42, --avg 11, --max-duration 100 |
|
| modified beam search (beam size 4) | 2.56 | 6.27 | --epoch 42, --avg 11, --max-duration 100 |
|
||||||
| beam search (beam size 4) | 2.57 | 6.27 | --epoch 42, --avg 11, --max-duration 100 |
|
| beam search (beam size 4) | 2.57 | 6.27 | --epoch 42, --avg 11, --max-duration 100 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
The decoding time for `test-clean` and `test-other` is given below:
|
The decoding time for `test-clean` and `test-other` is given below:
|
||||||
(A V100 GPU with 32 GB RAM is used for decoding. Note: Not all GPU RAM is used during decoding.)
|
(A V100 GPU with 32 GB RAM is used for decoding. Note: Not all GPU RAM is used during decoding.)
|
||||||
|
|
||||||
|
@ -89,7 +89,7 @@ def fast_beam_search(
|
|||||||
# (shape.NumElements(), 1, joiner_dim)
|
# (shape.NumElements(), 1, joiner_dim)
|
||||||
# fmt: off
|
# fmt: off
|
||||||
current_encoder_out = torch.index_select(
|
current_encoder_out = torch.index_select(
|
||||||
encoder_out[:, t:t + 1, :], 0, shape.row_ids(1)
|
encoder_out[:, t:t + 1, :], 0, shape.row_ids(1).to(torch.int64)
|
||||||
)
|
)
|
||||||
# fmt: on
|
# fmt: on
|
||||||
logits = model.joiner(
|
logits = model.joiner(
|
||||||
|
Loading…
x
Reference in New Issue
Block a user