Updating RESULTS.md; fix in beam_search.py

2025-12-11 06:55:27 +00:00 · 2022-04-11 20:56:11 +08:00 · 2022-04-11 20:56:11 +08:00 · e8eb0b94d9
commit e8eb0b94d9
parent a92133ef96
3 changed files with 88 additions and 8 deletions
--- a/egs/librispeech/ASR/README.md
+++ b/egs/librispeech/ASR/README.md
@ -10,12 +10,14 @@ There are various folders containing the name `transducer` in this folder.
 The following table lists the differences among them.

 |                                       | Encoder             | Decoder            | Comment                                           |
-|---------------------------------------|-----------|--------------------|---------------------------------------------------|
+|---------------------------------------|---------------------|--------------------|---------------------------------------------------|
 | `transducer`                          | Conformer           | LSTM               |                                                   |
 | `transducer_stateless`                | Conformer           | Embedding + Conv1d |                                                   |
 | `transducer_lstm`                     | LSTM                | LSTM               |                                                   |
 | `transducer_stateless_multi_datasets` | Conformer           | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
 | `pruned_transducer_stateless`         | Conformer           | Embedding + Conv1d | Using k2 pruned RNN-T loss                        |
+| `pruned_transducer_stateless2`        | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss                        |
+

 The decoder in `transducer_stateless` is modified from the paper
 [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@ -1,5 +1,79 @@
 ## Results

+### LibriSpeech BPE training results (Pruned Transducer 2)
+
+This is with a reworked version of the conformer encoder, with many changes.
+
+[pruned_transducer_stateless2](./pruned_transducer_stateless2)
+
+using commit `34aad74a2c849542dd5f6359c9e6b527e8782fd6`.
+See <https://github.com/k2-fsa/icefall/pull/288>
+
+The WERs are:
+
+|                                     | test-clean | test-other | comment                                                                       |
+|-------------------------------------|------------|------------|-------------------------------------------------------------------------------|
+| greedy search (max sym per frame 1) | 2.62       | 6.37       | --epoch 25, --avg 8, --max-duration 600                                       |
+| fast beam search                    | 2.61       | 6.17       | --epoch 25, --avg 8, --max-duration 600 --decoding-method fast_beam_search    |
+| modified beam search                | 2.59       | 6.19       | --epoch 25, --avg 8, --max-duration 600 --decoding-method modified_beam_search|
+
+
+The train and decode commands are:
+`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp --world-size 8 --num-epochs 26  --full-libri 1 --max-duration 300`
+and:
+`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp --epoch 25 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`
+
+The Tensorboard log is at <https://tensorboard.dev/experiment/UKI6z9BvT6iaUkXPxex1OA>
+
+
+The WERs for librispeech 100 hours are:
+
+Trained with one job:
+`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws1 --world-size 1 --num-epochs 40  --full-libri 0 --max-duration 300`
+and decoded with:
+`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws1 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
+
+The Tensorboard log is at <https://tensorboard.dev/experiment/AhnhooUBRPqTnaggoqo7lg> (learning rate
+schedule is not visible due to a since-fixed bug).
+
+|                                     | test-clean | test-other | comment                                               |
+|-------------------------------------|------------|------------|-------------------------------------------------------|
+| greedy search (max sym per frame 1) | 7.12       | 18.42      | --epoch 19 --avg 8                                    |
+| greedy search (max sym per frame 1) | 6.71       | 17.77      | --epoch 29 --avg 8                                    |
+| fast beam search                    | 6.58       | 17.27      | --epoch 19 --avg 8 --decoding-method fast_beam_search |
+
+Trained with two jobs:
+`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws2 --world-size 2 --num-epochs 40  --full-libri 0 --max-duration 300`
+and decoded with:
+`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws2 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
+
+The Tensorboard log is at <https://tensorboard.dev/experiment/dvOC9wsrSdWrAIdsebJILg/>
+(learning rate schedule is not visible due to a since-fixed bug).
+
+|                                     | test-clean | test-other | comment               |
+|-------------------------------------|------------|------------|-----------------------|
+| greedy search (max sym per frame 1) | 7.05       | 18.77      | --epoch 19, --avg 8   |
+| greedy search (max sym per frame 1) | 6.82       | 18.14      | --epoch 29, --avg 8   |
+| greedy search (max sym per frame 1) | 6.81       | 17.66      | --epoch 30, --avg 10  |
+
+
+Trained with 4 jobs:
+`python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws4 --world-size 4 --num-epochs 40  --full-libri 0 --max-duration 300`
+and decoded with:
+`python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws4 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
+
+
+The Tensorboard log is at <https://tensorboard.dev/experiment/a3T0TyC0R5aLj5bmFbRErA/>
+(learning rate schedule is not visible due to a since-fixed bug).
+
+|                                     | test-clean | test-other | comment               |
+|-------------------------------------|------------|------------|-----------------------|
+| greedy search (max sym per frame 1) | 7.31       | 19.55      | --epoch 19, --avg 8   |
+| greedy search (max sym per frame 1) | 7.08       | 18.59      | --epoch 29, --avg 8   |
+| greedy search (max sym per frame 1) | 6.86       | 18.29      | --epoch 30, --avg 10  |
+
+
+
 ### LibriSpeech BPE training results (Pruned Transducer)

 Conformer encoder + non-current decoder. The decoder
@ -23,6 +97,10 @@ The WERs are:
 | modified beam search (beam size 4)  | 2.56       | 6.27       | --epoch 42, --avg 11, --max-duration 100 |
 | beam search (beam size 4)           | 2.57       | 6.27       | --epoch 42, --avg 11, --max-duration 100 |

+
+
+
+
 The decoding time for `test-clean` and `test-other` is given below:
 (A V100 GPU with 32 GB RAM is used for decoding. Note: Not all GPU RAM is used during decoding.)

--- a/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py
+++ b/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py
@ -89,7 +89,7 @@ def fast_beam_search(
        # (shape.NumElements(), 1, joiner_dim)
        # fmt: off
        current_encoder_out = torch.index_select(
-            encoder_out[:, t:t + 1, :], 0, shape.row_ids(1)
+            encoder_out[:, t:t + 1, :], 0, shape.row_ids(1).to(torch.int64)
        )
        # fmt: on
        logits = model.joiner(