Updating RESULTS.md; fix in beam_search.py

2025-08-09 01:52:41 +00:00 · 2022-04-11 20:56:11 +08:00 · 2022-04-11 20:56:11 +08:00 · e8eb0b94d9
commit e8eb0b94d9
parent a92133ef96
3 changed files with 88 additions and 8 deletions
--- a/egs/librispeech/ASR/README.md
+++ b/egs/librispeech/ASR/README.md
@ -9,13 +9,15 @@ for how to run models in this recipe.
 There are various folders containing the name `transducer` in this folder.
 The following table lists the differences among them.
-|                                       | Encoder   | Decoder            | Comment                                           |
+|                                       | Encoder             | Decoder            | Comment                                           |
-|---------------------------------------|-----------|--------------------|---------------------------------------------------|
+|---------------------------------------|---------------------|--------------------|---------------------------------------------------|
-| `transducer`                          | Conformer | LSTM               |                                                   |
+| `transducer`                          | Conformer           | LSTM               |                                                   |
-| `transducer_stateless`                | Conformer | Embedding + Conv1d |                                                   |
+| `transducer_stateless`                | Conformer           | Embedding + Conv1d |                                                   |
-| `transducer_lstm`                     | LSTM      | LSTM               |                                                   |
+| `transducer_lstm`                     | LSTM                | LSTM               |                                                   |
-| `transducer_stateless_multi_datasets` | Conformer | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
+| `transducer_stateless_multi_datasets` | Conformer           | Embedding + Conv1d | Using data from GigaSpeech as extra training data |
-| `pruned_transducer_stateless`         | Conformer | Embedding + Conv1d | Using k2 pruned RNN-T loss                        |
+| `pruned_transducer_stateless`         | Conformer           | Embedding + Conv1d | Using k2 pruned RNN-T loss                        |
 | `pruned_transducer_stateless2`        | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss                        |
 The decoder in `transducer_stateless` is modified from the paper
 [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@ -1,5 +1,79 @@
 ## Results
 ### LibriSpeech BPE training results (Pruned Transducer 2)
 This is with a reworked version of the conformer encoder, with many changes.
 [pruned_transducer_stateless2](./pruned_transducer_stateless2)
 using commit `34aad74a2c849542dd5f6359c9e6b527e8782fd6`.
 See <https://github.com/k2-fsa/icefall/pull/288>
 The WERs are:
 |                                     | test-clean | test-other | comment                                                                       |
 |-------------------------------------|------------|------------|-------------------------------------------------------------------------------|
 | greedy search (max sym per frame 1) | 2.62       | 6.37       | --epoch 25, --avg 8, --max-duration 600                                       |
 | fast beam search                    | 2.61       | 6.17       | --epoch 25, --avg 8, --max-duration 600 --decoding-method fast_beam_search    |
 | modified beam search                | 2.59       | 6.19       | --epoch 25, --avg 8, --max-duration 600 --decoding-method modified_beam_search|
 The train and decode commands are:
 `python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp --world-size 8 --num-epochs 26  --full-libri 1 --max-duration 300`
 and:
 `python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp --epoch 25 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`
 The Tensorboard log is at <https://tensorboard.dev/experiment/UKI6z9BvT6iaUkXPxex1OA>
 The WERs for librispeech 100 hours are:
 Trained with one job:
 `python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws1 --world-size 1 --num-epochs 40  --full-libri 0 --max-duration 300`
 and decoded with:
 `python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws1 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
 The Tensorboard log is at <https://tensorboard.dev/experiment/AhnhooUBRPqTnaggoqo7lg> (learning rate
 schedule is not visible due to a since-fixed bug).
 |                                     | test-clean | test-other | comment                                               |
 |-------------------------------------|------------|------------|-------------------------------------------------------|
 | greedy search (max sym per frame 1) | 7.12       | 18.42      | --epoch 19 --avg 8                                    |
 | greedy search (max sym per frame 1) | 6.71       | 17.77      | --epoch 29 --avg 8                                    |
 | fast beam search                    | 6.58       | 17.27      | --epoch 19 --avg 8 --decoding-method fast_beam_search |
 Trained with two jobs:
 `python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws2 --world-size 2 --num-epochs 40  --full-libri 0 --max-duration 300`
 and decoded with:
 `python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws2 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
 The Tensorboard log is at <https://tensorboard.dev/experiment/dvOC9wsrSdWrAIdsebJILg/>
 (learning rate schedule is not visible due to a since-fixed bug).
 |                                     | test-clean | test-other | comment               |
 |-------------------------------------|------------|------------|-----------------------|
 | greedy search (max sym per frame 1) | 7.05       | 18.77      | --epoch 19, --avg 8   |
 | greedy search (max sym per frame 1) | 6.82       | 18.14      | --epoch 29, --avg 8   |
 | greedy search (max sym per frame 1) | 6.81       | 17.66      | --epoch 30, --avg 10  |
 Trained with 4 jobs:
 `python3 ./pruned_transducer_stateless2/train.py --exp-dir=pruned_transducer_stateless2/exp_100h_ws4 --world-size 4 --num-epochs 40  --full-libri 0 --max-duration 300`
 and decoded with:
 `python3 ./pruned_transducer_stateless2/decode.py --exp-dir pruned_transducer_stateless2/exp_100h_ws4 --epoch 19 --avg 8 --bpe-model ./data/lang_bpe_500/bpe.model --max-duration 600`.
 The Tensorboard log is at <https://tensorboard.dev/experiment/a3T0TyC0R5aLj5bmFbRErA/>
 (learning rate schedule is not visible due to a since-fixed bug).
 |                                     | test-clean | test-other | comment               |
 |-------------------------------------|------------|------------|-----------------------|
 | greedy search (max sym per frame 1) | 7.31       | 19.55      | --epoch 19, --avg 8   |
 | greedy search (max sym per frame 1) | 7.08       | 18.59      | --epoch 29, --avg 8   |
 | greedy search (max sym per frame 1) | 6.86       | 18.29      | --epoch 30, --avg 10  |
 ### LibriSpeech BPE training results (Pruned Transducer)
 Conformer encoder + non-current decoder. The decoder
@ -23,6 +97,10 @@ The WERs are:
 | modified beam search (beam size 4)  | 2.56       | 6.27       | --epoch 42, --avg 11, --max-duration 100 |
 | beam search (beam size 4)           | 2.57       | 6.27       | --epoch 42, --avg 11, --max-duration 100 |
 The decoding time for `test-clean` and `test-other` is given below:
 (A V100 GPU with 32 GB RAM is used for decoding. Note: Not all GPU RAM is used during decoding.)
--- a/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py
+++ b/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py
@ -89,7 +89,7 @@ def fast_beam_search(
        # (shape.NumElements(), 1, joiner_dim)
        # fmt: off
        current_encoder_out = torch.index_select(
-            encoder_out[:, t:t + 1, :], 0, shape.row_ids(1)
+            encoder_out[:, t:t + 1, :], 0, shape.row_ids(1).to(torch.int64)
        )
        # fmt: on
        logits = model.joiner(