Add commands for reproducing. (#40)

* Add commands for reproducing. * Use --bucketing-sampler by default.
2025-08-08 09:32:20 +00:00 · 2021-09-09 13:50:31 +08:00 · 2021-09-09 13:50:31 +08:00 · 7f8e3a673a
commit 7f8e3a673a
parent abadc71415
2 changed files with 28 additions and 2 deletions
--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@ -21,6 +21,32 @@ To get more unique paths, we scaled the lattice.scores with 0.5 (see https://git
 |test-clean|1.3|1.2|
 |test-other|1.2|1.1|

+You can use the following commands to reproduce our results:
+
+```bash
+git clone https://github.com/k2-fsa/icefall
+cd icefall
+
+# It was using ef233486, you may not need to switch to it
+# git checkout ef233486
+
+cd egs/librispeech/ASR
+./prepare.sh
+
+export CUDA_VISIBLE_DEVICES="0,1,2,3"
+python conformer_ctc/train.py --bucketing-sampler True \
+                              --concatenate-cuts False \
+                              --max-duration 200 \
+                              --full-libri True \
+                              --world-size 4
+
+python conformer_ctc/decode.py --lattice-score-scale 0.5 \
+                               --epoch 34 \
+                               --avg 20 \
+                               --method attention-decoder \
+                               --max-duration 20 \
+                               --num-paths 100
+```

 ### LibriSpeech training results (Tdnn-Lstm)
 #### 2021-08-24
--- a/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py
+++ b/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py
@ -82,14 +82,14 @@ class LibriSpeechAsrDataModule(DataModule):
        group.add_argument(
            "--max-duration",
            type=int,
-            default=500.0,
+            default=200.0,
            help="Maximum pooled recordings duration (seconds) in a "
            "single batch. You can reduce it if it causes CUDA OOM.",
        )
        group.add_argument(
            "--bucketing-sampler",
            type=str2bool,
-            default=False,
+            default=True,
            help="When enabled, the batches will come from buckets of "
            "similar duration (saves padding frames).",
        )