Minor fixes.

2025-12-11 06:55:27 +00:00 · 2021-12-23 12:35:25 +08:00 · 2021-12-23 12:35:25 +08:00 · ca0d7c5795
commit ca0d7c5795
parent 9a62a0e7bc
8 changed files with 132 additions and 21 deletions
--- a/.github/workflows/run-pretrained-conformer-ctc.yml
+++ b/.github/workflows/run-pretrained-conformer-ctc.yml
--- a/.github/workflows/run-pretrained-transducer.yml
+++ b/.github/workflows/run-pretrained-transducer.yml
@ -0,0 +1,109 @@
+# Copyright      2021  Fangjun Kuang (csukuangfj@gmail.com)
+
+# See ../../LICENSE for clarification regarding multiple authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+name: run-pre-trained-tranducer
+
+on:
+  push:
+    branches:
+      - master
+  pull_request:
+    types: [labeled]
+
+jobs:
+  run_pre_trained_transducer:
+    if: github.event.label.name == 'ready' || github.event_name == 'push'
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-18.04]
+        python-version: [3.7, 3.8, 3.9]
+        torch: ["1.10.0"]
+        torchaudio: ["0.10.0"]
+        k2-version: ["1.9.dev20211101"]
+
+      fail-fast: false
+
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+
+      - name: Setup Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v1
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install Python dependencies
+        run: |
+          python3 -m pip install --upgrade pip pytest
+          # numpy 1.20.x does not support python 3.6
+          pip install numpy==1.19
+          pip install torch==${{ matrix.torch }}+cpu torchaudio==${{ matrix.torchaudio }}+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
+          pip install k2==${{ matrix.k2-version }}+cpu.torch${{ matrix.torch }} -f https://k2-fsa.org/nightly/
+
+          python3 -m pip install git+https://github.com/lhotse-speech/lhotse
+          python3 -m pip install kaldifeat
+          # We are in ./icefall and there is a file: requirements.txt in it
+          pip install -r requirements.txt
+
+      - name: Install graphviz
+        shell: bash
+        run: |
+          python3 -m pip install -qq graphviz
+          sudo apt-get -qq install graphviz
+
+      - name: Download pre-trained model
+        shell: bash
+        run: |
+          sudo apt-get -qq install git-lfs tree sox
+          cd egs/librispeech/ASR
+          mkdir tmp
+          cd tmp
+          git lfs install
+          git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-bpe-500-2021-12-23
+
+          cd ..
+          tree tmp
+          soxi tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/*.wav
+          ls -lh tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/*.wav
+
+      - name: Run greedy search decoding
+        shell: bash
+        run: |
+          export PYTHONPATH=$PWD:PYTHONPATH
+          cd egs/librispeech/ASR
+          ./transducer_stateless/pretrained.py \
+            --method greedy_search \
+            --checkpoint ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/exp/pretrained.pt \
+            --bpe-model ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/data/lang_bpe_500/bpe.model \
+            ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/1089-134686-0001.wav \
+            ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/1221-135766-0001.wav \
+            ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/1221-135766-0002.wav
+
+      - name: Run beam search decoding
+        shell: bash
+        run: |
+          export PYTHONPATH=$PWD:$PYTHONPATH
+          cd egs/librispeech/ASR
+          ./transducer_stateless/pretrained.py \
+            --method beam_search \
+            --beam-size 4 \
+            --checkpoint ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/exp/pretrained.pt \
+            --bpe-model ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/data/lang_bpe_500/bpe.model \
+            ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/1089-134686-0001.wav \
+            ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/1221-135766-0001.wav \
+            ./tmp/icefall-asr-librispeech-transducer-bpe-500-2021-12-23/test_wavs/1221-135766-0002.wav
--- a/README.md
+++ b/README.md
@ -71,7 +71,7 @@ The best WER with greedy search is:

 |     | test-clean | test-other |
 |-----|------------|------------|
-| WER | 3.16       | 7.71       |
+| WER | 3.07       | 7.51       |

 We provide a Colab notebook to run a pre-trained RNN-T conformer model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)

--- a/egs/librispeech/ASR/RESULTS.md
+++ b/egs/librispeech/ASR/RESULTS.md
@ -2,7 +2,10 @@

 ### LibriSpeech BPE training results (Transducer)

-#### 2021-12-22
+#### Conformer encoder + embedding decoder
+
+Using commit `fb6a57e9e01dd8aae2af2a6b4568daad8bc8ab32`.
+
 Conformer encoder + non-current decoder. The decoder
 contains only an embedding layer and a Conv1d (with kernel size 2).

@ -60,8 +63,8 @@ avg=10
 ```


-#### 2021-12-17
-Using commit `cb04c8a7509425ab45fae888b0ca71bbbd23f0de`.
+#### Conformer encoder + LSTM decoder
+Using commit `TODO`.

 Conformer encoder + LSTM decoder.

@ -69,9 +72,9 @@ The best WER is

 |     | test-clean | test-other |
 |-----|------------|------------|
-| WER | 3.16       | 7.71       |
+| WER | 3.07       | 7.51       |

-using `--epoch 26 --avg 12` with **greedy search**.
+using `--epoch 34 --avg 11` with **greedy search**.

 The training command to reproduce the above WER is:

@ -80,19 +83,19 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3"

 ./transducer/train.py \
  --world-size 4 \
-  --num-epochs 30 \
+  --num-epochs 35 \
  --start-epoch 0 \
  --exp-dir transducer/exp-lr-2.5-full \
  --full-libri 1 \
-  --max-duration 250 \
+  --max-duration 180 \
  --lr-factor 2.5
 ```

 The decoding command is:

 ```
-epoch=26
-avg=12
+epoch=34
+avg=11

 ./transducer/decode.py \
  --epoch $epoch \
@ -102,7 +105,7 @@ avg=12
  --max-duration 100
 ```

-You can find the tensorboard log at: <https://tensorboard.dev/experiment/PYIbeD6zRJez1ViXaRqqeg/>
+You can find the tensorboard log at: <https://tensorboard.dev/experiment/D7NQc3xqTpyVmWi5FnWjrA>


 ### LibriSpeech BPE training results (Conformer-CTC)
--- a/egs/librispeech/ASR/transducer/decode.py
+++ b/egs/librispeech/ASR/transducer/decode.py
@ -70,14 +70,14 @@ def get_parser():
    parser.add_argument(
        "--epoch",
        type=int,
-        default=26,
+        default=34,
        help="It specifies the checkpoint to use for decoding."
        "Note: Epoch counts from 0.",
    )
    parser.add_argument(
        "--avg",
        type=int,
-        default=12,
+        default=11,
        help="Number of checkpoints to average. Automatically select "
        "consecutive checkpoints before the checkpoint specified by "
        "'--epoch'. ",
--- a/egs/librispeech/ASR/transducer/export.py
+++ b/egs/librispeech/ASR/transducer/export.py
@ -23,8 +23,8 @@ Usage:
 ./transducer/export.py \
  --exp-dir ./transducer/exp \
  --bpe-model data/lang_bpe_500/bpe.model \
-  --epoch 26 \
-  --avg 12
+  --epoch 34 \
+  --avg 11

 It will generate a file exp_dir/pretrained.pt

@ -66,7 +66,7 @@ def get_parser():
    parser.add_argument(
        "--epoch",
        type=int,
-        default=26,
+        default=34,
        help="It specifies the checkpoint to use for decoding."
        "Note: Epoch counts from 0.",
    )
@ -74,7 +74,7 @@ def get_parser():
    parser.add_argument(
        "--avg",
        type=int,
-        default=12,
+        default=11,
        help="Number of checkpoints to average. Automatically select "
        "consecutive checkpoints before the checkpoint specified by "
        "'--epoch'. ",
--- a/egs/librispeech/ASR/transducer/joiner.py
+++ b/egs/librispeech/ASR/transducer/joiner.py
@ -16,7 +16,6 @@

 import torch
 import torch.nn as nn
-import torch.nn.functional as F


 class Joiner(nn.Module):
@ -48,7 +47,7 @@ class Joiner(nn.Module):
        # Now decoder_out is (N, 1, U, C)

        logit = encoder_out + decoder_out
-        logit = F.tanh(logit)
+        logit = torch.tanh(logit)

        output = self.output_linear(logit)

--- a/egs/librispeech/ASR/transducer/train.py
+++ b/egs/librispeech/ASR/transducer/train.py
@ -23,7 +23,7 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3"

 ./transducer/train.py \
  --world-size 4 \
-  --num-epochs 30 \
+  --num-epochs 35 \
  --start-epoch 0 \
  --exp-dir transducer/exp \
  --full-libri 1 \
@ -92,7 +92,7 @@ def get_parser():
    parser.add_argument(
        "--num-epochs",
        type=int,
-        default=30,
+        default=35,
        help="Number of epochs to train.",
    )