Merge branch 'k2-fsa:master' into master

This commit is contained in:
Mingshuang Luo 2021-10-14 16:40:19 +08:00 committed by GitHub
commit f3fd2792ae
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 196 additions and 102 deletions

View File

@ -429,6 +429,7 @@ After downloading, you will have the following files:
|-- README.md |-- README.md
|-- data |-- data
| |-- lang_bpe | |-- lang_bpe
| | |-- Linv.pt
| | |-- HLG.pt | | |-- HLG.pt
| | |-- bpe.model | | |-- bpe.model
| | |-- tokens.txt | | |-- tokens.txt
@ -446,6 +447,9 @@ After downloading, you will have the following files:
6 directories, 11 files 6 directories, 11 files
**File descriptions**: **File descriptions**:
- ``data/lang_bpe/Linv.pt``
It is the lexicon file, with word IDs as labels and token IDs as aux_labels.
- ``data/lang_bpe/HLG.pt`` - ``data/lang_bpe/HLG.pt``
@ -527,12 +531,58 @@ Usage
displays the help information. displays the help information.
It supports three decoding methods: It supports 4 decoding methods:
- CTC decoding
- HLG decoding - HLG decoding
- HLG + n-gram LM rescoring - HLG + n-gram LM rescoring
- HLG + n-gram LM rescoring + attention decoder rescoring - HLG + n-gram LM rescoring + attention decoder rescoring
CTC decoding
^^^^^^^^^^^^
CTC decoding uses the best path of the decoding lattice as the decoding result
without any LM or lexicon.
The command to run CTC decoding is:
.. code-block:: bash
$ cd egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \
--lang-dir ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe \
--method ctc-decoding \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac
The output is given below:
.. code-block::
2021-10-13 11:21:50,896 INFO [pretrained.py:236] device: cuda:0
2021-10-13 11:21:50,896 INFO [pretrained.py:238] Creating model
2021-10-13 11:21:56,669 INFO [pretrained.py:255] Constructing Fbank computer
2021-10-13 11:21:56,670 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-10-13 11:21:56,683 INFO [pretrained.py:271] Decoding started
2021-10-13 11:21:57,341 INFO [pretrained.py:290] Building CTC topology
2021-10-13 11:21:57,625 INFO [lexicon.py:113] Loading pre-compiled tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/Linv.pt
2021-10-13 11:21:57,679 INFO [pretrained.py:299] Loading BPE model
2021-10-13 11:22:00,076 INFO [pretrained.py:314] Use CTC decoding
2021-10-13 11:22:00,087 INFO [pretrained.py:400]
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac:
GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONOURED
BOSOM TO CONNECT HER PARENT FOR EVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
2021-10-13 11:22:00,087 INFO [pretrained.py:402] Decoding Done
HLG decoding HLG decoding
^^^^^^^^^^^^ ^^^^^^^^^^^^
@ -545,8 +595,7 @@ The command to run HLG decoding is:
$ cd egs/librispeech/ASR $ cd egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py \ $ ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \ --checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \
--words-file ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/words.txt \ --lang-dir ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe \
--HLG ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \ ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \ ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac
@ -555,14 +604,14 @@ The output is given below:
.. code-block:: .. code-block::
2021-08-20 11:03:05,712 INFO [pretrained.py:217] device: cuda:0 2021-10-13 11:25:19,458 INFO [pretrained.py:236] device: cuda:0
2021-08-20 11:03:05,712 INFO [pretrained.py:219] Creating model 2021-10-13 11:25:19,458 INFO [pretrained.py:238] Creating model
2021-08-20 11:03:11,345 INFO [pretrained.py:238] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt 2021-10-13 11:25:25,342 INFO [pretrained.py:255] Constructing Fbank computer
2021-08-20 11:03:18,442 INFO [pretrained.py:255] Constructing Fbank computer 2021-10-13 11:25:25,343 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-08-20 11:03:18,444 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac'] 2021-10-13 11:25:25,356 INFO [pretrained.py:271] Decoding started
2021-08-20 11:03:18,507 INFO [pretrained.py:271] Decoding started 2021-10-13 11:25:26,026 INFO [pretrained.py:327] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt
2021-08-20 11:03:18,795 INFO [pretrained.py:300] Use HLG decoding 2021-10-13 11:25:33,735 INFO [pretrained.py:359] Use HLG decoding
2021-08-20 11:03:19,149 INFO [pretrained.py:339] 2021-10-13 11:25:34,013 INFO [pretrained.py:400]
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac: ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
@ -573,7 +622,7 @@ The output is given below:
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac: ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
2021-08-20 11:03:19,149 INFO [pretrained.py:341] Decoding Done 2021-10-13 11:25:34,014 INFO [pretrained.py:402] Decoding Done
HLG decoding + LM rescoring HLG decoding + LM rescoring
^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -588,8 +637,7 @@ The command to run HLG decoding + LM rescoring is:
$ cd egs/librispeech/ASR $ cd egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py \ $ ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \ --checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \
--words-file ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/words.txt \ --lang-dir ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe \
--HLG ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
--method whole-lattice-rescoring \ --method whole-lattice-rescoring \
--G ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt \ --G ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt \
--ngram-lm-scale 0.8 \ --ngram-lm-scale 0.8 \
@ -601,15 +649,15 @@ Its output is:
.. code-block:: .. code-block::
2021-08-20 11:12:17,565 INFO [pretrained.py:217] device: cuda:0 2021-10-13 11:28:19,129 INFO [pretrained.py:236] device: cuda:0
2021-08-20 11:12:17,565 INFO [pretrained.py:219] Creating model 2021-10-13 11:28:19,129 INFO [pretrained.py:238] Creating model
2021-08-20 11:12:23,728 INFO [pretrained.py:238] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt 2021-10-13 11:28:23,531 INFO [pretrained.py:255] Constructing Fbank computer
2021-08-20 11:12:30,035 INFO [pretrained.py:246] Loading G from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt 2021-10-13 11:28:23,532 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-08-20 11:13:10,779 INFO [pretrained.py:255] Constructing Fbank computer 2021-10-13 11:28:23,544 INFO [pretrained.py:271] Decoding started
2021-08-20 11:13:10,787 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac'] 2021-10-13 11:28:24,141 INFO [pretrained.py:327] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt
2021-08-20 11:13:10,798 INFO [pretrained.py:271] Decoding started 2021-10-13 11:28:30,752 INFO [pretrained.py:338] Loading G from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt
2021-08-20 11:13:11,085 INFO [pretrained.py:305] Use HLG decoding + LM rescoring 2021-10-13 11:28:48,308 INFO [pretrained.py:364] Use HLG decoding + LM rescoring
2021-08-20 11:13:11,736 INFO [pretrained.py:339] 2021-10-13 11:28:48,815 INFO [pretrained.py:400]
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac: ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
@ -620,7 +668,7 @@ Its output is:
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac: ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
2021-08-20 11:13:11,737 INFO [pretrained.py:341] Decoding Done 2021-10-13 11:28:48,815 INFO [pretrained.py:402] Decoding Done
HLG decoding + LM rescoring + attention decoder rescoring HLG decoding + LM rescoring + attention decoder rescoring
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -636,8 +684,7 @@ The command to run HLG decoding + LM rescoring + attention decoder rescoring is:
$ cd egs/librispeech/ASR $ cd egs/librispeech/ASR
$ ./conformer_ctc/pretrained.py \ $ ./conformer_ctc/pretrained.py \
--checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \ --checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \
--words-file ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/words.txt \ --lang-dir ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe \
--HLG ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
--method attention-decoder \ --method attention-decoder \
--G ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt \ --G ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt \
--ngram-lm-scale 1.3 \ --ngram-lm-scale 1.3 \
@ -654,15 +701,15 @@ The output is below:
.. code-block:: .. code-block::
2021-08-20 11:19:11,397 INFO [pretrained.py:217] device: cuda:0 2021-10-13 11:29:50,106 INFO [pretrained.py:236] device: cuda:0
2021-08-20 11:19:11,397 INFO [pretrained.py:219] Creating model 2021-10-13 11:29:50,106 INFO [pretrained.py:238] Creating model
2021-08-20 11:19:17,354 INFO [pretrained.py:238] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt 2021-10-13 11:29:56,063 INFO [pretrained.py:255] Constructing Fbank computer
2021-08-20 11:19:24,615 INFO [pretrained.py:246] Loading G from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt 2021-10-13 11:29:56,063 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']
2021-08-20 11:20:04,576 INFO [pretrained.py:255] Constructing Fbank computer 2021-10-13 11:29:56,077 INFO [pretrained.py:271] Decoding started
2021-08-20 11:20:04,584 INFO [pretrained.py:265] Reading sound files: ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac'] 2021-10-13 11:29:56,770 INFO [pretrained.py:327] Loading HLG from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt
2021-08-20 11:20:04,595 INFO [pretrained.py:271] Decoding started 2021-10-13 11:30:04,023 INFO [pretrained.py:338] Loading G from ./tmp/icefall_asr_librispeech_conformer_ctc/data/lm/G_4_gram.pt
2021-08-20 11:20:04,854 INFO [pretrained.py:313] Use HLG + LM rescoring + attention decoder rescoring 2021-10-13 11:30:18,163 INFO [pretrained.py:372] Use HLG + LM rescoring + attention decoder rescoring
2021-08-20 11:20:05,805 INFO [pretrained.py:339] 2021-10-13 11:30:19,367 INFO [pretrained.py:400]
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac: ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac:
AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
@ -673,7 +720,7 @@ The output is below:
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac: ./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac:
YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION YET THESE THOUGHTS AFFECTED HESTER PRYNNE LESS WITH HOPE THAN APPREHENSION
2021-08-20 11:20:05,805 INFO [pretrained.py:341] Decoding Done 2021-10-13 11:30:19,367 INFO [pretrained.py:402] Decoding Done
Colab notebook Colab notebook
-------------- --------------

View File

@ -1,5 +1,6 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
# Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang) # Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang,
# Mingshuang Luo)
# #
# See ../../../../LICENSE for clarification regarding multiple authors # See ../../../../LICENSE for clarification regarding multiple authors
# #
@ -23,6 +24,7 @@ from typing import List
import k2 import k2
import kaldifeat import kaldifeat
import sentencepiece as spm
import torch import torch
import torchaudio import torchaudio
from conformer import Conformer from conformer import Conformer
@ -34,6 +36,7 @@ from icefall.decode import (
rescore_with_attention_decoder, rescore_with_attention_decoder,
rescore_with_whole_lattice, rescore_with_whole_lattice,
) )
from icefall.lexicon import Lexicon
from icefall.utils import AttributeDict, get_texts from icefall.utils import AttributeDict, get_texts
@ -52,14 +55,10 @@ def get_parser():
) )
parser.add_argument( parser.add_argument(
"--words-file", "--lang-dir",
type=str, type=str,
required=True, required=True,
help="Path to words.txt", help="Path to lang dir.",
)
parser.add_argument(
"--HLG", type=str, required=True, help="Path to HLG.pt."
) )
parser.add_argument( parser.add_argument(
@ -68,6 +67,10 @@ def get_parser():
default="1best", default="1best",
help="""Decoding method. help="""Decoding method.
Possible values are: Possible values are:
(0) ctc-decoding - Use CTC decoding. It uses a sentence
piece model, i.e., lang_dir/bpe.model, to convert
word pieces to words. It needs neither a lexicon
nor an n-gram LM.
(1) 1best - Use the best path as decoding output. Only (1) 1best - Use the best path as decoding output. Only
the transformer encoder output is used for decoding. the transformer encoder output is used for decoding.
We call it HLG decoding. We call it HLG decoding.
@ -139,7 +142,7 @@ def get_parser():
parser.add_argument( parser.add_argument(
"--sos-id", "--sos-id",
type=float, type=int,
default=1, default=1,
help=""" help="""
Used only when method is attention-decoder. Used only when method is attention-decoder.
@ -149,7 +152,7 @@ def get_parser():
parser.add_argument( parser.add_argument(
"--eos-id", "--eos-id",
type=float, type=int,
default=1, default=1,
help=""" help="""
Used only when method is attention-decoder. Used only when method is attention-decoder.
@ -249,23 +252,6 @@ def main():
model.to(device) model.to(device)
model.eval() model.eval()
logging.info(f"Loading HLG from {params.HLG}")
HLG = k2.Fsa.from_dict(torch.load(params.HLG, map_location="cpu"))
HLG = HLG.to(device)
if not hasattr(HLG, "lm_scores"):
# For whole-lattice-rescoring and attention-decoder
HLG.lm_scores = HLG.scores.clone()
if params.method in ["whole-lattice-rescoring", "attention-decoder"]:
logging.info(f"Loading G from {params.G}")
G = k2.Fsa.from_dict(torch.load(params.G, map_location="cpu"))
# Add epsilon self-loops to G as we will compose
# it with the whole lattice later
G = G.to(device)
G = k2.add_epsilon_self_loops(G)
G = k2.arc_sort(G)
G.lm_scores = G.scores.clone()
logging.info("Constructing Fbank computer") logging.info("Constructing Fbank computer")
opts = kaldifeat.FbankOptions() opts = kaldifeat.FbankOptions()
opts.device = device opts.device = device
@ -299,52 +285,113 @@ def main():
dtype=torch.int32, dtype=torch.int32,
) )
lattice = get_lattice( if params.method == "ctc-decoding":
nnet_output=nnet_output, logging.info("Use CTC decoding")
decoding_graph=HLG, lexicon = Lexicon(params.lang_dir)
supervision_segments=supervision_segments, max_token_id = max(lexicon.tokens)
search_beam=params.search_beam, H = k2.ctc_topo(
output_beam=params.output_beam, max_token=max_token_id,
min_active_states=params.min_active_states, modified=False,
max_active_states=params.max_active_states, device=device,
subsampling_factor=params.subsampling_factor, )
)
bpe_model = spm.SentencePieceProcessor()
bpe_model.load(params.lang_dir + "/bpe.model")
lattice = get_lattice(
nnet_output=nnet_output,
decoding_graph=H,
supervision_segments=supervision_segments,
search_beam=params.search_beam,
output_beam=params.output_beam,
min_active_states=params.min_active_states,
max_active_states=params.max_active_states,
subsampling_factor=params.subsampling_factor,
)
if params.method == "1best":
logging.info("Use HLG decoding")
best_path = one_best_decoding( best_path = one_best_decoding(
lattice=lattice, use_double_scores=params.use_double_scores lattice=lattice, use_double_scores=params.use_double_scores
) )
elif params.method == "whole-lattice-rescoring": token_ids = get_texts(best_path)
logging.info("Use HLG decoding + LM rescoring") hyps = bpe_model.decode(token_ids)
best_path_dict = rescore_with_whole_lattice( hyps = [s.split() for s in hyps]
lattice=lattice, elif params.method in [
G_with_epsilon_loops=G, "1best",
lm_scale_list=[params.ngram_lm_scale], "whole-lattice-rescoring",
"attention-decoder",
]:
logging.info(f"Loading HLG from {params.lang_dir}/HLG.pt")
HLG = k2.Fsa.from_dict(
torch.load(params.lang_dir + "/HLG.pt", map_location="cpu")
) )
best_path = next(iter(best_path_dict.values())) HLG = HLG.to(device)
elif params.method == "attention-decoder": if not hasattr(HLG, "lm_scores"):
logging.info("Use HLG + LM rescoring + attention decoder rescoring") # For whole-lattice-rescoring and attention-decoder
rescored_lattice = rescore_with_whole_lattice( HLG.lm_scores = HLG.scores.clone()
lattice=lattice, G_with_epsilon_loops=G, lm_scale_list=None
)
best_path_dict = rescore_with_attention_decoder(
lattice=rescored_lattice,
num_paths=params.num_paths,
model=model,
memory=memory,
memory_key_padding_mask=memory_key_padding_mask,
sos_id=params.sos_id,
eos_id=params.eos_id,
nbest_scale=params.nbest_scale,
ngram_lm_scale=params.ngram_lm_scale,
attention_scale=params.attention_decoder_scale,
)
best_path = next(iter(best_path_dict.values()))
hyps = get_texts(best_path) if params.method in [
word_sym_table = k2.SymbolTable.from_file(params.words_file) "whole-lattice-rescoring",
hyps = [[word_sym_table[i] for i in ids] for ids in hyps] "attention-decoder",
]:
logging.info(f"Loading G from {params.G}")
G = k2.Fsa.from_dict(torch.load(params.G, map_location="cpu"))
# Add epsilon self-loops to G as we will compose
# it with the whole lattice later
G = G.to(device)
G = k2.add_epsilon_self_loops(G)
G = k2.arc_sort(G)
G.lm_scores = G.scores.clone()
lattice = get_lattice(
nnet_output=nnet_output,
decoding_graph=HLG,
supervision_segments=supervision_segments,
search_beam=params.search_beam,
output_beam=params.output_beam,
min_active_states=params.min_active_states,
max_active_states=params.max_active_states,
subsampling_factor=params.subsampling_factor,
)
if params.method == "1best":
logging.info("Use HLG decoding")
best_path = one_best_decoding(
lattice=lattice, use_double_scores=params.use_double_scores
)
elif params.method == "whole-lattice-rescoring":
logging.info("Use HLG decoding + LM rescoring")
best_path_dict = rescore_with_whole_lattice(
lattice=lattice,
G_with_epsilon_loops=G,
lm_scale_list=[params.ngram_lm_scale],
)
best_path = next(iter(best_path_dict.values()))
elif params.method == "attention-decoder":
logging.info("Use HLG + LM rescoring + attention decoder rescoring")
rescored_lattice = rescore_with_whole_lattice(
lattice=lattice, G_with_epsilon_loops=G, lm_scale_list=None
)
best_path_dict = rescore_with_attention_decoder(
lattice=rescored_lattice,
num_paths=params.num_paths,
model=model,
memory=memory,
memory_key_padding_mask=memory_key_padding_mask,
sos_id=params.sos_id,
eos_id=params.eos_id,
nbest_scale=params.nbest_scale,
ngram_lm_scale=params.ngram_lm_scale,
attention_scale=params.attention_decoder_scale,
)
best_path = next(iter(best_path_dict.values()))
hyps = get_texts(best_path)
word_sym_table = k2.SymbolTable.from_file(
params.lang_dir + "/words.txt"
)
hyps = [[word_sym_table[i] for i in ids] for ids in hyps]
else:
raise ValueError(f"Unsupported decoding method: {params.method}")
s = "\n" s = "\n"
for filename, hyp in zip(params.sound_files, hyps): for filename, hyp in zip(params.sound_files, hyps):