From 0a244463c39c16252810b426a0e5e153451b7ef0 Mon Sep 17 00:00:00 2001 From: Fangjun Kuang Date: Tue, 30 Jan 2024 19:29:33 +0800 Subject: [PATCH] WIP: Add doc about FST-based CTC forced alignment. --- docs/source/conf.py | 2 + .../fst-based-forced-alignment/index.rst | 56 +++++++++++++++++++ .../fst-based-forced-alignment/k2-based.rst | 4 ++ .../kaldi-based.rst | 31 ++++++++++ docs/source/index.rst | 4 +- 5 files changed, 95 insertions(+), 2 deletions(-) create mode 100644 docs/source/fst-based-forced-alignment/index.rst create mode 100644 docs/source/fst-based-forced-alignment/k2-based.rst create mode 100644 docs/source/fst-based-forced-alignment/kaldi-based.rst diff --git a/docs/source/conf.py b/docs/source/conf.py index 5a534e126..ded6977ac 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -98,4 +98,6 @@ rst_epilog = """ .. _Next-gen Kaldi: https://github.com/k2-fsa .. _Kaldi: https://github.com/kaldi-asr/kaldi .. _lilcom: https://github.com/danpovey/lilcom +.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf +.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder """ diff --git a/docs/source/fst-based-forced-alignment/index.rst b/docs/source/fst-based-forced-alignment/index.rst new file mode 100644 index 000000000..a05dc5813 --- /dev/null +++ b/docs/source/fst-based-forced-alignment/index.rst @@ -0,0 +1,56 @@ +FST-based forced alignment +========================== + +This section describes how to perform **FST-based** ``forced alignment`` with models +trained by the `CTC`_ loss. + +We use `CTC FORCED ALIGNMENT API TUTORIAL `_ +from `torchaudio`_ as a reference in this section. The difference is that we are using an ``FST``-based approach. + +Two approaches for FST-based forced alignment will be described: + + - `Kaldi`_-based + - `k2`_-base + +Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all. +That is, you don't need to install `Kaldi`_ in order to use it. Instead, +we will use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_ +without depending on it. + + +Differences between the two approaches +-------------------------------------- + +The following table compares the differences between the two approaches. + +.. list-table:: + + * - Features + - `Kaldi`_-based + - `k2`_-based + * - Support CUDA + - No + - Yes + * - Support CPU + - Yes + - Yes + * - Support batch processing + - No + - Yes on CUDA; No on CPU + * - Support streaming models + - Yes + - No + * - Support C++ APIs + - Yes + - Yes + * - Support Python APIs + - Yes + - Yes + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + kaldi-based + k2-based diff --git a/docs/source/fst-based-forced-alignment/k2-based.rst b/docs/source/fst-based-forced-alignment/k2-based.rst new file mode 100644 index 000000000..373e49f3e --- /dev/null +++ b/docs/source/fst-based-forced-alignment/k2-based.rst @@ -0,0 +1,4 @@ +k2-based forced alignment +========================= + +TODO(fangjun) diff --git a/docs/source/fst-based-forced-alignment/kaldi-based.rst b/docs/source/fst-based-forced-alignment/kaldi-based.rst new file mode 100644 index 000000000..1e66afcfa --- /dev/null +++ b/docs/source/fst-based-forced-alignment/kaldi-based.rst @@ -0,0 +1,31 @@ +Kaldi-based forced alignment +============================ + +This section describes in detail how to use `kaldi-decoder`_ +for **FST-based** ``forced alignment`` with models trained by the `CTC`_ loss. + +We will use the test data +from `CTC FORCED ALIGNMENT API TUTORIAL `_ + +Prepare the environment +----------------------- + +Before you continue, make sure you have setup `icefall`_ by following :ref:`install icefall`. + +.. hint:: + + You don't need to install `Kaldi`_. We will ``NOT`` use `Kaldi`_ below. + +Get the test data +----------------- + +Compute log_probs +----------------- + +Convert transcript to an FST graph +---------------------------------- + +Force aligner +------------- + + diff --git a/docs/source/index.rst b/docs/source/index.rst index fb539d3f2..d46a4038f 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -25,7 +25,7 @@ speech recognition recipes using `k2 `_. docker/index faqs model-export/index - + fst-based-forced-alignment/index .. toctree:: :maxdepth: 3 @@ -40,5 +40,5 @@ speech recognition recipes using `k2 `_. .. toctree:: :maxdepth: 2 - + decoding-with-langugage-models/index