WIP: Add doc about FST-based CTC forced alignment.

2025-08-26 18:24:18 +00:00 · 2024-01-30 19:29:33 +08:00 · 2024-01-30 19:29:33 +08:00 · 0a244463c3
commit 0a244463c3
parent 37b975cac9
5 changed files with 95 additions and 2 deletions
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -98,4 +98,6 @@ rst_epilog = """
 .. _Next-gen Kaldi: https://github.com/k2-fsa
 .. _Kaldi: https://github.com/kaldi-asr/kaldi
 .. _lilcom: https://github.com/danpovey/lilcom
+.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
+.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
 """
--- a/docs/source/fst-based-forced-alignment/index.rst
+++ b/docs/source/fst-based-forced-alignment/index.rst
@ -0,0 +1,56 @@
+FST-based forced alignment
+==========================
+
+This section describes how to perform **FST-based** ``forced alignment`` with models
+trained by the `CTC`_ loss.
+
+We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
+from `torchaudio`_ as a reference in this section. The difference is that we are using an ``FST``-based approach.
+
+Two approaches for FST-based forced alignment will be described:
+
+  - `Kaldi`_-based
+  - `k2`_-base
+
+Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
+That is, you don't need to install `Kaldi`_ in order to use it. Instead,
+we will use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
+without depending on it.
+
+
+Differences between the two approaches
+--------------------------------------
+
+The following table compares the differences between the two approaches.
+
+.. list-table::
+
+ * - Features
+   - `Kaldi`_-based
+   - `k2`_-based
+ * - Support CUDA
+   - No
+   - Yes
+ * - Support CPU
+   - Yes
+   - Yes
+ * - Support batch processing
+   - No
+   - Yes on CUDA; No on CPU
+ * - Support streaming models
+   - Yes
+   - No
+ * - Support C++ APIs
+   - Yes
+   - Yes
+ * - Support Python APIs
+   - Yes
+   - Yes
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   kaldi-based
+   k2-based
--- a/docs/source/fst-based-forced-alignment/k2-based.rst
+++ b/docs/source/fst-based-forced-alignment/k2-based.rst
@ -0,0 +1,4 @@
+k2-based forced alignment
+=========================
+
+TODO(fangjun)
--- a/docs/source/fst-based-forced-alignment/kaldi-based.rst
+++ b/docs/source/fst-based-forced-alignment/kaldi-based.rst
@ -0,0 +1,31 @@
+Kaldi-based forced alignment
+============================
+
+This section describes in detail how to use `kaldi-decoder`_
+for **FST-based** ``forced alignment`` with models trained by the `CTC`_ loss.
+
+We will use the test data
+from `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
+
+Prepare the environment
+-----------------------
+
+Before you continue, make sure you have setup `icefall`_ by following :ref:`install icefall`.
+
+.. hint::
+
+   You don't need to install `Kaldi`_. We will ``NOT`` use `Kaldi`_ below.
+
+Get the test data
+-----------------
+
+Compute log_probs
+-----------------
+
+Convert transcript to an FST graph
+----------------------------------
+
+Force aligner
+-------------
+
+
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -25,7 +25,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
   docker/index
   faqs
   model-export/index
-
+   fst-based-forced-alignment/index

 .. toctree::
   :maxdepth: 3