WIP: Add doc about FST-based CTC forced alignment.

2025-12-11 06:55:27 +00:00 · 2024-01-30 19:29:33 +08:00 · 2024-01-30 19:29:33 +08:00 · 0a244463c3
commit 0a244463c3
parent 37b975cac9
5 changed files with 95 additions and 2 deletions
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -98,4 +98,6 @@ rst_epilog = """
 .. _Next-gen Kaldi: https://github.com/k2-fsa
 .. _Kaldi: https://github.com/kaldi-asr/kaldi
 .. _lilcom: https://github.com/danpovey/lilcom
 .. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
 .. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
 """
--- a/docs/source/fst-based-forced-alignment/index.rst
+++ b/docs/source/fst-based-forced-alignment/index.rst
@ -0,0 +1,56 @@
 FST-based forced alignment
 ==========================
 This section describes how to perform **FST-based** ``forced alignment`` with models
 trained by the `CTC`_ loss.
 We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
 from `torchaudio`_ as a reference in this section. The difference is that we are using an ``FST``-based approach.
 Two approaches for FST-based forced alignment will be described:
  - `Kaldi`_-based
  - `k2`_-base
 Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
 That is, you don't need to install `Kaldi`_ in order to use it. Instead,
 we will use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
 without depending on it.
 Differences between the two approaches
 --------------------------------------
 The following table compares the differences between the two approaches.
 .. list-table::
 * - Features
   - `Kaldi`_-based
   - `k2`_-based
 * - Support CUDA
   - No
   - Yes
 * - Support CPU
   - Yes
   - Yes
 * - Support batch processing
   - No
   - Yes on CUDA; No on CPU
 * - Support streaming models
   - Yes
   - No
 * - Support C++ APIs
   - Yes
   - Yes
 * - Support Python APIs
   - Yes
   - Yes
 .. toctree::
   :maxdepth: 2
   :caption: Contents:
   kaldi-based
   k2-based
--- a/docs/source/fst-based-forced-alignment/k2-based.rst
+++ b/docs/source/fst-based-forced-alignment/k2-based.rst
@ -0,0 +1,4 @@
 k2-based forced alignment
 =========================
 TODO(fangjun)
--- a/docs/source/fst-based-forced-alignment/kaldi-based.rst
+++ b/docs/source/fst-based-forced-alignment/kaldi-based.rst
@ -0,0 +1,31 @@
 Kaldi-based forced alignment
 ============================
 This section describes in detail how to use `kaldi-decoder`_
 for **FST-based** ``forced alignment`` with models trained by the `CTC`_ loss.
 We will use the test data
 from `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
 Prepare the environment
 -----------------------
 Before you continue, make sure you have setup `icefall`_ by following :ref:`install icefall`.
 .. hint::
   You don't need to install `Kaldi`_. We will ``NOT`` use `Kaldi`_ below.
 Get the test data
 -----------------
 Compute log_probs
 -----------------
 Convert transcript to an FST graph
 ----------------------------------
 Force aligner
 -------------
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -25,7 +25,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
   docker/index
   faqs
   model-export/index
-
+   fst-based-forced-alignment/index
 .. toctree::
   :maxdepth: 3