From 0a244463c39c16252810b426a0e5e153451b7ef0 Mon Sep 17 00:00:00 2001
From: Fangjun Kuang <csukuangfj@gmail.com>
Date: Tue, 30 Jan 2024 19:29:33 +0800
Subject: [PATCH] WIP: Add doc about FST-based CTC forced alignment.

---
 docs/source/conf.py                           |  2 +
 .../fst-based-forced-alignment/index.rst      | 56 +++++++++++++++++++
 .../fst-based-forced-alignment/k2-based.rst   |  4 ++
 .../kaldi-based.rst                           | 31 ++++++++++
 docs/source/index.rst                         |  4 +-
 5 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 docs/source/fst-based-forced-alignment/index.rst
 create mode 100644 docs/source/fst-based-forced-alignment/k2-based.rst
 create mode 100644 docs/source/fst-based-forced-alignment/kaldi-based.rst

diff --git a/docs/source/conf.py b/docs/source/conf.py
index 5a534e126..ded6977ac 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -98,4 +98,6 @@ rst_epilog = """
 .. _Next-gen Kaldi: https://github.com/k2-fsa
 .. _Kaldi: https://github.com/kaldi-asr/kaldi
 .. _lilcom: https://github.com/danpovey/lilcom
+.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
+.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
 """
diff --git a/docs/source/fst-based-forced-alignment/index.rst b/docs/source/fst-based-forced-alignment/index.rst
new file mode 100644
index 000000000..a05dc5813
--- /dev/null
+++ b/docs/source/fst-based-forced-alignment/index.rst
@@ -0,0 +1,56 @@
+FST-based forced alignment
+==========================
+
+This section describes how to perform **FST-based** ``forced alignment`` with models
+trained by the `CTC`_ loss.
+
+We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
+from `torchaudio`_ as a reference in this section. The difference is that we are using an ``FST``-based approach.
+
+Two approaches for FST-based forced alignment will be described:
+
+  - `Kaldi`_-based
+  - `k2`_-base
+
+Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
+That is, you don't need to install `Kaldi`_ in order to use it. Instead,
+we will use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
+without depending on it.
+
+
+Differences between the two approaches
+--------------------------------------
+
+The following table compares the differences between the two approaches.
+
+.. list-table::
+
+ * - Features
+   - `Kaldi`_-based
+   - `k2`_-based
+ * - Support CUDA
+   - No
+   - Yes
+ * - Support CPU
+   - Yes
+   - Yes
+ * - Support batch processing
+   - No
+   - Yes on CUDA; No on CPU
+ * - Support streaming models
+   - Yes
+   - No
+ * - Support C++ APIs
+   - Yes
+   - Yes
+ * - Support Python APIs
+   - Yes
+   - Yes
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   kaldi-based
+   k2-based
diff --git a/docs/source/fst-based-forced-alignment/k2-based.rst b/docs/source/fst-based-forced-alignment/k2-based.rst
new file mode 100644
index 000000000..373e49f3e
--- /dev/null
+++ b/docs/source/fst-based-forced-alignment/k2-based.rst
@@ -0,0 +1,4 @@
+k2-based forced alignment
+=========================
+
+TODO(fangjun)
diff --git a/docs/source/fst-based-forced-alignment/kaldi-based.rst b/docs/source/fst-based-forced-alignment/kaldi-based.rst
new file mode 100644
index 000000000..1e66afcfa
--- /dev/null
+++ b/docs/source/fst-based-forced-alignment/kaldi-based.rst
@@ -0,0 +1,31 @@
+Kaldi-based forced alignment
+============================
+
+This section describes in detail how to use `kaldi-decoder`_
+for **FST-based** ``forced alignment`` with models trained by the `CTC`_ loss.
+
+We will use the test data
+from `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
+
+Prepare the environment
+-----------------------
+
+Before you continue, make sure you have setup `icefall`_ by following :ref:`install icefall`.
+
+.. hint::
+
+   You don't need to install `Kaldi`_. We will ``NOT`` use `Kaldi`_ below.
+
+Get the test data
+-----------------
+
+Compute log_probs
+-----------------
+
+Convert transcript to an FST graph
+----------------------------------
+
+Force aligner
+-------------
+
+
diff --git a/docs/source/index.rst b/docs/source/index.rst
index fb539d3f2..d46a4038f 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -25,7 +25,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
    docker/index
    faqs
    model-export/index
-
+   fst-based-forced-alignment/index
 
 .. toctree::
    :maxdepth: 3
@@ -40,5 +40,5 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
 
 .. toctree::
    :maxdepth: 2
-   
+
    decoding-with-langugage-models/index