mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 18:24:18 +00:00
WIP: Add doc about FST-based CTC forced alignment.
This commit is contained in:
parent
37b975cac9
commit
0a244463c3
@ -98,4 +98,6 @@ rst_epilog = """
|
|||||||
.. _Next-gen Kaldi: https://github.com/k2-fsa
|
.. _Next-gen Kaldi: https://github.com/k2-fsa
|
||||||
.. _Kaldi: https://github.com/kaldi-asr/kaldi
|
.. _Kaldi: https://github.com/kaldi-asr/kaldi
|
||||||
.. _lilcom: https://github.com/danpovey/lilcom
|
.. _lilcom: https://github.com/danpovey/lilcom
|
||||||
|
.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
|
||||||
|
.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
|
||||||
"""
|
"""
|
||||||
|
56
docs/source/fst-based-forced-alignment/index.rst
Normal file
56
docs/source/fst-based-forced-alignment/index.rst
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
FST-based forced alignment
|
||||||
|
==========================
|
||||||
|
|
||||||
|
This section describes how to perform **FST-based** ``forced alignment`` with models
|
||||||
|
trained by the `CTC`_ loss.
|
||||||
|
|
||||||
|
We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
|
||||||
|
from `torchaudio`_ as a reference in this section. The difference is that we are using an ``FST``-based approach.
|
||||||
|
|
||||||
|
Two approaches for FST-based forced alignment will be described:
|
||||||
|
|
||||||
|
- `Kaldi`_-based
|
||||||
|
- `k2`_-base
|
||||||
|
|
||||||
|
Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
|
||||||
|
That is, you don't need to install `Kaldi`_ in order to use it. Instead,
|
||||||
|
we will use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
|
||||||
|
without depending on it.
|
||||||
|
|
||||||
|
|
||||||
|
Differences between the two approaches
|
||||||
|
--------------------------------------
|
||||||
|
|
||||||
|
The following table compares the differences between the two approaches.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
|
||||||
|
* - Features
|
||||||
|
- `Kaldi`_-based
|
||||||
|
- `k2`_-based
|
||||||
|
* - Support CUDA
|
||||||
|
- No
|
||||||
|
- Yes
|
||||||
|
* - Support CPU
|
||||||
|
- Yes
|
||||||
|
- Yes
|
||||||
|
* - Support batch processing
|
||||||
|
- No
|
||||||
|
- Yes on CUDA; No on CPU
|
||||||
|
* - Support streaming models
|
||||||
|
- Yes
|
||||||
|
- No
|
||||||
|
* - Support C++ APIs
|
||||||
|
- Yes
|
||||||
|
- Yes
|
||||||
|
* - Support Python APIs
|
||||||
|
- Yes
|
||||||
|
- Yes
|
||||||
|
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 2
|
||||||
|
:caption: Contents:
|
||||||
|
|
||||||
|
kaldi-based
|
||||||
|
k2-based
|
4
docs/source/fst-based-forced-alignment/k2-based.rst
Normal file
4
docs/source/fst-based-forced-alignment/k2-based.rst
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
k2-based forced alignment
|
||||||
|
=========================
|
||||||
|
|
||||||
|
TODO(fangjun)
|
31
docs/source/fst-based-forced-alignment/kaldi-based.rst
Normal file
31
docs/source/fst-based-forced-alignment/kaldi-based.rst
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
Kaldi-based forced alignment
|
||||||
|
============================
|
||||||
|
|
||||||
|
This section describes in detail how to use `kaldi-decoder`_
|
||||||
|
for **FST-based** ``forced alignment`` with models trained by the `CTC`_ loss.
|
||||||
|
|
||||||
|
We will use the test data
|
||||||
|
from `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
|
||||||
|
|
||||||
|
Prepare the environment
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
Before you continue, make sure you have setup `icefall`_ by following :ref:`install icefall`.
|
||||||
|
|
||||||
|
.. hint::
|
||||||
|
|
||||||
|
You don't need to install `Kaldi`_. We will ``NOT`` use `Kaldi`_ below.
|
||||||
|
|
||||||
|
Get the test data
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Compute log_probs
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Convert transcript to an FST graph
|
||||||
|
----------------------------------
|
||||||
|
|
||||||
|
Force aligner
|
||||||
|
-------------
|
||||||
|
|
||||||
|
|
@ -25,7 +25,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
|
|||||||
docker/index
|
docker/index
|
||||||
faqs
|
faqs
|
||||||
model-export/index
|
model-export/index
|
||||||
|
fst-based-forced-alignment/index
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 3
|
:maxdepth: 3
|
||||||
|
Loading…
x
Reference in New Issue
Block a user