mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 18:24:18 +00:00
WIP: Add doc about FST-based CTC forced alignment.
This commit is contained in:
parent
37b975cac9
commit
0a244463c3
@ -98,4 +98,6 @@ rst_epilog = """
|
||||
.. _Next-gen Kaldi: https://github.com/k2-fsa
|
||||
.. _Kaldi: https://github.com/kaldi-asr/kaldi
|
||||
.. _lilcom: https://github.com/danpovey/lilcom
|
||||
.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
|
||||
.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
|
||||
"""
|
||||
|
56
docs/source/fst-based-forced-alignment/index.rst
Normal file
56
docs/source/fst-based-forced-alignment/index.rst
Normal file
@ -0,0 +1,56 @@
|
||||
FST-based forced alignment
|
||||
==========================
|
||||
|
||||
This section describes how to perform **FST-based** ``forced alignment`` with models
|
||||
trained by the `CTC`_ loss.
|
||||
|
||||
We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
|
||||
from `torchaudio`_ as a reference in this section. The difference is that we are using an ``FST``-based approach.
|
||||
|
||||
Two approaches for FST-based forced alignment will be described:
|
||||
|
||||
- `Kaldi`_-based
|
||||
- `k2`_-base
|
||||
|
||||
Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
|
||||
That is, you don't need to install `Kaldi`_ in order to use it. Instead,
|
||||
we will use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
|
||||
without depending on it.
|
||||
|
||||
|
||||
Differences between the two approaches
|
||||
--------------------------------------
|
||||
|
||||
The following table compares the differences between the two approaches.
|
||||
|
||||
.. list-table::
|
||||
|
||||
* - Features
|
||||
- `Kaldi`_-based
|
||||
- `k2`_-based
|
||||
* - Support CUDA
|
||||
- No
|
||||
- Yes
|
||||
* - Support CPU
|
||||
- Yes
|
||||
- Yes
|
||||
* - Support batch processing
|
||||
- No
|
||||
- Yes on CUDA; No on CPU
|
||||
* - Support streaming models
|
||||
- Yes
|
||||
- No
|
||||
* - Support C++ APIs
|
||||
- Yes
|
||||
- Yes
|
||||
* - Support Python APIs
|
||||
- Yes
|
||||
- Yes
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
kaldi-based
|
||||
k2-based
|
4
docs/source/fst-based-forced-alignment/k2-based.rst
Normal file
4
docs/source/fst-based-forced-alignment/k2-based.rst
Normal file
@ -0,0 +1,4 @@
|
||||
k2-based forced alignment
|
||||
=========================
|
||||
|
||||
TODO(fangjun)
|
31
docs/source/fst-based-forced-alignment/kaldi-based.rst
Normal file
31
docs/source/fst-based-forced-alignment/kaldi-based.rst
Normal file
@ -0,0 +1,31 @@
|
||||
Kaldi-based forced alignment
|
||||
============================
|
||||
|
||||
This section describes in detail how to use `kaldi-decoder`_
|
||||
for **FST-based** ``forced alignment`` with models trained by the `CTC`_ loss.
|
||||
|
||||
We will use the test data
|
||||
from `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
|
||||
|
||||
Prepare the environment
|
||||
-----------------------
|
||||
|
||||
Before you continue, make sure you have setup `icefall`_ by following :ref:`install icefall`.
|
||||
|
||||
.. hint::
|
||||
|
||||
You don't need to install `Kaldi`_. We will ``NOT`` use `Kaldi`_ below.
|
||||
|
||||
Get the test data
|
||||
-----------------
|
||||
|
||||
Compute log_probs
|
||||
-----------------
|
||||
|
||||
Convert transcript to an FST graph
|
||||
----------------------------------
|
||||
|
||||
Force aligner
|
||||
-------------
|
||||
|
||||
|
@ -25,7 +25,7 @@ speech recognition recipes using `k2 <https://github.com/k2-fsa/k2>`_.
|
||||
docker/index
|
||||
faqs
|
||||
model-export/index
|
||||
|
||||
fst-based-forced-alignment/index
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
Loading…
x
Reference in New Issue
Block a user