Update README.md

This commit is contained in:
Dongji Gao 2023-09-17 20:57:04 -04:00 committed by GitHub
parent 092fb4766d
commit fd2b0f7b77
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -6,5 +6,48 @@ the task and the BTC/OTC training process.
## Task
We propose BTC/OTC to directly train an ASR system leveraging weak supervision, i.e., speech with non-verbatim transcripts.
This is achieved by using a special token to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors)
within the WFST framework during training.
<div style="display: flex;flex; justify-content: space-between">
<figure style="flex: 2; text-align: center; margin: 5px;">
<img src="figures/sub.png" alt="Image 1" width="100%" />
<figcaption>Substitution error</figcaption>
</figure>
<figure style="flex: 2; text-align: center; margin: 5px;">
<img src="figures/ins.png" alt="Image 2" width="100%" />
<figcaption>Insertion error</figcaption>
</figure>
<figure style="flex: 2; text-align: center;margin: 5px;">
<img src="figures/del.png" alt="Image 3" width="100%" />
<figcaption>Deletion error</figcaption>
</figure>
</div>
<figcaption> Examples of error in the transcript. The grey box is the verbatim transcript and the red box is the inaccurate transcript. Inaccurate words are marked in bold.</figcaption> <br>
This is achieved by using a special token $\star$ to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors)
within the WFST framework during training.\
we modify $G(\mathbf{y})$ by adding self-loop arcs into each state and bypass arcs into each arc.
<div style="text-align: center;">
<figure text-align: center>
<img src="figures/otc_g.png" alt="Image Alt Text" width="50%" />
<figcaption>OTC WFST representations of the transcript "a b"</figcaption>
</figure>
</div>
After composing the modified WFST $G_{\text{otc}}(\mathbf{y})$ with $L$ and $T$, the OTC training graph is shown in this figure:
<figure style="text-align: center">
<img src="figures/otc_training_graph.drawio.png" alt="Image Alt Text" />
<figcaption>OTC training graph. The self-loop arcs and bypass arcs are highlighted in green and blue, respectively.</figcaption>
</figure>
The $\star$ is represented as the average probability of all non-blank tokens.
<div style="text-align: center;">
<figure text-align: center>
<img src="figures/otc_emission.drawio.png" alt="Image Alt Text" width="50%" />
<figcaption>OTC emission WFST</figcaption>
</figure>
</div>
The weight of $\star$ is the log average probability of "a" and "b": $\log \frac{e^{-1.2} + e^{-2.3}}{2} = -1.6$ and $\log \frac{e^{-1.9} + e^{-0.5}}{2} = -1.0$ for 2 frames.
## Description of the recipe