icefall/egs/librispeech/WSASR/README.md

# Introduction

This is a weakly supervised ASR recipe for the LibriSpeech (clean 100 hours) dataset. We training a
conformer model using BTC/OTC with transcripts with synthetic errors. In this README, we will describe
the task and the BTC/OTC training process.

## Task
We propose BTC/OTC to directly train an ASR system leveraging weak supervision, i.e., speech with non-verbatim transcripts.
This is achieved by using a special token to model uncertainties (i.e., substitution errors, insertion errors, and deletion errors)
within the WFST framework during training.