This repository has been archived on 2026-03-23. You can view files and clone it, but cannot push or open issues or pull requests.
icefall/egs/mdcc/ASR/README.md
2024-03-08 16:51:57 +08:00

358 B

Introduction

Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It comprises philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics.

Manuscript can be found at: https://arxiv.org/abs/2201.02419