History

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

* Use jsonl for cutsets in the librispeech recipe.

* Use lazy cutset for all recipes.

* More fixes to use lazy CutSet.

* Remove force=True from logging to support Python < 3.8

* Minor fixes.

* Fix style issues.

2022-06-06 10:19:16 +08:00

conformer_ctc

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

2022-06-06 10:19:16 +08:00

local

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

2022-06-06 10:19:16 +08:00

pruned_transducer_stateless2

Use jsonl for CutSet in the LibriSpeech recipe. (#397 )

2022-06-06 10:19:16 +08:00

.gitignore

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

prepare.sh

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

README.md

Update GigaSpeech reults (#364 )

2022-05-15 12:57:40 +08:00

RESULTS.md

Update GigaSpeech reults (#364 )

2022-05-15 12:57:40 +08:00

shared

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

README.md

GigaSpeech

GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio, collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. More details can be found: https://github.com/SpeechColab/GigaSpeech

Download

Apply for the download credentials and download the dataset by following https://github.com/SpeechColab/GigaSpeech#download. Then create a symlink

ln -sfv /path/to/GigaSpeech download/GigaSpeech

Performance Record

	Dev	Test
`conformer_ctc`	10.47	10.58
`pruned_transducer_stateless2`	10.40	10.51

See RESULTS for details.