History

streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 )

- some AudioTransform classes produce audio signals out of range [-1,+1]
   - Resample produced 1.0079
   - The range [-10,+10] was chosen to still be able to reliably
     distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254

2024-01-05 10:21:27 +08:00

conformer_ctc

Compatibility with the latest Lhotse (#1314 )

2023-10-17 21:22:32 +08:00

local

Add UniqueLexicon for gigaspeech (#982 )

2023-04-03 12:39:34 +08:00

pruned_transducer_stateless2

Compatibility with the latest Lhotse (#1314 )

2023-10-17 21:22:32 +08:00

zipformer

streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 )

2024-01-05 10:21:27 +08:00

.gitignore

Add Blankskip to Zipformer+CTC (#730 )

2022-12-21 17:41:31 +08:00

prepare.sh

typo fixed (#1334 )

2023-10-25 00:03:33 +08:00

README.md

Add Zipformer recipe for GigaSpeech (#1254 )

2023-10-21 15:36:59 +08:00

RESULTS.md

Add Zipformer recipe for GigaSpeech (#1254 )

2023-10-21 15:36:59 +08:00

shared

GigaSpeech recipe (#120 )

2022-04-14 16:07:22 +08:00

README.md

GigaSpeech

GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio, collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. More details can be found: https://github.com/SpeechColab/GigaSpeech

Download

Apply for the download credentials and download the dataset by following https://github.com/SpeechColab/GigaSpeech#download. Then create a symlink

ln -sfv /path/to/GigaSpeech download/GigaSpeech

Performance Record

	Dev	Test
`zipformer`	10.25	10.38
`conformer_ctc`	10.47	10.58
`pruned_transducer_stateless2`	10.40	10.51

See RESULTS for details.