1244 Commits

Author SHA1 Message Date
Kinan Martin
b167ac7b40 add utility file for creating subsets of mls english. must be fixed to make dev and test splits have matching sizes to reazonspeech 2025-07-28 17:52:36 +09:00
Kinan Martin
eafbd6429b add utility file for updating the storage_path of cutsets for use in the multilingual training recipe directory structure 2025-07-28 17:52:36 +09:00
Kinan Martin
2f1c61124a fix decode script data module usage 2025-07-28 17:52:36 +09:00
Kinan Martin
3307836352 Combined updates. Changed BBPE path structure, changed dataset path structure, added script to update cutset paths. WIP 2025-07-28 17:52:36 +09:00
Kinan Martin
a8ecb16d47 use huggingface_hub library to download mls_english 2025-07-28 17:52:36 +09:00
Kinan Martin
f4b29870a0 switch mls_english clone from https to ssh 2025-07-28 17:52:36 +09:00
Kinan Martin
782e1fb958 fix stage 5 output pathing 2025-07-28 17:52:36 +09:00
Kinan Martin
5417e0926b restore version of mls_english compute_fbank_mls_english.py and prepare.sh from commit 547f5c5 2025-07-28 17:52:36 +09:00
Bailey Hirota
6d71d9cff4 remove bilingual tag from train.py 2025-07-28 17:52:28 +09:00
Kinan Martin
3751441dad deprecate params.bilingual=0, replace ReazonSpeechAsrDataModule for MultiDatasetAsrDataModule, not tested yet 2025-07-28 17:49:35 +09:00
Bailey Hirota
61e81bfc26 Revert "add fbank"
This reverts commit ba603e0a0a514056ec6d32677053c41743a1a5dd.
2025-07-28 17:49:35 +09:00
Bailey Hirota
c83b115b49 add fbank 2025-07-28 17:49:35 +09:00
Kinan Martin
abebb6aaf0 new version of multi_ja_en prepare.sh script which swaps Librispeech for MLS English 2025-07-28 17:49:35 +09:00
Kinan Martin
fa84782b21 optimize with num_jobs on save_audios 2025-07-28 17:49:35 +09:00
Kinan Martin
f2e01712de fix stage 2 and 3 2025-07-28 17:49:35 +09:00
Kinan Martin
59519a41fa fix validation manifest name 2025-07-28 17:49:35 +09:00
Kinan Martin
4ca8ee94f0 adjusted prepare.sh to only calculate fbank and manifest together; adjust datamodule to load from manifest files 2025-07-28 17:49:35 +09:00
Kinan Martin
d6e3c98e58 move compute_fbank_mls_english.py, add validate_manifest.py, add shared symlink to librispeech 2025-07-28 17:49:35 +09:00
Kinan Martin
68e3ceaaac instead of on-the-fly features, precompute fbank and manifests in prepare.sh 2025-07-28 17:49:35 +09:00
Kinan Martin
ce44150e25 readme 2025-07-28 17:49:35 +09:00
Kinan Martin
a34d34a38e pre-commit hooks 2025-07-28 17:49:35 +09:00
Kinan Martin
898525962c separate transcript prep stage from bpe train stage 2025-07-28 17:49:35 +09:00
Kinan Martin
8c1c7100d3 symlink copied files to librispeech recipe dir 2025-07-28 17:49:35 +09:00
Kinan Martin
efe015d568 cleaned-up version of recipe 2025-07-28 17:49:35 +09:00
Kinan Martin
defc71bc6a replace file 2025-07-28 17:49:35 +09:00
Kinan Martin
a1fc6420f9 change default path 2025-07-28 17:49:35 +09:00
Kinan Martin
ac0c0edddb update prepare.sh, fix asr_datamodule.py 2025-07-28 17:49:35 +09:00
Kinan Martin
28f65458b3 WIP v0 MLS English recipe 2025-07-28 17:49:35 +09:00
Fangjun Kuang
e22bc78f98
Export streaming zipformer2 to RKNN (#1977) 2025-07-11 13:24:01 +08:00
Teo Wen Shen
da87e7fc99
add weights_only=False to torch.load (#1984) 2025-07-10 15:27:08 +08:00
Yifan Yang
89728dd4f8
Refactor data preparation for GigaSpeech recipe (#1986) 2025-07-10 11:17:37 +08:00
Mistmoon
9293edc62f
Add cr-ctc loss and ctc-decode in aishell (#1980) 2025-07-08 14:47:24 +08:00
Fangjun Kuang
fba5e67d5e
Fix CI tests. (#1974)
- Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle 
  deprecations in PyTorch ≥2.3.0

- Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast 
  with the new utilities across all training and inference scripts

- Update all torch.load calls to include weights_only=False for compatibility with 
  newer PyTorch versions
2025-07-01 13:47:55 +08:00
Fangjun Kuang
71377d21cd
Export streaming zipformer models with whisper feature to onnx (#1973) 2025-06-30 19:01:15 +08:00
Fangjun Kuang
abd9437e6d
Add more wheels for piper-phonemize (#1969) 2025-06-24 14:49:16 +08:00
Wei Kang
e1cf4dbace
rm zipvoice (#1967) 2025-06-23 19:22:35 +08:00
Wei Kang
343b8fa2dc
Using non strict match in context graph for contextual words (#1952) 2025-06-19 12:27:15 +08:00
Wei Kang
f80a2ee110
Decrease num_buckets & remove shuffle_buffer_size (#1955) 2025-06-19 12:26:37 +08:00
Wei Kang
3587c4b3b7
Fix decoding byte bpes tokens to words. (#1966) 2025-06-19 12:26:01 +08:00
Wei Kang
762f965cf7
[zipvoice] Add requirements.txt and pinyin.txt, remove k2 from pretrained model inference. (#1965)
* Add requirements.txt and pinyin.txt needed by zipvoice

* simplify the requirements for pretrained model inference
2025-06-18 18:38:46 +08:00
Wei Kang
06539d2b9d
Add Zipvoice (#1964)
* Add ZipVoice - a flow-matching based zero-shot TTS model.
2025-06-17 20:17:12 +08:00
Zengwei Yao
ffb7d05635
refactor branch exchange in cr-ctc (#1954) 2025-05-27 12:09:59 +08:00
Mahsa Yarmohammadi
021e1a8846
Add acknowledgment to README (#1950) 2025-05-22 22:06:35 +08:00
Tianxiang Zhao
30e7ea4b5a
Fix a bug in finetune.py --use-mux (#1949) 2025-05-22 12:05:01 +08:00
Fangjun Kuang
fd8f8780fa
Fix logging torch.dtype. (#1947) 2025-05-21 12:04:57 +08:00
Yifan Yang
e79833aad2
ensure SwooshL/SwooshR output dtype matches input dtype (#1940) 2025-05-12 19:28:48 +08:00
Yifan Yang
4627969ccd
fix bug: undefined name 'partial' (#1941) 2025-05-12 14:19:53 +08:00
Yifan Yang
cd7caf12df
Fix speech_llm recipe (#1936)
* fix training/decoding scripts, cleanup unused code, and ensure compliance with style checks

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2025-04-30 11:41:00 +08:00
Fangjun Kuang
cc2e64a6aa
Fix convert_texts_into_ids() in the tedlium3 recipe. (#1929) 2025-04-24 17:04:46 +08:00
Yifan Yang
5ec95e5482
Fix SpeechLLM recipe (#1926) 2025-04-23 16:18:38 +08:00