1352 Commits

Author SHA1 Message Date
Kinan Martin
d136086d6b add utility file for creating subsets of mls english. must be fixed to make dev and test splits have matching sizes to reazonspeech 2025-08-05 18:38:55 +09:00
Kinan Martin
b25254f0c9 add utility file for updating the storage_path of cutsets for use in the multilingual training recipe directory structure 2025-08-05 18:37:41 +09:00
Kinan Martin
68bff93940 fix decode script data module usage 2025-08-05 18:36:27 +09:00
Kinan Martin
1b1a317603 Combined updates. Changed BBPE path structure, changed dataset path structure, added script to update cutset paths. WIP 2025-08-05 18:35:20 +09:00
Kinan Martin
1093e78612 use huggingface_hub library to download mls_english 2025-08-05 18:34:15 +09:00
Kinan Martin
5682978c64 switch mls_english clone from https to ssh 2025-08-05 18:33:03 +09:00
Kinan Martin
2265e1afed fix stage 5 output pathing 2025-08-05 18:31:48 +09:00
Kinan Martin
7bea23e954 restore version of mls_english compute_fbank_mls_english.py and prepare.sh from commit 547f5c5 2025-08-05 18:30:41 +09:00
Bailey Hirota
8b035a0c96 remove bilingual tag from train.py 2025-08-05 18:29:29 +09:00
Kinan Martin
99db0e4643 deprecate params.bilingual=0, replace ReazonSpeechAsrDataModule for MultiDatasetAsrDataModule, not tested yet 2025-08-05 18:16:17 +09:00
Bailey Hirota
31a37c7e44 Revert "add fbank"
This reverts commit ba603e0a0a514056ec6d32677053c41743a1a5dd.
2025-08-05 18:15:04 +09:00
Bailey Hirota
7d462aa8b4 add fbank 2025-08-05 18:13:51 +09:00
Kinan Martin
06e429131b new version of multi_ja_en prepare.sh script which swaps Librispeech for MLS English 2025-08-05 18:12:40 +09:00
Kinan Martin
0e86ef805c optimize with num_jobs on save_audios 2025-08-05 18:11:22 +09:00
Kinan Martin
73dea24fd9 fix stage 2 and 3 2025-08-05 18:10:15 +09:00
Kinan Martin
2504b23861 fix validation manifest name 2025-08-05 18:09:00 +09:00
Kinan Martin
eb2168bc49 adjusted prepare.sh to only calculate fbank and manifest together; adjust datamodule to load from manifest files 2025-08-05 18:07:45 +09:00
Kinan Martin
a8f45bc08b move compute_fbank_mls_english.py, add validate_manifest.py, add shared symlink to librispeech 2025-08-05 18:06:28 +09:00
Kinan Martin
fe88d1db36 instead of on-the-fly features, precompute fbank and manifests in prepare.sh 2025-08-05 18:05:18 +09:00
Kinan Martin
996334f520 readme 2025-08-05 18:04:04 +09:00
Kinan Martin
24db8c11ba pre-commit hooks 2025-08-05 18:02:50 +09:00
Kinan Martin
c532a503e7 separate transcript prep stage from bpe train stage 2025-08-05 18:01:41 +09:00
Kinan Martin
313afea773 symlink copied files to librispeech recipe dir 2025-08-05 18:00:24 +09:00
Kinan Martin
e76b749450 cleaned-up version of recipe 2025-08-05 17:59:15 +09:00
Kinan Martin
1b8a3061b0 replace file 2025-08-05 17:57:59 +09:00
Kinan Martin
0ab027411f change default path 2025-08-05 17:56:51 +09:00
Kinan Martin
ba6d8e8b26 update prepare.sh, fix asr_datamodule.py 2025-08-05 17:55:40 +09:00
Kinan Martin
c92c606c5f WIP v0 MLS English recipe 2025-08-05 17:54:30 +09:00
Fangjun Kuang
1c5d792bf1 Validate generated manifest files. (#338) 2025-08-05 17:46:36 +09:00
Kinan Martin
dbd89773d5 Manually fix merge conflict in multi_ja_en/ASR/zipformer/train.py 2025-07-28 17:59:47 +09:00
Bailey Machiko Hirota
aed139f125 Musan implementation for ReazonSpeech (#1988) 2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
9d93d63cf2 Update RESULTS.md 2025-07-28 17:52:36 +09:00
Bailey Hirota
dc4db379ea PR review suggestions implemented 2025-07-28 17:52:36 +09:00
Bailey Hirota
6012edbc17 black and isort formatting 2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
154ef43206 Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
f7fec4a6e7 Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
542620c4e3 Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
310aaec3cc Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Hirota
aee7b87adb working changes for musan mixing 2025-07-28 17:52:36 +09:00
Bailey Hirota
d5cc0301d4 attempt to fix musan paths 2025-07-28 17:52:36 +09:00
Bailey Hirota
0f700ed0b2 update musan symlinks 2025-07-28 17:52:36 +09:00
Bailey Hirota
093a035935 update musan paths 2025-07-28 17:52:36 +09:00
Bailey Hirota
4e92879751 update musan path 2025-07-28 17:52:36 +09:00
Bailey Hirota
f51621b374 resolve typos and import issues 2025-07-28 17:52:36 +09:00
Bailey Hirota
de35cc2760 remove comment 2025-07-28 17:52:36 +09:00
Bailey Hirota
5ec9389909 commenting 2025-07-28 17:52:36 +09:00
Bailey Hirota
df923f3a16 typos 2025-07-28 17:52:36 +09:00
Bailey Hirota
70a7940c95 changes to asr_datamodule for musan support 2025-07-28 17:52:36 +09:00
Kinan Martin
5f2f6843c9 make prepare.sh symlinks relative 2025-07-28 17:52:36 +09:00
Bailey Hirota
19b62c008d remove unused local scripts 2025-07-28 17:52:36 +09:00