1271 Commits

Author SHA1 Message Date
Bailey Machiko Hirota
9d93d63cf2 Update RESULTS.md 2025-07-28 17:52:36 +09:00
Bailey Hirota
dc4db379ea PR review suggestions implemented 2025-07-28 17:52:36 +09:00
Bailey Hirota
6012edbc17 black and isort formatting 2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
154ef43206 Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
f7fec4a6e7 Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
542620c4e3 Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Machiko Hirota
310aaec3cc Update egs/multi_ja_en/ASR/local/utils/update_cutset_paths.py
Co-authored-by: Yubo <54519381+yuta0306@users.noreply.github.com>
2025-07-28 17:52:36 +09:00
Bailey Hirota
aee7b87adb working changes for musan mixing 2025-07-28 17:52:36 +09:00
Bailey Hirota
d5cc0301d4 attempt to fix musan paths 2025-07-28 17:52:36 +09:00
Bailey Hirota
0f700ed0b2 update musan symlinks 2025-07-28 17:52:36 +09:00
Bailey Hirota
093a035935 update musan paths 2025-07-28 17:52:36 +09:00
Bailey Hirota
4e92879751 update musan path 2025-07-28 17:52:36 +09:00
Bailey Hirota
f51621b374 resolve typos and import issues 2025-07-28 17:52:36 +09:00
Bailey Hirota
de35cc2760 remove comment 2025-07-28 17:52:36 +09:00
Bailey Hirota
5ec9389909 commenting 2025-07-28 17:52:36 +09:00
Bailey Hirota
df923f3a16 typos 2025-07-28 17:52:36 +09:00
Bailey Hirota
70a7940c95 changes to asr_datamodule for musan support 2025-07-28 17:52:36 +09:00
Kinan Martin
5f2f6843c9 make prepare.sh symlinks relative 2025-07-28 17:52:36 +09:00
Bailey Hirota
19b62c008d remove unused local scripts 2025-07-28 17:52:36 +09:00
Bailey Hirota
f6ad423398 changes to train script - no need for limiting utterance length here 2025-07-28 17:52:36 +09:00
Bailey Hirota
ddc2daaccd remove commented out codels 2025-07-28 17:52:36 +09:00
Bailey Hirota
f3e59dfa4c add stage 6 - update cutset paths to prepare 2025-07-28 17:52:36 +09:00
Bailey Hirota
cdf246ca1c update manifest dir path 2025-07-28 17:52:36 +09:00
Bailey Hirota
c77a8470f5 add step 4: display manifest stats to mls_eng 2025-07-28 17:52:36 +09:00
Kinan Martin
fd3fbe6454 Update README.md to reflect MLS English dataset 2025-07-28 17:52:36 +09:00
Kinan Martin
78ee595b45 Add failsafe for MLS English dev set key alternate name as validation 2025-07-28 17:52:36 +09:00
Kinan Martin
ad1be22919 Parametrize dev and test split sizes. 2025-07-28 17:52:36 +09:00
Kinan Martin
b167ac7b40 add utility file for creating subsets of mls english. must be fixed to make dev and test splits have matching sizes to reazonspeech 2025-07-28 17:52:36 +09:00
Kinan Martin
eafbd6429b add utility file for updating the storage_path of cutsets for use in the multilingual training recipe directory structure 2025-07-28 17:52:36 +09:00
Kinan Martin
2f1c61124a fix decode script data module usage 2025-07-28 17:52:36 +09:00
Kinan Martin
3307836352 Combined updates. Changed BBPE path structure, changed dataset path structure, added script to update cutset paths. WIP 2025-07-28 17:52:36 +09:00
Kinan Martin
a8ecb16d47 use huggingface_hub library to download mls_english 2025-07-28 17:52:36 +09:00
Kinan Martin
f4b29870a0 switch mls_english clone from https to ssh 2025-07-28 17:52:36 +09:00
Kinan Martin
782e1fb958 fix stage 5 output pathing 2025-07-28 17:52:36 +09:00
Kinan Martin
5417e0926b restore version of mls_english compute_fbank_mls_english.py and prepare.sh from commit 547f5c5 2025-07-28 17:52:36 +09:00
Bailey Hirota
6d71d9cff4 remove bilingual tag from train.py 2025-07-28 17:52:28 +09:00
Kinan Martin
3751441dad deprecate params.bilingual=0, replace ReazonSpeechAsrDataModule for MultiDatasetAsrDataModule, not tested yet 2025-07-28 17:49:35 +09:00
Bailey Hirota
61e81bfc26 Revert "add fbank"
This reverts commit ba603e0a0a514056ec6d32677053c41743a1a5dd.
2025-07-28 17:49:35 +09:00
Bailey Hirota
c83b115b49 add fbank 2025-07-28 17:49:35 +09:00
Kinan Martin
abebb6aaf0 new version of multi_ja_en prepare.sh script which swaps Librispeech for MLS English 2025-07-28 17:49:35 +09:00
Kinan Martin
fa84782b21 optimize with num_jobs on save_audios 2025-07-28 17:49:35 +09:00
Kinan Martin
f2e01712de fix stage 2 and 3 2025-07-28 17:49:35 +09:00
Kinan Martin
59519a41fa fix validation manifest name 2025-07-28 17:49:35 +09:00
Kinan Martin
4ca8ee94f0 adjusted prepare.sh to only calculate fbank and manifest together; adjust datamodule to load from manifest files 2025-07-28 17:49:35 +09:00
Kinan Martin
d6e3c98e58 move compute_fbank_mls_english.py, add validate_manifest.py, add shared symlink to librispeech 2025-07-28 17:49:35 +09:00
Kinan Martin
68e3ceaaac instead of on-the-fly features, precompute fbank and manifests in prepare.sh 2025-07-28 17:49:35 +09:00
Kinan Martin
ce44150e25 readme 2025-07-28 17:49:35 +09:00
Kinan Martin
a34d34a38e pre-commit hooks 2025-07-28 17:49:35 +09:00
Kinan Martin
898525962c separate transcript prep stage from bpe train stage 2025-07-28 17:49:35 +09:00
Kinan Martin
8c1c7100d3 symlink copied files to librispeech recipe dir 2025-07-28 17:49:35 +09:00