icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-09-29 19:15:23 +00:00

Author	SHA1	Message	Date
Amir Hussein	729a5ba3ec	IWSLT-Ta ASR/ST (#1362 ) This is a pull request for Dialectal IWSLT-Tunisian 2022 shared task https://iwslt.org/2022/dialect ASR and ST recipes.	2025-09-22 09:58:00 +08:00
Amir Hussein	855536d355	HENT-SRT (#2026 ) HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation Paper: https://arxiv.org/abs/2506.02157	2025-09-20 00:17:53 +08:00
Fangjun Kuang	63563d16d3	Fix setting joiner dim (#2027 ) Fixes incorrect computation of encoder_dim when encoder_dim is a comma-separated list of integers by ensuring numeric (not lexicographic) max is used. Fixes #2018 - Replace int(max(params.encoder_dim.split(","))) (lexicographic max on strings) with max(_to_int_tuple(params.encoder_dim)) (numeric max). - Apply the fix consistently across all affected training scripts.	2025-09-19 09:42:41 +08:00
qweasdzxcvde	0c7ce5256f	add tot_score inf mask to make training stable (#2019 ) I find there are some inf in tot_score, it makes model cannot converge, add inf mask can make training more stable.	2025-09-08 14:36:12 +08:00
Fangjun Kuang	34fc1fdf0d	Fix transformer decoder layer (#1995 )	2025-07-18 20:12:29 +08:00
Bailey Machiko Hirota	5fe13078cc	Musan implementation for ReazonSpeech (#1988 )	2025-07-18 17:16:19 +08:00
Yifan Yang	9fd0f2dc1d	support left pad for make_pad_mask (#1990 )	2025-07-16 23:59:04 +08:00
Fangjun Kuang	e22bc78f98	Export streaming zipformer2 to RKNN (#1977 )	2025-07-11 13:24:01 +08:00
Teo Wen Shen	da87e7fc99	add weights_only=False to torch.load (#1984 )	2025-07-10 15:27:08 +08:00
Yifan Yang	89728dd4f8	Refactor data preparation for GigaSpeech recipe (#1986 )	2025-07-10 11:17:37 +08:00
Mistmoon	9293edc62f	Add cr-ctc loss and ctc-decode in aishell (#1980 )	2025-07-08 14:47:24 +08:00
Fangjun Kuang	fba5e67d5e	Fix CI tests. (#1974 ) - Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle deprecations in PyTorch ≥2.3.0 - Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast with the new utilities across all training and inference scripts - Update all torch.load calls to include weights_only=False for compatibility with newer PyTorch versions	2025-07-01 13:47:55 +08:00
Fangjun Kuang	71377d21cd	Export streaming zipformer models with whisper feature to onnx (#1973 )	2025-06-30 19:01:15 +08:00
Fangjun Kuang	abd9437e6d	Add more wheels for piper-phonemize (#1969 )	2025-06-24 14:49:16 +08:00
Wei Kang	e1cf4dbace	rm zipvoice (#1967 )	2025-06-23 19:22:35 +08:00
Wei Kang	343b8fa2dc	Using non strict match in context graph for contextual words (#1952 )	2025-06-19 12:27:15 +08:00
Wei Kang	f80a2ee110	Decrease num_buckets & remove shuffle_buffer_size (#1955 )	2025-06-19 12:26:37 +08:00
Wei Kang	3587c4b3b7	Fix decoding byte bpes tokens to words. (#1966 )	2025-06-19 12:26:01 +08:00
Wei Kang	762f965cf7	[zipvoice] Add requirements.txt and pinyin.txt, remove k2 from pretrained model inference. (#1965 ) * Add requirements.txt and pinyin.txt needed by zipvoice * simplify the requirements for pretrained model inference	2025-06-18 18:38:46 +08:00
Wei Kang	06539d2b9d	Add Zipvoice (#1964 ) * Add ZipVoice - a flow-matching based zero-shot TTS model.	2025-06-17 20:17:12 +08:00
Zengwei Yao	ffb7d05635	refactor branch exchange in cr-ctc (#1954 )	2025-05-27 12:09:59 +08:00
Mahsa Yarmohammadi	021e1a8846	Add acknowledgment to README (#1950 )	2025-05-22 22:06:35 +08:00
Tianxiang Zhao	30e7ea4b5a	Fix a bug in finetune.py --use-mux (#1949 )	2025-05-22 12:05:01 +08:00
Fangjun Kuang	fd8f8780fa	Fix logging torch.dtype. (#1947 )	2025-05-21 12:04:57 +08:00
Yifan Yang	e79833aad2	ensure SwooshL/SwooshR output dtype matches input dtype (#1940 )	2025-05-12 19:28:48 +08:00
Yifan Yang	4627969ccd	fix bug: undefined name 'partial' (#1941 )	2025-05-12 14:19:53 +08:00
Yifan Yang	cd7caf12df	Fix speech_llm recipe (#1936 ) * fix training/decoding scripts, cleanup unused code, and ensure compliance with style checks --------- Co-authored-by: Your Name <you@example.com> Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2025-04-30 11:41:00 +08:00
Fangjun Kuang	cc2e64a6aa	Fix convert_texts_into_ids() in the tedlium3 recipe. (#1929 )	2025-04-24 17:04:46 +08:00
Yifan Yang	5ec95e5482	Fix SpeechLLM recipe (#1926 )	2025-04-23 16:18:38 +08:00
math345	64c5364085	Fix bug: When resuming training from a checkpoint, model_avg was not assigned, resulting in a None error. (#1914 )	2025-04-10 11:37:28 +08:00
Fangjun Kuang	300a821f58	Fix aishell training (#1916 )	2025-04-10 10:30:37 +08:00
Fangjun Kuang	171cf8c9fe	Avoid redundant computation in PiecewiseLinear. (#1915 )	2025-04-09 11:52:37 +08:00
Wei Kang	86bd16d496	[KWS]Remove graph compiler (#1905 )	2025-04-02 22:10:06 +08:00
Fangjun Kuang	db9fb8ad31	Add scripts to export streaming zipformer(v1) to RKNN (#1882 )	2025-02-27 17:10:58 +08:00
Yuekai Zhang	2ba665abca	Add F5-TTS with semantic token training results (#1880 ) * add cosy token * update inference code * add extract cosy token * update results * add requirements.txt * update readme --------- Co-authored-by: yuekaiz <yuekaiz@h20-7.cm.cluster> Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>	2025-02-24 13:58:47 +08:00
Machiko Bailey	da597ad782	Update RESULTS.md (#1873 )	2025-02-04 09:04:25 +08:00
Machiko Bailey	0855b0338a	Merge japanese-to-english multilingual branch (#1860 ) * add streaming support to reazonresearch * update README for streaming * Update RESULTS.md * add onnx decode --------- Co-authored-by: root <root@KDA03.cm.cluster> Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> Co-authored-by: root <root@KDA01.cm.cluster> Co-authored-by: zr_jin <peter.jin.cn@gmail.com>	2025-02-04 01:33:09 +08:00
Yuekai Zhang	dd5d7e358b	F5-TTS Training Recipe for WenetSpeech4TTS (#1846 ) * add f5 * add infer * add dit * add README * update pretrained checkpoint usage --------- Co-authored-by: yuekaiz <yuekaiz@h20-5.cm.cluster> Co-authored-by: yuekaiz <yuekaiz@l20-3.cm.cluster> Co-authored-by: yuekaiz <yuekaiz@h20-6.cm.cluster> Co-authored-by: zr_jin <peter.jin.cn@gmail.com>	2025-01-27 16:33:02 +08:00
zr_jin	39c466e802	Update shared (#1868 )	2025-01-21 11:04:11 +08:00
zr_jin	79074ef0d4	removed the erroneous ‘’continual'' implementation (#1865 )	2025-01-16 20:51:28 +08:00
zr_jin	8ab0352e60	Update style_check.yml (#1866 )	2025-01-16 17:36:09 +08:00
Han Zhu	ab91112909	Improve infinity-check (#1862 ) 1. Attach the inf-check hooks if the grad scale is getting too small. 2. Add try-catch to avoid OOM in the inf-check hooks. 3. Set warmup_start=0.1 to reduce chances of divergence	2025-01-09 15:05:38 +08:00
Seonuk Kim	8d602806c3	Update conformer.py (#1859 ) * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py Swich -? Swish	2025-01-06 17:31:13 +08:00
Seonuk Kim	3b6d54007b	Update conformer.py (#1857 ) * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension * Update conformer.py feedforward dimention -> feedforward dimension	2025-01-06 13:17:02 +08:00
Fangjun Kuang	3b263539cd	Publish MatchaTTS onnx models trained with LJSpeech to huggingface (#1854 )	2025-01-02 15:54:34 +08:00
Fangjun Kuang	bfffda5afb	Add MatchaTTS for the Chinese dataset Baker (#1849 )	2024-12-31 17:17:05 +08:00
Han Zhu	df46a3eaf9	Warn instead of raising exceptions in inf-check (#1852 )	2024-12-31 16:52:06 +08:00
Yifan Yang	a2b0f6057c	Small fix (#1853 )	2024-12-31 07:41:44 +08:00
Han Zhu	48088cb807	Refactor optimizer (#1837 ) * Print indexes of largest grad	2024-12-30 15:30:02 +08:00
Han Zhu	57e9f2a8db	Add the "rms-sort" diagnostics (#1851 )	2024-12-30 15:27:05 +08:00

1 2 3 4 5 ...

1223 Commits