icefall

Author	SHA1	Message	Date
Yifan Yang	95b4c90a5e	Merge 70f13e54d814761432acc1c23e9ef4ffd566df41 into 0904e490c5fb424dc5cb4d14ae468e4d32a07dc4	2025-11-30 11:23:35 +08:00
Fangjun Kuang	0904e490c5	Fix gigaspeech dataset iterator. (#2045 ) Previously, it was reset after every epoch, which may cause it to always use the first part of the gigaspeech dataset if you choose a small --giga-prob.	2025-11-28 11:42:20 +08:00
Karel Vesely	693f069de7	zipformer/ctc_align.py (#2020 ) * zipformer/ctc_align.py - tool for forced-alignment with CTC model - provides timeline, computes per-token and per-utterance acoustic confidences - based on torchaudio `forced_align()` - confidences are computed in several ways other modifications: - LibriSpeechAsrDataModel extended with `::load_manifest()` to allow passing-in cutset from CLI. - update @custom_fwd @custom_bwd in scaling.py - streaming_decode.py update errs/recogs/log filenames '-' <-> '_' * putting back `custom_bwd`, `custom_fwd` * integrating remarks from PR * update of argparse help strings * ctc_align.py, avoid shadowing a variable * Finalizing the code: - adding some coderabbit suggestions. - removing `word_table`, `decoding_graph` from aligner API (unused) - improved consistency of variable names (confidences) - updated docstrings	2025-10-06 07:49:37 +08:00
Amir Hussein	729a5ba3ec	IWSLT-Ta ASR/ST (#1362 ) This is a pull request for Dialectal IWSLT-Tunisian 2022 shared task https://iwslt.org/2022/dialect ASR and ST recipes.	2025-09-22 09:58:00 +08:00
Amir Hussein	855536d355	HENT-SRT (#2026 ) HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation Paper: https://arxiv.org/abs/2506.02157	2025-09-20 00:17:53 +08:00
Fangjun Kuang	63563d16d3	Fix setting joiner dim (#2027 ) Fixes incorrect computation of encoder_dim when encoder_dim is a comma-separated list of integers by ensuring numeric (not lexicographic) max is used. Fixes #2018 - Replace int(max(params.encoder_dim.split(","))) (lexicographic max on strings) with max(_to_int_tuple(params.encoder_dim)) (numeric max). - Apply the fix consistently across all affected training scripts.	2025-09-19 09:42:41 +08:00
qweasdzxcvde	0c7ce5256f	add tot_score inf mask to make training stable (#2019 ) I find there are some inf in tot_score, it makes model cannot converge, add inf mask can make training more stable.	2025-09-08 14:36:12 +08:00
Fangjun Kuang	34fc1fdf0d	Fix transformer decoder layer (#1995 )	2025-07-18 20:12:29 +08:00
Bailey Machiko Hirota	5fe13078cc	Musan implementation for ReazonSpeech (#1988 )	2025-07-18 17:16:19 +08:00
Yifan Yang	9fd0f2dc1d	support left pad for make_pad_mask (#1990 )	2025-07-16 23:59:04 +08:00
Fangjun Kuang	e22bc78f98	Export streaming zipformer2 to RKNN (#1977 )	2025-07-11 13:24:01 +08:00
Teo Wen Shen	da87e7fc99	add weights_only=False to torch.load (#1984 )	2025-07-10 15:27:08 +08:00
Yifan Yang	89728dd4f8	Refactor data preparation for GigaSpeech recipe (#1986 )	2025-07-10 11:17:37 +08:00
Mistmoon	9293edc62f	Add cr-ctc loss and ctc-decode in aishell (#1980 )	2025-07-08 14:47:24 +08:00
Yifan Yang	70f13e54d8	Merge branch 'k2-fsa:master' into dev/speechllm	2025-07-07 11:32:12 +08:00
Fangjun Kuang	fba5e67d5e	Fix CI tests. (#1974 ) - Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle deprecations in PyTorch ≥2.3.0 - Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast with the new utilities across all training and inference scripts - Update all torch.load calls to include weights_only=False for compatibility with newer PyTorch versions	2025-07-01 13:47:55 +08:00
Fangjun Kuang	71377d21cd	Export streaming zipformer models with whisper feature to onnx (#1973 )	2025-06-30 19:01:15 +08:00
Fangjun Kuang	abd9437e6d	Add more wheels for piper-phonemize (#1969 )	2025-06-24 14:49:16 +08:00
Wei Kang	e1cf4dbace	rm zipvoice (#1967 )	2025-06-23 19:22:35 +08:00
Wei Kang	343b8fa2dc	Using non strict match in context graph for contextual words (#1952 )	2025-06-19 12:27:15 +08:00
Wei Kang	f80a2ee110	Decrease num_buckets & remove shuffle_buffer_size (#1955 )	2025-06-19 12:26:37 +08:00
Wei Kang	3587c4b3b7	Fix decoding byte bpes tokens to words. (#1966 )	2025-06-19 12:26:01 +08:00
Yifan Yang	56349001d6	Merge branch 'k2-fsa:master' into dev/speechllm	2025-06-18 21:09:44 +08:00
Wei Kang	762f965cf7	[zipvoice] Add requirements.txt and pinyin.txt, remove k2 from pretrained model inference. (#1965 ) * Add requirements.txt and pinyin.txt needed by zipvoice * simplify the requirements for pretrained model inference	2025-06-18 18:38:46 +08:00
yfyeung	53111d0e46	fix for multigpu	2025-06-18 07:33:15 +00:00
yfyeung	39d90356fe	fix deepspeed config fix	2025-06-18 05:04:00 +00:00
Yifan Yang	c571a88b59	Merge branch 'k2-fsa:master' into dev/speechllm	2025-06-18 12:29:27 +08:00
Yifan Yang	34639d5249	use padding instead of trimming (suggested by @shylockasr) use ctc compress (suggested by @shylockasr) fix revert revert revert	2025-06-18 04:25:30 +00:00
Zengwei Yao	05e3094429	refactor branch exchange in cr-ctc (#1954 )	2025-06-18 04:25:15 +00:00
Wei Kang	06539d2b9d	Add Zipvoice (#1964 ) * Add ZipVoice - a flow-matching based zero-shot TTS model.	2025-06-17 20:17:12 +08:00
yfyeung	7c30dd570b	restrict deepspeed >=0.16.9	2025-05-28 03:42:03 +00:00
Zengwei Yao	ffb7d05635	refactor branch exchange in cr-ctc (#1954 )	2025-05-27 12:09:59 +08:00
yfyeung	11ccaa3ab8	add requirements.txt	2025-05-26 04:11:28 +00:00
Yifan Yang	d1a535dc76	Merge branch 'k2-fsa:master' into dev/speechllm	2025-05-24 13:13:42 +08:00
Mahsa Yarmohammadi	021e1a8846	Add acknowledgment to README (#1950 )	2025-05-22 22:06:35 +08:00
Tianxiang Zhao	30e7ea4b5a	Fix a bug in finetune.py --use-mux (#1949 )	2025-05-22 12:05:01 +08:00
Fangjun Kuang	fd8f8780fa	Fix logging torch.dtype. (#1947 )	2025-05-21 12:04:57 +08:00
Yifan Yang	24b6f42340	fix typos in docs fix typo in RESULTS.md Update RESULTS.md	2025-05-13 14:51:17 +08:00
yifanyeung	62dfe56cbe	restore checkpoint save after validation	2025-05-13 06:14:59 +00:00
yfyeung	06667e1f6d	add batch shave mechanism fix fix	2025-05-12 17:39:15 +00:00
Yifan Yang	ea20ac208d	Merge branch 'k2-fsa:master' into dev/speechllm	2025-05-12 20:31:41 +08:00
Yifan Yang	e79833aad2	ensure SwooshL/SwooshR output dtype matches input dtype (#1940 )	2025-05-12 19:28:48 +08:00
Yifan Yang	c709ce433d	Merge branch 'k2-fsa:master' into dev/speechllm	2025-05-12 14:38:13 +08:00
yfyeung	2793ccdf56	remove checkpoint save after validation	2025-05-12 06:36:20 +00:00
Yifan Yang	4627969ccd	fix bug: undefined name 'partial' (#1941 )	2025-05-12 14:19:53 +08:00
yfyeung	c078772e59	skip OOM	2025-05-11 17:23:19 +00:00
yfyeung	9939c2b72d	remove duplicated torch autocast	2025-05-11 17:03:44 +00:00
Yifan Yang	5fbeed9f96	fix SwooshR and SwooshL	2025-05-12 00:48:42 +08:00
yfyeung	cd3adad46d	use quadratic-duration	2025-05-10 17:47:30 +00:00
yfyeung	c75767f600	set world_size and rank explicitly update	2025-05-10 17:47:28 +00:00

1 2 3 4 5 ...

1254 Commits