icefall

Author	SHA1	Message	Date
Yuekai Zhang	a04482d23e	Merge 559f9e2deff33077461428d422d9f03c95988b01 into 693f069de73fd91d7c2009571245d97221cc3a3f	2025-10-08 21:32:01 +05:30
Karel Vesely	693f069de7	zipformer/ctc_align.py (#2020 ) * zipformer/ctc_align.py - tool for forced-alignment with CTC model - provides timeline, computes per-token and per-utterance acoustic confidences - based on torchaudio `forced_align()` - confidences are computed in several ways other modifications: - LibriSpeechAsrDataModel extended with `::load_manifest()` to allow passing-in cutset from CLI. - update @custom_fwd @custom_bwd in scaling.py - streaming_decode.py update errs/recogs/log filenames '-' <-> '_' * putting back `custom_bwd`, `custom_fwd` * integrating remarks from PR * update of argparse help strings * ctc_align.py, avoid shadowing a variable * Finalizing the code: - adding some coderabbit suggestions. - removing `word_table`, `decoding_graph` from aligner API (unused) - improved consistency of variable names (confidences) - updated docstrings	2025-10-06 07:49:37 +08:00
Amir Hussein	729a5ba3ec	IWSLT-Ta ASR/ST (#1362 ) This is a pull request for Dialectal IWSLT-Tunisian 2022 shared task https://iwslt.org/2022/dialect ASR and ST recipes.	2025-09-22 09:58:00 +08:00
Amir Hussein	855536d355	HENT-SRT (#2026 ) HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation Paper: https://arxiv.org/abs/2506.02157	2025-09-20 00:17:53 +08:00
Fangjun Kuang	63563d16d3	Fix setting joiner dim (#2027 ) Fixes incorrect computation of encoder_dim when encoder_dim is a comma-separated list of integers by ensuring numeric (not lexicographic) max is used. Fixes #2018 - Replace int(max(params.encoder_dim.split(","))) (lexicographic max on strings) with max(_to_int_tuple(params.encoder_dim)) (numeric max). - Apply the fix consistently across all affected training scripts.	2025-09-19 09:42:41 +08:00
qweasdzxcvde	0c7ce5256f	add tot_score inf mask to make training stable (#2019 ) I find there are some inf in tot_score, it makes model cannot converge, add inf mask can make training more stable.	2025-09-08 14:36:12 +08:00
Fangjun Kuang	34fc1fdf0d	Fix transformer decoder layer (#1995 )	2025-07-18 20:12:29 +08:00
Bailey Machiko Hirota	5fe13078cc	Musan implementation for ReazonSpeech (#1988 )	2025-07-18 17:16:19 +08:00
Yifan Yang	9fd0f2dc1d	support left pad for make_pad_mask (#1990 )	2025-07-16 23:59:04 +08:00
Fangjun Kuang	e22bc78f98	Export streaming zipformer2 to RKNN (#1977 )	2025-07-11 13:24:01 +08:00
Teo Wen Shen	da87e7fc99	add weights_only=False to torch.load (#1984 )	2025-07-10 15:27:08 +08:00
Yifan Yang	89728dd4f8	Refactor data preparation for GigaSpeech recipe (#1986 )	2025-07-10 11:17:37 +08:00
Mistmoon	9293edc62f	Add cr-ctc loss and ctc-decode in aishell (#1980 )	2025-07-08 14:47:24 +08:00
Fangjun Kuang	fba5e67d5e	Fix CI tests. (#1974 ) - Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle deprecations in PyTorch ≥2.3.0 - Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast with the new utilities across all training and inference scripts - Update all torch.load calls to include weights_only=False for compatibility with newer PyTorch versions	2025-07-01 13:47:55 +08:00
Fangjun Kuang	71377d21cd	Export streaming zipformer models with whisper feature to onnx (#1973 )	2025-06-30 19:01:15 +08:00
Fangjun Kuang	abd9437e6d	Add more wheels for piper-phonemize (#1969 )	2025-06-24 14:49:16 +08:00
Wei Kang	e1cf4dbace	rm zipvoice (#1967 )	2025-06-23 19:22:35 +08:00
Wei Kang	343b8fa2dc	Using non strict match in context graph for contextual words (#1952 )	2025-06-19 12:27:15 +08:00
Wei Kang	f80a2ee110	Decrease num_buckets & remove shuffle_buffer_size (#1955 )	2025-06-19 12:26:37 +08:00
Wei Kang	3587c4b3b7	Fix decoding byte bpes tokens to words. (#1966 )	2025-06-19 12:26:01 +08:00
Wei Kang	762f965cf7	[zipvoice] Add requirements.txt and pinyin.txt, remove k2 from pretrained model inference. (#1965 ) * Add requirements.txt and pinyin.txt needed by zipvoice * simplify the requirements for pretrained model inference	2025-06-18 18:38:46 +08:00
Wei Kang	06539d2b9d	Add Zipvoice (#1964 ) * Add ZipVoice - a flow-matching based zero-shot TTS model.	2025-06-17 20:17:12 +08:00
root	559f9e2def	fix repeat bos and pad id	2025-06-04 10:02:42 +00:00
root	80677a55f8	remove stats	2025-06-03 00:48:39 -07:00
root	5becf6927d	remove concat three items	2025-06-03 00:18:21 -07:00
root	4c0396f8f2	support text2speech ultrachat	2025-06-02 23:16:03 -07:00
root	49256fa917	fix tts stage decode	2025-05-28 02:34:07 +00:00
root	5a7c72cb47	add tts task decode	2025-05-27 02:12:22 -07:00
root	1281d7a515	add tts training	2025-05-27 00:18:23 -07:00
Zengwei Yao	ffb7d05635	refactor branch exchange in cr-ctc (#1954 )	2025-05-27 12:09:59 +08:00
root	39700d5c94	refactor train to reuse code	2025-05-26 19:53:16 -07:00
root	e6e1f3fa4f	add tts stage	2025-05-23 01:53:05 -07:00
root	dd858f0cd1	support instruct s2s	2025-05-22 23:16:33 -07:00
root	9fff18edec	refactor code	2025-05-22 19:14:52 -07:00
Mahsa Yarmohammadi	021e1a8846	Add acknowledgment to README (#1950 )	2025-05-22 22:06:35 +08:00
root	7a12d88d6c	update	2025-05-21 22:18:57 -07:00
root	7aa6c80ddb	add multi gpu processing	2025-05-21 21:54:59 -07:00
Tianxiang Zhao	30e7ea4b5a	Fix a bug in finetune.py --use-mux (#1949 )	2025-05-22 12:05:01 +08:00
Fangjun Kuang	fd8f8780fa	Fix logging torch.dtype. (#1947 )	2025-05-21 12:04:57 +08:00
root	ca84aff5d6	remove cosyvoice lib	2025-05-20 00:52:09 -07:00
root	9cdd393f43	add server url	2025-05-20 07:48:49 +00:00
root	50fc1aba60	add multi-node	2025-05-18 18:47:22 -07:00
root	4a29430349	add loss type	2025-05-19 01:31:21 +00:00
root	e52581e69b	support local_rank for multi-node	2025-05-16 00:02:12 -07:00
root	0e8c1db4d0	fix speed perturb issue	2025-05-15 22:45:04 -07:00
root	bfb4ebeb83	remove triton	2025-05-15 14:32:49 +00:00
root	f81363d324	add speech continuation pretraining	2025-05-15 14:16:51 +00:00
root	e65725810c	fix mmsu	2025-05-13 09:13:12 +00:00
root	cbf3af31fd	add voicebench eval	2025-05-13 05:37:11 +00:00
Yifan Yang	e79833aad2	ensure SwooshL/SwooshR output dtype matches input dtype (#1940 )	2025-05-12 19:28:48 +08:00

1 2 3 4 5 ...

1282 Commits