icefall

Author	SHA1	Message	Date
Karel Vesely	693f069de7	zipformer/ctc_align.py (#2020 ) * zipformer/ctc_align.py - tool for forced-alignment with CTC model - provides timeline, computes per-token and per-utterance acoustic confidences - based on torchaudio `forced_align()` - confidences are computed in several ways other modifications: - LibriSpeechAsrDataModel extended with `::load_manifest()` to allow passing-in cutset from CLI. - update @custom_fwd @custom_bwd in scaling.py - streaming_decode.py update errs/recogs/log filenames '-' <-> '_' * putting back `custom_bwd`, `custom_fwd` * integrating remarks from PR * update of argparse help strings * ctc_align.py, avoid shadowing a variable * Finalizing the code: - adding some coderabbit suggestions. - removing `word_table`, `decoding_graph` from aligner API (unused) - improved consistency of variable names (confidences) - updated docstrings	2025-10-06 07:49:37 +08:00
Amir Hussein	729a5ba3ec	IWSLT-Ta ASR/ST (#1362 ) This is a pull request for Dialectal IWSLT-Tunisian 2022 shared task https://iwslt.org/2022/dialect ASR and ST recipes.	2025-09-22 09:58:00 +08:00
Fangjun Kuang	e22bc78f98	Export streaming zipformer2 to RKNN (#1977 )	2025-07-11 13:24:01 +08:00
Fangjun Kuang	fba5e67d5e	Fix CI tests. (#1974 ) - Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle deprecations in PyTorch ≥2.3.0 - Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast with the new utilities across all training and inference scripts - Update all torch.load calls to include weights_only=False for compatibility with newer PyTorch versions	2025-07-01 13:47:55 +08:00
Fangjun Kuang	71377d21cd	Export streaming zipformer models with whisper feature to onnx (#1973 )	2025-06-30 19:01:15 +08:00
Zengwei Yao	ffb7d05635	refactor branch exchange in cr-ctc (#1954 )	2025-05-27 12:09:59 +08:00
Tianxiang Zhao	30e7ea4b5a	Fix a bug in finetune.py --use-mux (#1949 )	2025-05-22 12:05:01 +08:00
Yifan Yang	e79833aad2	ensure SwooshL/SwooshR output dtype matches input dtype (#1940 )	2025-05-12 19:28:48 +08:00
Fangjun Kuang	171cf8c9fe	Avoid redundant computation in PiecewiseLinear. (#1915 )	2025-04-09 11:52:37 +08:00
Han Zhu	ab91112909	Improve infinity-check (#1862 ) 1. Attach the inf-check hooks if the grad scale is getting too small. 2. Add try-catch to avoid OOM in the inf-check hooks. 3. Set warmup_start=0.1 to reduce chances of divergence	2025-01-09 15:05:38 +08:00
Han Zhu	48088cb807	Refactor optimizer (#1837 ) * Print indexes of largest grad	2024-12-30 15:30:02 +08:00
Fangjun Kuang	d4d4f281ec	Revert "Replace deprecated pytorch methods (#1814 )" (#1841 ) This reverts commit 3e4da5f78160d3dba3bdf97968bd7ceb8c11631f.	2024-12-18 16:49:57 +08:00
Li Peng	3e4da5f781	Replace deprecated pytorch methods (#1814 ) * Replace deprecated pytorch methods - torch.cuda.amp.GradScaler(...) => torch.amp.GradScaler("cuda", ...) - torch.cuda.amp.autocast(...) => torch.amp.autocast("cuda", ...) * Replace `with autocast(...)` with `with autocast("cuda", ...)` Co-authored-by: Li Peng <lipeng@unisound.ai>	2024-12-16 10:24:16 +08:00
zr_jin	87cadfcd2e	fixed formatting issue (#1791 ) * isort fixed formatting issue	2024-10-30 21:14:12 +08:00
Wei Kang	d513d456b8	Add prefix beam search and corresponding decoding methods (#1786 ) * Add prefix beam search / shallow fussion / hotwords in librispeech ctc decode * Add librispeech cr-ctc prefix beam search results	2024-10-30 10:14:34 +08:00
Fangjun Kuang	05f756390c	Avoid using lr from checkpoint. (#1781 )	2024-10-28 00:59:04 +08:00
zr_jin	88bacfb9e6	minor fixes for the repo (#1775 ) * minor fixes for the repo Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2024-10-21 13:51:56 +08:00
zr_jin	e8b6b920c0	A LibriTTS recipe on both ASR & Neural Codec Tasks (#1746 ) * added ASR & CODEC recipes for LibriTTS corpus	2024-10-21 11:30:14 +08:00
Zengwei Yao	693d84a301	Add Consistency-Regularized CTC (#1766 ) * support consistency-regularized CTC * update arguments of cr-ctc * set default value of cr_loss_masked_scale to 1.0 * minor fix * refactor codes * update RESULTS.md	2024-10-21 10:35:26 +08:00
Zengwei Yao	fbba712887	Fix issue with eval mode in ActivationDropoutLinear (#1770 ) * Fix issue with eval mode in ActivationDropoutLinear --------- Co-authored-by: Daniel Povey <dpovey@gmail.com>	2024-10-12 19:09:05 +08:00
Fangjun Kuang	6f1abd832d	Fix exporting streaming zipformer models. (#1755 )	2024-09-11 21:04:52 +08:00
Xiaoyu Yang	a6c02a4d8c	zipformer BF16 training recipe (#1700 ) Support Zipformer AMP +BF16 training	2024-08-23 09:42:22 +08:00
Yuekai Zhang	3b434fe83c	fix triton onnx export (#1730 )	2024-08-23 09:33:46 +08:00
Karel Vesely	1730fce688	split `save_results()` -> `save_asr_output()` + `save_wer_results()` (#1712 ) - the idea is to support `--skip-scoring` argument passed to a decoding script - created for Transducer decoding (non-streaming, streaming) - it can be done also for CTC decoding... (not yet) - also added `--label` for extra label in `streaming_decode.py` - and also added `set_caching_enabled(True)`, which has no effect on librispeech, but it leads to faster runtime on DBs with long recordings (assuming `librispeech/zipformer` scripts are the example scripts for other setups)	2024-08-13 23:02:14 +08:00
Zengwei Yao	d47c078286	add decoding method of ctc-greedy-search in zipformer recipe (#1690 )	2024-07-14 17:30:13 +08:00
Zengwei Yao	334beed2af	fix usages of returned losses after adding attention-decoder in zipformer (#1689 )	2024-07-12 16:50:58 +08:00
Teo Wen Shen	19048e155b	Cast grad_scale in whiten to float (#1663 ) * cast grad_scale in whiten to float * fix cast in zipformer_lora	2024-07-11 15:12:30 +08:00
Yifan Yang	d65187ec52	Small fix (#1686 )	2024-07-11 14:45:35 +08:00
zr_jin	2d64228efa	Update attention_decoder.py (#1681 )	2024-07-06 09:01:34 +08:00
Zengwei Yao	f76afff741	Support CTC/AED option for Zipformer recipe (#1389 ) * add attention-decoder loss option for zipformer recipe * add attention-decoder-rescoring * update export.py and pretrained_ctc.py * update RESULTS.md	2024-07-05 20:19:18 +08:00
Yifan Yang	cbcac23d26	Fix typos, remove unused packages, normalize comments (#1678 )	2024-07-04 14:19:45 +08:00
Manix	eaab2c819f	Zipformer Onnx FP16 (#1671 ) Signed-off-by: manickavela29 <manickavela1998@gmail.com>	2024-06-27 16:08:24 +08:00
Fangjun Kuang	b88062292b	Typo fixes (#1643 )	2024-06-03 16:49:21 +08:00
Zengwei Yao	0df406c5da	Initialize BiasNorm bias with small random values (#1630 )	2024-05-20 22:32:02 +08:00
Yifan Yang	ed6bc200e3	Update train.py (#1590 )	2024-04-11 19:35:25 +08:00
Zengwei Yao	353469182c	fix issue in zipformer.py (#1566 )	2024-03-21 15:59:43 +08:00
Fangjun Kuang	489263e5bb	Add streaming HLG decoding for zipformer CTC. (#1557 ) Note it supports only CPU.	2024-03-18 20:11:47 +08:00
Karel Vesely	4917ac8bab	allow export of onnx-streaming-models with other than 80dim input features (#1556 )	2024-03-18 18:43:29 +08:00
zr_jin	eb132da00d	additional instruction for the `grad_scale is too small` error (#1550 )	2024-03-14 11:33:49 +08:00
zr_jin	242002e0bd	Strengthened style constraints (#1527 )	2024-03-04 23:28:04 +08:00
Xiaoyu Yang	777074046d	Fine-tune recipe for Zipformer (#1484 ) 1. support finetune zipformer 2. update the usage; set a very large batch count	2024-02-06 18:25:43 +08:00
Teo Wen Shen	b9e6327adf	Fixing torch.ctc err (#1485 ) * fixing torch.ctc err * Move targets & lengths to CPU	2024-02-03 06:25:27 +08:00
Zengwei Yao	c401a2646b	minor fix of zipformer/optim.py (#1474 )	2024-01-26 15:50:11 +08:00
zr_jin	5445ea6df6	Use shuffled LibriSpeech cuts instead (#1450 ) * use shuffled LibriSpeech cuts instead * leave the old code in comments for reference	2024-01-08 15:09:21 +08:00
Karel Vesely	716b82cc3a	streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 ) - some AudioTransform classes produce audio signals out of range [-1,+1] - Resample produced 1.0079 - The range [-10,+10] was chosen to still be able to reliably distinguish from the [-32k,+32k] signal... - this is related to : https://github.com/lhotse-speech/lhotse/issues/1254	2024-01-05 10:21:27 +08:00
Fangjun Kuang	8136ad775b	Use high_freq -400 in computing fbank features. (#1447 ) See also https://github.com/k2-fsa/sherpa-onnx/issues/514	2024-01-04 13:59:32 +08:00
Fangjun Kuang	79a42148db	Add CI test to cover zipformer/train.py (#1424 )	2023-12-23 00:38:36 +08:00
Fangjun Kuang	f85f0252a9	Add greedy search for streaming zipformer CTC. (#1415 )	2023-12-13 17:34:12 +08:00
zr_jin	d0da509055	Support ONNX export for Streaming CTC Encoder (#1413 ) * Create export-onnx-streaming-ctc.py * doc_str updated Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com> --------- Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>	2023-12-13 10:33:28 +08:00
Fangjun Kuang	20a82c9abf	first commit (#1411 )	2023-12-12 18:13:26 +08:00

1 2

87 Commits