icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-12-11 06:55:27 +00:00

Author	SHA1	Message	Date
Fangjun Kuang	60986c3ac1	Fix default value for --context-size in icefall. (#1538 )	2024-03-08 20:47:13 +08:00
zr_jin	ae61bd4090	Minor fixes for the `commonvoice` recipe (#1534 ) * init commit * fix for issue https://github.com/k2-fsa/icefall/issues/1531 * minor fixes	2024-03-08 11:01:11 +08:00
Yuekai Zhang	5df24c1685	Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483 ) * add whisper fbank for wenetspeech * add whisper fbank for other dataset * add str to bool * add decode for wenetspeech * add requirments.txt * add original model decode with 30s * test feature extractor speed * add aishell2 feat * change compute feature batch * fix overwrite * fix executor * regression * add kaldifeatwhisper fbank * fix io issue * parallel jobs * use multi machines * add wenetspeech fine-tune scripts * add monkey patch codes * remove useless file * fix subsampling factor * fix too long audios * add remove long short * fix whisper version to support multi batch beam * decode all wav files * remove utterance more than 30s in test_net * only test net * using soft links * add kespeech whisper feats * fix index error * add manifests for whisper * change to licomchunky writer * add missing option * decrease cpu usage * add speed perturb for kespeech * fix kespeech speed perturb * add dataset * load checkpoint from specific path * add speechio * add speechio results --------- Co-authored-by: zr_jin <peter.jin.cn@gmail.com>	2024-03-07 19:04:27 +08:00
zr_jin	cdb3fb5675	add text norm script for pl (#1532 )	2024-03-07 18:47:29 +08:00
zr_jin	335a9962de	Fixed formatting issue of PR #1528 (#1530 )	2024-03-06 08:43:45 +08:00
Rezakh20	ff430b465f	Add num_features to train.py for training WSASR (#1528 )	2024-03-05 16:40:30 +08:00
zr_jin	242002e0bd	Strengthened style constraints (#1527 )	2024-03-04 23:28:04 +08:00
Fangjun Kuang	29b195a42e	Update export-onnx.py for vits to support sherpa-onnx. (#1524 )	2024-03-01 19:53:58 +08:00
zr_jin	58610b1bf6	Provides `README.md` for TTS recipes (#1491 ) * Update README.md	2024-02-29 17:31:28 +08:00
Xiaoyu Yang	7e2b561bbf	Add recipe for fine-tuning Zipformer with adapter (#1512 )	2024-02-29 10:57:38 +08:00
Zengwei Yao	d89f4ea149	Use piper_phonemize as text tokenizer in ljspeech recipe (#1511 ) * use piper_phonemize as text tokenizer in ljspeech recipe * modify usage of tokenizer in vits/train.py * update docs	2024-02-29 10:13:22 +08:00
Xiaoyu Yang	2483b8b4da	Zipformer recipe for SPGISpeech (#1449 )	2024-02-22 15:53:19 +08:00
Wei Kang	aac7df064a	Recipes for open vocabulary keyword spotting (#1428 ) * English recipe on gigaspeech; Chinese recipe on wenetspeech	2024-02-22 15:31:20 +08:00
Zengwei Yao	b3e2044068	minor fix of vits/tokenizer.py (#1504 ) * minor fix of vits/tokenizer.py	2024-02-19 19:33:32 +08:00
zr_jin	db4d66c0e3	Fixed softlink for `ljspeech` recipe (#1503 )	2024-02-19 16:13:09 +08:00
Wei Kang	711d6bc462	Refactor prepare.sh in librispeech (#1493 ) * Refactor prepare.sh in librispeech, break it into three parts, prepare.sh (basic, minimal requirement for transducer), prepare_lm.sh (ngram & nnlm staff), prepare_mmi.sh (for MMI training).	2024-02-09 10:44:19 +08:00
Tiance Wang	4ed88d9484	Update shared (#1487 ) There should be one more ../	2024-02-07 10:16:02 +08:00
Xiaoyu Yang	777074046d	Fine-tune recipe for Zipformer (#1484 ) 1. support finetune zipformer 2. update the usage; set a very large batch count	2024-02-06 18:25:43 +08:00
zr_jin	a813186f64	minor fix for docstr and default param. (#1490 ) * Update train.py and README.md	2024-02-05 12:47:52 +08:00
Teo Wen Shen	b9e6327adf	Fixing torch.ctc err (#1485 ) * fixing torch.ctc err * Move targets & lengths to CPU	2024-02-03 06:25:27 +08:00
Henry Li Xinyuan	b07d5472c5	Implement recipe for Fluent Speech Commands dataset (#1469 ) --------- Signed-off-by: Xinyuan Li <xli257@c13.clsp.jhu.edu>	2024-01-31 22:53:36 +08:00
zr_jin	37b975cac9	fixed a CI test for `wenetspeech` (#1476 ) * Comply to issue #1149 https://github.com/k2-fsa/icefall/issues/1149	2024-01-27 06:41:56 +08:00
Yuekai Zhang	1c30847947	Whisper Fine-tuning Recipe on Aishell1 (#1466 ) * add decode seamlessm4t * add requirements * add decoding with avg model * add token files * add custom tokenizer * support deepspeed to finetune large model * support large-v3 * add model saving * using monkey patch to replace models * add manifest dir option	2024-01-27 00:32:30 +08:00
Fangjun Kuang	8d39f9508b	Fix torchscript export to use tokens.txt instead of lang_dir (#1475 )	2024-01-26 19:18:33 +08:00
Zengwei Yao	c401a2646b	minor fix of zipformer/optim.py (#1474 )	2024-01-26 15:50:11 +08:00
zr_jin	9c494a3329	typos fixed (#1472 )	2024-01-25 18:41:43 +08:00
Triplecq	5d94a19026	prepare for 1000h dataset	2024-01-24 11:33:36 -05:00
Triplecq	d864da4d65	validation scripts	2024-01-25 01:25:28 +09:00
Triplecq	f35fa8aa8f	add blank penalty in decoding script	2024-01-23 17:10:10 -05:00
Triplecq	a8e9dc2488	all combinations of epochs and avgs	2024-01-23 21:12:17 +09:00
Yifan Yang	5dfc3ed7f9	Fix buffer size of DynamicBucketingSampler (#1468 ) * Fix buffer size * Fix for flake8 --------- Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>	2024-01-21 02:10:42 +08:00
zr_jin	7bdde9174c	A Zipformer recipe with Byte-level BPE for Aishell-1 (#1464 ) * init commit * Update train.py * Update decode.py * Update RESULTS.md * added `vocab_size` * removed unused softlinks * added scripts for testing pretrained models * set `bpe_model` as required * re-org the bbpe recipe for aishell	2024-01-16 21:08:35 +08:00
Triplecq	77178c6311	comment out params related to the chunk size	2024-01-14 17:35:20 -05:00
Triplecq	7b6a89749d	customize decoding script	2024-01-14 17:29:22 -05:00
Triplecq	04fa9e3e8c	traning script completed	2024-01-15 07:06:14 +09:00
Triplecq	42c152f5cb	decrease learning-rate to solve the error: RuntimeError: grad_scale is too small, exiting: 5.820766091346741e-11	2024-01-14 12:12:15 -05:00
Triplecq	ced8a53cdc	Merge branch 'master' into rs	2024-01-14 23:05:00 +09:00
Triplecq	819db8fcad	Merge branch 'master' of github.com:Triplecq/icefall	2024-01-14 23:00:19 +09:00
Triplecq	dc2d531540	customized recipes for rs	2024-01-14 22:28:53 +09:00
Triplecq	b1de6f266c	customized recipes for reazonspeech	2024-01-14 22:28:32 +09:00
Triplecq	1e6fe2eae1	restore	2024-01-14 08:05:49 -05:00
Triplecq	5e9a171b20	customize tranning script for rs	2024-01-14 07:45:33 -05:00
Triplecq	8eae6ec7d1	Add pruned_transducer_stateless2 from reazonspeech branch	2024-01-14 05:23:26 -05:00
Triplecq	af87726bf2	init zipformer recipe	2024-01-14 19:13:21 +09:00
zr_jin	5445ea6df6	Use shuffled LibriSpeech cuts instead (#1450 ) * use shuffled LibriSpeech cuts instead * leave the old code in comments for reference	2024-01-08 15:09:21 +08:00
zr_jin	b9b56eb879	Minor fixes to the VCTK data prep scripts (#1441 ) * Update prepare.sh	2024-01-08 14:28:07 +08:00
Karel Vesely	716b82cc3a	streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448 ) - some AudioTransform classes produce audio signals out of range [-1,+1] - Resample produced 1.0079 - The range [-10,+10] was chosen to still be able to reliably distinguish from the [-32k,+32k] signal... - this is related to : https://github.com/lhotse-speech/lhotse/issues/1254	2024-01-05 10:21:27 +08:00
Fangjun Kuang	8136ad775b	Use high_freq -400 in computing fbank features. (#1447 ) See also https://github.com/k2-fsa/sherpa-onnx/issues/514	2024-01-04 13:59:32 +08:00
zr_jin	f42258caf8	Update compute_fbank_commonvoice_splits.py (#1437 )	2023-12-30 13:03:26 +08:00
Chen	2436597f7f	Zipformer recipe	2023-12-28 05:37:40 +09:00

1 2 3 4 5 ...

916 Commits