Seung Hyun Lee
c13c7aa30b
Add Streaming Zipformer-Transducer recipe for KsponSpeech ( #1651 )
2024-06-16 16:20:44 +08:00
Yuekai Zhang
890eeec82c
Add qwen-audio style model training: using whisper + qwen2 ( #1652 )
2024-06-16 12:14:44 +08:00
Triplecq
3b40d9bbb1
Zipformer recipe for ReazonSpeech ( #1611 )
...
* Add first cut at ReazonSpeech recipe
This recipe is mostly based on egs/csj, but tweaked to the point that
can be run with ReazonSpeech corpus.
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
---------
Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
Co-authored-by: Fujimoto Seiji <fujimoto@ceptord.net>
Co-authored-by: Chen <qc@KDM00.cm.cluster>
Co-authored-by: root <root@KDA01.cm.cluster>
2024-06-13 14:19:03 +08:00
Yuekai Zhang
d5be739639
add distill whisper results ( #1648 )
2024-06-13 00:20:04 +08:00
Fangjun Kuang
b88062292b
Typo fixes ( #1643 )
2024-06-03 16:49:21 +08:00
zr_jin
1adf1e441d
Removed unused `k2
` dependencies from the AT recipe ( #1633 )
2024-05-21 18:22:19 +08:00
Zengwei Yao
0df406c5da
Initialize BiasNorm bias with small random values ( #1630 )
2024-05-20 22:32:02 +08:00
zr_jin
68980c5d0a
Fix an error occured during mmi preparation ( #1626 )
...
* init commit
* updated
2024-05-17 19:45:15 +08:00
zr_jin
9d570870cf
Update asr_datamodule.py ( #1619 )
2024-05-07 21:37:55 +08:00
Yuekai Zhang
6d7c1d13a5
update speechio whisper ft results ( #1605 )
...
* update speechio whisper ft results
2024-04-30 11:49:20 +08:00
Wei Kang
b49351fc39
Update README.md for conformer-ctc ( #1609 )
2024-04-28 09:56:13 +08:00
Dongji Gao
9a17f4ce41
add OTC related scripts using phone as units instead of BPEs ( #1602 )
...
* add otc related scripts using phone instead of bpe
2024-04-26 00:55:44 +08:00
zzasdf
25cabb7663
fix error in padding computing ( #1607 )
2024-04-25 22:40:07 +08:00
Xiaoyu Yang
df36f93bd8
add small-scaled model for audio tagging ( #1604 )
2024-04-24 17:00:42 +08:00
zr_jin
9f8f0bceb5
Update prepare.sh ( #1601 )
2024-04-20 23:02:02 +09:00
Yifan Yang
ed6bc200e3
Update train.py ( #1590 )
2024-04-11 19:35:25 +08:00
Fangjun Kuang
ba5b2e854b
Return probs in audio tagging onnx models ( #1586 )
2024-04-10 09:03:30 +08:00
Fangjun Kuang
fa5d861af0
Add CI test for the AudioSet recipe. ( #1585 )
2024-04-09 17:45:00 +08:00
yh646492956
f5d7818733
fix run.sh script in wenetspeech KWS ( #1584 )
...
Co-authored-by: Hao You <13182720519@sina.cn>
2024-04-09 15:16:12 +08:00
Xiaoyu Yang
1732dafe24
Add zipformer recipe for audio tagging ( #1421 )
2024-04-09 12:06:14 +08:00
zr_jin
f2e36ec414
Zipformer recipe for CommonVoice ( #1546 )
...
* added scripts for char-based lang prep training scripts
* added `Zipformer` recipe for commonvoice
---------
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-04-09 11:37:08 +08:00
Yifan Yang
87843e9382
k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning ( #1500 )
...
* Add k2SSL
* fix flake8
* fix for black
* fix for black
* fix for black
* Update ssl_datamodule.py
* Fix bugs in HubertDataset
* update comments
* add librilight
* add checkpoint convert script
* format
---------
Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
Co-authored-by: zzasdf <15218404468@163.com>
2024-04-04 23:29:16 +08:00
Wei Kang
9369c2bef9
Add comments to prepare.sh in aidatatang ( #1575 )
2024-04-02 16:08:09 +08:00
Dadoou
6cbddaa8e3
Add base choice to model_name argument for whisper model. ( #1573 )
...
Co-authored-by: dadoou <dadoou@yandex.com>
2024-04-02 09:47:38 +08:00
marcoyang
6b2bd0fb52
support fine-tuning mono-lingual whisper model; add ScaledAdam as an option
2024-03-29 15:29:50 +08:00
marcoyang
f208431f5c
support on-the-fly whisper fbank extraction
2024-03-29 11:03:58 +08:00
marcoyang
4d9f2120b3
update comments; generate train-all-shuf after feature extraction
2024-03-29 11:03:37 +08:00
marcoyang
55a6857df6
add an option to use hdf5 for whisper fbank extraction
2024-03-29 11:02:48 +08:00
marcoyang
5d41deca71
update the decoding script
2024-03-28 18:16:52 +08:00
marcoyang
cfbc829df3
support freezing modules
2024-03-28 18:16:33 +08:00
marcoyang
360f208037
deactivate beam search temporarily for speed
2024-03-28 16:17:05 +08:00
marcoyang
ebc0f3b052
update train.py
2024-03-28 16:16:18 +08:00
marcoyang
711859c21f
fix typo
2024-03-28 16:14:44 +08:00
marcoyang
eb685364df
generate train-all-shuf for whisper fbank
2024-03-28 15:56:04 +08:00
marcoyang
76e0d59267
support decoding
2024-03-28 15:23:19 +08:00
marcoyang
1cf78fd675
fbank for whisper
2024-03-28 12:37:44 +08:00
marcoyang
c2f8c6d232
add files
2024-03-28 12:33:23 +08:00
Wei Kang
42de459110
Fix decoding finetune model ( #1568 )
2024-03-26 10:38:21 +08:00
Wei Kang
b156b6c291
Add use-mux to finetune commands ( #1567 )
2024-03-26 09:42:46 +08:00
Zengwei Yao
353469182c
fix issue in zipformer.py ( #1566 )
2024-03-21 15:59:43 +08:00
Xiaoyu Yang
bddc3fca7a
Fix adapter in streaming_forward ( #1560 )
2024-03-21 15:08:58 +08:00
zr_jin
413220d6a4
Minor fixes for the multi_zh_en
recipe ( #1526 )
2024-03-18 20:25:57 +08:00
Fangjun Kuang
489263e5bb
Add streaming HLG decoding for zipformer CTC. ( #1557 )
...
Note it supports only CPU.
2024-03-18 20:11:47 +08:00
Karel Vesely
4917ac8bab
allow export of onnx-streaming-models with other than 80dim input features ( #1556 )
2024-03-18 18:43:29 +08:00
zr_jin
eec12f053d
Use piper_phonemize as text tokenizer in vctk TTS recipe ( #1522 )
...
* to align with PR #1524
2024-03-18 17:53:52 +08:00
zr_jin
9b0eae3b4a
fixes for init value of diagnostics.TensorDiagnosticOptions
( #1555 )
2024-03-18 17:14:29 +08:00
zr_jin
bf2f94346c
Enabling char_level
and compute_CER
for aishell
recipe ( #1554 )
...
* init fix
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-03-18 11:57:47 +08:00
Xiaoyu Yang
2dfd5dbf8b
Add LoRA for Zipformer ( #1540 )
2024-03-15 17:19:23 +08:00
zr_jin
eb132da00d
additional instruction for the grad_scale is too small
error ( #1550 )
2024-03-14 11:33:49 +08:00
Fangjun Kuang
15bd9a841e
add CI for ljspeech ( #1548 )
2024-03-13 17:39:01 +08:00