1191 Commits

Author SHA1 Message Date
Fangjun Kuang
130a18cc10
support torch 2.3.1 in docker (#1646) 2024-06-06 22:27:29 +08:00
Fangjun Kuang
b88062292b
Typo fixes (#1643) 2024-06-03 16:49:21 +08:00
zr_jin
42a97f6d7b
Update env.py (#1635) 2024-05-22 22:29:38 +08:00
zr_jin
1adf1e441d
Removed unused `k2` dependencies from the AT recipe (#1633) 2024-05-21 18:22:19 +08:00
Zengwei Yao
0df406c5da
Initialize BiasNorm bias with small random values (#1630) 2024-05-20 22:32:02 +08:00
zr_jin
68980c5d0a
Fix an error occured during mmi preparation (#1626)
* init commit

* updated
2024-05-17 19:45:15 +08:00
zr_jin
9d570870cf
Update asr_datamodule.py (#1619) 2024-05-07 21:37:55 +08:00
Yifan Yang
4e97b19b63
Remove duplicate logging initialization logic in utils.py (#1617) 2024-05-06 13:00:27 +08:00
Zengwei Yao
c08fe48603
add force=True to logging.basicConfig (#1613) 2024-05-04 11:42:23 +08:00
Yuekai Zhang
6d7c1d13a5
update speechio whisper ft results (#1605)
* update speechio whisper ft results
2024-04-30 11:49:20 +08:00
Wei Kang
b49351fc39
Update README.md for conformer-ctc (#1609) 2024-04-28 09:56:13 +08:00
Dongji Gao
9a17f4ce41
add OTC related scripts using phone as units instead of BPEs (#1602)
* add otc related scripts using phone instead of bpe
2024-04-26 00:55:44 +08:00
zzasdf
25cabb7663
fix error in padding computing (#1607) 2024-04-25 22:40:07 +08:00
Xiaoyu Yang
df36f93bd8
add small-scaled model for audio tagging (#1604) 2024-04-24 17:00:42 +08:00
Yifan Yang
368b7d10a7
clear log handlers before setup (#1603) 2024-04-24 15:31:25 +09:00
zr_jin
9f8f0bceb5
Update prepare.sh (#1601) 2024-04-20 23:02:02 +09:00
Yifan Yang
ed6bc200e3
Update train.py (#1590) 2024-04-11 19:35:25 +08:00
Fangjun Kuang
ba5b2e854b
Return probs in audio tagging onnx models (#1586) 2024-04-10 09:03:30 +08:00
Fangjun Kuang
fa5d861af0
Add CI test for the AudioSet recipe. (#1585) 2024-04-09 17:45:00 +08:00
yh646492956
f5d7818733
fix run.sh script in wenetspeech KWS (#1584)
Co-authored-by: Hao You <13182720519@sina.cn>
2024-04-09 15:16:12 +08:00
Xiaoyu Yang
1732dafe24
Add zipformer recipe for audio tagging (#1421) 2024-04-09 12:06:14 +08:00
zr_jin
f2e36ec414
Zipformer recipe for CommonVoice (#1546)
* added scripts for char-based lang prep training scripts

* added `Zipformer` recipe for commonvoice

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-04-09 11:37:08 +08:00
Yifan Yang
87843e9382
k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning (#1500)
* Add k2SSL

* fix flake8

* fix for black

* fix for black

* fix for black

* Update ssl_datamodule.py

* Fix bugs in HubertDataset

* update comments

* add librilight

* add checkpoint convert script

* format

---------

Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
Co-authored-by: zzasdf <15218404468@163.com>
2024-04-04 23:29:16 +08:00
Fangjun Kuang
c45e9fecfb
support torch 2.2.2 in docker images (#1578) 2024-04-03 11:26:24 +08:00
Wei Kang
9369c2bef9
Add comments to prepare.sh in aidatatang (#1575) 2024-04-02 16:08:09 +08:00
Dadoou
6cbddaa8e3
Add base choice to model_name argument for whisper model. (#1573)
Co-authored-by: dadoou <dadoou@yandex.com>
2024-04-02 09:47:38 +08:00
Wei Kang
42de459110
Fix decoding finetune model (#1568) 2024-03-26 10:38:21 +08:00
Wei Kang
b156b6c291
Add use-mux to finetune commands (#1567) 2024-03-26 09:42:46 +08:00
Fangjun Kuang
bb9ebcfb06
Fix CI (#1563) 2024-03-23 09:27:28 +08:00
Zengwei Yao
353469182c
fix issue in zipformer.py (#1566) 2024-03-21 15:59:43 +08:00
Xiaoyu Yang
bddc3fca7a
Fix adapter in streaming_forward (#1560) 2024-03-21 15:08:58 +08:00
Fangjun Kuang
387833fb7c
Doc: Add huggingface mirror for users from China. (#1565) 2024-03-21 12:05:30 +08:00
zr_jin
d5cd78a637
Update hooks.py (#1564) 2024-03-20 16:43:45 +08:00
zr_jin
9bd30853ae
Update diagnostics.py (#1562) 2024-03-20 15:35:14 +08:00
zr_jin
413220d6a4
Minor fixes for the multi_zh_en recipe (#1526) 2024-03-18 20:25:57 +08:00
Fangjun Kuang
489263e5bb
Add streaming HLG decoding for zipformer CTC. (#1557)
Note it supports only CPU.
2024-03-18 20:11:47 +08:00
Karel Vesely
4917ac8bab
allow export of onnx-streaming-models with other than 80dim input features (#1556) 2024-03-18 18:43:29 +08:00
zr_jin
eec12f053d
Use piper_phonemize as text tokenizer in vctk TTS recipe (#1522)
* to align with PR #1524
2024-03-18 17:53:52 +08:00
zr_jin
9b0eae3b4a
fixes for init value of diagnostics.TensorDiagnosticOptions (#1555) 2024-03-18 17:14:29 +08:00
zr_jin
bf2f94346c
Enabling char_level and compute_CER for aishell recipe (#1554)
* init fix

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-03-18 11:57:47 +08:00
Xiaoyu Yang
2dfd5dbf8b
Add LoRA for Zipformer (#1540) 2024-03-15 17:19:23 +08:00
Xiaoyu Yang
f28c05f4f5
Documentation for adapter fine-tuning (#1545) 2024-03-14 12:18:49 +08:00
zr_jin
eb132da00d
additional instruction for the grad_scale is too small error (#1550) 2024-03-14 11:33:49 +08:00
Fangjun Kuang
15bd9a841e
add CI for ljspeech (#1548) 2024-03-13 17:39:01 +08:00
Fangjun Kuang
d406b41cbd
Doc: Add page for installing piper-phonemize (#1547) 2024-03-13 11:01:18 +08:00
zr_jin
c3f6f28116
Zipformer recipe for Cantonese dataset MDCC (#1537)
* init commit

* Create README.md

* handle code switching cases

* misc. fixes

* added manifest statistics

* init commit for the zipformer recipe

* added scripts for exporting model

* added RESULTS.md

* added scripts for streaming related stuff

* doc str fixed
2024-03-13 10:01:28 +08:00
Fangjun Kuang
81f518ea7c
Support different tts model types. (#1541) 2024-03-12 22:29:21 +08:00
BannerWang
959906e9dc
Correct alimeeting download link (#1544)
Co-authored-by: BannerWang <banner.wang@upblocks.io>
2024-03-12 12:44:09 +08:00
jimmy1984xu
e472fa6840
fix CutMix init parameter (#1543)
Co-authored-by: jimmyxu <jimmyxu@upblocks.io>
2024-03-11 18:37:26 +08:00
Fangjun Kuang
60986c3ac1
Fix default value for --context-size in icefall. (#1538) 2024-03-08 20:47:13 +08:00