1219 Commits

Author SHA1 Message Date
Yifan Yang
87843e9382
k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning (#1500)
* Add k2SSL

* fix flake8

* fix for black

* fix for black

* fix for black

* Update ssl_datamodule.py

* Fix bugs in HubertDataset

* update comments

* add librilight

* add checkpoint convert script

* format

---------

Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
Co-authored-by: zzasdf <15218404468@163.com>
2024-04-04 23:29:16 +08:00
Fangjun Kuang
c45e9fecfb
support torch 2.2.2 in docker images (#1578) 2024-04-03 11:26:24 +08:00
Wei Kang
9369c2bef9
Add comments to prepare.sh in aidatatang (#1575) 2024-04-02 16:08:09 +08:00
Dadoou
6cbddaa8e3
Add base choice to model_name argument for whisper model. (#1573)
Co-authored-by: dadoou <dadoou@yandex.com>
2024-04-02 09:47:38 +08:00
Wei Kang
42de459110
Fix decoding finetune model (#1568) 2024-03-26 10:38:21 +08:00
Wei Kang
b156b6c291
Add use-mux to finetune commands (#1567) 2024-03-26 09:42:46 +08:00
Fangjun Kuang
bb9ebcfb06
Fix CI (#1563) 2024-03-23 09:27:28 +08:00
Zengwei Yao
353469182c
fix issue in zipformer.py (#1566) 2024-03-21 15:59:43 +08:00
Xiaoyu Yang
bddc3fca7a
Fix adapter in streaming_forward (#1560) 2024-03-21 15:08:58 +08:00
Fangjun Kuang
387833fb7c
Doc: Add huggingface mirror for users from China. (#1565) 2024-03-21 12:05:30 +08:00
zr_jin
d5cd78a637
Update hooks.py (#1564) 2024-03-20 16:43:45 +08:00
zr_jin
9bd30853ae
Update diagnostics.py (#1562) 2024-03-20 15:35:14 +08:00
zr_jin
413220d6a4
Minor fixes for the multi_zh_en recipe (#1526) 2024-03-18 20:25:57 +08:00
Fangjun Kuang
489263e5bb
Add streaming HLG decoding for zipformer CTC. (#1557)
Note it supports only CPU.
2024-03-18 20:11:47 +08:00
Karel Vesely
4917ac8bab
allow export of onnx-streaming-models with other than 80dim input features (#1556) 2024-03-18 18:43:29 +08:00
zr_jin
eec12f053d
Use piper_phonemize as text tokenizer in vctk TTS recipe (#1522)
* to align with PR #1524
2024-03-18 17:53:52 +08:00
zr_jin
9b0eae3b4a
fixes for init value of diagnostics.TensorDiagnosticOptions (#1555) 2024-03-18 17:14:29 +08:00
zr_jin
bf2f94346c
Enabling char_level and compute_CER for aishell recipe (#1554)
* init fix

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-03-18 11:57:47 +08:00
Xiaoyu Yang
2dfd5dbf8b
Add LoRA for Zipformer (#1540) 2024-03-15 17:19:23 +08:00
Xiaoyu Yang
f28c05f4f5
Documentation for adapter fine-tuning (#1545) 2024-03-14 12:18:49 +08:00
zr_jin
eb132da00d
additional instruction for the grad_scale is too small error (#1550) 2024-03-14 11:33:49 +08:00
Fangjun Kuang
15bd9a841e
add CI for ljspeech (#1548) 2024-03-13 17:39:01 +08:00
Fangjun Kuang
d406b41cbd
Doc: Add page for installing piper-phonemize (#1547) 2024-03-13 11:01:18 +08:00
zr_jin
c3f6f28116
Zipformer recipe for Cantonese dataset MDCC (#1537)
* init commit

* Create README.md

* handle code switching cases

* misc. fixes

* added manifest statistics

* init commit for the zipformer recipe

* added scripts for exporting model

* added RESULTS.md

* added scripts for streaming related stuff

* doc str fixed
2024-03-13 10:01:28 +08:00
Fangjun Kuang
81f518ea7c
Support different tts model types. (#1541) 2024-03-12 22:29:21 +08:00
BannerWang
959906e9dc
Correct alimeeting download link (#1544)
Co-authored-by: BannerWang <banner.wang@upblocks.io>
2024-03-12 12:44:09 +08:00
jimmy1984xu
e472fa6840
fix CutMix init parameter (#1543)
Co-authored-by: jimmyxu <jimmyxu@upblocks.io>
2024-03-11 18:37:26 +08:00
Fangjun Kuang
60986c3ac1
Fix default value for --context-size in icefall. (#1538) 2024-03-08 20:47:13 +08:00
zr_jin
ae61bd4090
Minor fixes for the commonvoice recipe (#1534)
* init commit

* fix for issue https://github.com/k2-fsa/icefall/issues/1531

* minor fixes
2024-03-08 11:01:11 +08:00
Yuekai Zhang
5df24c1685
Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483)
* add whisper fbank for wenetspeech

* add whisper fbank for other dataset

* add str to bool

* add decode for wenetspeech

* add requirments.txt

* add original model decode with 30s

* test feature extractor speed

* add aishell2 feat

* change compute feature batch

* fix overwrite

* fix executor

* regression

* add kaldifeatwhisper fbank

* fix io issue

* parallel jobs

* use multi machines

* add wenetspeech fine-tune scripts

* add monkey patch codes

* remove useless file

* fix subsampling factor

* fix too long audios

* add remove long short

* fix whisper version to support multi batch beam

* decode all wav files

* remove utterance more than 30s in test_net

* only test net

* using soft links

* add kespeech whisper feats

* fix index error

* add manifests for whisper

* change to licomchunky writer

* add missing option

* decrease cpu usage 

* add speed perturb for kespeech

* fix kespeech speed perturb

* add dataset

* load checkpoint from specific path

* add speechio

* add speechio results

---------

Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2024-03-07 19:04:27 +08:00
zr_jin
cdb3fb5675
add text norm script for pl (#1532) 2024-03-07 18:47:29 +08:00
zr_jin
335a9962de
Fixed formatting issue of PR #1528 (#1530) 2024-03-06 08:43:45 +08:00
Rezakh20
ff430b465f
Add num_features to train.py for training WSASR (#1528) 2024-03-05 16:40:30 +08:00
zr_jin
242002e0bd
Strengthened style constraints (#1527) 2024-03-04 23:28:04 +08:00
Fangjun Kuang
29b195a42e
Update export-onnx.py for vits to support sherpa-onnx. (#1524) 2024-03-01 19:53:58 +08:00
zr_jin
58610b1bf6
Provides README.md for TTS recipes (#1491)
* Update README.md
2024-02-29 17:31:28 +08:00
Fangjun Kuang
2f102eb989
Add CUDA docker image for torch 2.2.1 (#1521) 2024-02-29 11:41:18 +08:00
Xiaoyu Yang
7e2b561bbf
Add recipe for fine-tuning Zipformer with adapter (#1512) 2024-02-29 10:57:38 +08:00
Zengwei Yao
d89f4ea149
Use piper_phonemize as text tokenizer in ljspeech recipe (#1511)
* use piper_phonemize as text tokenizer in ljspeech recipe

* modify usage of tokenizer in vits/train.py

* update docs
2024-02-29 10:13:22 +08:00
Fangjun Kuang
291d06056c
Support torch 2.2.1 for cpu docker. (#1516) 2024-02-23 14:24:13 +08:00
Xiaoyu Yang
2483b8b4da
Zipformer recipe for SPGISpeech (#1449) 2024-02-22 15:53:19 +08:00
Wei Kang
819bb45539
Add pypinyin to requirements (#1515) 2024-02-22 15:50:11 +08:00
Wei Kang
aac7df064a
Recipes for open vocabulary keyword spotting (#1428)
* English recipe on gigaspeech; Chinese recipe on wenetspeech
2024-02-22 15:31:20 +08:00
Xiaoyu Yang
13daf73468
docs for finetune zipformer (#1509) 2024-02-21 18:06:27 +08:00
Wei Kang
c19b414778
Update docker (adding pypinyin (#1513)
Update docker (adding pypinyin)
2024-02-21 08:04:16 +08:00
zr_jin
027302c902
minor fix for param. names (#1495) 2024-02-20 14:38:51 +08:00
Karel Vesely
e59fa38e86
docs: minor fixes of LM rescoring texts (#1498) 2024-02-20 10:40:15 +08:00
Zengwei Yao
b3e2044068
minor fix of vits/tokenizer.py (#1504)
* minor fix of vits/tokenizer.py
2024-02-19 19:33:32 +08:00
zr_jin
db4d66c0e3
Fixed softlink for ljspeech recipe (#1503) 2024-02-19 16:13:09 +08:00
Fangjun Kuang
7eb360d0d5
Fix cpu docker images for torch 2.2.0 (#1502) 2024-02-18 20:32:40 +08:00