Wei Kang
8eb5658141
Update README.md for conformer-ctc
2024-04-28 09:55:42 +08:00
Dongji Gao
9a17f4ce41
add OTC related scripts using phone as units instead of BPEs ( #1602 )
...
* add otc related scripts using phone instead of bpe
2024-04-26 00:55:44 +08:00
zzasdf
25cabb7663
fix error in padding computing ( #1607 )
2024-04-25 22:40:07 +08:00
Xiaoyu Yang
df36f93bd8
add small-scaled model for audio tagging ( #1604 )
2024-04-24 17:00:42 +08:00
Yifan Yang
368b7d10a7
clear log handlers before setup ( #1603 )
2024-04-24 15:31:25 +09:00
zr_jin
9f8f0bceb5
Update prepare.sh ( #1601 )
2024-04-20 23:02:02 +09:00
Yifan Yang
ed6bc200e3
Update train.py ( #1590 )
2024-04-11 19:35:25 +08:00
Fangjun Kuang
ba5b2e854b
Return probs in audio tagging onnx models ( #1586 )
2024-04-10 09:03:30 +08:00
Fangjun Kuang
fa5d861af0
Add CI test for the AudioSet recipe. ( #1585 )
2024-04-09 17:45:00 +08:00
yh646492956
f5d7818733
fix run.sh script in wenetspeech KWS ( #1584 )
...
Co-authored-by: Hao You <13182720519@sina.cn>
2024-04-09 15:16:12 +08:00
Xiaoyu Yang
1732dafe24
Add zipformer recipe for audio tagging ( #1421 )
2024-04-09 12:06:14 +08:00
zr_jin
f2e36ec414
Zipformer recipe for CommonVoice ( #1546 )
...
* added scripts for char-based lang prep training scripts
* added `Zipformer` recipe for commonvoice
---------
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-04-09 11:37:08 +08:00
Yifan Yang
87843e9382
k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning ( #1500 )
...
* Add k2SSL
* fix flake8
* fix for black
* fix for black
* fix for black
* Update ssl_datamodule.py
* Fix bugs in HubertDataset
* update comments
* add librilight
* add checkpoint convert script
* format
---------
Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
Co-authored-by: zzasdf <15218404468@163.com>
2024-04-04 23:29:16 +08:00
Fangjun Kuang
c45e9fecfb
support torch 2.2.2 in docker images ( #1578 )
2024-04-03 11:26:24 +08:00
Wei Kang
9369c2bef9
Add comments to prepare.sh in aidatatang ( #1575 )
2024-04-02 16:08:09 +08:00
Dadoou
6cbddaa8e3
Add base choice to model_name argument for whisper model. ( #1573 )
...
Co-authored-by: dadoou <dadoou@yandex.com>
2024-04-02 09:47:38 +08:00
Wei Kang
42de459110
Fix decoding finetune model ( #1568 )
2024-03-26 10:38:21 +08:00
Wei Kang
b156b6c291
Add use-mux to finetune commands ( #1567 )
2024-03-26 09:42:46 +08:00
Fangjun Kuang
bb9ebcfb06
Fix CI ( #1563 )
2024-03-23 09:27:28 +08:00
Zengwei Yao
353469182c
fix issue in zipformer.py ( #1566 )
2024-03-21 15:59:43 +08:00
Xiaoyu Yang
bddc3fca7a
Fix adapter in streaming_forward ( #1560 )
2024-03-21 15:08:58 +08:00
Fangjun Kuang
387833fb7c
Doc: Add huggingface mirror for users from China. ( #1565 )
2024-03-21 12:05:30 +08:00
zr_jin
d5cd78a637
Update hooks.py ( #1564 )
2024-03-20 16:43:45 +08:00
zr_jin
9bd30853ae
Update diagnostics.py ( #1562 )
2024-03-20 15:35:14 +08:00
zr_jin
413220d6a4
Minor fixes for the multi_zh_en
recipe ( #1526 )
2024-03-18 20:25:57 +08:00
Fangjun Kuang
489263e5bb
Add streaming HLG decoding for zipformer CTC. ( #1557 )
...
Note it supports only CPU.
2024-03-18 20:11:47 +08:00
Karel Vesely
4917ac8bab
allow export of onnx-streaming-models with other than 80dim input features ( #1556 )
2024-03-18 18:43:29 +08:00
zr_jin
eec12f053d
Use piper_phonemize as text tokenizer in vctk TTS recipe ( #1522 )
...
* to align with PR #1524
2024-03-18 17:53:52 +08:00
zr_jin
9b0eae3b4a
fixes for init value of diagnostics.TensorDiagnosticOptions
( #1555 )
2024-03-18 17:14:29 +08:00
zr_jin
bf2f94346c
Enabling char_level
and compute_CER
for aishell
recipe ( #1554 )
...
* init fix
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-03-18 11:57:47 +08:00
Xiaoyu Yang
2dfd5dbf8b
Add LoRA for Zipformer ( #1540 )
2024-03-15 17:19:23 +08:00
Xiaoyu Yang
f28c05f4f5
Documentation for adapter fine-tuning ( #1545 )
2024-03-14 12:18:49 +08:00
zr_jin
eb132da00d
additional instruction for the grad_scale is too small
error ( #1550 )
2024-03-14 11:33:49 +08:00
Fangjun Kuang
15bd9a841e
add CI for ljspeech ( #1548 )
2024-03-13 17:39:01 +08:00
Fangjun Kuang
d406b41cbd
Doc: Add page for installing piper-phonemize ( #1547 )
2024-03-13 11:01:18 +08:00
zr_jin
c3f6f28116
Zipformer recipe for Cantonese dataset MDCC ( #1537 )
...
* init commit
* Create README.md
* handle code switching cases
* misc. fixes
* added manifest statistics
* init commit for the zipformer recipe
* added scripts for exporting model
* added RESULTS.md
* added scripts for streaming related stuff
* doc str fixed
2024-03-13 10:01:28 +08:00
Fangjun Kuang
81f518ea7c
Support different tts model types. ( #1541 )
2024-03-12 22:29:21 +08:00
BannerWang
959906e9dc
Correct alimeeting download link ( #1544 )
...
Co-authored-by: BannerWang <banner.wang@upblocks.io>
2024-03-12 12:44:09 +08:00
jimmy1984xu
e472fa6840
fix CutMix init parameter ( #1543 )
...
Co-authored-by: jimmyxu <jimmyxu@upblocks.io>
2024-03-11 18:37:26 +08:00
Fangjun Kuang
60986c3ac1
Fix default value for --context-size in icefall. ( #1538 )
2024-03-08 20:47:13 +08:00
zr_jin
ae61bd4090
Minor fixes for the commonvoice
recipe ( #1534 )
...
* init commit
* fix for issue https://github.com/k2-fsa/icefall/issues/1531
* minor fixes
2024-03-08 11:01:11 +08:00
Yuekai Zhang
5df24c1685
Whisper large fine-tuning on wenetspeech, mutli-hans-zh ( #1483 )
...
* add whisper fbank for wenetspeech
* add whisper fbank for other dataset
* add str to bool
* add decode for wenetspeech
* add requirments.txt
* add original model decode with 30s
* test feature extractor speed
* add aishell2 feat
* change compute feature batch
* fix overwrite
* fix executor
* regression
* add kaldifeatwhisper fbank
* fix io issue
* parallel jobs
* use multi machines
* add wenetspeech fine-tune scripts
* add monkey patch codes
* remove useless file
* fix subsampling factor
* fix too long audios
* add remove long short
* fix whisper version to support multi batch beam
* decode all wav files
* remove utterance more than 30s in test_net
* only test net
* using soft links
* add kespeech whisper feats
* fix index error
* add manifests for whisper
* change to licomchunky writer
* add missing option
* decrease cpu usage
* add speed perturb for kespeech
* fix kespeech speed perturb
* add dataset
* load checkpoint from specific path
* add speechio
* add speechio results
---------
Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2024-03-07 19:04:27 +08:00
zr_jin
cdb3fb5675
add text norm script for pl ( #1532 )
2024-03-07 18:47:29 +08:00
zr_jin
335a9962de
Fixed formatting issue of PR #1528 ( #1530 )
2024-03-06 08:43:45 +08:00
Rezakh20
ff430b465f
Add num_features to train.py for training WSASR ( #1528 )
2024-03-05 16:40:30 +08:00
zr_jin
242002e0bd
Strengthened style constraints ( #1527 )
2024-03-04 23:28:04 +08:00
Fangjun Kuang
29b195a42e
Update export-onnx.py for vits to support sherpa-onnx. ( #1524 )
2024-03-01 19:53:58 +08:00
zr_jin
58610b1bf6
Provides README.md
for TTS recipes ( #1491 )
...
* Update README.md
2024-02-29 17:31:28 +08:00
Fangjun Kuang
2f102eb989
Add CUDA docker image for torch 2.2.1 ( #1521 )
2024-02-29 11:41:18 +08:00
Xiaoyu Yang
7e2b561bbf
Add recipe for fine-tuning Zipformer with adapter ( #1512 )
2024-02-29 10:57:38 +08:00