1029 Commits

Author SHA1 Message Date
Yuekai Zhang
84e4af93d7 add whisper fine-tuning results 2024-01-17 16:17:32 +08:00
Yuekai Zhang
557b35cefc clean codes 2024-01-15 20:40:44 +08:00
Yuekai Zhang
eea46458c5 revert asr data module 2024-01-15 19:59:48 +08:00
Yuekai Zhang
e883bb60d4 remove seamless for next PR 2024-01-15 19:51:43 +08:00
Yuekai Zhang
ac53222054 add model saving 2024-01-15 19:51:43 +08:00
Yuekai Zhang
2ce09809cd support large-v3 2024-01-15 19:51:41 +08:00
Yuekai Zhang
fa7ad4dc72 update deepspeed model loading 2024-01-15 19:50:57 +08:00
Yuekai Zhang
b6418acda2 support deepspeed to finetune large model 2024-01-15 19:50:57 +08:00
Yuekai Zhang
92895f774f clean up codes 2024-01-15 19:50:57 +08:00
Yuekai Zhang
98d11abedb remove padding to 30s, compute validation loss once 2024-01-15 19:50:57 +08:00
Yuekai Zhang
07cefa82a7 change scaleadam to adamw 2024-01-15 19:50:55 +08:00
Yuekai Zhang
8b832f168d update lhotse version 2024-01-15 19:49:50 +08:00
Yuekai Zhang
5bf3a9cfe0 using audio with any length 2024-01-15 19:49:50 +08:00
Yuekai Zhang
6c2cd5b4c3 support whisper ft 2024-01-15 19:49:26 +08:00
Yuekai Zhang
bb1c4466e3 rename train, train2, add support to fine-tune embedding table 2024-01-15 19:49:26 +08:00
Yuekai Zhang
d926585b10 fix loading 2024-01-15 19:49:26 +08:00
Yuekai Zhang
2a288fb9bf add custom tokenizer 2024-01-15 19:49:26 +08:00
Yuekai Zhang
22ee287312 add token files 2024-01-15 19:49:26 +08:00
Yuekai Zhang
7e387dd54b change vocab table 2024-01-15 19:49:26 +08:00
Yuekai Zhang
72e9a436b8 fix typo 2024-01-15 19:49:26 +08:00
Yuekai Zhang
cc6432443d add decoding with avg model 2024-01-15 19:49:26 +08:00
Yuekai Zhang
5f399dc780 load checkpoint to decode 2024-01-15 19:49:26 +08:00
Yuekai Zhang
e81545714a update decoding from checkpoint 2024-01-15 19:49:26 +08:00
Yuekai Zhang
0d6d8f9473 update fine-tuning lr 2024-01-15 19:49:26 +08:00
Yuekai Zhang
cbc3852876 add fairseq2 require 2024-01-15 19:49:26 +08:00
Yuekai Zhang
3a7ad277ad add requirements 2024-01-15 19:49:26 +08:00
Yuekai Zhang
363c3f1f82 update finetuning codes 2024-01-15 19:49:26 +08:00
Yuekai Zhang
f99f4d7c92 add decode seamlessm4t 2024-01-15 19:49:26 +08:00
Fangjun Kuang
398401ed27
Update kaldifeat installation doc (#1460) 2024-01-14 14:38:41 +08:00
Xiaoyu Yang
e2fcb42f5f
fix typo (#1455) 2024-01-09 15:41:37 +08:00
zr_jin
5445ea6df6
Use shuffled LibriSpeech cuts instead (#1450)
* use shuffled LibriSpeech cuts instead

* leave the old code in comments for reference
2024-01-08 15:09:21 +08:00
zr_jin
b9b56eb879
Minor fixes to the VCTK data prep scripts (#1441)
* Update prepare.sh
2024-01-08 14:28:07 +08:00
Karel Vesely
716b82cc3a
streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448)
- some AudioTransform classes produce audio signals out of range [-1,+1]
   - Resample produced 1.0079
   - The range [-10,+10] was chosen to still be able to reliably
     distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254
2024-01-05 10:21:27 +08:00
Fangjun Kuang
8136ad775b
Use high_freq -400 in computing fbank features. (#1447)
See also https://github.com/k2-fsa/sherpa-onnx/issues/514
2024-01-04 13:59:32 +08:00
zr_jin
f42258caf8
Update compute_fbank_commonvoice_splits.py (#1437) 2023-12-30 13:03:26 +08:00
Fangjun Kuang
140e6381ad
Refactor CI tests for librispeech (#1436) 2023-12-27 13:21:14 +08:00
Fangjun Kuang
db52fe2349
Refactor CI test for aishell (#1435) 2023-12-26 20:29:43 +08:00
Fangjun Kuang
835a92eba5
Add doc about how to use the CPU-only docker images (#1432) 2023-12-25 20:23:56 +08:00
Ali Haznedaroğlu
ddd7131317
Update TTS export-onnx.py scripts for handling variable token counts (#1430) 2023-12-25 19:44:07 +08:00
Fangjun Kuang
c855a58cfd
Generate the dependency matrix by code for GitHub Actions (#1431) 2023-12-25 19:41:09 +08:00
Fangjun Kuang
e5bb1ae86c
Use the CPU docker in CI to simplify the test code (#1427) 2023-12-24 13:40:33 +08:00
Fangjun Kuang
79a42148db
Add CI test to cover zipformer/train.py (#1424) 2023-12-23 00:38:36 +08:00
TianHao Zhang
702d4f5914
Update prepare.sh (#1422)
fix the bug in line 251:
1、 del the additional blank
2、correct the spell error of "new_vocab_size"
2023-12-21 14:42:33 +08:00
zr_jin
10a234709c
bugs fixed (#1416) 2023-12-14 11:26:37 +08:00
Fangjun Kuang
f85f0252a9
Add greedy search for streaming zipformer CTC. (#1415) 2023-12-13 17:34:12 +08:00
zr_jin
d0da509055
Support ONNX export for Streaming CTC Encoder (#1413)
* Create export-onnx-streaming-ctc.py

* doc_str updated

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-12-13 10:33:28 +08:00
Fangjun Kuang
9e9fe7954d
Upload gigaspeech zipformer models in CI (#1412) 2023-12-12 18:57:04 +08:00
Fangjun Kuang
20a82c9abf
first commit (#1411) 2023-12-12 18:13:26 +08:00
Fangjun Kuang
b0f70c9d04
Fix torch.jit.script() export for pruned_transducer_stateless2 (#1410) 2023-12-10 11:38:39 +08:00
zr_jin
df56aff31e
minor fixes to the vits onnx exportation scripts (#1408) 2023-12-08 21:11:31 +08:00