Yuekai Zhang
a5de488304
Merge 559f9e2deff33077461428d422d9f03c95988b01 into 34fc1fdf0d8ff520e2bb18267d046ca207c78ef9
2025-07-24 22:09:54 +05:30
Fangjun Kuang
34fc1fdf0d
Fix transformer decoder layer ( #1995 )
2025-07-18 20:12:29 +08:00
Bailey Machiko Hirota
5fe13078cc
Musan implementation for ReazonSpeech ( #1988 )
2025-07-18 17:16:19 +08:00
Yifan Yang
9fd0f2dc1d
support left pad for make_pad_mask ( #1990 )
2025-07-16 23:59:04 +08:00
Fangjun Kuang
e22bc78f98
Export streaming zipformer2 to RKNN ( #1977 )
2025-07-11 13:24:01 +08:00
Teo Wen Shen
da87e7fc99
add weights_only=False to torch.load ( #1984 )
2025-07-10 15:27:08 +08:00
Yifan Yang
89728dd4f8
Refactor data preparation for GigaSpeech recipe ( #1986 )
2025-07-10 11:17:37 +08:00
Mistmoon
9293edc62f
Add cr-ctc loss and ctc-decode in aishell ( #1980 )
2025-07-08 14:47:24 +08:00
Fangjun Kuang
fba5e67d5e
Fix CI tests. ( #1974 )
...
- Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle
deprecations in PyTorch ≥2.3.0
- Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast
with the new utilities across all training and inference scripts
- Update all torch.load calls to include weights_only=False for compatibility with
newer PyTorch versions
2025-07-01 13:47:55 +08:00
Fangjun Kuang
71377d21cd
Export streaming zipformer models with whisper feature to onnx ( #1973 )
2025-06-30 19:01:15 +08:00
Fangjun Kuang
abd9437e6d
Add more wheels for piper-phonemize ( #1969 )
2025-06-24 14:49:16 +08:00
Wei Kang
e1cf4dbace
rm zipvoice ( #1967 )
2025-06-23 19:22:35 +08:00
Wei Kang
343b8fa2dc
Using non strict match in context graph for contextual words ( #1952 )
2025-06-19 12:27:15 +08:00
Wei Kang
f80a2ee110
Decrease num_buckets & remove shuffle_buffer_size ( #1955 )
2025-06-19 12:26:37 +08:00
Wei Kang
3587c4b3b7
Fix decoding byte bpes tokens to words. ( #1966 )
2025-06-19 12:26:01 +08:00
Wei Kang
762f965cf7
[zipvoice] Add requirements.txt and pinyin.txt, remove k2 from pretrained model inference. ( #1965 )
...
* Add requirements.txt and pinyin.txt needed by zipvoice
* simplify the requirements for pretrained model inference
2025-06-18 18:38:46 +08:00
Wei Kang
06539d2b9d
Add Zipvoice ( #1964 )
...
* Add ZipVoice - a flow-matching based zero-shot TTS model.
2025-06-17 20:17:12 +08:00
root
559f9e2def
fix repeat bos and pad id
2025-06-04 10:02:42 +00:00
root
80677a55f8
remove stats
2025-06-03 00:48:39 -07:00
root
5becf6927d
remove concat three items
2025-06-03 00:18:21 -07:00
root
4c0396f8f2
support text2speech ultrachat
2025-06-02 23:16:03 -07:00
root
49256fa917
fix tts stage decode
2025-05-28 02:34:07 +00:00
root
5a7c72cb47
add tts task decode
2025-05-27 02:12:22 -07:00
root
1281d7a515
add tts training
2025-05-27 00:18:23 -07:00
Zengwei Yao
ffb7d05635
refactor branch exchange in cr-ctc ( #1954 )
2025-05-27 12:09:59 +08:00
root
39700d5c94
refactor train to reuse code
2025-05-26 19:53:16 -07:00
root
e6e1f3fa4f
add tts stage
2025-05-23 01:53:05 -07:00
root
dd858f0cd1
support instruct s2s
2025-05-22 23:16:33 -07:00
root
9fff18edec
refactor code
2025-05-22 19:14:52 -07:00
Mahsa Yarmohammadi
021e1a8846
Add acknowledgment to README ( #1950 )
2025-05-22 22:06:35 +08:00
root
7a12d88d6c
update
2025-05-21 22:18:57 -07:00
root
7aa6c80ddb
add multi gpu processing
2025-05-21 21:54:59 -07:00
Tianxiang Zhao
30e7ea4b5a
Fix a bug in finetune.py --use-mux ( #1949 )
2025-05-22 12:05:01 +08:00
Fangjun Kuang
fd8f8780fa
Fix logging torch.dtype. ( #1947 )
2025-05-21 12:04:57 +08:00
root
ca84aff5d6
remove cosyvoice lib
2025-05-20 00:52:09 -07:00
root
9cdd393f43
add server url
2025-05-20 07:48:49 +00:00
root
50fc1aba60
add multi-node
2025-05-18 18:47:22 -07:00
root
4a29430349
add loss type
2025-05-19 01:31:21 +00:00
root
e52581e69b
support local_rank for multi-node
2025-05-16 00:02:12 -07:00
root
0e8c1db4d0
fix speed perturb issue
2025-05-15 22:45:04 -07:00
root
bfb4ebeb83
remove triton
2025-05-15 14:32:49 +00:00
root
f81363d324
add speech continuation pretraining
2025-05-15 14:16:51 +00:00
root
e65725810c
fix mmsu
2025-05-13 09:13:12 +00:00
root
cbf3af31fd
add voicebench eval
2025-05-13 05:37:11 +00:00
Yifan Yang
e79833aad2
ensure SwooshL/SwooshR output dtype matches input dtype ( #1940 )
2025-05-12 19:28:48 +08:00
root
89781b9bb1
add cosyvoice2 decode
2025-05-12 10:06:59 +00:00
Yifan Yang
4627969ccd
fix bug: undefined name 'partial' ( #1941 )
2025-05-12 14:19:53 +08:00
root
b20a0d0e35
add on the fly feature
2025-05-08 19:21:41 -07:00
root
bd2df570ad
add debug script
2025-05-08 03:37:26 -07:00
root
37db65984c
remove k2 dependency
2025-05-08 03:02:34 -07:00