Yuekai Zhang
341c29e6e2
fix whisper version to support multi batch beam
2024-01-31 14:02:39 +08:00
Yuekai Zhang
c19891ee8e
add remove long short
2024-01-31 14:02:39 +08:00
Yuekai Zhang
bb07b65e45
add remove long short
2024-01-31 14:02:39 +08:00
Yuekai Zhang
1600f7db95
fix too long audios
2024-01-31 14:02:39 +08:00
Yuekai Zhang
b76cd65abf
fix subsampling factor
2024-01-31 14:02:39 +08:00
Yuekai Zhang
ad796d929d
remove useless file
2024-01-31 14:02:39 +08:00
Yuekai Zhang
e49534f2dd
add monkey patch codes
2024-01-31 14:02:39 +08:00
Yuekai Zhang
e1a55b945b
add wenetspeech fine-tune scripts
2024-01-31 14:02:39 +08:00
Yuekai Zhang
baa7c5fb8d
use multi machines
2024-01-31 14:02:39 +08:00
Yuekai Zhang
cf85019290
parallel jobs
2024-01-31 14:02:39 +08:00
Yuekai Zhang
df54121c41
fix io issue
2024-01-31 14:02:39 +08:00
Yuekai Zhang
af29455c3d
add kaldifeatwhisper fbank
2024-01-31 14:02:39 +08:00
Yuekai Zhang
08db3051ad
regression
2024-01-31 14:02:39 +08:00
Yuekai Zhang
f66b266aa4
fix executor
2024-01-31 14:02:39 +08:00
Yuekai Zhang
e46e9b77ee
fix overwrite
2024-01-31 14:02:39 +08:00
Yuekai Zhang
fd77c5758c
change compute feature batch
2024-01-31 14:02:39 +08:00
Yuekai Zhang
f4cf9fb2d3
add aishell2 feat
2024-01-31 14:02:39 +08:00
Yuekai Zhang
aa7b17e410
test feature extractor speed
2024-01-31 14:02:39 +08:00
Yuekai Zhang
d1b010463c
add original model decode with 30s
2024-01-31 14:02:39 +08:00
Yuekai Zhang
38f5f45c67
add requirments.txt
2024-01-31 14:02:39 +08:00
Yuekai Zhang
72c9d01724
add decode for wenetspeech
2024-01-31 14:02:39 +08:00
Yuekai Zhang
046e071ca3
add str to bool
2024-01-31 14:02:39 +08:00
Yuekai Zhang
315175a362
add whisper fbank for other dataset
2024-01-31 14:02:39 +08:00
Yuekai Zhang
e43c4da91d
add whisper fbank for wenetspeech
2024-01-31 14:02:39 +08:00
zr_jin
37b975cac9
fixed a CI test for wenetspeech
( #1476 )
...
* Comply to issue #1149
https://github.com/k2-fsa/icefall/issues/1149
2024-01-27 06:41:56 +08:00
Yuekai Zhang
1c30847947
Whisper Fine-tuning Recipe on Aishell1 ( #1466 )
...
* add decode seamlessm4t
* add requirements
* add decoding with avg model
* add token files
* add custom tokenizer
* support deepspeed to finetune large model
* support large-v3
* add model saving
* using monkey patch to replace models
* add manifest dir option
2024-01-27 00:32:30 +08:00
Fangjun Kuang
8d39f9508b
Fix torchscript export to use tokens.txt instead of lang_dir ( #1475 )
2024-01-26 19:18:33 +08:00
Zengwei Yao
c401a2646b
minor fix of zipformer/optim.py ( #1474 )
2024-01-26 15:50:11 +08:00
zr_jin
9c494a3329
typos fixed ( #1472 )
2024-01-25 18:41:43 +08:00
Yifan Yang
559ed150bb
Fix typo ( #1471 )
2024-01-23 22:51:09 +08:00
zr_jin
ebe97a07b0
Reworked README.md ( #1470 )
...
* Rework README.md
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
---------
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-01-23 16:26:24 +08:00
Yifan Yang
5dfc3ed7f9
Fix buffer size of DynamicBucketingSampler ( #1468 )
...
* Fix buffer size
* Fix for flake8
---------
Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
2024-01-21 02:10:42 +08:00
zr_jin
7bdde9174c
A Zipformer recipe with Byte-level BPE for Aishell-1 ( #1464 )
...
* init commit
* Update train.py
* Update decode.py
* Update RESULTS.md
* added `vocab_size`
* removed unused softlinks
* added scripts for testing pretrained models
* set `bpe_model` as required
* re-org the bbpe recipe for aishell
2024-01-16 21:08:35 +08:00
Fangjun Kuang
398401ed27
Update kaldifeat installation doc ( #1460 )
2024-01-14 14:38:41 +08:00
Xiaoyu Yang
e2fcb42f5f
fix typo ( #1455 )
2024-01-09 15:41:37 +08:00
zr_jin
5445ea6df6
Use shuffled LibriSpeech cuts instead ( #1450 )
...
* use shuffled LibriSpeech cuts instead
* leave the old code in comments for reference
2024-01-08 15:09:21 +08:00
zr_jin
b9b56eb879
Minor fixes to the VCTK data prep scripts ( #1441 )
...
* Update prepare.sh
2024-01-08 14:28:07 +08:00
Karel Vesely
716b82cc3a
streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] ( #1448 )
...
- some AudioTransform classes produce audio signals out of range [-1,+1]
- Resample produced 1.0079
- The range [-10,+10] was chosen to still be able to reliably
distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254
2024-01-05 10:21:27 +08:00
Fangjun Kuang
8136ad775b
Use high_freq -400 in computing fbank features. ( #1447 )
...
See also https://github.com/k2-fsa/sherpa-onnx/issues/514
2024-01-04 13:59:32 +08:00
zr_jin
f42258caf8
Update compute_fbank_commonvoice_splits.py ( #1437 )
2023-12-30 13:03:26 +08:00
Fangjun Kuang
140e6381ad
Refactor CI tests for librispeech ( #1436 )
2023-12-27 13:21:14 +08:00
Fangjun Kuang
db52fe2349
Refactor CI test for aishell ( #1435 )
2023-12-26 20:29:43 +08:00
Fangjun Kuang
835a92eba5
Add doc about how to use the CPU-only docker images ( #1432 )
2023-12-25 20:23:56 +08:00
Ali Haznedaroğlu
ddd7131317
Update TTS export-onnx.py scripts for handling variable token counts ( #1430 )
2023-12-25 19:44:07 +08:00
Fangjun Kuang
c855a58cfd
Generate the dependency matrix by code for GitHub Actions ( #1431 )
2023-12-25 19:41:09 +08:00
Fangjun Kuang
e5bb1ae86c
Use the CPU docker in CI to simplify the test code ( #1427 )
2023-12-24 13:40:33 +08:00
Fangjun Kuang
79a42148db
Add CI test to cover zipformer/train.py ( #1424 )
2023-12-23 00:38:36 +08:00
TianHao Zhang
702d4f5914
Update prepare.sh ( #1422 )
...
fix the bug in line 251:
1、 del the additional blank
2、correct the spell error of "new_vocab_size"
2023-12-21 14:42:33 +08:00
zr_jin
10a234709c
bugs fixed ( #1416 )
2023-12-14 11:26:37 +08:00
Fangjun Kuang
f85f0252a9
Add greedy search for streaming zipformer CTC. ( #1415 )
2023-12-13 17:34:12 +08:00