Wei Kang
711d6bc462
Refactor prepare.sh in librispeech ( #1493 )
...
* Refactor prepare.sh in librispeech, break it into three parts, prepare.sh (basic, minimal requirement for transducer), prepare_lm.sh (ngram & nnlm staff), prepare_mmi.sh (for MMI training).
2024-02-09 10:44:19 +08:00
Tiance Wang
4ed88d9484
Update shared ( #1487 )
...
There should be one more ../
2024-02-07 10:16:02 +08:00
Xiaoyu Yang
777074046d
Fine-tune recipe for Zipformer ( #1484 )
...
1. support finetune zipformer
2. update the usage; set a very large batch count
2024-02-06 18:25:43 +08:00
zr_jin
a813186f64
minor fix for docstr and default param. ( #1490 )
...
* Update train.py and README.md
2024-02-05 12:47:52 +08:00
Teo Wen Shen
b9e6327adf
Fixing torch.ctc err ( #1485 )
...
* fixing torch.ctc err
* Move targets & lengths to CPU
2024-02-03 06:25:27 +08:00
Henry Li Xinyuan
b07d5472c5
Implement recipe for Fluent Speech Commands dataset ( #1469 )
...
---------
Signed-off-by: Xinyuan Li <xli257@c13.clsp.jhu.edu>
2024-01-31 22:53:36 +08:00
zr_jin
37b975cac9
fixed a CI test for wenetspeech
( #1476 )
...
* Comply to issue #1149
https://github.com/k2-fsa/icefall/issues/1149
2024-01-27 06:41:56 +08:00
Yuekai Zhang
1c30847947
Whisper Fine-tuning Recipe on Aishell1 ( #1466 )
...
* add decode seamlessm4t
* add requirements
* add decoding with avg model
* add token files
* add custom tokenizer
* support deepspeed to finetune large model
* support large-v3
* add model saving
* using monkey patch to replace models
* add manifest dir option
2024-01-27 00:32:30 +08:00
Fangjun Kuang
8d39f9508b
Fix torchscript export to use tokens.txt instead of lang_dir ( #1475 )
2024-01-26 19:18:33 +08:00
Zengwei Yao
c401a2646b
minor fix of zipformer/optim.py ( #1474 )
2024-01-26 15:50:11 +08:00
zr_jin
9c494a3329
typos fixed ( #1472 )
2024-01-25 18:41:43 +08:00
Yifan Yang
559ed150bb
Fix typo ( #1471 )
2024-01-23 22:51:09 +08:00
zr_jin
ebe97a07b0
Reworked README.md ( #1470 )
...
* Rework README.md
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
---------
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-01-23 16:26:24 +08:00
Yifan Yang
5dfc3ed7f9
Fix buffer size of DynamicBucketingSampler ( #1468 )
...
* Fix buffer size
* Fix for flake8
---------
Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
2024-01-21 02:10:42 +08:00
zr_jin
7bdde9174c
A Zipformer recipe with Byte-level BPE for Aishell-1 ( #1464 )
...
* init commit
* Update train.py
* Update decode.py
* Update RESULTS.md
* added `vocab_size`
* removed unused softlinks
* added scripts for testing pretrained models
* set `bpe_model` as required
* re-org the bbpe recipe for aishell
2024-01-16 21:08:35 +08:00
Fangjun Kuang
398401ed27
Update kaldifeat installation doc ( #1460 )
2024-01-14 14:38:41 +08:00
Xiaoyu Yang
e2fcb42f5f
fix typo ( #1455 )
2024-01-09 15:41:37 +08:00
zr_jin
5445ea6df6
Use shuffled LibriSpeech cuts instead ( #1450 )
...
* use shuffled LibriSpeech cuts instead
* leave the old code in comments for reference
2024-01-08 15:09:21 +08:00
zr_jin
b9b56eb879
Minor fixes to the VCTK data prep scripts ( #1441 )
...
* Update prepare.sh
2024-01-08 14:28:07 +08:00
Karel Vesely
716b82cc3a
streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] ( #1448 )
...
- some AudioTransform classes produce audio signals out of range [-1,+1]
- Resample produced 1.0079
- The range [-10,+10] was chosen to still be able to reliably
distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254
2024-01-05 10:21:27 +08:00
Fangjun Kuang
8136ad775b
Use high_freq -400 in computing fbank features. ( #1447 )
...
See also https://github.com/k2-fsa/sherpa-onnx/issues/514
2024-01-04 13:59:32 +08:00
zr_jin
f42258caf8
Update compute_fbank_commonvoice_splits.py ( #1437 )
2023-12-30 13:03:26 +08:00
Fangjun Kuang
140e6381ad
Refactor CI tests for librispeech ( #1436 )
2023-12-27 13:21:14 +08:00
Fangjun Kuang
db52fe2349
Refactor CI test for aishell ( #1435 )
2023-12-26 20:29:43 +08:00
Fangjun Kuang
835a92eba5
Add doc about how to use the CPU-only docker images ( #1432 )
2023-12-25 20:23:56 +08:00
Ali Haznedaroğlu
ddd7131317
Update TTS export-onnx.py scripts for handling variable token counts ( #1430 )
2023-12-25 19:44:07 +08:00
Fangjun Kuang
c855a58cfd
Generate the dependency matrix by code for GitHub Actions ( #1431 )
2023-12-25 19:41:09 +08:00
Fangjun Kuang
e5bb1ae86c
Use the CPU docker in CI to simplify the test code ( #1427 )
2023-12-24 13:40:33 +08:00
Fangjun Kuang
79a42148db
Add CI test to cover zipformer/train.py ( #1424 )
2023-12-23 00:38:36 +08:00
TianHao Zhang
702d4f5914
Update prepare.sh ( #1422 )
...
fix the bug in line 251:
1、 del the additional blank
2、correct the spell error of "new_vocab_size"
2023-12-21 14:42:33 +08:00
zr_jin
10a234709c
bugs fixed ( #1416 )
2023-12-14 11:26:37 +08:00
Fangjun Kuang
f85f0252a9
Add greedy search for streaming zipformer CTC. ( #1415 )
2023-12-13 17:34:12 +08:00
zr_jin
d0da509055
Support ONNX export for Streaming CTC Encoder ( #1413 )
...
* Create export-onnx-streaming-ctc.py
* doc_str updated
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
---------
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-12-13 10:33:28 +08:00
Fangjun Kuang
9e9fe7954d
Upload gigaspeech zipformer models in CI ( #1412 )
2023-12-12 18:57:04 +08:00
Fangjun Kuang
20a82c9abf
first commit ( #1411 )
2023-12-12 18:13:26 +08:00
Fangjun Kuang
b0f70c9d04
Fix torch.jit.script() export for pruned_transducer_stateless2 ( #1410 )
2023-12-10 11:38:39 +08:00
zr_jin
df56aff31e
minor fixes to the vits onnx exportation scripts ( #1408 )
2023-12-08 21:11:31 +08:00
Fangjun Kuang
e9ec827de7
Rename zipformer2 to zipformer_for_ncnn_export_only to avoid confusion. ( #1407 )
2023-12-08 14:29:24 +08:00
zr_jin
bda72f86ff
minor adjustments to the VITS recipes for onnx runtime ( #1405 )
2023-12-08 06:32:40 +08:00
Yifan Yang
b87ed26c09
Normalize dockerfile ( #1400 )
2023-12-06 14:33:45 +08:00
zr_jin
735fb9a73d
A TTS recipe VITS on VCTK dataset ( #1380 )
...
* init
* isort formatted
* minor updates
* Create shared
* Update prepare_tokens_vctk.py
* Update prepare_tokens_vctk.py
* Update prepare_tokens_vctk.py
* Update prepare.sh
* updated
* Update train.py
* Update train.py
* Update tts_datamodule.py
* Update train.py
* Update train.py
* Update train.py
* Update train.py
* Update train.py
* Update train.py
* fixed formatting issue
* Update infer.py
* removed redundant files
* Create monotonic_align
* removed redundant files
* created symlinks
* Update prepare.sh
* minor adjustments
* Create requirements_tts.txt
* Update requirements_tts.txt
added version constraints
* Update infer.py
* Update infer.py
* Update infer.py
* updated docs
* Update export-onnx.py
* Update export-onnx.py
* Update test_onnx.py
* updated requirements.txt
* Update test_onnx.py
* Update test_onnx.py
* docs updated
* docs fixed
* minor updates
2023-12-06 09:59:19 +08:00
LoganLiu66
f08af2fa22
fix initial states ( #1398 )
...
Co-authored-by: liujiawang02 <liujiawang02@baidu.com>
2023-12-04 22:29:42 +08:00
Zengwei Yao
0622dea30d
Add a TTS recipe VITS on LJSpeech dataset ( #1372 )
...
* first commit
* replace phonimizer with g2p
* use Conformer as text encoder
* modify training script, clean codes
* rename directory
* convert text to tokens in data preparation stage
* fix tts_datamodule.py
* support onnx export and testing the exported onnx model
* add doc
* add README.md
* fix style
2023-11-29 21:28:38 +08:00
zr_jin
ae67f75e9c
a bilingual recipe similar to the multi-zh_hans
( #1265 )
2023-11-26 10:04:15 +08:00
Wei Kang
238b45bea8
Libriheavy recipe (zipformer) ( #1261 )
...
* initial commit for libriheavy
* Data prepare pipeline
* Fix train.py
* Fix decode.py
* Add results
* minor fixes
* black
* black
* Incorporate PR https://github.com/k2-fsa/icefall/pull/1269
---------
Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2023-11-23 01:22:57 +08:00
Wei Kang
11d816d174
Add cumstomized score for hotwords ( #1385 )
...
* add custom score for each hotword
* Add more comments
* Fix deocde
* fix style
* minor fixes
2023-11-18 18:47:55 +08:00
Fangjun Kuang
666d69b20d
Rename train2.py to avoid confusion ( #1386 )
2023-11-17 18:12:59 +08:00
Karel Vesely
59c943878f
add the voxpopuli
recipe ( #1374 )
...
* add the `voxpopuli` recipe
- this is the data preparation
- there is no ASR training and no results
* update the PR#1374 (feedback from @csukuangfj)
- fixing .py headers and docstrings
- removing BUT specific parts of `prepare.sh`
- adding assert `num_jobs >= num_workers` to `compute_fbank.py`
- narrowing list of languages
(let's limit to ASR sets with transcripts for now)
- added links to `README.md`
- extending `text_from_manifest.py`
2023-11-16 14:38:31 +08:00
zr_jin
6d275ddf9f
fixed broken softlinks ( #1381 )
...
* removed broken softlinks
* fixed dependencies
* fixed file permission
2023-11-10 14:45:16 +08:00
lishaojie
1b2e99d374
add the pruned_transducer_stateless7_streaming recipe for commonvoice ( #1018 )
...
* add the pruned_transducer_stateless7_streaming recipe for commonvoice
* fix the symlinks
* Update RESULTS.md
2023-11-09 22:07:28 +08:00