Compare commits

...

1173 Commits
v1.0 ... master

Author SHA1 Message Date
Fangjun Kuang
34fc1fdf0d
Fix transformer decoder layer (#1995) 2025-07-18 20:12:29 +08:00
Bailey Machiko Hirota
5fe13078cc
Musan implementation for ReazonSpeech (#1988) 2025-07-18 17:16:19 +08:00
Yifan Yang
9fd0f2dc1d
support left pad for make_pad_mask (#1990) 2025-07-16 23:59:04 +08:00
Fangjun Kuang
e22bc78f98
Export streaming zipformer2 to RKNN (#1977) 2025-07-11 13:24:01 +08:00
Teo Wen Shen
da87e7fc99
add weights_only=False to torch.load (#1984) 2025-07-10 15:27:08 +08:00
Yifan Yang
89728dd4f8
Refactor data preparation for GigaSpeech recipe (#1986) 2025-07-10 11:17:37 +08:00
Mistmoon
9293edc62f
Add cr-ctc loss and ctc-decode in aishell (#1980) 2025-07-08 14:47:24 +08:00
Fangjun Kuang
fba5e67d5e
Fix CI tests. (#1974)
- Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle 
  deprecations in PyTorch ≥2.3.0

- Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast 
  with the new utilities across all training and inference scripts

- Update all torch.load calls to include weights_only=False for compatibility with 
  newer PyTorch versions
2025-07-01 13:47:55 +08:00
Fangjun Kuang
71377d21cd
Export streaming zipformer models with whisper feature to onnx (#1973) 2025-06-30 19:01:15 +08:00
Fangjun Kuang
abd9437e6d
Add more wheels for piper-phonemize (#1969) 2025-06-24 14:49:16 +08:00
Wei Kang
e1cf4dbace
rm zipvoice (#1967) 2025-06-23 19:22:35 +08:00
Wei Kang
343b8fa2dc
Using non strict match in context graph for contextual words (#1952) 2025-06-19 12:27:15 +08:00
Wei Kang
f80a2ee110
Decrease num_buckets & remove shuffle_buffer_size (#1955) 2025-06-19 12:26:37 +08:00
Wei Kang
3587c4b3b7
Fix decoding byte bpes tokens to words. (#1966) 2025-06-19 12:26:01 +08:00
Wei Kang
762f965cf7
[zipvoice] Add requirements.txt and pinyin.txt, remove k2 from pretrained model inference. (#1965)
* Add requirements.txt and pinyin.txt needed by zipvoice

* simplify the requirements for pretrained model inference
2025-06-18 18:38:46 +08:00
Wei Kang
06539d2b9d
Add Zipvoice (#1964)
* Add ZipVoice - a flow-matching based zero-shot TTS model.
2025-06-17 20:17:12 +08:00
Zengwei Yao
ffb7d05635
refactor branch exchange in cr-ctc (#1954) 2025-05-27 12:09:59 +08:00
Mahsa Yarmohammadi
021e1a8846
Add acknowledgment to README (#1950) 2025-05-22 22:06:35 +08:00
Tianxiang Zhao
30e7ea4b5a
Fix a bug in finetune.py --use-mux (#1949) 2025-05-22 12:05:01 +08:00
Fangjun Kuang
fd8f8780fa
Fix logging torch.dtype. (#1947) 2025-05-21 12:04:57 +08:00
Yifan Yang
e79833aad2
ensure SwooshL/SwooshR output dtype matches input dtype (#1940) 2025-05-12 19:28:48 +08:00
Yifan Yang
4627969ccd
fix bug: undefined name 'partial' (#1941) 2025-05-12 14:19:53 +08:00
Yifan Yang
cd7caf12df
Fix speech_llm recipe (#1936)
* fix training/decoding scripts, cleanup unused code, and ensure compliance with style checks

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2025-04-30 11:41:00 +08:00
Fangjun Kuang
cc2e64a6aa
Fix convert_texts_into_ids() in the tedlium3 recipe. (#1929) 2025-04-24 17:04:46 +08:00
Yifan Yang
5ec95e5482
Fix SpeechLLM recipe (#1926) 2025-04-23 16:18:38 +08:00
math345
64c5364085
Fix bug: When resuming training from a checkpoint, model_avg was not assigned, resulting in a None error. (#1914) 2025-04-10 11:37:28 +08:00
Fangjun Kuang
300a821f58
Fix aishell training (#1916) 2025-04-10 10:30:37 +08:00
Fangjun Kuang
171cf8c9fe
Avoid redundant computation in PiecewiseLinear. (#1915) 2025-04-09 11:52:37 +08:00
Wei Kang
86bd16d496
[KWS]Remove graph compiler (#1905) 2025-04-02 22:10:06 +08:00
Fangjun Kuang
db9fb8ad31
Add scripts to export streaming zipformer(v1) to RKNN (#1882) 2025-02-27 17:10:58 +08:00
Yuekai Zhang
2ba665abca
Add F5-TTS with semantic token training results (#1880)
* add cosy token

* update inference code

* add extract cosy token

* update results

* add requirements.txt

* update readme

---------

Co-authored-by: yuekaiz <yuekaiz@h20-7.cm.cluster>
Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>
2025-02-24 13:58:47 +08:00
Machiko Bailey
da597ad782
Update RESULTS.md (#1873) 2025-02-04 09:04:25 +08:00
Machiko Bailey
0855b0338a
Merge japanese-to-english multilingual branch (#1860)
* add streaming support to reazonresearch

* update README for streaming

* Update RESULTS.md

* add onnx decode

---------

Co-authored-by: root <root@KDA03.cm.cluster>
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
Co-authored-by: root <root@KDA01.cm.cluster>
Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2025-02-04 01:33:09 +08:00
Yuekai Zhang
dd5d7e358b
F5-TTS Training Recipe for WenetSpeech4TTS (#1846)
* add f5

* add infer

* add dit

* add README

* update pretrained checkpoint usage

---------

Co-authored-by: yuekaiz <yuekaiz@h20-5.cm.cluster>
Co-authored-by: yuekaiz <yuekaiz@l20-3.cm.cluster>
Co-authored-by: yuekaiz <yuekaiz@h20-6.cm.cluster>
Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2025-01-27 16:33:02 +08:00
zr_jin
39c466e802
Update shared (#1868) 2025-01-21 11:04:11 +08:00
zr_jin
79074ef0d4
removed the erroneous ‘’continual'' implementation (#1865) 2025-01-16 20:51:28 +08:00
zr_jin
8ab0352e60
Update style_check.yml (#1866) 2025-01-16 17:36:09 +08:00
Han Zhu
ab91112909
Improve infinity-check (#1862)
1. Attach the inf-check hooks if the grad scale is getting too small.
2. Add try-catch to avoid OOM in the inf-check hooks.
3. Set warmup_start=0.1 to reduce chances of divergence
2025-01-09 15:05:38 +08:00
Seonuk Kim
8d602806c3
Update conformer.py (#1859)
* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

Swich -? Swish
2025-01-06 17:31:13 +08:00
Seonuk Kim
3b6d54007b
Update conformer.py (#1857)
* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension

* Update conformer.py

feedforward dimention -> feedforward dimension
2025-01-06 13:17:02 +08:00
Fangjun Kuang
3b263539cd
Publish MatchaTTS onnx models trained with LJSpeech to huggingface (#1854) 2025-01-02 15:54:34 +08:00
Fangjun Kuang
bfffda5afb
Add MatchaTTS for the Chinese dataset Baker (#1849) 2024-12-31 17:17:05 +08:00
Han Zhu
df46a3eaf9
Warn instead of raising exceptions in inf-check (#1852) 2024-12-31 16:52:06 +08:00
Yifan Yang
a2b0f6057c
Small fix (#1853) 2024-12-31 07:41:44 +08:00
Han Zhu
48088cb807
Refactor optimizer (#1837)
* Print indexes of largest grad
2024-12-30 15:30:02 +08:00
Han Zhu
57e9f2a8db
Add the "rms-sort" diagnostics (#1851) 2024-12-30 15:27:05 +08:00
Fangjun Kuang
ad966fb81d
Minor fixes to the onnx inference script for ljspeech matcha-tts. (#1838) 2024-12-19 15:19:41 +08:00
Fangjun Kuang
92ed1708c0
Add torch 1.13 and 2.0 to CI tests (#1840) 2024-12-18 16:50:14 +08:00
Fangjun Kuang
d4d4f281ec
Revert "Replace deprecated pytorch methods (#1814)" (#1841)
This reverts commit 3e4da5f78160d3dba3bdf97968bd7ceb8c11631f.
2024-12-18 16:49:57 +08:00
Li Peng
3e4da5f781
Replace deprecated pytorch methods (#1814)
* Replace deprecated pytorch methods

- torch.cuda.amp.GradScaler(...) => torch.amp.GradScaler("cuda", ...)
- torch.cuda.amp.autocast(...) => torch.amp.autocast("cuda", ...)

* Replace `with autocast(...)` with `with autocast("cuda", ...)`


Co-authored-by: Li Peng <lipeng@unisound.ai>
2024-12-16 10:24:16 +08:00
zr_jin
d475de5600
Merge pull request #1835 from JinZr/fix/matcha-minor 2024-12-11 19:03:19 +08:00
zr_jin
b7acf0f57b minor fixes 2024-12-11 14:33:47 +08:00
zr_jin
a43480af47
fixed the not found python 3.8 env (#1830) 2024-12-10 11:15:49 +08:00
zr_jin
08caa1e4e5
minor fixes to the matcha recipe 2024-12-09 22:59:29 +08:00
zr_jin
32b7a449e7
removed unnecessary type check (#1827) 2024-12-08 17:36:08 +08:00
zr_jin
d33f678176
fixed the formatting issue of PR#1812 (#1828) 2024-12-08 16:37:24 +08:00
goddamnVincent
5c04f7bfb8
'try to fix 'compute_fbank_kespeech_splits.py: error: unrecognized arguments: --speed-perturb true'' (#1812) 2024-12-08 11:17:15 +08:00
zr_jin
1c4dd464a0
Performed end to end testing on the matcha recipe (#1797)
* minor fixes to the `ljspeech/matcha` recipe
2024-12-08 03:18:15 +08:00
zr_jin
6e6b022e41
performed end to end testing to the VALL-E recipe (#1818)
* added the missing ``visualize`` function

* minor fixes
2024-12-06 16:14:51 +08:00
Han Zhu
bdd0f85704
Fix the normalized_text in LibriTTS recipe (#1825) 2024-12-05 15:12:06 +08:00
zr_jin
a1ade8ecb7
fixed failed assertion in the xbmu_ambo31 recipe (#1816) 2024-11-29 16:36:02 +08:00
Han Zhu
18fa6a0fec
Fix LibriTTS prepare.sh (#1815) 2024-11-29 11:45:05 +08:00
Yuekai Zhang
cbe012d54c
Valle Recipe for WenetSpeech4TTS, LibriTTS, LibriTTS-R (#1805)
* add valle

* update readme
2024-11-22 11:18:01 +08:00
Yifan Yang
57451b0382
refactor ksponspeech recipe (#1794)
Co-authored-by: Your Name <>
2024-11-01 22:49:19 +08:00
zr_jin
66225fbe33
VITS recipe for LibriTTS corpus (#1776) 2024-11-01 15:33:13 +08:00
Yifan Yang
119e1ce3e8
fix str2bool (#1792) 2024-10-31 09:54:12 +08:00
zr_jin
87cadfcd2e
fixed formatting issue (#1791)
* isort fixed formatting issue
2024-10-30 21:14:12 +08:00
Wei Kang
d513d456b8
Add prefix beam search and corresponding decoding methods (#1786)
* Add prefix beam search / shallow fussion / hotwords in librispeech ctc decode

* Add librispeech cr-ctc prefix beam search results
2024-10-30 10:14:34 +08:00
Fangjun Kuang
6c7863c2f8
Fix CI tests (#1788)
Use numpy<2.0
2024-10-29 22:26:25 +08:00
Fangjun Kuang
f23c8ce9dd
Fix CI test for gigaspeech (#1787) 2024-10-29 15:50:49 +08:00
Fangjun Kuang
516b4869b3
Add Matcha-TTS (#1773) 2024-10-29 15:04:04 +08:00
Fangjun Kuang
7e9eea6dc3
Add pretrained.py for SURT (#1785) 2024-10-28 11:53:11 +08:00
Fangjun Kuang
05f756390c
Avoid using lr from checkpoint. (#1781) 2024-10-28 00:59:04 +08:00
Yifan Yang
37a1420603
remove incomplete recipe (#1778)
Co-authored-by: yifanyeung <v-yifanyang@microsoft.com>
2024-10-24 13:16:18 +08:00
zr_jin
88bacfb9e6
minor fixes for the repo (#1775)
* minor fixes for the repo

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-10-21 13:51:56 +08:00
zr_jin
e8b6b920c0
A LibriTTS recipe on both ASR & Neural Codec Tasks (#1746)
* added ASR & CODEC recipes for LibriTTS corpus
2024-10-21 11:30:14 +08:00
Zengwei Yao
693d84a301
Add Consistency-Regularized CTC (#1766)
* support consistency-regularized CTC

* update arguments of cr-ctc

* set default value of cr_loss_masked_scale to 1.0

* minor fix

* refactor codes

* update RESULTS.md
2024-10-21 10:35:26 +08:00
KIM7AZEN
f84270c935
fix the fixed num_splits (#1772) 2024-10-16 17:19:24 +08:00
zzasdf
2653df5bda
fix the mismatch in batch_idx_train (#1757) 2024-10-12 19:14:28 +08:00
Zengwei Yao
fbba712887
Fix issue with eval mode in ActivationDropoutLinear (#1770)
* Fix issue with eval mode in ActivationDropoutLinear

---------

Co-authored-by: Daniel Povey <dpovey@gmail.com>
2024-10-12 19:09:05 +08:00
zr_jin
d9844d847f
Update prepare.sh (#1768) 2024-10-09 15:50:12 +08:00
Yu Lianjie
5c04c31292
fix open-commands path (#1714) 2024-09-20 12:38:52 +08:00
Fangjun Kuang
6f1abd832d
Fix exporting streaming zipformer models. (#1755) 2024-09-11 21:04:52 +08:00
Fangjun Kuang
329e34ac20
Test export onnx models for multi-zh-hans (#1752) 2024-09-10 19:29:19 +08:00
zr_jin
a394bf7474
fixed gss scripts for alimeeting and ami recipes (#1749) 2024-09-08 20:35:07 +08:00
zr_jin
65b8a6c730
fixed wrong default value for the alimeeting recipe (#1750) 2024-09-08 20:34:49 +08:00
Fangjun Kuang
2ff0bb6a88
fix CI tests (#1748) 2024-09-08 17:42:55 +08:00
zr_jin
559c8a7160
fixed a typo in prepare.sh for alimeeting recipes (#1747) 2024-09-08 17:10:17 +08:00
Fangjun Kuang
d4b4323699
Fix github actions CI tests (#1744) 2024-09-07 19:21:26 +08:00
Fangjun Kuang
f233ffa02a
Add docker images for torch 2.4.1 (#1743) 2024-09-07 18:17:04 +08:00
Yifan Yang
cea0dbe7b1
fix gigaspeech_prepare.sh (#1734) 2024-08-28 12:15:01 +08:00
Xiaoyu Yang
a6c02a4d8c
zipformer BF16 training recipe (#1700)
Support Zipformer AMP +BF16 training
2024-08-23 09:42:22 +08:00
Yuekai Zhang
3b434fe83c
fix triton onnx export (#1730) 2024-08-23 09:33:46 +08:00
Xiaoyu Yang
3fc06cc2b9
Support AudioSet training with weighted sampler (#1727) 2024-08-22 15:27:25 +08:00
Xiaoyu Yang
5952972294
Keep the custom fields in libriheavy manifest (#1719) 2024-08-17 13:24:38 +08:00
Yifan Yang
6ac3343ce5
fix path in README.md (#1722) 2024-08-16 20:13:02 +08:00
Karel Vesely
1730fce688
split save_results() -> save_asr_output() + save_wer_results() (#1712)
- the idea is to support `--skip-scoring` argument passed to a decoding
  script
- created for Transducer decoding (non-streaming, streaming)
- it can be done also for CTC decoding... (not yet)

- also added `--label` for extra label in `streaming_decode.py`
- and also added `set_caching_enabled(True)`, which has no effect on
  librispeech, but it leads to faster runtime on DBs with long
  recordings (assuming `librispeech/zipformer` scripts are the
  example scripts for other setups)
2024-08-13 23:02:14 +08:00
Fangjun Kuang
3b257dd5ae
Add docker images for torch 2.4 (#1704) 2024-07-25 16:46:24 +08:00
Yuekai Zhang
4af81af5a6
Update Zipformer-xl 700M Results on multi-hans-zh (#1694)
* add blank penalty

* update zipformer-xl results

* fix typo
2024-07-18 21:05:59 +08:00
zzasdf
11151415f3
fix error in accum_grad (#1693) 2024-07-17 17:47:43 +08:00
Fangjun Kuang
2e13298717
Refactor ctc greedy search. (#1691)
Use torch.unique_consecutive() to avoid reinventing the wheel.
2024-07-15 12:01:47 +08:00
Zengwei Yao
d47c078286
add decoding method of ctc-greedy-search in zipformer recipe (#1690) 2024-07-14 17:30:13 +08:00
Zengwei Yao
334beed2af
fix usages of returned losses after adding attention-decoder in zipformer (#1689) 2024-07-12 16:50:58 +08:00
Ziwei Li
f6febd658e
"-" replace "_" fix writing error (#1687) 2024-07-12 14:42:00 +08:00
Teo Wen Shen
19048e155b
Cast grad_scale in whiten to float (#1663)
* cast grad_scale in whiten to float

* fix cast in zipformer_lora
2024-07-11 15:12:30 +08:00
Yifan Yang
d65187ec52
Small fix (#1686) 2024-07-11 14:45:35 +08:00
Zengwei Yao
785f3f0bcf
Update RESULTS.md, adding results and model links of zipformer-small/medium CTC/AED models (#1683) 2024-07-09 20:04:47 +08:00
Yuekai Zhang
1c3d992a39
Update results using Zipformer-large on multi-hans-zh (#1679) 2024-07-09 09:57:52 +08:00
zr_jin
2d64228efa
Update attention_decoder.py (#1681) 2024-07-06 09:01:34 +08:00
zr_jin
325a825841
Update requirements-ci.txt (#1682) 2024-07-06 09:01:19 +08:00
Zengwei Yao
f76afff741
Support CTC/AED option for Zipformer recipe (#1389)
* add attention-decoder loss option for zipformer recipe

* add attention-decoder-rescoring

* update export.py and pretrained_ctc.py

* update RESULTS.md
2024-07-05 20:19:18 +08:00
Yifan Yang
cbcac23d26
Fix typos, remove unused packages, normalize comments (#1678) 2024-07-04 14:19:45 +08:00
Yuekai Zhang
ebbd396c2b
update multi-hans-zh whisper-qwen-7b results (#1677)
* update qwen-7b whisper encoder results

* update qwen-7b whisper encoder results

* fix typo
2024-07-03 19:55:12 +08:00
Manix
eaab2c819f
Zipformer Onnx FP16 (#1671)
Signed-off-by: manickavela29 <manickavela1998@gmail.com>
2024-06-27 16:08:24 +08:00
Fangjun Kuang
b594a3875b
Add CI for non-streaming zipformer about ksponspeech (#1667) 2024-06-24 16:20:46 +08:00
Seung Hyun Lee
031f892796
Reformat by black non-streaming zipformer recipe for ksponspeech (#1665) 2024-06-24 15:28:09 +08:00
Seung Hyun Lee
6f102d3470
Add non-streaming Zipformer recipe for KsponSpeech (#1664) 2024-06-24 14:07:37 +08:00
Fangjun Kuang
3059eb4511
Fix doc URLs (#1660) 2024-06-21 11:10:14 +08:00
Yuekai Zhang
ff2bef9e50
update multi-hans whisper-qwen-1.5b results (#1657) 2024-06-19 11:10:31 +08:00
Seung Hyun Lee
2e05663fbb
Add prepare.sh for KsponSpeech recipe. (#1656) 2024-06-18 16:54:39 +08:00
Fangjun Kuang
1f5c0a87b9
Add CI for ksponspeech (#1655) 2024-06-16 19:15:09 +08:00
Seung Hyun Lee
c13c7aa30b
Add Streaming Zipformer-Transducer recipe for KsponSpeech (#1651) 2024-06-16 16:20:44 +08:00
Yuekai Zhang
890eeec82c
Add qwen-audio style model training: using whisper + qwen2 (#1652) 2024-06-16 12:14:44 +08:00
Triplecq
3b40d9bbb1
Zipformer recipe for ReazonSpeech (#1611)
* Add first cut at ReazonSpeech recipe

This recipe is mostly based on egs/csj, but tweaked to the point that
can be run with ReazonSpeech corpus.

Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>

---------

Signed-off-by: Fujimoto Seiji <fujimoto@ceptord.net>
Co-authored-by: Fujimoto Seiji <fujimoto@ceptord.net>
Co-authored-by: Chen <qc@KDM00.cm.cluster>
Co-authored-by: root <root@KDA01.cm.cluster>
2024-06-13 14:19:03 +08:00
Yuekai Zhang
d5be739639
add distill whisper results (#1648) 2024-06-13 00:20:04 +08:00
Fangjun Kuang
13f55d0735
Add merge_tokens for ctc forced alignment (#1649) 2024-06-12 17:45:13 +08:00
Fangjun Kuang
ec0389a3c1
Add doc about FST-based CTC forced alignment. (#1482) 2024-06-12 17:36:57 +08:00
Daniel Povey
4d5c1f2e60
Remove inf from stored stats (#1647) 2024-06-10 22:41:54 +08:00
Fangjun Kuang
130a18cc10
support torch 2.3.1 in docker (#1646) 2024-06-06 22:27:29 +08:00
Fangjun Kuang
b88062292b
Typo fixes (#1643) 2024-06-03 16:49:21 +08:00
zr_jin
42a97f6d7b
Update env.py (#1635) 2024-05-22 22:29:38 +08:00
zr_jin
1adf1e441d
Removed unused `k2` dependencies from the AT recipe (#1633) 2024-05-21 18:22:19 +08:00
Zengwei Yao
0df406c5da
Initialize BiasNorm bias with small random values (#1630) 2024-05-20 22:32:02 +08:00
zr_jin
68980c5d0a
Fix an error occured during mmi preparation (#1626)
* init commit

* updated
2024-05-17 19:45:15 +08:00
zr_jin
9d570870cf
Update asr_datamodule.py (#1619) 2024-05-07 21:37:55 +08:00
Yifan Yang
4e97b19b63
Remove duplicate logging initialization logic in utils.py (#1617) 2024-05-06 13:00:27 +08:00
Zengwei Yao
c08fe48603
add force=True to logging.basicConfig (#1613) 2024-05-04 11:42:23 +08:00
Yuekai Zhang
6d7c1d13a5
update speechio whisper ft results (#1605)
* update speechio whisper ft results
2024-04-30 11:49:20 +08:00
Wei Kang
b49351fc39
Update README.md for conformer-ctc (#1609) 2024-04-28 09:56:13 +08:00
Dongji Gao
9a17f4ce41
add OTC related scripts using phone as units instead of BPEs (#1602)
* add otc related scripts using phone instead of bpe
2024-04-26 00:55:44 +08:00
zzasdf
25cabb7663
fix error in padding computing (#1607) 2024-04-25 22:40:07 +08:00
Xiaoyu Yang
df36f93bd8
add small-scaled model for audio tagging (#1604) 2024-04-24 17:00:42 +08:00
Yifan Yang
368b7d10a7
clear log handlers before setup (#1603) 2024-04-24 15:31:25 +09:00
zr_jin
9f8f0bceb5
Update prepare.sh (#1601) 2024-04-20 23:02:02 +09:00
Yifan Yang
ed6bc200e3
Update train.py (#1590) 2024-04-11 19:35:25 +08:00
Fangjun Kuang
ba5b2e854b
Return probs in audio tagging onnx models (#1586) 2024-04-10 09:03:30 +08:00
Fangjun Kuang
fa5d861af0
Add CI test for the AudioSet recipe. (#1585) 2024-04-09 17:45:00 +08:00
yh646492956
f5d7818733
fix run.sh script in wenetspeech KWS (#1584)
Co-authored-by: Hao You <13182720519@sina.cn>
2024-04-09 15:16:12 +08:00
Xiaoyu Yang
1732dafe24
Add zipformer recipe for audio tagging (#1421) 2024-04-09 12:06:14 +08:00
zr_jin
f2e36ec414
Zipformer recipe for CommonVoice (#1546)
* added scripts for char-based lang prep training scripts

* added `Zipformer` recipe for commonvoice

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-04-09 11:37:08 +08:00
Yifan Yang
87843e9382
k2SSL: a Faster and Better Framework for Self-Supervised Speech Representation Learning (#1500)
* Add k2SSL

* fix flake8

* fix for black

* fix for black

* fix for black

* Update ssl_datamodule.py

* Fix bugs in HubertDataset

* update comments

* add librilight

* add checkpoint convert script

* format

---------

Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
Co-authored-by: zzasdf <15218404468@163.com>
2024-04-04 23:29:16 +08:00
Fangjun Kuang
c45e9fecfb
support torch 2.2.2 in docker images (#1578) 2024-04-03 11:26:24 +08:00
Wei Kang
9369c2bef9
Add comments to prepare.sh in aidatatang (#1575) 2024-04-02 16:08:09 +08:00
Dadoou
6cbddaa8e3
Add base choice to model_name argument for whisper model. (#1573)
Co-authored-by: dadoou <dadoou@yandex.com>
2024-04-02 09:47:38 +08:00
Wei Kang
42de459110
Fix decoding finetune model (#1568) 2024-03-26 10:38:21 +08:00
Wei Kang
b156b6c291
Add use-mux to finetune commands (#1567) 2024-03-26 09:42:46 +08:00
Fangjun Kuang
bb9ebcfb06
Fix CI (#1563) 2024-03-23 09:27:28 +08:00
Zengwei Yao
353469182c
fix issue in zipformer.py (#1566) 2024-03-21 15:59:43 +08:00
Xiaoyu Yang
bddc3fca7a
Fix adapter in streaming_forward (#1560) 2024-03-21 15:08:58 +08:00
Fangjun Kuang
387833fb7c
Doc: Add huggingface mirror for users from China. (#1565) 2024-03-21 12:05:30 +08:00
zr_jin
d5cd78a637
Update hooks.py (#1564) 2024-03-20 16:43:45 +08:00
zr_jin
9bd30853ae
Update diagnostics.py (#1562) 2024-03-20 15:35:14 +08:00
zr_jin
413220d6a4
Minor fixes for the multi_zh_en recipe (#1526) 2024-03-18 20:25:57 +08:00
Fangjun Kuang
489263e5bb
Add streaming HLG decoding for zipformer CTC. (#1557)
Note it supports only CPU.
2024-03-18 20:11:47 +08:00
Karel Vesely
4917ac8bab
allow export of onnx-streaming-models with other than 80dim input features (#1556) 2024-03-18 18:43:29 +08:00
zr_jin
eec12f053d
Use piper_phonemize as text tokenizer in vctk TTS recipe (#1522)
* to align with PR #1524
2024-03-18 17:53:52 +08:00
zr_jin
9b0eae3b4a
fixes for init value of diagnostics.TensorDiagnosticOptions (#1555) 2024-03-18 17:14:29 +08:00
zr_jin
bf2f94346c
Enabling char_level and compute_CER for aishell recipe (#1554)
* init fix

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-03-18 11:57:47 +08:00
Xiaoyu Yang
2dfd5dbf8b
Add LoRA for Zipformer (#1540) 2024-03-15 17:19:23 +08:00
Xiaoyu Yang
f28c05f4f5
Documentation for adapter fine-tuning (#1545) 2024-03-14 12:18:49 +08:00
zr_jin
eb132da00d
additional instruction for the grad_scale is too small error (#1550) 2024-03-14 11:33:49 +08:00
Fangjun Kuang
15bd9a841e
add CI for ljspeech (#1548) 2024-03-13 17:39:01 +08:00
Fangjun Kuang
d406b41cbd
Doc: Add page for installing piper-phonemize (#1547) 2024-03-13 11:01:18 +08:00
zr_jin
c3f6f28116
Zipformer recipe for Cantonese dataset MDCC (#1537)
* init commit

* Create README.md

* handle code switching cases

* misc. fixes

* added manifest statistics

* init commit for the zipformer recipe

* added scripts for exporting model

* added RESULTS.md

* added scripts for streaming related stuff

* doc str fixed
2024-03-13 10:01:28 +08:00
Fangjun Kuang
81f518ea7c
Support different tts model types. (#1541) 2024-03-12 22:29:21 +08:00
BannerWang
959906e9dc
Correct alimeeting download link (#1544)
Co-authored-by: BannerWang <banner.wang@upblocks.io>
2024-03-12 12:44:09 +08:00
jimmy1984xu
e472fa6840
fix CutMix init parameter (#1543)
Co-authored-by: jimmyxu <jimmyxu@upblocks.io>
2024-03-11 18:37:26 +08:00
Fangjun Kuang
60986c3ac1
Fix default value for --context-size in icefall. (#1538) 2024-03-08 20:47:13 +08:00
zr_jin
ae61bd4090
Minor fixes for the commonvoice recipe (#1534)
* init commit

* fix for issue https://github.com/k2-fsa/icefall/issues/1531

* minor fixes
2024-03-08 11:01:11 +08:00
Yuekai Zhang
5df24c1685
Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483)
* add whisper fbank for wenetspeech

* add whisper fbank for other dataset

* add str to bool

* add decode for wenetspeech

* add requirments.txt

* add original model decode with 30s

* test feature extractor speed

* add aishell2 feat

* change compute feature batch

* fix overwrite

* fix executor

* regression

* add kaldifeatwhisper fbank

* fix io issue

* parallel jobs

* use multi machines

* add wenetspeech fine-tune scripts

* add monkey patch codes

* remove useless file

* fix subsampling factor

* fix too long audios

* add remove long short

* fix whisper version to support multi batch beam

* decode all wav files

* remove utterance more than 30s in test_net

* only test net

* using soft links

* add kespeech whisper feats

* fix index error

* add manifests for whisper

* change to licomchunky writer

* add missing option

* decrease cpu usage 

* add speed perturb for kespeech

* fix kespeech speed perturb

* add dataset

* load checkpoint from specific path

* add speechio

* add speechio results

---------

Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2024-03-07 19:04:27 +08:00
zr_jin
cdb3fb5675
add text norm script for pl (#1532) 2024-03-07 18:47:29 +08:00
zr_jin
335a9962de
Fixed formatting issue of PR #1528 (#1530) 2024-03-06 08:43:45 +08:00
Rezakh20
ff430b465f
Add num_features to train.py for training WSASR (#1528) 2024-03-05 16:40:30 +08:00
zr_jin
242002e0bd
Strengthened style constraints (#1527) 2024-03-04 23:28:04 +08:00
Fangjun Kuang
29b195a42e
Update export-onnx.py for vits to support sherpa-onnx. (#1524) 2024-03-01 19:53:58 +08:00
zr_jin
58610b1bf6
Provides README.md for TTS recipes (#1491)
* Update README.md
2024-02-29 17:31:28 +08:00
Fangjun Kuang
2f102eb989
Add CUDA docker image for torch 2.2.1 (#1521) 2024-02-29 11:41:18 +08:00
Xiaoyu Yang
7e2b561bbf
Add recipe for fine-tuning Zipformer with adapter (#1512) 2024-02-29 10:57:38 +08:00
Zengwei Yao
d89f4ea149
Use piper_phonemize as text tokenizer in ljspeech recipe (#1511)
* use piper_phonemize as text tokenizer in ljspeech recipe

* modify usage of tokenizer in vits/train.py

* update docs
2024-02-29 10:13:22 +08:00
Fangjun Kuang
291d06056c
Support torch 2.2.1 for cpu docker. (#1516) 2024-02-23 14:24:13 +08:00
Xiaoyu Yang
2483b8b4da
Zipformer recipe for SPGISpeech (#1449) 2024-02-22 15:53:19 +08:00
Wei Kang
819bb45539
Add pypinyin to requirements (#1515) 2024-02-22 15:50:11 +08:00
Wei Kang
aac7df064a
Recipes for open vocabulary keyword spotting (#1428)
* English recipe on gigaspeech; Chinese recipe on wenetspeech
2024-02-22 15:31:20 +08:00
Xiaoyu Yang
13daf73468
docs for finetune zipformer (#1509) 2024-02-21 18:06:27 +08:00
Wei Kang
c19b414778
Update docker (adding pypinyin (#1513)
Update docker (adding pypinyin)
2024-02-21 08:04:16 +08:00
zr_jin
027302c902
minor fix for param. names (#1495) 2024-02-20 14:38:51 +08:00
Karel Vesely
e59fa38e86
docs: minor fixes of LM rescoring texts (#1498) 2024-02-20 10:40:15 +08:00
Zengwei Yao
b3e2044068
minor fix of vits/tokenizer.py (#1504)
* minor fix of vits/tokenizer.py
2024-02-19 19:33:32 +08:00
zr_jin
db4d66c0e3
Fixed softlink for ljspeech recipe (#1503) 2024-02-19 16:13:09 +08:00
Fangjun Kuang
7eb360d0d5
Fix cpu docker images for torch 2.2.0 (#1502) 2024-02-18 20:32:40 +08:00
Fangjun Kuang
17688476e5
Provider docker images for torch 2.2.0 (#1501) 2024-02-18 14:56:04 +08:00
Fangjun Kuang
06b356a610
Update cpu docker images to support torch 2.2.0 (#1499) 2024-02-18 12:05:38 +08:00
safarisadegh
d9ae8c02a0
Update README.md (#1497) 2024-02-09 15:05:01 +08:00
Wei Kang
711d6bc462
Refactor prepare.sh in librispeech (#1493)
* Refactor prepare.sh in librispeech, break it into three parts,  prepare.sh (basic, minimal requirement for transducer), prepare_lm.sh (ngram & nnlm staff), prepare_mmi.sh (for MMI training).
2024-02-09 10:44:19 +08:00
Tiance Wang
4ed88d9484
Update shared (#1487)
There should be one more ../
2024-02-07 10:16:02 +08:00
Xiaoyu Yang
777074046d
Fine-tune recipe for Zipformer (#1484)
1. support finetune zipformer
2. update the usage; set a very large batch count
2024-02-06 18:25:43 +08:00
zr_jin
a813186f64
minor fix for docstr and default param. (#1490)
* Update train.py and README.md
2024-02-05 12:47:52 +08:00
Teo Wen Shen
b9e6327adf
Fixing torch.ctc err (#1485)
* fixing torch.ctc err

* Move targets & lengths to CPU
2024-02-03 06:25:27 +08:00
Henry Li Xinyuan
b07d5472c5
Implement recipe for Fluent Speech Commands dataset (#1469)
---------

Signed-off-by: Xinyuan Li <xli257@c13.clsp.jhu.edu>
2024-01-31 22:53:36 +08:00
zr_jin
37b975cac9
fixed a CI test for wenetspeech (#1476)
* Comply to issue #1149

https://github.com/k2-fsa/icefall/issues/1149
2024-01-27 06:41:56 +08:00
Yuekai Zhang
1c30847947
Whisper Fine-tuning Recipe on Aishell1 (#1466)
* add decode seamlessm4t

* add requirements

* add decoding with avg model

* add token files

* add custom tokenizer

* support deepspeed to finetune large model

* support large-v3

* add model saving

* using monkey patch to replace models

* add manifest dir option
2024-01-27 00:32:30 +08:00
Fangjun Kuang
8d39f9508b
Fix torchscript export to use tokens.txt instead of lang_dir (#1475) 2024-01-26 19:18:33 +08:00
Zengwei Yao
c401a2646b
minor fix of zipformer/optim.py (#1474) 2024-01-26 15:50:11 +08:00
zr_jin
9c494a3329
typos fixed (#1472) 2024-01-25 18:41:43 +08:00
Yifan Yang
559ed150bb
Fix typo (#1471) 2024-01-23 22:51:09 +08:00
zr_jin
ebe97a07b0
Reworked README.md (#1470)
* Rework README.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2024-01-23 16:26:24 +08:00
Yifan Yang
5dfc3ed7f9
Fix buffer size of DynamicBucketingSampler (#1468)
* Fix buffer size

* Fix for flake8

---------

Co-authored-by: yifanyeung <yifanyeung@yifanyeung.local>
2024-01-21 02:10:42 +08:00
zr_jin
7bdde9174c
A Zipformer recipe with Byte-level BPE for Aishell-1 (#1464)
* init commit

* Update train.py

* Update decode.py

* Update RESULTS.md

* added `vocab_size`

* removed unused softlinks

* added scripts for testing pretrained models

* set `bpe_model` as required

* re-org the bbpe recipe for aishell
2024-01-16 21:08:35 +08:00
Fangjun Kuang
398401ed27
Update kaldifeat installation doc (#1460) 2024-01-14 14:38:41 +08:00
Xiaoyu Yang
e2fcb42f5f
fix typo (#1455) 2024-01-09 15:41:37 +08:00
zr_jin
5445ea6df6
Use shuffled LibriSpeech cuts instead (#1450)
* use shuffled LibriSpeech cuts instead

* leave the old code in comments for reference
2024-01-08 15:09:21 +08:00
zr_jin
b9b56eb879
Minor fixes to the VCTK data prep scripts (#1441)
* Update prepare.sh
2024-01-08 14:28:07 +08:00
Karel Vesely
716b82cc3a
streaming_decode.py, relax the audio range from [-1,+1] to [-10,+10] (#1448)
- some AudioTransform classes produce audio signals out of range [-1,+1]
   - Resample produced 1.0079
   - The range [-10,+10] was chosen to still be able to reliably
     distinguish from the [-32k,+32k] signal...
- this is related to : https://github.com/lhotse-speech/lhotse/issues/1254
2024-01-05 10:21:27 +08:00
Fangjun Kuang
8136ad775b
Use high_freq -400 in computing fbank features. (#1447)
See also https://github.com/k2-fsa/sherpa-onnx/issues/514
2024-01-04 13:59:32 +08:00
zr_jin
f42258caf8
Update compute_fbank_commonvoice_splits.py (#1437) 2023-12-30 13:03:26 +08:00
Fangjun Kuang
140e6381ad
Refactor CI tests for librispeech (#1436) 2023-12-27 13:21:14 +08:00
Fangjun Kuang
db52fe2349
Refactor CI test for aishell (#1435) 2023-12-26 20:29:43 +08:00
Fangjun Kuang
835a92eba5
Add doc about how to use the CPU-only docker images (#1432) 2023-12-25 20:23:56 +08:00
Ali Haznedaroğlu
ddd7131317
Update TTS export-onnx.py scripts for handling variable token counts (#1430) 2023-12-25 19:44:07 +08:00
Fangjun Kuang
c855a58cfd
Generate the dependency matrix by code for GitHub Actions (#1431) 2023-12-25 19:41:09 +08:00
Fangjun Kuang
e5bb1ae86c
Use the CPU docker in CI to simplify the test code (#1427) 2023-12-24 13:40:33 +08:00
Fangjun Kuang
79a42148db
Add CI test to cover zipformer/train.py (#1424) 2023-12-23 00:38:36 +08:00
TianHao Zhang
702d4f5914
Update prepare.sh (#1422)
fix the bug in line 251:
1、 del the additional blank
2、correct the spell error of "new_vocab_size"
2023-12-21 14:42:33 +08:00
zr_jin
10a234709c
bugs fixed (#1416) 2023-12-14 11:26:37 +08:00
Fangjun Kuang
f85f0252a9
Add greedy search for streaming zipformer CTC. (#1415) 2023-12-13 17:34:12 +08:00
zr_jin
d0da509055
Support ONNX export for Streaming CTC Encoder (#1413)
* Create export-onnx-streaming-ctc.py

* doc_str updated

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-12-13 10:33:28 +08:00
Fangjun Kuang
9e9fe7954d
Upload gigaspeech zipformer models in CI (#1412) 2023-12-12 18:57:04 +08:00
Fangjun Kuang
20a82c9abf
first commit (#1411) 2023-12-12 18:13:26 +08:00
Fangjun Kuang
b0f70c9d04
Fix torch.jit.script() export for pruned_transducer_stateless2 (#1410) 2023-12-10 11:38:39 +08:00
zr_jin
df56aff31e
minor fixes to the vits onnx exportation scripts (#1408) 2023-12-08 21:11:31 +08:00
Fangjun Kuang
e9ec827de7
Rename zipformer2 to zipformer_for_ncnn_export_only to avoid confusion. (#1407) 2023-12-08 14:29:24 +08:00
zr_jin
bda72f86ff
minor adjustments to the VITS recipes for onnx runtime (#1405) 2023-12-08 06:32:40 +08:00
Yifan Yang
b87ed26c09
Normalize dockerfile (#1400) 2023-12-06 14:33:45 +08:00
zr_jin
735fb9a73d
A TTS recipe VITS on VCTK dataset (#1380)
* init

* isort formatted

* minor updates

* Create shared

* Update prepare_tokens_vctk.py

* Update prepare_tokens_vctk.py

* Update prepare_tokens_vctk.py

* Update prepare.sh

* updated

* Update train.py

* Update train.py

* Update tts_datamodule.py

* Update train.py

* Update train.py

* Update train.py

* Update train.py

* Update train.py

* Update train.py

* fixed formatting issue

* Update infer.py

* removed redundant files

* Create monotonic_align

* removed redundant files

* created symlinks

* Update prepare.sh

* minor adjustments

* Create requirements_tts.txt

* Update requirements_tts.txt

added version constraints

* Update infer.py

* Update infer.py

* Update infer.py

* updated docs

* Update export-onnx.py

* Update export-onnx.py

* Update test_onnx.py

* updated requirements.txt

* Update test_onnx.py

* Update test_onnx.py

* docs updated

* docs fixed

* minor updates
2023-12-06 09:59:19 +08:00
LoganLiu66
f08af2fa22
fix initial states (#1398)
Co-authored-by: liujiawang02 <liujiawang02@baidu.com>
2023-12-04 22:29:42 +08:00
Zengwei Yao
0622dea30d
Add a TTS recipe VITS on LJSpeech dataset (#1372)
* first commit

* replace phonimizer with g2p

* use Conformer as text encoder

* modify training script, clean codes

* rename directory

* convert text to tokens in data preparation stage

* fix tts_datamodule.py

* support onnx export and testing the exported onnx model

* add doc

* add README.md

* fix style
2023-11-29 21:28:38 +08:00
zr_jin
ae67f75e9c
a bilingual recipe similar to the multi-zh_hans (#1265) 2023-11-26 10:04:15 +08:00
Wei Kang
238b45bea8
Libriheavy recipe (zipformer) (#1261)
* initial commit for libriheavy

* Data prepare pipeline

* Fix train.py

* Fix decode.py

* Add results

* minor fixes

* black

* black

* Incorporate PR https://github.com/k2-fsa/icefall/pull/1269

---------

Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2023-11-23 01:22:57 +08:00
Wei Kang
11d816d174
Add cumstomized score for hotwords (#1385)
* add custom score for each hotword

* Add more comments

* Fix deocde

* fix style

* minor fixes
2023-11-18 18:47:55 +08:00
Fangjun Kuang
666d69b20d
Rename train2.py to avoid confusion (#1386) 2023-11-17 18:12:59 +08:00
Karel Vesely
59c943878f
add the voxpopuli recipe (#1374)
* add the `voxpopuli` recipe

- this is the data preparation
- there is no ASR training and no results

* update the PR#1374 (feedback from @csukuangfj)

- fixing .py headers and docstrings
- removing BUT specific parts of `prepare.sh`
- adding assert `num_jobs >= num_workers` to `compute_fbank.py`
- narrowing list of languages
  (let's limit to ASR sets with transcripts for now)
- added links to `README.md`
- extending `text_from_manifest.py`
2023-11-16 14:38:31 +08:00
zr_jin
6d275ddf9f
fixed broken softlinks (#1381)
* removed broken softlinks

* fixed dependencies

* fixed file permission
2023-11-10 14:45:16 +08:00
lishaojie
1b2e99d374
add the pruned_transducer_stateless7_streaming recipe for commonvoice (#1018)
* add the pruned_transducer_stateless7_streaming recipe for commonvoice

* fix the symlinks

* Update RESULTS.md
2023-11-09 22:07:28 +08:00
zr_jin
231bbcd2b6
Update optim.py (#1366) 2023-11-03 12:06:29 +08:00
wnywbyt
c3bbb32f9e
Update the parameter 'vocab-size' (#1364)
Co-authored-by: wdq <dongqin.wan@desaysv.com>
2023-11-02 20:45:30 +08:00
zr_jin
9e5a5d7839
Incorporate some latest changes to optim.py (#1359)
* init commit

* black formatted

* isort formatted
2023-11-02 16:10:08 +08:00
zr_jin
23913f6afd
Minor refinements for some stale but recently merged PRs (#1354)
* incorporate https://github.com/k2-fsa/icefall/pull/1269

* incorporate https://github.com/k2-fsa/icefall/pull/1301

* black formatted

* incorporate https://github.com/k2-fsa/icefall/pull/1162

* black formatted
2023-10-31 10:28:20 +08:00
Tiance Wang
c970df512b
New recipe: tiny_transducer_ctc (#848)
* initial commit

* update readme

* Update README.md

* change bool to str2bool for arg parser

* run validation only at the end of epoch

* black format

* black format
2023-10-30 12:09:39 +08:00
Himanshu Kumar Mahto
161ab90dfb
Enhancing the contributing.md file (#1351) 2023-10-30 09:07:42 +08:00
Desh Raj
7d56685734
[recipe] LibriSpeech zipformer_ctc (#941)
* merge upstream

* initial commit for zipformer_ctc

* remove unwanted changes

* remove changes to other recipe

* fix zipformer softlink

* fix for JIT export

* add missing file

* fix symbolic links

* update results

* Update RESULTS.md

Address comments from @csukuangfj

---------

Co-authored-by: zr_jin <peter.jin.cn@gmail.com>
2023-10-27 13:38:09 +08:00
Shreyas0410
5cebecf2dc
updated broken link in read.me file (#1342) 2023-10-27 13:36:15 +08:00
zr_jin
ea78b32857
minor fixes (#1345) 2023-10-27 13:35:43 +08:00
hairyputtar
800bf4b6a2
fix more typos (#1340)
* fix more typos

* fix typo

* fix typo

* fix typo
2023-10-27 11:46:28 +08:00
Zengwei Yao
c0a53271e2
Update Zipformer-large result on LibriSpeech (#1343)
* update zipformer-large result on librispeech
2023-10-26 17:35:12 +08:00
zr_jin
770c495484
minor fixes in the CTC decoding code (#1338) 2023-10-25 17:14:17 +08:00
zr_jin
dcbc7a63e1
Update train-rnn-lm.sh (#1337) 2023-10-25 12:50:35 +08:00
zr_jin
1814bbb0e7
typo fixed (#1334) 2023-10-25 00:03:33 +08:00
zr_jin
f82bccfd63
Support CTC decoding for multi-zh_hans recipe (#1313) 2023-10-24 19:04:09 +08:00
zr_jin
d76c3fe472
Migrate zipformer model to other Chinese datasets (#1216)
added zipformer recipe for AISHELL-1
2023-10-24 16:24:46 +08:00
hairyputtar
3fb99400cf
fix typos (#1336)
* fix typo

* fix typo

* Update pruned_transducer_stateless.rst
2023-10-24 15:47:25 +08:00
Fangjun Kuang
4b791ced78
Fix CI tests (#1333) 2023-10-24 10:38:56 +08:00
zr_jin
f9980aa606
minor fixes (#1332) 2023-10-24 08:17:17 +08:00
zr_jin
92ef561ff7
Minor fixes for torch.jit.script support (#1329) 2023-10-24 01:10:50 +08:00
Fangjun Kuang
902dc2364a
Update docker for torch 2.1 (#1326) 2023-10-22 23:25:06 +08:00
Yifan Yang
416852e8a1
Add Zipformer recipe for GigaSpeech (#1254)
Co-authored-by: Yifan Yang <yifanyeung@qq.com>
Co-authored-by: yfy62 <yfy62@d3-hpc-sjtu-test-005.cm.cluster>
2023-10-21 15:36:59 +08:00
Rudra
eef47adee9
fix typo (#1324) 2023-10-19 22:54:43 +08:00
Daniel Povey
973dc1026d
Make diagnostics.py more error-tolerant and have wider range of supported torch versions (#1234) 2023-10-19 22:54:00 +08:00
Karel Vesely
543b4cc1ca
small enhanecements (#1322)
- add extra check of 'x' and 'x_lens' to earlier point in Transducer model
- specify 'utf' encoding when opening text files for writing (recogs,
  errs)
2023-10-19 21:53:31 +08:00
marcoyang1998
ce372cce33
Update documentation to PromptASR (#1321) 2023-10-19 17:24:31 +08:00
Surav Shrestha
36c60b0cf6
fix typos in icefall/utils.py (#1319) 2023-10-19 11:15:18 +08:00
Ikko Eltociear Ashimine
98c5286404
Fix typo in code-style.rst (#1318) 2023-10-19 00:13:50 +08:00
marcoyang1998
52c24df61d
Fix model avg (#1317)
* fix a bug about the model_avg during finetuning by exchanging the order of loading pre-trained model and initializing avg model

* only match the exact module prefix
2023-10-18 17:36:14 +08:00
Erwan Zerhouni
807816fec0
Fix chunk issue for sherpa (#1316) 2023-10-18 16:07:10 +08:00
zr_jin
d2bd0933b1
Compatibility with the latest Lhotse (#1314) 2023-10-17 21:22:32 +08:00
zr_jin
1ef349d120
[WIP] AISHELL-1 pruned transducer stateless7 streaming recipe (#1300)
* `pruned_transudcer_stateless7_streaming` for AISHELL-1

* Update train.py

* Update train2.py

* Update decode.py

* Update RESULTS.md
2023-10-16 16:28:16 +08:00
zr_jin
eeeeef390b
Minor bug fixes and descriptive text for the LibriCSS recipe (#1268) 2023-10-12 10:02:49 -04:00
zr_jin
162ceaf4b3
fixes for data preparation (#1307)
Issue: #1306
2023-10-12 17:05:41 +08:00
zr_jin
855492156a
Update finetune.py (#1304) 2023-10-12 16:48:23 +08:00
Wen Ding
2b3c5d799f
Fix padding issues (#1303) 2023-10-11 16:58:00 +08:00
marcoyang1998
16a2748d6c
PromptASR for contextualized ASR with controllable style (#1250)
* Add PromptASR with BERT as text encoder

* Support using word-list based content prompts for context biasing

* Upload the pretrained models to huggingface

* Add usage example
2023-10-11 14:56:41 +08:00
Fangjun Kuang
cb874e9905
add export-onnx.py for stateless8 (#1302)
* add export-onnx.py for stateless8

* use tokens.txt to replace bpe.model
2023-10-11 12:20:12 +08:00
zr_jin
103d617380
bug fixes (#1301) 2023-10-11 11:04:20 +08:00
zr_jin
0d09a44930
Update train.py (#1299) 2023-10-11 10:06:00 +08:00
Zengwei Yao
9af144c26b
Zipformer update result (#1296)
* update Zipformer results
2023-10-09 23:15:22 +08:00
zr_jin
fefffc02f6
Update optim.py (#1292) 2023-10-09 17:39:23 +08:00
zr_jin
ce08230ade
Update README.md (#1293) 2023-10-07 11:57:30 +08:00
zr_jin
82199b8fe1
Init commit for swbd (#1146) 2023-10-07 11:44:18 +08:00
Fangjun Kuang
109354b6b8
Add CTC HLG decoding for zipformer (#1287) 2023-10-02 14:00:06 +08:00
Fangjun Kuang
f14b673408
Add HLG decoding with OpenFst on CPU for aishell conformer_ctc (#1279) 2023-10-01 13:46:16 +08:00
Fangjun Kuang
48cc41bd83 Fix CI 2023-09-30 22:23:22 +08:00
Dongji Gao
3abc290c11
Add scripts and recipe for BTC/OTC (#1255) 2023-09-29 07:52:46 +08:00
yaguang
8181d19860
check bbpe model exists in advance. (#1277) 2023-09-27 17:35:26 +08:00
yaguang
a5ba1133c4
Compatible with new lhotse versions. (#1278) 2023-09-27 17:33:38 +08:00
Fangjun Kuang
772ee3955b
Support HLG decoding using OpenFst with kaldi decoders (#1275) 2023-09-27 14:49:27 +08:00
Fangjun Kuang
2318c3fbd0
Support CTC decoding on CPU using OpenFst and kaldi decoders. (#1244) 2023-09-26 16:36:19 +08:00
zr_jin
1b565dd251
added softlinks to local dir (#1273) 2023-09-26 15:41:39 +08:00
marcoyang1998
e17f884ace
Fix docs for MVQ (#1272)
* typo fix
2023-09-25 15:36:40 +08:00
marcoyang1998
97f9b9c33b
Add documentation for RNNLM training (#1267)
* add documentation for training an RNNLM
2023-09-25 10:48:50 +08:00
zr_jin
ef5da4824d
formatted the entire LibriSpeech recipe (#1270)
* formatted the entire librispeech recipe

* minor updates
2023-09-24 17:31:01 +08:00
zr_jin
ef658d691e
fixes for init value of diagnostics.TensorDiagnosticOptions (#1269)
* fixes for `diagnostics`

Replace `2 ** 22` with `512` as the default value of `diagnostics.TensorDiagnosticOptions`

also black formatted some scripts

* fixed formatting issues
2023-09-24 17:06:47 +08:00
Fangjun Kuang
34e40a86b3
Fix exporting decoder model to onnx (#1264)
* Use torch.jit.script() to export the decoder model

See also https://github.com/k2-fsa/sherpa-onnx/issues/327
2023-09-22 09:57:15 +08:00
Fangjun Kuang
f5dc957d44
Fix CI tests (#1266) 2023-09-21 21:16:14 +08:00
l2009312042
45d60ef262
Update conformer.py (#1200)
* Update conformer.py
* Update zipformer.py

fix bug in get_dynamic_dropout_rate
2023-09-21 19:41:10 +08:00
zr_jin
bbb03f7962
Update decoder.py (#1262) 2023-09-20 08:15:54 +08:00
Tiance Wang
7e1288af50
fix thchs-30 download command (#1260) 2023-09-19 16:46:36 +08:00
Ikko Eltociear Ashimine
0c564c6c81
Fix typo in README.md (#1257) 2023-09-17 12:25:37 +08:00
zr_jin
565d2c2f5b
Minor fixes to the libricss recipe (#1256) 2023-09-15 02:37:53 +08:00
docterstrange
fba1710622
modify tal_csasr recipe (#1252)
Co-authored-by: zss11 <zss11@d3-hpc-sjtu-test-001.cm.cluster>
2023-09-14 09:58:28 +08:00
zr_jin
7cc2dae940
Fixes to incorporate with the latest Lhotse release (#1249) 2023-09-13 12:39:49 +08:00
zr_jin
0f1bc6f8af
Multi_zh-Hans Recipe (#1238)
* Init commit for recipes trained on multiple zh datasets.

* fbank extraction for thchs30

* added support for aishell1

* added support for aishell-2

* fixes

* fixes

* fixes

* added support for stcmds and primewords

* fixes

* added support for magicdata

script for fbank computation not done yet

* added script for magicdata fbank computation

* file permission fixed

* updated for the wenetspeech recipe

* updated

* Update preprocess_kespeech.py

* updated

* updated

* updated

* updated

* file permission fixed

* updated paths

* fixes

* added support for kespeech dev/test set fbank computation

* fixes for file permission

* refined support for KeSpeech

* added scripts for BPE model training

* updated

* init commit for the multi_zh-cn zipformer recipe

* disable speed perturbation by default

* updated

* updated

* added necessary files for the zipformer recipe

* removed redundant wenetspeech M and S sets

* updates for multi dataset decoding

* refined

* formatting issues fixed

* updated

* minor fixes

* this commit finalize the recipe (hopefully)

* fixed formatting issues

* minor fixes

* updated

* using soft links to reduce redundancy

* minor updates

* using soft links to reduce redundancy

* minor updates

* minor updates

* using soft links to reduce redundancy

* minor updates

* Update README.md

* minor updates

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_magicdata.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_stcmds.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/compute_fbank_primewords.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* minor updates

* minor fixes

* fixed a formatting issue

* Update preprocess_kespeech.py

* Update prepare.sh

* Update egs/multi_zh-hans/ASR/local/compute_fbank_kespeech_splits.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/multi_zh-hans/ASR/local/preprocess_kespeech.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* removed redundant files

* symlinks added

* minor updates

* added CI tests for `multi_zh-hans`

* minor fixes

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

* Update run-multi-zh_hans-zipformer.sh

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-09-13 11:57:05 +08:00
zr_jin
3199058194
enable sclite_mode for swbd scoring (#1239) 2023-09-09 21:25:26 +08:00
zr_jin
49a4b67288
fixed a CI test issue related to python version (#1243) 2023-09-07 19:48:46 +08:00
zr_jin
c912bd65d0
Update run-gigaspeech-pruned-transducer-stateless2-2022-05-12.sh (#1242) 2023-09-07 18:48:27 +08:00
zr_jin
d50a9ea030
doc str fixes (#1241) 2023-09-07 16:34:53 +08:00
zr_jin
9ef8145fa3
minor fixes (#1240) 2023-09-04 17:56:05 +08:00
Desh Raj
8fcadb68a7
Missing definitions in scaling.py added (#1232) 2023-08-31 10:31:05 +08:00
marcoyang1998
3a1ce5963b
Minor fix for documentation (#1229) 2023-08-29 16:39:48 +08:00
Wei Kang
4d7f73ce65
Add context biasing for zipformer recipe (#1204)
* Add context biasing for zipformer recipe

* support context biasing in modified_beam_search_LODR

* fix context graph

* Minor fixes
2023-08-28 19:37:32 +08:00
Fangjun Kuang
fc2df07841
Add icefall tutorials for dummies. (#1220) 2023-08-16 22:32:41 +08:00
Erwan Zerhouni
9a47c08d08
Update padding modified beam search (#1217) 2023-08-14 16:10:50 +02:00
zr_jin
3b5645f594
doc updated (#1214) 2023-08-13 12:37:08 +08:00
Piotr Żelasko
b0e8a40c89
Speed up yesno training to finish in ~10s on CPU (#1215) 2023-08-13 09:50:59 +08:00
Fangjun Kuang
dfccadc6b6
Fix a typo in export_onnx.py for yesno (#1213) 2023-08-12 16:59:06 +08:00
zr_jin
a81396b482
Use tokens.txt to replace bpe.model (#1162) 2023-08-12 16:53:59 +08:00
Fangjun Kuang
d6b28a11a7
Add export script for the yesno recipe. (#1212) 2023-08-11 23:57:00 +08:00
zr_jin
74806b744b
disable speed perturbation by default (#1176)
* disable speed perturbation by default

* minor fixes

* minor updates

* updated bash scripts to incorporate with the `speed-perturb` arg

* minor fixes

1. changed the naming scheme from `speed-perturb` to `perturb-speed` to align with the librispeech recipe

>> 00256a7669/egs/librispeech/ASR/local/compute_fbank_librispeech.py (L65)

2. changed arg type for `perturb-speed` to str2bool
2023-08-10 20:56:02 +08:00
Yifan Yang
00256a7669
Fix decode_stream.py (#1208)
* FIx decode_stream.py

* Update decode_stream.py
2023-08-09 09:40:58 +08:00
marcoyang1998
1ee251c8b3
Decode zipformer with external LMs (#1193)
* update some documentation

* support decoding with LMs in zipformer recipe

* update RESULTS.md
2023-08-03 15:50:35 +08:00
Fangjun Kuang
bcabaf896c
Add doc describing how to run icefall within a docker container (#1194) 2023-08-01 12:28:34 +08:00
Fangjun Kuang
375520d419
Run the yesno recipe with docker in GitHub actions (#1191) 2023-07-28 15:43:08 +08:00
Fangjun Kuang
751bb6ff1a
Add docker image for icefall (#1189) 2023-07-28 10:34:40 +08:00
Fangjun Kuang
19b942c958
Update installation doc. (#1188) 2023-07-27 13:36:46 +08:00
marcoyang1998
3fb0a43170
Fix conflict (#1187)
Resolve conflict
2023-07-27 12:36:05 +08:00
marcoyang1998
625b33e9ad
Update descriptions for different decoding methods with external LMs (#1185)
* add some descriptions

* minor updates
2023-07-27 12:08:20 +08:00
kobenaxie
80d922c158
Update preprocess_commonvoice.py to fix text normalization bug. (#1181) 2023-07-26 16:54:42 +08:00
Fangjun Kuang
1dbbd7759e
Add tests for subsample.py and fix typos (#1180) 2023-07-25 14:46:18 +08:00
zr_jin
4ab7d61008
removed batch_name to fix a KeyError with "uttid" (#1172) 2023-07-15 12:39:32 +08:00
marcoyang1998
5ed6fc0e6d
add sym link (#1170) 2023-07-12 15:37:14 +08:00
Desh Raj
41b16d7838
SURT recipe for AMI and ICSI (#1133)
* merge upstream

* add SURT model and training

* add libricss decoding

* add chunk width randomization

* decode SURT with libricss

* initial commit for zipformer_ctc

* remove unwanted changes

* remove changes to other recipe

* fix zipformer softlink

* fix for JIT export

* add missing file

* fix symbolic links

* update results

* clean commit for SURT recipe

* training libricss surt model

* remove unwanted files

* remove unwanted changes

* remove changes in librispeech

* change some files to symlinks

* remove unwanted changes in utils

* add export script

* add README

* minor fix in README

* add assets for README

* replace some files with symlinks

* remove unused decoding methods

* initial commit for SURT AMI recipe

* fix symlink

* add train + decode scripts

* add missing symlink

* change files to symlink

* change file type
2023-07-08 23:01:51 +08:00
Yifan Yang
ffe816e2a8
Fix blank skip ci test (#1167)
* Fix for ci

* Fix frame_reducer
2023-07-06 23:12:41 +08:00
marcoyang1998
11523c5b89
Shallow fusion & LODR documentation (#1142)
* add shallow fusion documentation

* add documentation for LODR

* upload docs for LM rescoring
2023-07-06 19:11:01 +08:00
Fangjun Kuang
6fd674312c
Fix failed CI tests (#1166) 2023-07-05 10:52:34 +08:00
Fangjun Kuang
130ad0319d
Fix CI test for zipformer CTC (#1165) 2023-07-05 10:38:29 +08:00
Fangjun Kuang
b8a17944e4
Fix zipformer CI test (#1164) 2023-07-05 10:23:35 +08:00
Desh Raj
a4402b88e6
SURT multi-talker ASR recipe (#1126)
* merge upstream

* add SURT model and training

* add libricss decoding

* add chunk width randomization

* decode SURT with libricss

* initial commit for zipformer_ctc

* remove unwanted changes

* remove changes to other recipe

* fix zipformer softlink

* fix for JIT export

* add missing file

* fix symbolic links

* update results

* clean commit for SURT recipe

* training libricss surt model

* remove unwanted files

* remove unwanted changes

* remove changes in librispeech

* change some files to symlinks

* remove unwanted changes in utils

* add export script

* add README

* minor fix in README

* add assets for README

* replace some files with symlinks

* remove unused decoding methods

* fix symlink

* address comments from @csukuangfj
2023-07-04 19:25:58 +08:00
zr_jin
856c0f2a60
fixed default param for an aishell recipe (#1159) 2023-07-04 19:12:39 +08:00
Nickolay V. Shmyrev
eca0202632
Add start-batch option for RNNLM training (#1161)
* Add start-batch option for RNNLM training

* Also set epoch

* Skip batches on load
2023-07-04 10:13:25 +08:00
Fangjun Kuang
9009d028a0
Fix ONNX export for the latest non-streaming zipformer. (#1160) 2023-07-03 23:56:51 +08:00
Fangjun Kuang
c3e23ec8d2
Fix logaddexp for ONNX export (#1158) 2023-07-02 10:30:09 +08:00
MicKot
98d89463f6
zipformer2 logaddexp onnx safe (#1157) 2023-06-30 21:16:40 +08:00
Zengwei Yao
ccd8c624dd
support testing onnx exported model on the test sets (#1150)
* support testing onnx exported model on the test sets

* use token_table instead
2023-06-30 12:05:37 +08:00
Desh Raj
c59c89fc13
Minor fix in tedlium results file (#1153) 2023-06-29 13:09:01 +02:00
Wei Kang
db71b03026
Support int8 quantization in decoder (#1152) 2023-06-29 16:48:59 +08:00
Desh Raj
9c2172c1c4
Zipformer for TedLium (#1125)
* initial commit for zipformer tedlium

* fix unk decoding

* add pretrained model and logs

* update for new AsrModel

* add option for choosing rnnt type

* add results with modified rnnt
2023-06-28 16:43:49 +08:00
Fangjun Kuang
968ebd236b
Fix ONNX export of the latest streaming zipformer model. (#1148) 2023-06-27 14:35:59 +08:00
Wei Kang
219bba1310
zipformer wenetspeech (#1130)
* copy files

* update train.py

* small fixes

* Add decode.py

* Fix dataloader in decode.py

* add blank penalty

* Add blank-penalty to other decoding method

* Minor fixes

* add zipformer2 recipe

* Minor fixes

* Remove pruned7

* export and test models

* Replace bpe with tokens in export.py and pretrain.py

* Minor fixes

* Minor fixes

* Minor fixes

* Fix export

* Update results

* Fix zipformer-ctc

* Fix ci

* Fix ci

* Fix CI

* Fix CI

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-06-26 09:33:18 +08:00
frankyoujian
4d5b8369ae
fix small typo (#1144) 2023-06-21 17:17:19 +08:00
Yifan Yang
d667dc365b
Fix for diagnostic (#1135)
* CTC loss return tensor

* Update model.py
2023-06-16 15:04:41 +08:00
Yifan Yang
0a465794a8
Fix Zipformer (#1132)
* Update model.py

* Update train.py

* Update decoder.py
2023-06-15 17:52:14 +08:00
Fangjun Kuang
947f0614c9
Fix running exported model on GPU. (#1131) 2023-06-15 12:25:15 +08:00
Zengwei Yao
0ad037d076
Add CTC loss option in zipformer recipe (#1111)
* add CTC loss option in zipformer recipe

* add ctc_decode.py

* support CTC model export, add jit_pretrained_ctc.py, pretrained_ctc.py

* update README.md and RESULTS.md

* add CI test
2023-06-14 14:27:29 +08:00
danfu
0cb71ad3bc
add updated zipformer onnx export (#1108)
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-06-12 14:02:23 +08:00
Peter Ross
b4c38d7547
Use symlinks for best epochs (#1123)
* utils: add symlink_or_copyfile

* pruned_transducer_stateless7: use symlinks (when possible) to output best epochs

* Rename function

---------

Co-authored-by: Yifan Yang <64255737+yfyeung@users.noreply.github.com>
2023-06-12 13:51:46 +08:00
Yifan Yang
dca21c2a17
Fix parameters_names in train.py (#1121) 2023-06-08 16:54:05 +08:00
SarahSmitho
3ae47a4940
verify have installed ffmpeg (#1117) 2023-06-07 11:17:38 +08:00
Fangjun Kuang
c0de78d3c0
Add data preparation for the MuST-C speech translation corpus (#1107) 2023-06-05 15:49:41 +08:00
Wei Kang
ba257efbcd
Add Context biasing (#1038)
* Add context biasing for librispeech

* Add context biasing for wenetspeech

* fix bugs

* Implement Aho-Corasick context graph

* fix some bugs

* Fixes to forward_one_step; add draw to context graph

* add output arc; fix black

* Fix wenetspeech tokenizer

* Minor fixes to the decode.py
2023-06-03 21:28:49 +08:00
Yifan Yang
ca60ced213
Fix typo (#1114)
* Fix typo for zipformer

* Fix typo for pruned_transducer_stateless7

* Fix typo for pruned_transducer_stateless7_ctc

* Fix typo for pruned_transducer_stateless7_ctc_bs

* Fix typo for pruned_transducer_stateless7_streaming

* Fix typo for pruned_transducer_stateless7_streaming_multi

* Fix file permissions for pruned_transducer_stateless7_streaming_multi

* Fix typo for pruned_transducer_stateless8

* Fix typo for pruned_transducer_stateless6

* Fix typo for pruned_transducer_stateless5

* Fix typo for pruned_transducer_stateless4

* Fix typo for pruned_transducer_stateless3
2023-06-02 14:12:42 +08:00
Yifan Yang
82f34a2388
Remove multidataset from librispeech/pruned_transducer_stateless7 (#1105)
* Add People's Speech to multidataset

* update

* remove multi from librispeech
2023-06-01 18:45:20 +08:00
Zengwei Yao
7a604057f9
update diagnostics, print limits in Balancer, merge changes from Dan's branch zlm59 (#1109) 2023-06-01 14:24:19 +08:00
Yifan Yang
03853f1ee5
Add peoples_speech (#1101)
* update

* Small fix

* Update egs/peoples_speech/ASR/prepare.sh

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* limit normalize log

* Update egs/peoples_speech/ASR/local/compute_fbank_peoples_speech_valid_test.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update compute_fbank_peoples_speech_splits.py

* Update compute_fbank_peoples_speech_valid_test.py

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-05-31 12:46:17 +08:00
Fangjun Kuang
7b0afbdc16
Remove cur_batch_idx (#1102) 2023-05-30 14:49:54 +08:00
Fangjun Kuang
1aeffa73bc
remove outdated code in train.py (#1096) 2023-05-25 07:47:38 +08:00
Peter Ross
af8907e1ec
Update pre-commit isort package to v5.11.5 (#1095) 2023-05-24 19:57:37 +08:00
Zengwei Yao
6826b076d4
add flops profiler, support for Zipformer encoder and Conformer encoder (#1093)
* add flops profiler, support for Zipformer encoder and Conformer encoder

* support for reworked conformer and old zipformer

* skip black check
2023-05-24 19:10:45 +08:00
Fangjun Kuang
1df71a6b38
add onnx export for stateless2 (#1086) 2023-05-23 16:11:00 +08:00
Fangjun Kuang
ea8b15309f
Add onnx export scripts for wenetspeech recipe. (#1085) 2023-05-23 13:32:14 +08:00
Fangjun Kuang
dbcf0b41db
Fix stateless7 training error (#1082) 2023-05-23 12:52:02 +08:00
marcoyang1998
585e7b224f
Aishell pruned_transducer_stateless7 (#962)
* Add pruned_transducer_stateless7 for Aishell

* update README.md

* update comments and small fixes
2023-05-23 11:04:33 +08:00
Yifan Yang
7c4ff66a3d
Fix yesno Cl test (#1078) 2023-05-22 12:46:43 +08:00
Yifan Yang
90c392b7b3
Add docs for Fine-tune with mux (#1074)
* Update RESULTS.md
2023-05-22 12:39:51 +08:00
Fangjun Kuang
3883e362ad
Fix yesno CI test (#1077) 2023-05-22 12:29:51 +08:00
Zengwei Yao
8070258ec5
fix conv_emformer2, when using right_context_length=0 (#1076) 2023-05-21 20:31:54 +08:00
Zengwei Yao
30fcd16c7d
rm zipformer/__init__.py (#1075) 2023-05-20 23:12:11 +08:00
Zengwei Yao
a7e142b7ff
Support long audios recognition (#980)
* support long file transcription

* rename recipe as long_file_recog

* add docs

* support multi-gpu decoding

* style fix
2023-05-19 20:27:55 +08:00
Zengwei Yao
f18b539fbc
Add the upgraded Zipformer model (#1058)
* add the zipformer codes, copied from branch from_dan_scaled_adam_exp1119

* support model export with torch.jit.script

* update RESULTS.md

* support exporting streaming model with torch.jit.script

* add results of streaming models, with some minor changes

* update README.md

* add CI test

* update k2 version in requirements-ci.txt

* update pyproject.toml
2023-05-19 16:47:59 +08:00
Fangjun Kuang
a5bbfc6f7e
Update doc for exporting to ncnn (#1072) 2023-05-19 16:22:08 +08:00
Fangjun Kuang
ae1949ddcc
Support using the latest master from tencent/ncnn (#1070)
* Support using the latest master from tencent/ncnn

* small fixes
2023-05-18 20:56:58 +08:00
Yifan Yang
562bda91e4
Add adaption recipe for pruned_transducer_stateless7 (#1059)
* Add mux for finetune

* Add comments

* Fix for black

* Update finetune.py
2023-05-17 16:02:27 +08:00
Wei Kang
bccd20d978
Traning with byte level BPE (TAL_CSASR) (#1033)
* Add byte level bpe tal_csasr recipe

* Minor fixes to decoding and exporting

* Fix prepare.sh

* Update results
2023-05-16 12:44:52 +08:00
tomato18463
7a9f40aac5
Update the yesno recipe logs in doc (#1060) 2023-05-15 11:16:53 +08:00
arbs-gpu
30bde4b788
fix rnn_lm/train.py usage (#1055) 2023-05-11 17:37:47 +08:00
PF Luo
44d016e4a7
export score_token interface for onnx-runtime (#1050) 2023-05-10 22:41:07 +08:00
Fangjun Kuang
6c326427a0
Support exporting streaming conformer to ONNX (#1047) 2023-05-10 14:47:37 +08:00
Fangjun Kuang
86b0db6eb9
update installation doc (#1049) 2023-05-09 16:13:21 +08:00
Fangjun Kuang
5b50ffda54
support using mini librispeech in training (#1048)
* support mini librispeech in training

* update onnx export doc
2023-05-09 15:10:06 +08:00
Fangjun Kuang
ebbab37776
Fix broken code in download_lm.py (#1046) 2023-05-08 20:48:17 +08:00
Peter Ross
62c9dd9703
make egs/timit work according to the documentation (#1044)
* prepare.sh: restore working directory after git lfs pull
* set execute permisons on python scripts called by prepare.sh
2023-05-08 19:07:40 +08:00
Yifan Yang
24b50a5bad
Update README.md (#1043)
* Update README.md
2023-05-08 16:59:05 +08:00
Fangjun Kuang
efbb577b88
fix compiling HLG (#1039) 2023-05-07 16:26:13 +08:00
Yifan Yang
98569b2607
Update RESULTS.md (#1036)
* Update RESULTS.md
2023-05-06 17:51:55 +08:00
Wei Kang
80156dda09
Training with byte level BPE (AIShell) (#986)
* copy files from zipformer librispeech

* Add byte bpe training for aishell

* compile LG graph

* Support LG decoding

* Minor fixes

* black

* Minor fixes

* export & fix pretrain.py

* fix black

* Update RESULTS.md

* Fix export.py
2023-05-04 19:16:17 +08:00
PF Luo
61ec3a7a8f
fix export RNNLM onnx model typo (#1029) 2023-04-28 19:53:06 +08:00
Yuanhang Zhang
b0228c536e
Fix typo in librispeech OpenFST-based HLG preparation script (#1028) 2023-04-28 19:52:32 +08:00
PF Luo
298ed4520f
add meta-data embedding_dim to RNNLM onnx-model (#1026) 2023-04-28 16:33:46 +08:00
Fangjun Kuang
2767b9ff11
Support exporting RNNLM to ONNX. (#1014)
* Support exporting RNNLM to ONNX.

* add int8 models

* fix style issues

* Fix EOS padding

* support exporting for streaming ASR
2023-04-27 14:36:36 +08:00
marcoyang1998
45c13e90e4
RNNLM rescore + Low-order density ratio (#1017)
* add rnnlm rescore + LODR

* add LODR in decode.py

* update RESULTS
2023-04-24 15:00:02 +08:00
Yifan Yang
2096e69bda
Use CutSet.mux for multidataset (#1020)
* Use CutSet.mux

* Remove mischange

* Fix for style check
2023-04-23 18:41:44 +08:00
Yifan Yang
d67a49afe4
Add multidataset (#1010)
* Add Common Voice for multidataset

* Add prepare_multidataset.sh

* Add dataset mixing


* Update prepare_multidataset.sh

* Update prepare_giga_speech.sh

* update comments

* Add split and shuffle mechanism

* Add multi-dataset train

* Fix for deleting

* Fix for modifying

* Add comments

* Change type for perturb_speed

* Fix for style check

* Small fix

* Add filter

* Remove warning
2023-04-21 18:09:41 +08:00
marcoyang1998
57d6482a79
Streaming Zipformer with multi-dataset (#984)
* modify train.py

* add right padding option in decode.py

* update RESULTS.md
2023-04-21 15:43:28 +08:00
Wei Kang
0efed1cec5
Fix path in aishell rnnlm training (#1016) 2023-04-20 23:09:31 +08:00
Wei Kang
5c65516e05
Fix aishell rnnlm training command (#1015) 2023-04-20 16:14:16 +08:00
Yifan Yang
81d386ef3e
Add compute_ppl.py and ngram_entropy_pruning.py (#1013) 2023-04-20 12:27:43 +08:00
Wen Ding
78b9dcc936
Support exporting BS Zipformer models to ONNX, used in Triton Server (#1008)
* Support export BS Zipformer models to ONNX in Tritron

* Update copyright

* Update exporting codes for BS zipformer models

* Code format

* Update comments

* Update export_onnx.py

---------

Co-authored-by: Yifan Yang <64255737+yfyeung@users.noreply.github.com>
2023-04-18 17:05:08 +08:00
Yifan Yang
05e7435d0d
Move soft links into proper position (#1007) 2023-04-18 10:11:12 +08:00
Yifan Yang
8838fe0bd2
Zipformer for Common Voice (#997)
* Add soft links in pruned_transducer_stateless7 for CommonVoice

* Add python files

* Update prepare.sh

* Update normalization

* Fix for soft links

* Add some docs

* Add export

* Update egs/commonvoice/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Add export for onnx

---------

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2023-04-17 17:47:25 +08:00
marcoyang1998
34d1b07c3d
Modified beam search with RNNLM rescoring (#1002)
* add RNNLM rescore

* add shallow fusion and lm rescore for streaming zipformer

* minor fix

* update RESULTS.md

* fix yesno workflow, change from ubuntu-18.04 to ubuntu-latest
2023-04-17 16:43:00 +08:00
Fangjun Kuang
e32658e620
Fix torch.jit.script() export for streaming zipformer. (#1005) 2023-04-17 16:13:30 +08:00
Zengwei Yao
7c7d9ab042
add @torch.jit.export for streaming_forward func in Zipformer class (#1004) 2023-04-17 12:03:52 +08:00
Zengwei Yao
5f066d3d53
support decoding and computing RTF on test sets with onnx models (#995)
* support decode and compute RTF on test sets with onnx models

* support onnx export and decode in pruned_transducer_stateless
2023-04-12 19:04:50 +08:00
Yifan Yang
dbf2aa3212
Create preprocess_commonvoice.py (#996) 2023-04-11 21:04:54 +08:00
Yifan Yang
3cb0a0121b
Add Common Voice (#994)
* Add commonvoice

* Add data preparation recipe

* Updata

* update prepare.sh

* Fix for black

* Update prefix with cv-

* 20 ->

* Update compute_fbank_commonvoice_dev_test.py

* Update prepare.sh

* Update compute_fbank_commonvoice_dev_test.py
2023-04-11 20:56:40 +08:00
Yifan Yang
33578cca48
Fix filter_cuts in compute_fbank_librispeech.py (#993) 2023-04-11 11:12:05 +08:00
Yifan Yang
6434c8eadc
Add averaged model && change start from 0 to 1 && fix typo for gigaspeech (#990)
* Add averaged model && change start from 0 to 1 && fix typo

* Update train.py

* Set use-averaged-model False for BC

---------

Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-04-09 20:53:47 +08:00
Zengwei Yao
136aa94d57
remove duplicated lines (#988) 2023-04-06 17:47:33 +08:00
Yifan Yang
c90f57afdb
Remove simulate streaming from stateless8 (#985) 2023-04-04 11:04:00 +08:00
marcoyang1998
d337398d29
Shallow fusion for Aishell (#954)
* add shallow fusion and LODR for aishell

* update RESULTS

* add save by iterations
2023-04-03 16:20:29 +08:00
Yifan Yang
46bf6df62f
Remove simulate streaming from stateless7 (#983)
* Remove simulate streaming from stateless7
2023-04-03 14:55:45 +08:00
Yifan Yang
180c7c2b7a
Add UniqueLexicon for gigaspeech (#982) 2023-04-03 12:39:34 +08:00
Yifan Yang
12a222aa4b
Fix comments on the usage of train.py (#981) 2023-04-02 16:32:43 +08:00
Fangjun Kuang
a632b24c35
Export int8 quantized models for non-streaming Zipformer. (#977)
* Export int8 quantized models for non-streaming Zipformer.

* Delete export-onnx.py

* Export int8 models for other folders
2023-03-31 22:46:19 +08:00
marcoyang1998
c21b6a208b
Add finetuning script for aishell (#974)
* add aishell finetune scripts

* add an example bash script
2023-03-30 17:08:46 +08:00
Zengwei Yao
2a5a75cb56
add option of using full attention for streaming model decoding (#975) 2023-03-30 14:30:13 +08:00
Zengwei Yao
bcc5923ab9
Support batch-wise forced-alignment (#970)
* support batch-wise forced-alignment based on beam search

* add length_norm to HypothesisList.topk()

* Use Hypothesis and HypothesisList instead
2023-03-28 23:24:24 +08:00
PF Luo
15d48e3a6a
fix rnn_lm && transformer_lm import problem (#971) 2023-03-28 19:14:08 +08:00
Fangjun Kuang
35e21a0d2e
Fix torchscript export for aishell (#969) 2023-03-27 14:08:26 +08:00
Fangjun Kuang
8c3ea93fc8
Save meta data to exported ONNX models (#968) 2023-03-27 11:39:29 +08:00
Zengwei Yao
7155769c19
minor fix, remove numel = p.numel() in optim.py (#967) 2023-03-24 15:30:29 +08:00
Peng He
f260a09ed4
remove if-branch at downsample pad in zipformer for onnx-export compatibility (#965) 2023-03-24 14:30:43 +08:00
Wei Kang
d74822d07b
Fix wenetspeech decoding speed (#953) 2023-03-21 21:35:32 +08:00
marcoyang1998
7948624a22
Support fine-tuning (#944)
* support finetune

* add files for decoding giga

* support initializing modules

* add a fine-tune bash script
2023-03-17 13:44:29 +08:00
Jason's Lab
6196b4a407
Add char-based language model training process for aishell. (#945)
* Add char-based language model training process for aishell.

Add soft link from librispeech/ASR/local/sort_lm_training_data.py to aishell/ASR/local/

---------

Co-authored-by: lichao <www.563042811@qq.com>
2023-03-16 09:52:11 +08:00
Yifan Yang
a48812ddb3
Ban the test_rnn.py in ci-test (#949) 2023-03-15 22:02:20 +08:00
Yifan Yang
cad6735e07
Modify make_pad_mask to support TensorRT (#943)
* Modify make_pad_mask to support TensorRT

* Fix for test
2023-03-10 19:28:59 +08:00
marcoyang1998
9ddd811925
Fix padding_idx (#942)
* fix padding_idx

* update RESULTS.md
2023-03-10 14:37:28 +08:00
Yifan Yang
28af269e5e
Fix for workflow (#934) 2023-03-09 17:38:15 +08:00
Fangjun Kuang
f5de2e90c6
Fix style issues. (#937) 2023-03-08 22:56:04 +08:00
pehonnet
07243d136a
remove key from result filename (#936)
Co-authored-by: pe-honnet <pe.honnet@telepathy.ai>
2023-03-08 21:06:07 +08:00
Fangjun Kuang
8aaa9761e4
Add doc about exporting streaming zipformer to sherpa-ncnn (#927) 2023-02-27 21:23:04 +08:00
Fangjun Kuang
b7c85968ae
Use standard apache 2.0 license (#919) 2023-02-22 11:15:58 +08:00
marcoyang1998
c51e6c5b9c
fix typo (#916) 2023-02-20 19:04:57 +08:00
nihui
4626c60c74
fix typo (#915) 2023-02-17 15:38:08 +08:00
Fangjun Kuang
52d7cdd1a6
Update doc about exporting LSTM models to ncnn (#914) 2023-02-17 12:50:13 +08:00
Fangjun Kuang
c01175679e
Add CI test for exporting csj pretrained zipformer to ncnn (#913) 2023-02-16 21:09:05 +08:00
Fangjun Kuang
6d7a55904c
export script to ncnn for csj (#912) 2023-02-16 19:47:54 +08:00
Zengwei Yao
4e832fa6b0
fix reduction conformer_ctc3/train.py (#908) 2023-02-14 20:45:38 +08:00
Fangjun Kuang
c5e687ddf5
Export streaming zipformer to ncnn (#906) 2023-02-13 23:41:43 +08:00
Teo Wen Shen
e63a8c27f8
CSJ pruned_transducer_stateless7_streaming (#892)
* update manifest stats

* update transcript configs

* lang_char and compute_fbanks

* save cuts in fbank_dir

* add core codes

* update decode.py

* Create local/utils

* tidy up

* parse raw in prepare_lang_char.py

* update manifest stats

* update transcript configs

* lang_char and compute_fbanks

* save cuts in fbank_dir

* add core codes

* update decode.py

* Create local/utils

* tidy up

* parse raw in prepare_lang_char.py

* working train

* Add compare_cer_transcript.py

* fix tokenizer decode, allow d2f only

* comment cleanup

* add export files and READMEs

* reword average column

* fix comments

* Update new results
2023-02-13 22:19:50 +08:00
Zengwei Yao
25ee50e27c
add ctc-greedy-search with timestamps (#905) 2023-02-13 19:45:09 +08:00
Desh Raj
6a8b649e56
Add small streaming Zipformer transducer model (#903) 2023-02-13 15:53:28 +08:00
Yifan Yang
c34ee67691
Update generate_model_from_checkpoint.py (#901) 2023-02-13 14:05:38 +08:00
Fangjun Kuang
c102e7fbf0
more fixes for lstm3 to support exporting to ncnn (#902) 2023-02-13 12:16:43 +08:00
Fangjun Kuang
48c2c22dbe
Fix export to ncnn for lstm3 (#900) 2023-02-13 11:44:25 +08:00
KajiMaCN
57604aac34
fix tal_csasr data pre-processing (#898) 2023-02-10 21:28:19 +08:00
xiabingquan
cba6ecc1d1
Update README.md (#894) 2023-02-09 23:54:45 +08:00
emilyluj
59ac8bfc70
fix mmi graph compiler bug. (#895) 2023-02-09 18:32:03 +08:00
Yifan Yang
5cd1636cb3
Fix a bug in decode.py (#893)
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-02-09 12:12:23 +08:00
Fangjun Kuang
e916027bfe
Fix doc typos for onnx export (#891) 2023-02-09 10:33:40 +08:00
Karel Vesely
35e5a2475c
Librispeech, validate_manifest.py (#890) 2023-02-09 07:57:02 +08:00
Fangjun Kuang
2b995639b7
Add ONNX support for Zipformer and ConvEmformer (#884) 2023-02-09 00:02:38 +08:00
Zengwei Yao
af735eb75b
Get alignments using lhotse workflows align-with-torchaudio (#888)
* add lhotse workflow align-with-torchaudio

* modify related decode.py files
2023-02-08 21:54:35 +08:00
Zengwei Yao
d12e6f098c
Get (start, end) timestamps for CTC models (#876)
* parse timestamps and texts for BPE-based models

* parse timestamps (frame indexes) and texts for other cases

* add test functions

* add parse_fsa_timestamps_and_texts function, test in conformer_ctc3/decode.py

* calculate symbol delay for (start, end) timestamps
2023-02-07 21:43:16 +08:00
Fangjun Kuang
7ae03f6c88
Add onnx export support for pruned_transducer_stateless5 (#883) 2023-02-07 17:47:08 +08:00
Yifan Yang
ffbf6d9199
Add generate_averaged_model.py (#882) 2023-02-07 16:19:08 +08:00
Fangjun Kuang
8d3810e289
Simplify ONNX export (#881)
* Simplify ONNX export

* Fix ONNX CI tests
2023-02-07 15:01:59 +08:00
Fangjun Kuang
52f3a747be
Refactor onnx export for streaming zipformer (#879) 2023-02-07 12:12:26 +08:00
Zengwei Yao
5a05b95730
add params.hlg_scale (#880) 2023-02-06 23:21:46 +08:00
Yifan Yang
caf23546ed
No more T < S after frame_reducer (#875)
* No more T < S after frame_reducer

* Fix for style check

* Adjust the permissions

* Add support for inference to frame_reducer

* Fix for flake8 check

---------

Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-02-06 12:17:45 +08:00
Yuekai Zhang
bf5f0342a2
Add streaming onnx export for zipformer (#831)
* add streaming onnx export for zipformer

* update triton support

* add comments

* add ci test

* add onnxmltools for fp16 onnx export
2023-02-06 10:37:07 +08:00
Yifan Yang
029c8566e4
Small fix for frame_reducer.py (#871) 2023-02-03 17:49:54 +08:00
Yifan Yang
bffce413f0
Fix filename ctc_guild_decode_bs.py -> ctc_guide_decode_bs.py (#870)
* fix filename ctc_guild_decode_bs.py -> ctc_guide_decode_bs.py

---------

Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-02-03 12:32:06 +08:00
Zengwei Yao
1e6d6f8160
shuffle full Librispeech for zipformer recipes (#869)
* shuffle libri
2023-02-03 11:54:57 +08:00
Yifan Yang
e36ea89112
update result.md for pruned_transducer_stateless7_ctc_bs (#865) 2023-02-01 21:04:56 +08:00
Yifan Yang
d8234e199c
Add export to ONNX for Zipformer+CTC using blank skip (#861)
* Add export to ONNX for Zipformer+CTC using blank skip

---------

Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-01-31 15:57:03 +08:00
BuaaAlban
e9019511eb
Fix bug in streaming_conformer_ctc egs (#862)
* Update train.py

Fix transducer lstm egs bug as mentioned in issue 579

* Update train.py

fix dataloader bug
2023-01-31 15:19:50 +08:00
Yifan Yang
e277e31e37
update huggingface link of zipformer_ctc_blankskip.rst (#858)
* update huggingface link

* update link

---------

Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-01-29 15:35:36 +08:00
Meng Wei
74a2069f94
fix expired links (#856) 2023-01-28 14:43:47 +08:00
Teo Wen Shen
1ce2bc1ee0
edit comments (#852) 2023-01-28 13:47:21 +08:00
Zengwei Yao
6b1ab71dc9
hardcode --filter-uneven-sized-batch (#854) 2023-01-27 21:24:12 +08:00
Wei Kang
f5ff7a18eb
Fix the unclear description for streaming model (#849) 2023-01-17 11:28:59 +08:00
Fangjun Kuang
0af3e7beda
fix export for stateless4 (#844) 2023-01-16 20:26:36 +08:00
Zengwei Yao
2a463a420d
Filter uneven-sized batch (#843)
* add filter_uneven_sized_batch fucntion

* set --filter-uneven-sized-batch=True as default
2023-01-16 20:15:35 +08:00
Fangjun Kuang
5c8e9628cc
update faq for libpython3.10.so not found (#838) 2023-01-13 15:21:29 +08:00
Fangjun Kuang
958dbb3a1d
add doc for int8 quantization with sherpa-ncnn (#832)
* add doc for int8 quantization with sherpa-ncnn

* typo fixes
2023-01-11 20:29:36 +08:00
marcoyang1998
142420b3af
Add docs for distillation (#812)
* add README to docs

* update documents for distillation

* upload png files
2023-01-11 16:45:24 +08:00
Fangjun Kuang
8582b6e41a
Add doc about converting conv-emformer to sherpa-ncnn (#830) 2023-01-11 15:34:30 +08:00
Fangjun Kuang
c05f5d76df
fix decoding for ncnn (#828) 2023-01-10 20:52:13 +08:00
Fangjun Kuang
fcffa593f0
Add FAQs to doc (#827)
* Add FAQs

* small fixes
2023-01-10 15:38:33 +08:00
marcoyang1998
42cc10117e
Fix ncnn install (#824)
* add README to docs

* fix ncnn installation
2023-01-09 15:08:39 +08:00
Fangjun Kuang
9453eb1c70
Fix doc for building ncnn (#822) 2023-01-06 17:00:27 +08:00
kobenaxie
9a9c5a0f9b
remove unused codes. (#821) 2023-01-06 11:16:22 +08:00
Yifan Yang
b9626f2e06
fix typo for ctc-decode.py (#815)
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2023-01-05 17:18:43 +08:00
Fangjun Kuang
8642dbc0bd
Fix setup_dist (#806) 2023-01-04 12:21:19 +08:00
Yunusemre
0f26edfde9
Add Zipformer Onnx Support (#778)
* add export script

* add zipformer onnx pretrained script

* add onnx zipformer test

* fix style

* add zipformer onnx to workflow

* replace is_in_onnx_export with is_tracing

* add github.event.label.name == 'onnx'

* add is_tracing to necessary conditions

* fix pooling_mask

* add onnx_check

* add onnx_check to scripts

* add is_tracing to scaling.py
2023-01-03 16:59:44 +08:00
marcoyang1998
80cce141b4
Full libri fix manifest (#804)
* modify the name of the directory of vq manifest

* fix missing manifest in full libri training
2023-01-03 15:40:53 +08:00
Daniil
2fd970b682
not removing result_dir in tedlium conformer ctc2 + add lm stem to compile_hlg_using_openfst.py + add MASTER_ADDR to be prvided to setup_dist (#801) 2023-01-02 08:08:32 +08:00
Zengwei Yao
67ae5fdf2b
Doc streaming zipformer (#798)
* add doc for streaming_zipformer

* update README.md
2022-12-30 15:21:18 +08:00
behnamasefi
a54b748a02
check for utterance len (#795)
Co-authored-by: behnam <basefisaray@roku.com>
2022-12-30 11:06:09 +08:00
Zengwei Yao
d167aad4ab
Add streaming zipformer (#787)
* add streaming zipformer codes

* add test_model.py

* add export.py, pretrained.py, jit_pretrained.py

* add cached_len for pooling module

* add jit_trace_export.py and jit_trace_pretrained.py

* fix bug in jit.trace

* update RESULTS.md

* add CI test

* minor fix in pruned_transducer_stateless7/zipformer.py

* update README.md
2022-12-30 10:52:18 +08:00
marcoyang1998
aa0fe4e4ac
Fix typos in RESULTS.md (#797) 2022-12-29 11:54:42 +08:00
marcoyang1998
1f0408b103
Support Transformer LM (#750)
* support transformer LM

* show number of parameters during training

* update docstring

* testing files for ppl calculation

* add lm wrampper for rnn and transformer LM

* apply lm wrapper in lm shallow fusion

* small updates

* update decode.py to support LM fusion and LODR

* add export.py

* update CI and workflow

* update decoding results

* fix CI

* remove transformer LM from CI test
2022-12-29 10:53:36 +08:00
Yuekai Zhang
3c54333b06
fix bug (#796) 2022-12-28 11:20:38 +08:00
marcoyang1998
05dfd5e630
Fix distillation with HuBERT (#790)
* update vq huggingface url

* remove hard lhotse version requirement

* resolve ID mismatch

* small fixes


* Update egs/librispeech/ASR/pruned_transducer_stateless6/vq_utils.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* update version check

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-12-27 15:26:11 +08:00
Yifan Yang
a24a1cbfa9
small fix for zipformer_ctc_blankskip.rst (#792)
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2022-12-27 15:06:53 +08:00
Fangjun Kuang
88b7895adf
fix librispeech.py in multi-dataset setup (#791) 2022-12-27 13:59:55 +08:00
Fangjun Kuang
dfbcf606e7
small fixes to prepare.sh (#789) 2022-12-27 09:25:42 +08:00
Yifan Yang
4e249da2c4
Add zipformer_ctc_blankskip.rst (#784)
* Add zipformer_ctc_blankskip.rst

* typo fix for zipformer_mmi.rst

* fix warning

* Update docs/source/recipes/Non-streaming-ASR/librispeech/zipformer_ctc_blankskip.rst

Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-12-26 14:30:20 +08:00
Yifan Yang
59eb465b3c
optimize frame_reducer.py (#783)
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2022-12-23 17:55:36 +08:00
BuaaAlban
7eb2d0edb6
Update train.py (#773)
Fix transducer lstm egs bug as mentioned in issue 579
2022-12-23 11:38:22 +08:00
Yifan Yang
070c77e724
Add Blankskip to Zipformer+CTC (#730)
* init files

* add ctc as auxiliary loss and ctc_decode.py

* tuning the scalar of HLG score for 1best, nbest and nbest-oracle

* rename to pruned_transducer_stateless7_ctc

* fix doc

* fix bug, recover the hlg scores

* modify ctc_decode.py, move out the hlg scale

* fix hlg_scale

* add export.py and pretrained.py, and so on

* upload files, update README.md and RESULTS.md

* add CI test

* update .gitignore

* create symlinks

* Add Blank Skip to Zipformer+CTC

* Add warmup to blank skip

* Add warmup to blank skip

* Add __init__.py

* Add parameters_names to Adam

* Add warmup to blank skip

* Modify frame_reducer

* Modify frame_reducer

* Add Blank Skip to decode.

* Add ctc_decode.py

* Add blank skip to Zipformer+CTC

* process conflict

* process conflict

* modify ctc_guild_decode_bk.py

* modify Lconv

* produce the conflict

* Add export.py

* finish export

* fix for running black

* Add ci test

* Add ci-test

* chmod

* chmod

* fix bug for ci-test

* fix bug for ci-test

* fix bug for ci-test

* rename the dirname

* rename the dirname

* change dirname

* change dirname

* fix notes

* add pretrained.py

* add pretrained.py

* add pretrained.py

* add pretrained.py

* add pretrained.py

* add pretrained.py

* fix

* fix

* fix

* finished

* add the Copyright info and notes

Co-authored-by: Zengwei Yao <yaozengwei@outlook.com>
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2022-12-21 17:41:31 +08:00
Zengwei Yao
65d7192dca
Fix zipformer attn_output_weights (#774)
* fix attn_output_weights

* remove in-place op
2022-12-19 20:10:39 +08:00
Zengwei Yao
fbc1d3b194
fix src_key_padding_mask in DownsampledZipformerEncoder (#768) 2022-12-17 22:03:13 +08:00
kobenaxie
6d659f423d
delete duplicate line for encoder initial state (#765) 2022-12-15 20:42:07 +08:00
Wei Kang
ad475ec10d
Add documents for pruned_transducer_stateless (#526)
* begin to add documents for pruned_transducer_stateless

* Move lstm docs to Streaming folder

* Add documents for pruned transducer stateless models

* Move zipformer mmi to non-streaming recipe

* Add more docs for streaming decoding

* Fix typo
2022-12-15 19:07:28 +08:00
Fangjun Kuang
fbc8894804
Add comment for compile_hlg_using_openfst.py (#762) 2022-12-14 13:47:23 +08:00
Daniil
b293db4baf
Tedlium3 conformer ctc2 (#696)
* modify preparation

* small refacor

* add tedlium3 conformer_ctc2

* modify decode

* filter unk in decode

* add scaling converter

* address comments

* fix lambda function lhotse

* add implicit manifest shuffle

* refactor ctc_greedy_search

* import model arguments from train.py

* style fix

* fix ci test and last style issues

* update RESULTS

* fix RESULTS numbers

* fix label smoothing loss

* update model parameters number in RESULTS
2022-12-13 16:13:26 +08:00
Zengwei Yao
0470bbae66
minor fix for zipformer recipe (#758)
* minor fix

* add CI test
2022-12-13 15:47:30 +08:00
Zengwei Yao
b25c234c51
Add Zipformer-MMI (#746)
* Minor fix to conformer-mmi

* Minor fixes

* Fix decode.py

* add training files

* train with ctc warmup

* add pruned_transducer_stateless7_mmi

* add zipformer_mmi/mmi_decode.py, using HP as decoding graph

* add mmi_decode.py

* remove pruned_transducer_stateless7_mmi

* rename zipformer_mmi/train_with_ctc.py as zipformer_mmi/train.py

* remove unused method

* rename mmi_decode.py

* add export.py pretrained.py jit_pretrained.py ...

* add RESULTS.md

* add CI test

* add docs

* add README.md

Co-authored-by: pkufool <wkang.pku@gmail.com>
2022-12-11 21:30:39 +08:00
wzy
e83409cbe5
Filter the training data of T < S for Wenet train recipe (#753)
* filter the case of T <  S  for training data

* fix style issues

* fix style issues

* fix style issues

Co-authored-by: 张云斌 <zhangyunbin@MacBook-Air.local>
2022-12-11 20:16:10 +08:00
Yifan Yang
02c18ba4b2
rm the dup line of Zipformer.py (#755)
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2022-12-10 19:34:19 +08:00
Desh Raj
c4aaf3ea3b
Add AliMeeting multi-condition training recipe (#751)
* add AliMeeting multi-domain recipe

* convert scripts to symbolic links
2022-12-10 18:15:23 +08:00
Yifan Yang
a0cf85343d
fix for memory usage in pruned_transducer_stateless7/scaling.py (#752)
Co-authored-by: yifanyang <yifanyeung@yifanyangs-MacBook-Pro.local>
2022-12-09 19:23:11 +08:00
Fangjun Kuang
4501821fd9
Support using OpenFst to compile HLG. (#606)
* Support using OpenFst to compile HLG.

* Fix style issues
2022-12-09 16:46:44 +08:00
armusc
d65fe17d27
Update train.py with parameters_names as required by optimizer initialization (#742)
* Update train.py
2022-12-08 20:21:51 +08:00
huangruizhe
0e325c8782
Fixed rnn_lm model.py (#738) 2022-12-07 15:43:26 +08:00
Ali Haznedaroğlu
10472e7ffc
Update prepare.sh (#737) 2022-12-07 08:22:50 +08:00
Fangjun Kuang
f13cf61b05
Convert conv-emformer to ncnn (#717)
* Export conv-emformer via torch.jit.trace()
2022-12-06 16:34:27 +08:00
Cesc
be6e08f69a
fix wenet stateless5 jit export error (#735) 2022-12-05 23:35:10 +08:00
Fangjun Kuang
bd7fa2253d
Update the manifest statistics of the L subset of wenetspeech (#731) 2022-12-04 20:27:45 +08:00
Wei Kang
c25c8c6ad1
Add need_repeat_flag in phone based ctc graph compiler (#727)
* Fix is_repeat_token in icefall

* Fix phone based recipe

* Update egs/librispeech/ASR/conformer_ctc3/train.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Fix black

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-12-04 17:20:17 +08:00
Senyan Li
e6a6727012
Add Tibetan Amdo dialect xbmu_amdo31 in egs (#706)
* add egs/xbmu_amdo31

* fix xbmu_amdo31/ASR/pruned_transducer_stateless5/train.py

* fix xbmu_amdo31/ASR/pruned_transducer_stateless5/asr_datamodule.py

* fix xbmu_amdo31/ASR/prepare.sh

* add RESULTS.md and README.md

* dix pruned_transducer_stateless5 decode.py

* add transducer stateless7

* fix transducer_stateless7

* fix RESULTS.md error

* Add pruned_transducer_stateless7 validation set results
2022-12-03 23:50:49 +08:00
Zengwei Yao
8eb4b9d96d
Combining rnnt loss and k2-ctc loss for Dan's Zipformer (#683)
* init files

* add ctc as auxiliary loss and ctc_decode.py

* tuning the scalar of HLG score for 1best, nbest and nbest-oracle

* rename to pruned_transducer_stateless7_ctc

* fix doc

* fix bug, recover the hlg scores

* modify ctc_decode.py, move out the hlg scale

* fix hlg_scale

* add export.py and pretrained.py, and so on

* upload files, update README.md and RESULTS.md

* add CI test
2022-12-03 19:01:10 +08:00
Weiji Zhuang
7700ddcb38
update multidataset zipformer results (#728) 2022-12-02 17:40:42 +08:00
Amir Hussein
6f71981667
MGB2 (#396)
* mgb2

* mgb2

* adding pruned transducer stateless to mgb2

* update display_manifest_statistics.py

* .

* stateless transducer MGB-2

* Update README.md

* Update RESULTS.md

* Update prepare_lang_bpe.py

* Update asr_datamodule.py

* .nfs removed

* Adding symlink

* .

* resolving conflicts

* Update .gitignore

* black formatting

* Update compile_hlg.py

* Update compute_fbank_musan.py

* Update convert_transcript_words_to_tokens.py

* Update download_lm.py

* Update generate_unique_lexicon.py

* adding simlinks

* fixing symbolic links
2022-12-02 10:58:34 +08:00
Fangjun Kuang
6533f359c9
Fix CI (#726)
* Fix CI

* Disable shuffle for yesno.

See https://github.com/k2-fsa/icefall/issues/197
2022-12-02 10:53:06 +08:00
Fangjun Kuang
04c9fc9c9f
Fix for older versions of k2 (#725) 2022-12-02 09:18:28 +08:00
Fangjun Kuang
2bca7032af
Update RNNLM training scripts (#720)
* Update RNNLM training scripts

* Fix a typo

* Fix CI
2022-12-01 15:57:43 +08:00
Fangjun Kuang
556c63fbb7
Describe how to fix segfault in doc (#719) 2022-12-01 08:58:18 +08:00
marcoyang1998
4b5bc480e8
Add low-order density ratio in RNNLM shallow fusion (#678)
* Support LODR in RNNLM shallow fusion

* fix style

* fix code style

* update workflow and CI

* update results

* propagate changes to stateless3

* add decoding results for stateless3+giga

* fix CI
2022-11-30 17:26:05 +08:00
Daniel Povey
1d5c03f85a
Merge pull request #705 from glynpu/improve_diagnostic
[ready]show dominant parameters
2022-11-29 20:00:52 +08:00
Zengwei Yao
ece728d895
Apply delay penalty on k2 ctc loss (#669)
* add init files

* fix bug, apply delay penalty

* fix decoding code and getting timestamps

* add option applying delay penalty on ctc log-prob

* fix bug of streaming decoding

* minor change for bpe-based case

* add test_model.py

* add README.md

* add CI
2022-11-28 22:34:02 +08:00
Guo Liyong
4fee3e7f1e impove comment 2022-11-28 17:33:52 +08:00
huangruizhe
6693d907d3
shuffle full Librispeech data (#574)
* shuffled full/partial librispeech data

* fixed the code style issue

* Shuffled full librispeech data off-line

* Fixed style, addressed comments, and removed redandunt codes

* Used the suggested version of black

* Propagated the changes to other folders for librispeech (except
conformer_mmi and streaming_conformer_ctc)
2022-11-27 11:26:09 +08:00
Guo Liyong
9cf79cac3f message formatting 2022-11-26 22:39:03 +08:00
abb128
61032e70e0
Fix exception in find_checkpoints (#668) 2022-11-26 10:10:37 +08:00
Desh Raj
db75627e92
[recipe] AMI Zipformer transducer (#698)
* remove unnecessary changes

* add AMI prepare scripts

* add zipformer scripts for AMI

* added logs and pretrained model

* minor fix

* remove unwanted changes

* fix missing link

* make suggested changes

* update results
2022-11-26 10:00:45 +08:00
Guo Liyong
89c3982a07 show dominant parameters 2022-11-26 00:50:21 +08:00
Senyan Li
4c636c2cff
fix librispeech ASR pruned_transducer_stateless5 export (#704) 2022-11-25 14:39:56 +08:00
marcoyang1998
e5d942696a
Merge pull request #701 from marcoyang1998/fix_segfault
Fix segmentation fault
2022-11-22 11:45:03 +08:00
marcoyang
53454701cb fix segmentation fault 2022-11-22 11:39:21 +08:00
Fangjun Kuang
500792d0f1
Merge pull request #692 from desh2608/style_change_2.0
Style change 2.0
2022-11-20 05:54:19 +08:00
Desh Raj
fbe1e35b74 update code style docs 2022-11-18 09:24:07 -05:00
Desh Raj
349dae3503 add revision commit to git blame ignore 2022-11-17 14:18:50 -05:00
Desh Raj
d31db01037 manual correction of black formatting 2022-11-17 14:18:05 -05:00
Desh Raj
18e3a7a9d5 add git blame ignore file 2022-11-17 09:43:48 -05:00
Desh Raj
107df3b115 apply black on all files 2022-11-17 09:42:17 -05:00
Fangjun Kuang
b3920e5ab5
Merge pull request #691 from k2-fsa/revert-690-style_change
Revert "Apply new Black style changes"
2022-11-17 20:20:16 +08:00
Fangjun Kuang
60317120ca
Revert "Apply new Black style changes" 2022-11-17 20:19:32 +08:00
Fangjun Kuang
a7fbb18bdc
Merge pull request #690 from desh2608/style_change
Apply new Black style changes
2022-11-17 09:29:58 +08:00
Desh Raj
cad8f6aca4 merge upstream 2022-11-16 19:50:43 -05:00
Daniil
fca796cc2c
Small code refactoring (#687) 2022-11-17 06:55:53 +08:00
Desh Raj
7a8e8e735d change click version in pre-commit 2022-11-16 14:43:21 -05:00
Desh Raj
d89766d85d add git blame ignore revs file 2022-11-16 13:10:55 -05:00
Desh Raj
d110b04ad3 apply new black formatting to all files 2022-11-16 13:06:43 -05:00
Fangjun Kuang
aa7bae1ecd
fix decode.py for conformer_ctc in gigaspeech (#688) 2022-11-16 19:58:28 +08:00
Desh Raj
c8ce243255
Zipformer output length (#686)
* add assertion for output length

* add comment in filter_cuts

* add length filter to Zipformer recipes
2022-11-16 11:29:45 +08:00
Fangjun Kuang
855c76655b
Add zipformer from Dan using multi-dataset setup (#675)
* Bug fix

* Change subsamplling factor from 1 to 2

* Implement AttentionCombine as replacement for RandomCombine

* Decrease random_prob from 0.5 to 0.333

* Add print statement

* Apply single_prob mask, so sometimes we just get one layer as output.

* Introduce feature mask per frame

* Include changes from Liyong about padding conformer module.

* Reduce single_prob from 0.5 to 0.25

* Reduce feature_mask_dropout_prob from 0.25 to 0.15.

* Remove dropout from inside ConformerEncoderLayer, for adding to residuals

* Increase feature_mask_dropout_prob from 0.15 to 0.2.

* Swap random_prob and single_prob, to reduce prob of being randomized.

* Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change.

* Randomize order of some modules

* Bug fix

* Stop backprop bug

* Introduce a scale dependent on the masking value

* Implement efficient layer dropout

* Simplify the learned scaling factor on the modules

* Compute valid loss on batch 0.

* Make the scaling factors more global and the randomness of dropout more random

* Bug fix

* Introduce offset in layerdrop_scaleS

* Remove final combination; implement layer drop that drops the final layers.

* Bug fices

* Fix bug RE self.training

* Fix bug setting layerdrop mask

* Fix eigs call

* Add debug info

* Remove warmup

* Remove layer dropout and model-level warmup

* Don't always apply the frame mask

* Slight code cleanup/simplification

* Various fixes, finish implementating frame masking

* Remove debug info

* Don't compute validation if printing diagnostics.

* Apply layer bypass during warmup in a new way, including 2s and 4s of layers.

* Update checkpoint.py to deal with int params

* Revert initial_scale to previous values.

* Remove the feature where it was bypassing groups of layers.

* Implement layer dropout with probability 0.075

* Fix issue with warmup in test time

* Add warmup schedule where dropout disappears from earlier layers first.

* Have warmup that gradually removes dropout from layers; multiply initialization scales by 0.1.

* Do dropout a different way

* Fix bug in warmup

* Remove debug print

* Make the warmup mask per frame.

* Implement layer dropout (in a relatively efficient way)

* Decrease initial keep_prob to 0.25.

* Make it start warming up from the very start, and increase warmup_batches to 6k

* Change warmup schedule and increase warmup_batches from 4k to 6k

* Make the bypass scale trainable.

* Change the initial keep-prob back from 0.25 to 0.5

* Bug fix

* Limit bypass scale to >= 0.1

* Revert "Change warmup schedule and increase warmup_batches from 4k to 6k"

This reverts commit 86845bd5d859ceb6f83cd83f3719c3e6641de987.

* Do warmup by dropping out whole layers.

* Decrease frequency of logging variance_proportion

* Make layerdrop different in different processes.

* For speed, drop the same num layers per job.

* Decrease initial_layerdrop_prob from 0.75 to 0.5

* Revert also the changes in scaled_adam_exp85 regarding warmup schedule

* Remove unused code LearnedScale.

* Reintroduce batching to the optimizer

* Various fixes from debugging with nvtx, but removed the NVTX annotations.

* Only apply ActivationBalancer with prob 0.25.

* Fix s -> scaling for import.

* Increase final layerdrop prob from 0.05 to 0.075

* Fix bug where fewer layers were dropped than should be; remove unnecesary print statement.

* Fix bug in choosing layers to drop

* Refactor RelPosMultiheadAttention to have 2nd forward function and introduce more modules in conformer encoder layer

* Reduce final layerdrop_prob from 0.075 to 0.05.

* Fix issue with diagnostics if stats is None

* Remove persistent attention scores.

* Make ActivationBalancer and MaxEig more efficient.

* Cosmetic improvements

* Change scale_factor_scale from 0.5 to 0.8

* Make the ActivationBalancer regress to the data mean, not zero, when enforcing abs constraint.

* Remove unused config value

* Fix bug when channel_dim < 0

* Fix bug when channel_dim < 0

* Simplify how the positional-embedding scores work in attention (thanks to Zengwei for this concept)

* Revert dropout on attention scores to 0.0.

* This should just be a cosmetic change, regularizing how we get the warmup times from the layers.

* Reduce beta from 0.75 to  0.0.

* Reduce stats period from 10 to 4.

* Reworking of ActivationBalancer code to hopefully balance speed and effectiveness.

* Add debug code for attention weihts and eigs

* Remove debug statement

* Add different debug info.

* Penalize attention-weight entropies above a limit.

* Remove debug statements

* use larger delta but only penalize if small grad norm

* Bug fixes; change debug freq

* Change cutoff for small_grad_norm

* Implement whitening of values in conformer.

* Also whiten the keys in conformer.

* Fix an issue with scaling of grad.

* Decrease whitening limit from 2.0 to 1.1.

* Fix debug stats.

* Reorganize Whiten() code; configs are not the same as before.  Also remove MaxEig for self_attn module

* Bug fix RE float16

* Revert whitening_limit from 1.1 to 2.2.

* Replace MaxEig with Whiten with limit=5.0, and move it to end of ConformerEncoderLayer

* Change LR schedule to start off higher

* Simplify the dropout mask, no non-dropped-out sequences

* Make attention dims configurable, not embed_dim//2, trying 256.

* Reduce attention_dim to 192; cherry-pick scaled_adam_exp130 which is linear_pos interacting with query

* Use half the dim for values, vs. keys and queries.

* Increase initial-lr from 0.04 to 0.05, plus changes for diagnostics

* Cosmetic changes

* Changes to avoid bug in backward hooks, affecting diagnostics.

* Random clip attention scores to -5..5.

* Add some random clamping in model.py

* Add reflect=0.1 to invocations of random_clamp()

* Remove in_balancer.

* Revert model.py so there are no constraints on the output.

* Implement randomized backprop for softmax.

* Reduce min_abs from 1e-03 to 1e-04

* Add RandomGrad with min_abs=1.0e-04

* Use full precision to do softmax and store ans.

* Fix bug in backprop of random_clamp()

* Get the randomized backprop for softmax in autocast mode working.

* Remove debug print

* Reduce min_abs from 1.0e-04 to 5.0e-06

* Add hard limit of attention weights to +- 50

* Use normal implementation of softmax.

* Remove use of RandomGrad

* Remove the use of random_clamp in conformer.py.

* Reduce the limit on attention weights from 50 to 25.

* Reduce min_prob of ActivationBalancer from 0.1 to 0.05.

* Penalize too large weights in softmax of AttentionDownsample()

* Also apply limit on logit in SimpleCombiner

* Increase limit on logit for SimpleCombiner to 25.0

* Add more diagnostics to debug gradient scale problems

* Changes to grad scale logging; increase grad scale more frequently if less than one.

* Add logging

* Remove comparison diagnostics, which were not that useful.

* Configuration changes: scores limit 5->10, min_prob 0.05->0.1, cur_grad_scale more aggressive increase

* Reset optimizer state when we change loss function definition.

* Make warmup period decrease scale on simple loss, leaving pruned loss scale constant.

* Cosmetic change

* Increase initial-lr from 0.05 to 0.06.

* Increase initial-lr from 0.06 to 0.075 and decrease lr-epochs from 3.5 to 3.

* Fixes to logging statements.

* Introduce warmup schedule in optimizer

* Increase grad_scale to Whiten module

* Add inf check hooks

* Renaming in optim.py; remove step() from scan_pessimistic_batches_for_oom in train.py

* Change base lr to 0.1, also rename from initial lr in train.py

* Adding activation balancers after simple_am_prob and simple_lm_prob

* Reduce max_abs on am_balancer

* Increase max_factor in final lm_balancer and am_balancer

* Use penalize_abs_values_gt, not ActivationBalancer.

* Trying to reduce grad_scale of Whiten() from  0.02 to 0.01.

* Add hooks.py, had negleted to  git add it.

* don't do penalize_values_gt on simple_lm_proj and simple_am_proj; reduce --base-lr from 0.1 to  0.075

* Increase probs of activation balancer and make it decay slower.

* Dont print out full non-finite tensor

* Increase default max_factor for ActivationBalancer from 0.02 to 0.04; decrease max_abs in ConvolutionModule.deriv_balancer2 from 100.0 to 20.0

* reduce initial scale in GradScaler

* Increase max_abs in ActivationBalancer of conv module from 20 to 50

* --base-lr0.075->0.5; --lr-epochs 3->3.5

* Revert 179->180 change, i.e. change max_abs for deriv_balancer2 back from 50.0 20.0

* Save some memory in the autograd of DoubleSwish.

* Change the discretization of the sigmoid to be expectation preserving.

* Fix randn to rand

* Try a more exact way to round to uint8 that should prevent ever wrapping around to zero

* Make it use float16 if in amp but use clamp to avoid wrapping error

* Store only half precision output for softmax.

* More memory efficient backprop for DoubleSwish.

* Change to warmup schedule.

* Changes to more accurately estimate OOM conditions

* Reduce cutoff from 100 to 5 for estimating OOM with warmup

* Make 20 the limit for warmup_count

* Cast to float16 in DoubleSwish forward

* Hopefully make penalize_abs_values_gt more memory efficient.

* Add logging about memory used.

* Change scalar_max in optim.py from 2.0 to 5.0

* Regularize how we apply the min and max to the eps of BasicNorm

* Fix clamping of bypass scale; remove a couple unused variables.

* Increase floor on bypass_scale from 0.1 to 0.2.

* Increase bypass_scale from 0.2 to 0.4.

* Increase bypass_scale min from 0.4 to 0.5

* Rename conformer.py to zipformer.py

* Rename Conformer to Zipformer

* Update decode.py by copying from pruned_transducer_stateless5 and changing directory name

* Remove some unused variables.

* Fix clamping of epsilon

* Refactor zipformer for more flexibility so we can change number of encoder layers.

* Have a 3rd encoder, at downsampling factor of 8.

* Refactor how the downsampling is done so that it happens later, but the 1st encoder stack still operates after a subsampling of 2.

* Fix bug RE seq lengths

* Have 4 encoder stacks

* Have 6 different encoder stacks, U-shaped network.

* Reduce dim of linear positional encoding in attention layers.

* Reduce min of bypass_scale from 0.5 to 0.3, and make it not applied in test mode.

* Tuning change to num encoder layers, inspired by relative param importance.

* Make decoder group size equal to 4.

* Add skip connections as in normal U-net

* Avoid falling off the loop for weird inputs

* Apply layer-skip dropout prob

* Have warmup schedule for layer-skipping

* Rework how warmup count is produced; should not affect results.

* Add warmup schedule for zipformer encoder layer, from 1.0 -> 0.2.

* Reduce initial clamp_min for bypass_scale from 1.0 to 0.5.

* Restore the changes from scaled_adam_219 and scaled_adam_exp220,  accidentally lost, re layer skipping

* Change to schedule of bypass_scale min: make it larger, decrease slower.

* Change schedule after initial loss not promising

* Implement pooling module, add it after initial feedforward.

* Bug fix

* Introduce dropout rate to dynamic submodules of conformer.

* Introduce minimum probs in the SimpleCombiner

* Add bias in weight module

* Remove dynamic weights in SimpleCombine

* Remove the 5th of 6 encoder stacks

* Fix some typos

* small fixes

* small fixes

* Copy files

* Update decode.py

* Add changes from the master

* Add changes from the master

* update results

* Add CI

* Small fixes

* Small fixes

Co-authored-by: Daniel Povey <dpovey@gmail.com>
2022-11-15 16:56:05 +08:00
Tiance Wang
952a7b3fcc
Fix typo (#681)
* Update add_alignment_librispeech.py

* Update scaling_converter.py
2022-11-15 10:45:48 +08:00
ahmedalbahnasawy
62302259d0
add kaldifeat (#680) 2022-11-15 00:11:42 +08:00
Fangjun Kuang
cedf9aa24f
Fix shallow fusion and add CI tests for it (#676)
* Fix shallow fusion and add CI tests for it

* Fix -1 index in embedding introduced in the zipformer PR
2022-11-13 11:51:00 +08:00
Fangjun Kuang
7e82f87126
Add Zipformer from Dan (#672) 2022-11-12 18:11:19 +08:00
Fangjun Kuang
e334e570d8
Filter utterances with number_tokens > number_feature_frames. (#604) 2022-11-12 07:57:58 +08:00
Yuekai Zhang
2f43e4508b
fix mask errors when padding audios (#670) 2022-11-10 22:28:04 +08:00
Zengwei Yao
32de2766d5
Refactor getting timestamps in fsa-based decoding (#660)
* refactor getting timestamps for fsa-based decoding

* fix doc

* fix bug
2022-11-05 22:36:06 +08:00
Zengwei Yao
3600ce1b5f
Apply delay penalty on transducer (#654)
* add delay penalty

* fix CI

* fix CI
2022-11-04 16:10:09 +08:00
marcoyang1998
65b85b732c
Merge pull request #659 from marcoyang1998/master
Remove testing file
2022-11-04 12:29:55 +08:00
marcoyang1998
35b884bae6
Merge branch 'k2-fsa:master' into master 2022-11-04 12:28:49 +08:00
marcoyang
2271c3d396 remove testing file 2022-11-04 12:26:38 +08:00
marcoyang1998
7c50a019b1
Merge pull request #645 from marcoyang1998/master
Support RNNLM shallow fusion in modified beam search
2022-11-04 11:39:12 +08:00
marcoyang
a2d7095c1c resolve conflicts 2022-11-04 11:37:42 +08:00
marcoyang
b3c61b85e3 minor fixes 2022-11-04 11:32:09 +08:00
marcoyang
bdaeaae1ae resolve conflicts 2022-11-04 11:25:10 +08:00
marcoyang
0df597291f resolve conflict with timestamp feature 2022-11-04 11:17:56 +08:00
Wei Kang
64aed2cdeb
Fix LG log file name (#657) 2022-11-03 23:12:35 +08:00
Wei Kang
163d929601
Add fast_beam_search_LG (#622)
* Add fast_beam_search_LG

* add fast_beam_search_LG to commonly used recipes

* fix ci

* fix ci

* Fix error
2022-11-03 16:29:30 +08:00
marcoyang
f45d9c4383 resolve conflicts 2022-11-03 11:12:49 +08:00
marcoyang
2a52b8c125 update docs 2022-11-03 11:10:21 +08:00
Teo Wen Shen
d2a1c65c5c
fix torchaudio version in dockerfile (#653)
* fix torchaudio version in dockerfile

* remove kaldiio
2022-11-03 10:27:18 +08:00
zr_jin
5d285625cf
Update tdnn_lstm_ctc.rst (#648) 2022-11-02 23:37:01 +08:00
zr_jin
04671b44f8
Update README.md (#649) 2022-11-02 23:36:40 +08:00
zr_jin
8f79f6de00
Update tdnn_lstm_ctc.rst (#647) 2022-11-02 23:36:07 +08:00
marcoyang1998
e3f218b62b
Update egs/librispeech/ASR/lstm_transducer_stateless2/decode.py
Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-11-02 22:10:23 +08:00
marcoyang
b62fd917ae remove redundant test lines 2022-11-02 18:17:05 +08:00
marcoyang
fb45b95c90 minor fixes 2022-11-02 18:11:39 +08:00
marcoyang
9a01b9098d include previous added decoding method 2022-11-02 18:03:56 +08:00
marcoyang
6c8d1f9ef5 update 2022-11-02 17:48:58 +08:00
marcoyang
babcfd4b68 update author info 2022-11-02 17:27:31 +08:00
marcoyang
0a46a39e24 update decoding commands 2022-11-02 17:25:31 +08:00
marcoyang
86662f0b97 update results 2022-11-02 17:24:53 +08:00
marcoyang
63d0a52dbd support RNNLM shallow fusion in stateless5 2022-11-02 16:37:29 +08:00
marcoyang
de2f5e3e6d support RNNLM shallow fusion for LSTM transducer 2022-11-02 16:15:56 +08:00
Wei Kang
d389524d45
remove tail padding for non-streaming models (#625) 2022-11-01 11:09:56 +08:00
Zengwei Yao
03668771d7
Get timestamps during decoding (#598)
* print out timestamps during decoding

* add word-level alignments

* support to compute mean symbol delay with word-level alignments

* print variance of symbol delay

* update doc

* support to compute delay for pruned_transducer_stateless4

* fix bug

* add doc
2022-11-01 10:24:00 +08:00
Fangjun Kuang
ff3f026381
Checkout the LM for aishell explicitly (#642) 2022-10-31 19:47:43 +08:00
Fangjun Kuang
7f1c0e07b6
Remove onnx and onnxruntime from requirements.txt (#640)
* Remove onnx and onnxruntime from requirements.txt
2022-10-31 13:44:40 +08:00
Teo Wen Shen
1abf2863bb
fix typos (#639) 2022-10-30 22:47:21 +08:00
Wei Kang
581d0361cc
Fix type hints for decode.py (#638)
* Fix type hints for decode.py

* Fix flake8
2022-10-30 16:35:30 +08:00
Nagendra Goel
6709bf1e63
Update train.py (#635)
Add the missing step to add the arguments to the parser.
2022-10-28 10:23:32 +08:00
Fangjun Kuang
499ac24ecb
Install kaldifst for GitHub actions (#632)
* Install kaldifst for GitHub actions
2022-10-24 15:07:29 +08:00
Fangjun Kuang
348494888d
Add kaldifst to requirements.txt (#631) 2022-10-22 13:14:44 +08:00
ezerhouni
9b671e1c21
Add Shallow fusion in modified_beam_search (#630)
* Add utility for shallow fusion

* test batch size == 1 without shallow fusion

* Use shallow fusion for modified-beam-search

* Modified beam search with ngram rescoring

* Fix code according to review

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-10-21 16:44:56 +08:00
marcoyang1998
c30b8d3a1c
fix number of parameters in RESULTS.md (#627) 2022-10-19 16:53:29 +08:00
Teo Wen Shen
15c1a4a441
CSJ Data Preparation (#617)
* workspace setup

* csj prepare done

* Change compute_fbank_musan.py t soft link

* add description

* change lhotse prepare csj command

* split train-dev here

* Add header

* remove debug

* save manifest_statistics

* generate transcript in Lhotse

* update comments in config file
2022-10-18 15:56:43 +08:00
Fangjun Kuang
d69bb826ed
Support exporting LSTM with projection to ONNX (#621)
* Support exporting LSTM with projection to ONNX

* Add missing files

* small fixes
2022-10-18 11:25:31 +08:00
Fangjun Kuang
d1f16a04bd
fix type hints for decode.py (#623) 2022-10-18 06:56:12 +08:00
Fangjun Kuang
a66e74b92f
Fix links in the doc (#619) 2022-10-14 12:23:47 +08:00
Fangjun Kuang
11bff57586
Add doc about model export (#618)
* Add doc about model export

* fix typos
2022-10-14 10:16:34 +08:00
Fangjun Kuang
c39cba5191
Support exporting to ONNX for the wenetspeech recipe (#615)
* Support exporting to ONNX for the wenetspeech recipe
2022-10-13 15:17:20 +08:00
Zengwei Yao
aa58c2ee02
Modify ActivationBalancer for speed (#612)
* add a probability to apply ActivationBalancer

* minor fix

* minor fix
2022-10-13 15:14:28 +08:00
Fangjun Kuang
1c07d2fb37
Remove all-in-one for onnx export (#614)
* Remove all-in-one for onnx export

* Exit on error for CI
2022-10-12 10:34:06 +08:00
Yunusemre
f3db4ea871
exporting projection layers of joiner separately for onnx (#584)
* exporting projection layers of joiner separately for onnx
2022-10-11 18:22:28 +08:00
KajiMaCN
0019463c83
update docs (#611)
* update docs

Co-authored-by: unknown <mazhihao@jshcbd.cn>
Co-authored-by: KajiMaCN <moonlightshadowmzh@gmail.com>
2022-10-11 13:24:59 +08:00
Fangjun Kuang
3614d7ff6d
Add dill to requirements.txt (#613)
* Add dill to requirements.txt

* Disable style check for python 3.7
2022-10-10 22:50:25 +08:00
shcxlee
bf2c4a488e
Modified train.py of tedlium3 models (#597) 2022-10-02 13:01:15 +08:00
Zengwei Yao
f3ad32777a
Gradient filter for training lstm model (#564)
* init files

* add gradient filter module

* refact getting median value

* add cutoff for grad filter

* delete comments

* apply gradient filter in LSTM module, to filter both input and params

* fix typing and refactor

* filter with soft mask

* rename lstm_transducer_stateless2 to lstm_transducer_stateless3

* fix typos, and update RESULTS.md

* minor fix

* fix return typing

* fix typo
2022-09-29 11:15:43 +08:00
LIyong.Guo
923b60a7c6
padding zeros (#591) 2022-09-28 21:20:33 +08:00
Fangjun Kuang
3b5846effa
Update kaldifeat in CI tests (#583) 2022-09-28 20:51:06 +08:00
Fangjun Kuang
9ae2f3a3c5
Small fixes to the transducer training doc (#575) 2022-09-21 14:20:49 +08:00
Fangjun Kuang
099cd3a215
support exporting to ncnn format via PNNX (#571) 2022-09-20 22:52:49 +08:00
Teo Wen Shen
436942211c
Adding Dockerfile for Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8 (#572)
* Changed Dockerfile

* Update Dockerfile

* Dockerfile

* Update README.md

* Add Dockerfiles

* Update README.md

Removed misleading CUDA version, as the Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8 Dockerfile can only support CUDA versions >11.0.
2022-09-20 10:52:24 +08:00
Fangjun Kuang
97b3fc53aa
Add LSTM for the multi-dataset setup. (#558)
* Add LSTM for the multi-dataset setup.

* Add results

* fix style issues

* add missing file
2022-09-16 18:40:25 +08:00
Fangjun Kuang
145c44f710
Use modified ctc topo when vocab size is > 500 (#568) 2022-09-13 10:59:27 +08:00
shcxlee
9e24642faf
Modified prepare_transcripts.py and preprare_lexicon.py of tedlium3 recipe (#567) 2022-09-10 10:32:49 +08:00
Fangjun Kuang
e18fa78c3a
Check that read_manifests_if_cached returns a non-empty dict. (#555) 2022-08-28 11:50:11 +08:00
Fangjun Kuang
d68b8e9120
Disable CUDA_LAUNCH_BLOCKING in wenetspeech recipes. (#554)
* Disable CUDA_LAUNCH_BLOCKING in wenetspeech recipes.

* minor fixes
2022-08-28 11:17:38 +08:00
kobenaxie
235eb0746f
fix scaling converter test for decoder(predictor). (#553) 2022-08-27 17:26:21 +08:00
rickychanhoyin
2636a3dd58
minor changes for correct path names && import module text2segments.py (#552)
* Update asr_datamodule.py

minor file names correction

* minor changes for correct path names && import module text2segments.py
2022-08-27 17:23:45 +08:00
marcoyang1998
1e31fbcd7d
Add clamping operation in Eve optimizer for all scalar weights to avoid (#550)
non stable training in some scenarios. The clamping range is set to (-10,2).
 Note that this change may cause unexpected effect if you resume
training from a model that is trained without clamping.
2022-08-25 12:12:50 +08:00
Duo Ma
0967cf5b38
fixed no cut_id error in decode_dataset (#549)
* fixed import quantization is none

Signed-off-by: shanguanma <nanr9544@gmail.com>

* fixed no cut_id error in decode_dataset

Signed-off-by: shanguanma <nanr9544@gmail.com>

* fixed more than one "#"

Signed-off-by: shanguanma <nanr9544@gmail.com>

* fixed code style

Signed-off-by: shanguanma <nanr9544@gmail.com>

Signed-off-by: shanguanma <nanr9544@gmail.com>
Co-authored-by: shanguanma <nanr9544@gmail.com>
2022-08-25 10:54:21 +08:00
rxhmdia
626a26fc2a
some small changes for aidatatang_200zh (#542)
* Update prepare.sh

* Update compute_fbank_aidatatang_200zh.py
2022-08-23 17:30:03 +08:00
Yuekai Zhang
f9c3d7f92f
fix typo for export jit script (#544) 2022-08-23 17:29:42 +08:00
Duo Ma
dbd61a9db3
fixed import quantization is none (#541)
Signed-off-by: shanguanma <nanr9544@gmail.com>

Signed-off-by: shanguanma <nanr9544@gmail.com>
Co-authored-by: shanguanma <nanr9544@gmail.com>
2022-08-23 10:19:03 +08:00
Zengwei Yao
c0101185d7
consider case of empty tensor (#540) 2022-08-22 21:42:56 +08:00
Lucky Wong
9277c95bcd
Pruned transducer stateless2 for AISHELL-1 (#536)
* Fix not enough values to unpack error .

* [WIP] Pruned transducer stateless2 for AISHELL-1

* fix the style issue

* code format for black

* add pruned-transducer-stateless2 results for AISHELL-1

* simplify result
2022-08-22 10:17:26 +08:00
Fangjun Kuang
0598291ff1
minor fixes to LSTM streaming model (#537) 2022-08-20 09:50:50 +08:00
rickychanhoyin
cdea2d26d4
Update asr_datamodule.py (#538)
minor file names correction
2022-08-20 00:16:38 +08:00
Zengwei Yao
f2f5baf687
Use ScaledLSTM as streaming encoder (#479)
* add ScaledLSTM

* add RNNEncoderLayer and RNNEncoder classes in lstm.py

* add RNN and Conv2dSubsampling classes in lstm.py

* hardcode bidirectional=False

* link from pruned_transducer_stateless2

* link scaling.py pruned_transducer_stateless2

* copy from pruned_transducer_stateless2

* modify decode.py pretrained.py test_model.py train.py

* copy streaming decoding files from pruned_transducer_stateless2

* modify streaming decoding files

* simplified code in ScaledLSTM

* flat weights after scaling

* pruned2 -> pruned4

* link __init__.py

* fix style

* remove add_model_arguments

* modify .flake8

* fix style

* fix scale value in scaling.py

* add random combiner for training deeper model

* add using proj_size

* add scaling converter for ScaledLSTM

* support jit trace

* add using averaged model in export.py

* modify test_model.py, test if the model can be successfully exported by jit.trace

* modify pretrained.py

* support streaming decoding

* fix model.py

* Add cut_id to recognition results

* Add cut_id to recognition results

* do not pad in Conv subsampling module; add tail padding during decoding.

* update RESULTS.md

* minor fix

* fix doc

* update README.md

* minor change, filter infinite loss

* remove the condition of raise error

* modify type hint for the return value in model.py

* minor change

* modify RESULTS.md

Co-authored-by: pkufool <wkang.pku@gmail.com>
2022-08-19 14:38:45 +08:00
Lucky Wong
31686ac829
Fix not enough values to unpack error . (#533) 2022-08-18 10:45:06 +08:00
marcoyang1998
c74cec59e9
propagate changes from #525 to other librispeech recipes (#531)
* propaga changes from #525 to other librispeech recipes

* refactor display_and_save_batch to utils

* fixed typo

* reformat code style
2022-08-17 17:18:15 +08:00
Fangjun Kuang
669401869d
Filter non-finite losses (#525)
* Filter non-finite losses

* Fixes after review
2022-08-17 12:22:43 +08:00
yangsuxia
951b03f6d7
Add function display_and_save_batch in wenetspeech/pruned_transducer_stateless2/train.py (#528)
* Add function display_and_save_batch in egs/wenetspeech/ASR/pruned_transducer_stateless2/train.py

* Modify function: display_and_save_batch

* Delete empty line in pruned_transducer_stateless2/train.py

* Modify code format
2022-08-13 11:09:54 +08:00
Wei Kang
5c17255eec
Sort results to make it more convenient to compare decoding results (#522)
* Sort result to make it more convenient to compare decoding results

* Add cut_id to recognition results

* add cut_id to results for all recipes

* Fix torch.jit.script

* Fix comments

* Minor fixes

* Fix torch.jit.tracing for Pytorch version before v1.9.0
2022-08-12 07:12:50 +08:00
Fangjun Kuang
5149788cb2
Fix computing averaged loss in the aishell recipe. (#523)
* Fix computing averaged loss in the aishell recipe.

* Set find_unused_parameters optionally.
2022-08-09 10:53:31 +08:00
FNLPprojects
f24b76e64b
fix torchaudio version (#524)
* fix torchaudio version

* fix torchaudio version
2022-08-06 18:33:43 +08:00
Fangjun Kuang
1f7832b93c
Fix loading sampler state dict. (#421)
* Fix loading sampler state dict.

* skip scan_pessimistic_batches_for_oom if params.start_batch > 0
2022-08-06 10:00:08 +08:00
Yunusemre
7157f62af3
Merging onnx models (#518)
* add export function of onnx-all-in-one to export.py

* add onnx_check script for all-in-one onnx model

* minor fix

* remove unused arguments

* add onnx-all-in-one test

* fix style

* fix style

* fix requirements

* fix input/output names

* fix installing onnx_graphsurgeon

* fix instaliing onnx_graphsurgeon

* revert to previous requirements.txt

* fix minor
2022-08-04 23:03:41 +08:00
Zengwei Yao
a4dd273776
fix about tensorboard (#516)
* fix metricstracker

* fix style
2022-08-04 19:57:12 +08:00
Mingshuang Luo
e538232485
change for pruned rnnt5 train.py (#519) 2022-08-04 12:29:39 +08:00
Weiji Zhuang
36eacaccb2
Fix preparing char based lang and add multiprocessing for wenetspeech text segmentation (#513)
* add multiprocessing for wenetspeech text segmentation

* Fix preparing char based lang for wenetspeech

* fix style

Co-authored-by: WeijiZhuang <zhuangweiji@xiaomi.com>
2022-08-03 19:19:40 +08:00
Fangjun Kuang
6af5a82d8f
Convert ScaledEmbedding to nn.Embedding for inference. (#517)
* Convert ScaledEmbedding to nn.Embedding for inference.

* Fix CI style issues.
2022-08-03 15:34:55 +08:00
Fangjun Kuang
58a96e5b68
Support exporting to ONNX format (#501)
* WIP: Support exporting to ONNX format

* Minor fixes.

* Combine encoder/decoder/joiner into a single file.

* Revert merging three onnx models into a single one.

It's quite time consuming to extract a sub-graph from the combined
model. For instance, it takes more than one hour to extract
the encoder model.

* Update CI to test ONNX models.

* Decode with exported models.

* Fix typos.

* Add more doc.

* Remove ncnn as it is not fully tested yet.

* Fix as_strided for streaming conformer.
2022-08-03 10:30:28 +08:00
LIyong.Guo
132132f52a
liear_fst_with_self_loops (#512) 2022-08-02 22:28:12 +08:00
Wei Kang
2f75236c05
Support dynamic chunk streaming training in pruned_transcuder_stateless5 (#454)
* support dynamic chunk streaming training

* Add simulate streaming decoding

* Support streaming decoding

* fix causal

* Minor fixes

* fix streaming decode; add results
2022-07-29 16:40:06 +08:00
Mingshuang Luo
1b478d3ac3
Add other decoding methods (nbest, nbest oracle, nbest LG) for wenetspeech pruned rnnt2 (#482)
* add other decoding methods for wenetspeech

* changes for RESULTS.md

* add ngram-lm-scale=0.35 results

* set ngram-lm-scale=0.35 as default

* Update README.md

* add nbest-scale for flie name
2022-07-29 12:03:08 +08:00
Lucky Wong
34b4356bad
correction for get rank id. (#507)
* Fix no attribute 'data' error.

* minor fixes

* correction for get rank id.
2022-07-29 11:28:52 +08:00
Fangjun Kuang
ec69967584
Set overwrite=True when extracting features in batches. (#487) 2022-07-29 11:17:19 +08:00
Mingshuang Luo
389f9c77e5
correction for prepare.sh (#506) 2022-07-28 17:01:46 +08:00
boji123
3c9e7f733b
[debug] raise remind when git-lfs not available (#504)
* [debug] raise remind when git-lfs not available

* modify comment
2022-07-28 16:17:49 +08:00
Mingshuang Luo
f26b62ac00
[WIP] Pruned-transducer-stateless5-for-WenetSpeech (offline and streaming) (#447)
* pruned-rnnt5-for-wenetspeech

* style check

* style check

* add streaming conformer

* add streaming decode

* changes codes for fast_beam_search and export cpu jit

* add modified-beam-search for streaming decoding

* add modified-beam-search for streaming decoding

* change for streaming_beam_search.py

* add README.md and RESULTS.md

* change for style_check.yml

* do some changes

* do some changes for export.py

* add some decode commands for usage

* add streaming results on README.md
2022-07-28 12:54:27 +08:00
Fangjun Kuang
385645d533
Fix get_transducer_model() for aishell. (#497)
PR #495 introduces an error. This commit fixes it.
2022-07-26 15:42:21 +08:00
Fangjun Kuang
d3fc4b031e
Support using aidatatang_200zh optionally in aishell training (#495)
* Use aidatatang_200zh optionally in aishell training.
2022-07-26 11:25:01 +08:00
Fangjun Kuang
4612b03947
Fix using G before assignment in pruned_transducer_stateless/decode.py (#494) 2022-07-26 10:37:02 +08:00
Wei Kang
b1d0956855
Add modified_beam_search for streaming decode (#489)
* Add modified_beam_search for pruned_transducer_stateless/streaming_decode.py

* refactor

* modified beam search for stateless3,4

* Fix comments

* Add real streamng ci
2022-07-25 16:53:23 +08:00
Zengwei Yao
8203d10be7
Add stats about duration and padding proportion (#485)
* add stats about duration and padding proportion

* add  for utt_duration

* add stats for other recipes

* add stats for other 2 recipes

* modify doc

* minor change
2022-07-25 16:40:43 +08:00
Fangjun Kuang
d99796898c
Update doc to add a link to Nadira Povey's YouTube channel. (#492)
* Update doc to add a link to Nadira Povey's YouTube channel.

* fix a typo
2022-07-25 12:06:40 +08:00
Quandwang
116d0cf26d
CTC attention model with reworked Conformer encoder and reworked Transformer decoder (#462)
* ctc attention model with reworked conformer encoder and reworked transformer decoder

* remove unnecessary func

* resolve flake8 conflicts

* fix typos and modify the expr of ScaledEmbedding

* use original beam size

* minor changes to the scripts

* add rnn lm decoding

* minor changes

* check whether q k v weight is None

* check whether q k v weight is None

* check whether q k v weight is None

* style correction

* update results

* update results

* upload the decoding results of rnn-lm to the RESULTS

* upload the decoding results of rnn-lm to the RESULTS

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/RESULTS.md

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-07-22 15:31:25 +08:00
Mingshuang Luo
3d2986b4c2
Update conformer.py for aishell4 (#484)
* update conformer.py for aishell4

* update conformer.py

* add strict=False when model.load_state_dict
2022-07-20 21:32:53 +08:00
Daniel Povey
a8696b36fc
Merge pull request #483 from yaozengwei/fix_diagnostic
Fix diagnostic
2022-07-18 23:33:45 -07:00
yaozengwei
a35b28cd8d fix for case of None stats 2022-07-19 14:29:23 +08:00
ezerhouni
608473b4eb
Add RNN-LM rescoring in fast beam search (#475) 2022-07-18 16:52:17 +08:00
Mingshuang Luo
aec222e2fe
add compile_lg.py for aishell2 recipe (#481) 2022-07-18 14:36:40 +08:00
ezerhouni
ffca1ae7fb
[WIP] Rnn-T LM nbest rescoring (#471) 2022-07-15 10:32:54 +08:00
Yuekai Zhang
c17233eca7
[Ready] [Recipes] add aishell2 (#465)
* add aishell2

* fix aishell2

* add manifest stats

* update prepare char dict

* fix lint

* setting max duration

* lint

* change context size to 1

* update result

* update hf link

* fix decoding comment

* add more decoding methods

* update result

* change context-size 2 default
2022-07-14 14:46:56 +08:00
LIyong.Guo
f8d28f0998
update multi_quantization installation (#469)
* update multi_quantization installation

* Update egs/librispeech/ASR/pruned_transducer_stateless6/train.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-07-13 21:16:45 +08:00
Zengwei Yao
bc2882ddcc
Simplified memory bank for Emformer (#440)
* init files

* use average value as memory vector for each chunk

* change tail padding length from right_context_length to chunk_length

* correct the files, ln -> cp

* fix bug in conv_emformer_transducer_stateless2/emformer.py

* fix doc in conv_emformer_transducer_stateless/emformer.py

* refactor init states for stream

* modify .flake8

* fix bug about memory mask when memory_size==0

* add @torch.jit.export for init_states function

* update RESULTS.md

* minor change

* update README.md

* modify doc

* replace torch.div() with <<

* fix bug, >> -> <<

* use i&i-1 to judge if it is a power of 2

* minor fix

* fix error in RESULTS.md
2022-07-12 19:19:58 +08:00
Zengwei Yao
ce26495238
Rand combine update result (#467)
* update RESULTS.md

* fix test code in pruned_transducer_stateless5/conformer.py

* minor fix

* delete doc

* fix style
2022-07-11 18:13:31 +08:00
Fangjun Kuang
6c69c4e253
Support running icefall outside of a git tracked directory. (#470)
* Support running icefall outside of a git tracked directory.

* Minor fixes.
2022-07-08 15:03:07 +08:00
Fangjun Kuang
e5fdbcd480
Revert changes to setup_logger. (#468) 2022-07-08 09:15:37 +08:00
Fangjun Kuang
8761452a2c
Add multi_quantization to requirements.txt (#464)
* Add multi_quantization to requirements.txt
2022-07-07 14:36:08 +08:00
Mingshuang Luo
8e0b7ea518
mv split cuts before computing feature (#461) 2022-07-04 11:59:37 +08:00
Mingshuang Luo
10e8bc5b56
do a change (#460) 2022-07-03 19:35:01 +08:00
Tiance Wang
ac9fe5342b
Fix TIMIT lexicon generation bug (#456) 2022-06-30 19:13:46 +08:00
Zengwei Yao
d80f29e662
Modification about random combine (#452)
* comment some lines, random combine from 1/3 layers, on linear layers in combiner

* delete commented lines

* minor change
2022-06-30 12:23:49 +08:00
Mingshuang Luo
c10aec5656
load_manifest_lazy for asr_datamodule.py (#453) 2022-06-29 17:45:30 +08:00
Mingshuang Luo
29e407fd04
Code checks for pruned rnnt2 wenetspeech (#451)
* code check

* jq install
2022-06-28 18:57:53 +08:00
Mingshuang Luo
bfa8264697
code check (#450) 2022-06-28 17:32:20 +08:00
Mingshuang Luo
2cb1618c95
[Ready to merge] Pruned transducer stateless5 recipe for tal_csasr (mix Chinese chars and English BPE) (#428)
* add pruned transducer stateless5 recipe for tal_csasr

* do some changes for merging

* change for conformer.py

* add wer and cer for Chinese and English respectively

* fix a error for conformer.py
2022-06-28 11:02:10 +08:00
Wei Kang
6e609c67a2
Using streaming conformer as transducer encoder (#380)
* support streaming in conformer

* Add more documents

* support streaming on pruned_transducer_stateless2; add delay penalty; fixes for decode states

* Minor fixes

* streaming for pruned_transducer_stateless4

* Fix conv cache error, support async streaming decoding

* Fix style

* Fix style

* Fix style

* Add torch.jit.export

* mask the initial cache

* Cutting off invalid frames of encoder_embed output

* fix relative positional encoding in streaming decoding for compution saving

* Minor fixes

* Minor fixes

* Minor fixes

* Minor fixes

* Minor fixes

* Fix jit export for torch 1.6

* Minor fixes for streaming decoding

* Minor fixes on decode stream

* move model parameters to train.py

* make states in forward streaming optional

* update pretrain to support streaming model

* update results.md

* update tensorboard and pre-models

* fix typo

* Fix tests

* remove unused arguments

* add streaming decoding ci

* Minor fix

* Minor fix

* disable right context by default
2022-06-28 00:18:54 +08:00
Jun Wang
d792bdc9bc
fix typo (#445) 2022-06-25 11:00:53 +08:00
Tiance Wang
c0ea334738
fix bug of concatenating list to tuple (#444) 2022-06-24 19:31:09 +08:00
Mingshuang Luo
c391bfd100
fix errors for soft connection (#443) 2022-06-24 10:40:46 +08:00
ezerhouni
0475d75d15
[Ready to be merged] Add RNN-LM to Conformer-CTC decoding (#439) 2022-06-23 19:37:03 +08:00
Fangjun Kuang
dc89b61b80
Add fast_beam_search_nbest. (#420)
* Add fast_beam_search_nbest.

* Fix CI errors.

* Fix CI errors.

* More fixes.

* Small fixes.

* Support using log_add in LG decoding with fast_beam_search.

* Support LG decoding in pruned_transducer_stateless

* Support LG for pruned_transducer_stateless2.

* Support LG for fast beam search.

* Minor fixes.
2022-06-22 00:09:25 +08:00
Fangjun Kuang
7100c33820
Add pruned RNN-T for aishell. (#436)
* Add pruned RNN-T for aishell.

* support torch script.

* Update CI.

* Minor fixes.

* Add links to sherpa.
2022-06-21 21:17:22 +08:00
Zengwei Yao
d3daeaf5cd
Upload extracted codebook indexes (#429)
* save only vq-related info to manifest

* support to join manifest files

* support using extracted codebook indexes

* fix doc

* minor fix

* add enable-distillation argument option, fix monir typos

* fix style

* fix typo
2022-06-21 19:16:59 +08:00
2xwwx2
91b2765cfd
Fixs spelling mistake (#438) 2022-06-20 16:41:04 +08:00
Mingshuang Luo
998091ef52
do some changes for export.py (#437) 2022-06-20 14:57:08 +08:00
Zengwei Yao
a42d96dfe0
Fix warmup (#435)
* fix warmup when scan_pessimistic_batches_for_oom

* delete comments
2022-06-20 13:40:01 +08:00
yaozengwei
74c14f5f5d Merge remote-tracking branch 'k2-fsa/master' 2022-06-18 17:48:51 +08:00
Fangjun Kuang
ab788980c9
Fix an error introduced by supporting torchscript for torch 1.6.0 (#434) 2022-06-18 08:57:20 +08:00
Fangjun Kuang
d53f69108f
Support torch 1.6.0 (#433) 2022-06-17 22:24:47 +08:00
Wei Kang
5379c8e9fa
Disable drop_last in testing time (#427) 2022-06-16 15:43:48 +08:00
Mingshuang Luo
5c3ee8bfcd
[Ready to merge] Pruned transducer stateless5 recipe for AISHELL4 (#399)
* pruned-transducer-stateless5 recipe for aishell4

* pruned-transducer-stateless5 recipe for aishell4

* do some changes and text normalize

* do some changes

* add text normalize

* combine the training data and decode without webdataset

* update codes for merging

* Do a change for READMD.md
2022-06-14 22:19:05 +08:00
yaozengwei
ec8646d0cd Merge remote-tracking branch 'k2-fsa/master' 2022-06-13 20:55:28 +08:00
Zengwei Yao
53f38c01d2
Emformer with conv module and scaling mechanism (#389)
* copy files from existing branch

* add rule in .flake8

* monir style fix

* fix typos

* add tail padding

* refactor, use fixed-length cache for batch decoding

* copy from streaming branch

* copy from streaming branch

* modify emformer states stack and unstack, streaming decoding, to be continued

* refactor Stream class

* remane streaming_feature_extractor.py

* refactor streaming decoding

* test states stack and unstack

* fix bugs, no grad, and num_proccessed_frames

* add modify_beam_search, fast_beam_search

* support torch.jit.export

* use torch.div

* copy from pruned_transducer_stateless4

* modify export.py

* add author info

* delete other test functions

* minor fix

* modify doc

* fix style

* minor fix doc

* minor fix

* minor fix doc

* update RESULTS.md

* fix typo

* add info

* fix typo

* fix doc

* add test function for conv module, and minor fix.

* add copyright info

* minor change of test_emformer.py

* fix doc of stack and unstack, test case with batch_size=1

* update README.md
2022-06-13 15:09:17 +08:00
yaozengwei
2a5a70e03e Merge remote-tracking branch 'k2-fsa/master' 2022-06-13 12:52:28 +08:00
Fangjun Kuang
9f6c748b30
Add links to sherpa. (#417)
* Add links to sherpa.
2022-06-10 12:19:18 +08:00
Fangjun Kuang
bfeab319c9
Fix aishell. (#416) 2022-06-10 11:47:43 +08:00
Fangjun Kuang
dbda1644b5
Replace load_manifest_lazy with load_manifest for MUSAN. (#412) 2022-06-09 11:42:18 +08:00
Fangjun Kuang
ed66877694
Replace ChunkedLilcomHdf5Writer with LilcomChunkyWriter. (#411) 2022-06-09 11:18:52 +08:00
Quandwang
8512aaf585
fix typos (#409) 2022-06-08 20:08:44 +08:00
Mingshuang Luo
5079d99ee2
a correction for text2segmentation.py (#407) 2022-06-08 12:06:57 +08:00
Fangjun Kuang
1094a3cb37
Replace LilcomChunkyWriter with ChunkedLilcomHdf5Writer. (#404) 2022-06-07 18:14:25 +08:00
Fangjun Kuang
80c46f0abd
Fix exporting emformer with torchscript using torch 1.6.0 (#402) 2022-06-07 09:19:37 +08:00
Fangjun Kuang
29fa878fff
Fix Emformer for torchscript using torch 1.6.0 (#401) 2022-06-06 17:08:07 +08:00
Mingshuang Luo
0a21eaae7f
do a change for decode.py (#400) 2022-06-06 15:44:04 +08:00
Fangjun Kuang
f1abce72f8
Use jsonl for CutSet in the LibriSpeech recipe. (#397)
* Use jsonl for cutsets in the librispeech recipe.

* Use lazy cutset for all recipes.

* More fixes to use lazy CutSet.

* Remove force=True from logging to support Python < 3.8

* Minor fixes.

* Fix style issues.
2022-06-06 10:19:16 +08:00
Mingshuang Luo
e5884f82e0
[Ready to merge] Add prefix for compute fbank (#398)
* add prefix

* add prefix
2022-06-05 18:17:52 +08:00
fanlu
8a3068ead8
Update decode.py (#392)
* Update decode.py

fix bug ```TypeError: greedy_search_batch() missing 1 required positional argument: 'encoder_out_lens'```

* fix modified_beam_search

Co-authored-by: fanlu3 <fanlu@jd.com>
2022-06-04 19:08:17 +08:00
Zengwei Yao
148f69d8d9
Update RESULTS.md (#388)
* update RESULT.md about pruned_transducer_stateless4

* Update RESULT.md

This PR is only to update RESULT.md about pruned_transducer_stateless4.

* set default value of --use-averaged-model to True

* update RESULTS.md and add decode command

* minor fix

* update export.py

* add uploaded files links

* update link

* fix typos
2022-06-04 15:52:35 +08:00
Mingshuang Luo
beab229fd7
[Ready to merge] Pruned_transducer_stateless2 for alimeeting dataset (#378)
* add pruned-rnnt2 recipe for alimeeting dataset

* update code for merging

* change LilcomHdf5Writer to ChunkedLilcomHdf5Writer

* change for test.yml

* change for test.yml

* change for test.yml

* change for workflow yml

* change for yml

* change for yml

* change for README.md

* change for yml

* solve the conflicts

* solve the conflicts
2022-06-04 13:47:46 +08:00
Fangjun Kuang
fbfc98f1d3
Add streaming Emformer stateless RNN-T. (#390)
* Add streaming Emformer stateless RNN-T.

* Update results for streaming Emformer.

* Minor fixes.
2022-06-01 14:31:47 +08:00
yaozengwei
bb7ea3141b Merge remote-tracking branch 'k2-fsa/master' 2022-05-31 13:34:23 +08:00
LIyong.Guo
c4ee2bc0af
[Ready to merge]stateless6: states4 + hubert distillation. (#387)
* a copy of stateless4 as base

* distillation with hubert

* fix typo

* example usage

* usage

* Update egs/librispeech/ASR/pruned_transducer_stateless6/hubert_xlarge.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* fix comment

* add results of 100hours

* Update egs/librispeech/ASR/pruned_transducer_stateless6/hubert_xlarge.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* Update egs/librispeech/ASR/pruned_transducer_stateless6/hubert_xlarge.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* check fairseq and quantization

* a short intro to distillation framework

* Update egs/librispeech/ASR/pruned_transducer_stateless6/hubert_xlarge.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* add intro of statless6 in README

* fix type error of dst_manifest_dir

* Update egs/librispeech/ASR/pruned_transducer_stateless6/hubert_xlarge.py

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>

* make export.py call stateless6/train.py instead of stateless2/train.py

* update results by stateless6

* adjust results format

* fix typo

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-05-28 12:37:50 +08:00
yaozengwei
545316636b Merge remote-tracking branch 'origin/master' 2022-05-26 21:55:56 +08:00
yaozengwei
fbbc24f941 Merge remote-tracking branch 'k2-fsa/master' 2022-05-26 21:54:40 +08:00
Mingshuang Luo
c8c8645081
[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh (#375)
* add pruned-rnnt2 model for aidatatang_200zh

* do some changes

* change for README.md

* do some changes
2022-05-24 23:07:40 +08:00
Ewald Enzinger
8c5722de8c
[egs] Add prefix when reading manifests due to recent lhotse changes (#382)
* [egs] Add prefix when reading manifests due to recent lhotse changes

* Fix wenetspeech

* Fix style issues
2022-05-23 23:37:35 +08:00
Mingshuang Luo
0e57b30495
[Ready to merge] Pruned Transducer Stateless2 for WenetSpeech (char-based) (#349)
* add char-based pruned-rnnt2 for wenetspeech

* style check

* style check

* change for export.py

* do some changes

* do some changes

* a small change for .flake8

* solve the conflicts
2022-05-23 17:13:01 +08:00
Fangjun Kuang
2f1e23cde1
Narrower and deeper conformer (#330)
* Copy files for editing.

* Add random combine from #229.

* Minor fixes.

* Pass model parameters from the command line.

* Fix warnings.

* Fix warnings.

* Update readme.

* Rename to avoid conflicts.

* Update results.

* Add CI for pruned_transducer_stateless5

* Typo fixes.

* Remove random combiner.

* Update decode.py and train.py to use periodically averaged models.

* Minor fixes.

* Revert to use random combiner.

* Update results.

* Minor fixes.
2022-05-23 14:39:11 +08:00
Mingshuang Luo
ec5a112831
[Ready to merge] Do some coding style checks for the latest files (#379)
* style check

* do changes for .flake8

* a change for compute_fbank_yesno.py
2022-05-20 19:30:38 +08:00
Daniel Povey
2900ed8f8f
Merge pull request #376 from danpovey/diagnostics_fix
Diagnostics fix
2022-05-19 12:51:07 +08:00
Daniel Povey
9e88d0bf31 Merge remote-tracking branch 'upstream/master' 2022-05-19 12:49:12 +08:00
Daniel Povey
5230e73e41 Small fixes 2022-05-19 12:49:00 +08:00
Daniel Povey
4e23fb2252
Improve diagnostics code memory-wise and accumulate more stats. (#373)
* Update diagnostics, hopefully print more stats.

# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless4b/train.py

* Remove memory-limit options arg

* Remove unnecessary option for diagnostics code, collect on more batches
2022-05-19 11:45:59 +08:00
Daniel Povey
c736b39c7d Remove unnecessary option for diagnostics code, collect on more batches 2022-05-19 11:35:54 +08:00
Daniel Povey
c0fdfabaf3 Remove memory-limit options arg 2022-05-19 11:30:56 +08:00
Daniel Povey
c2c46ea023 Update diagnostics, hopefully print more stats.
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless4b/train.py
2022-05-19 11:29:31 +08:00
Fangjun Kuang
f6ce135608
Various fixes to support torch script. (#371)
* Various fixes to support torch script.

* Add tests to ensure that the model is torch scriptable.

* Update tests.
2022-05-16 21:46:59 +08:00
Desh Raj
5aafbb970e
SPGISpeech recipe (#334)
* initial commit for SPGISpeech recipe

* add decoding

* add spgispeech transducer

* remove conformer ctc; minor fixes in RNN-T

* add results

* add tensorboard

* add pretrained model to HF

* remove unused scripts and soft link common scripts

* remove duplicate files

* pre commit hooks

* remove change in librispeech

* pre commit hook

* add CER numbers
2022-05-16 20:52:14 +08:00
yaozengwei
c9d84aeb5c Merge remote-tracking branch 'k2-fsa/master' 2022-05-15 18:02:27 +08:00
Fangjun Kuang
6f7860a0a6
Fix GitHub CI for decoding GigaSpeech dev/test datasets (#366) 2022-05-15 14:25:35 +08:00
Guanbo Wang
9630f9a3ba
Update GigaSpeech reults (#364)
* Update decode.py

* Update export.py

* Update results

* Update README.md
2022-05-15 12:57:40 +08:00
Fangjun Kuang
f23dd43719
Update results for libri+giga multi dataset setup. (#363)
* Update results for libri+giga multi dataset setup.
2022-05-14 21:45:39 +08:00
Fangjun Kuang
2d7096dfc6
Decode gigaspeech in GitHub actions (#362)
* Add CI for gigaspeech.
2022-05-14 08:53:22 +08:00
Fangjun Kuang
0f180b3ce2
Validate that there are no OOV tokens in BPE-based lexicons. (#359)
* Validate that there are no OOV tokens in BPE-based lexicons.

* Typo fixes.
2022-05-13 14:00:35 +08:00
Fangjun Kuang
e30e042c39
Update decoding script for gigaspeech and remove duplicate files. (#361) 2022-05-13 13:03:16 +08:00
Guanbo Wang
48a6a9a549
GigaSpeech RNN-T experiments (#318)
* Copy RNN-T recipe from librispeech

* flake8

* flake8

* Update params

* gigaspeech decode

* black

* Update results

* syntax highlight

* Update RESULTS.md

* typo
2022-05-13 11:03:26 +08:00
Fangjun Kuang
7b7acdf369
Support --iter in export.py (#360) 2022-05-13 10:51:44 +08:00
Fangjun Kuang
aeb8986e35
Ignore padding frames during RNN-T decoding. (#358)
* Ignore padding frames during RNN-T decoding.

* Fix outdated decoding code.

* Minor fixes.
2022-05-13 07:39:14 +08:00
yaozengwei
bcef517a84 Merge remote-tracking branch 'k2-fsa/master' 2022-05-12 17:45:45 +08:00
Fangjun Kuang
bc284e88e6
Run decode.py in GitHub actions. (#356) 2022-05-10 14:51:34 +08:00
Fangjun Kuang
cd460f7bf1
Stringify torch.__version__ before serializing it. (#354) 2022-05-07 17:18:34 +08:00
Zengwei Yao
20f092e709
Support decoding with averaged model when using --iter (#353)
* support decoding with averaged model when using --iter

* minor fix

* monir fix of copyright date
2022-05-07 13:09:11 +08:00
Mingshuang Luo
f783e10dc8
Do some changes for aishell/ASR/transducer stateless/export.py (#347)
* do some changes for aishell/ASR/transducer_stateless/export.py
2022-05-07 11:09:31 +08:00
yaozengwei
ecfb3e9c26 Merge remote-tracking branch 'k2-fsa/master' 2022-05-07 11:07:48 +08:00
Zengwei Yao
c059ef3169
Keep model_avg on cpu (#348)
* keep model_avg on cpu

* explicitly convert model_avg to cpu

* minor fix

* remove device convertion for model_avg

* modify usage of the model device in train.py

* change model.device to next(model.parameters()).device for decoding

* assert params.start_epoch>0

* assert params.start_epoch>0, params.start_epoch
2022-05-07 10:42:34 +08:00
Guanbo Wang
8e3c89076e Bug fix (#352) 2022-05-07 08:10:54 +08:00
Fangjun Kuang
32f05c00e3
Save batch to disk on exception. (#350) 2022-05-06 17:49:40 +08:00
yaozengwei
70634d58a1 Merge remote-tracking branch 'k2-fsa/master' 2022-05-06 11:31:20 +08:00
Zengwei Yao
00c48ec1f3
Model average (#344)
* First upload of model average codes.

* minor fix

* update decode file

* update .flake8

* rename pruned_transducer_stateless3 to pruned_transducer_stateless4

* change epoch number counter starting from 1 instead of 0

* minor fix of pruned_transducer_stateless4/train.py

* refactor the checkpoint.py

* minor fix, update docs, and modify the epoch number to count from 1 in the pruned_transducer_stateless4/decode.py

* update author info

* add docs of the scaling in function average_checkpoints_with_averaged_model
2022-05-05 21:20:04 +08:00
Fangjun Kuang
8635fb4334
Fix decoding for gigaspeech in the libri + giga setup. (#345) 2022-05-05 20:58:46 +08:00
Fangjun Kuang
e1c3e98980
Save batch to disk on OOM. (#343)
* Save batch to disk on OOM.

* minor fixes

* Fixes after review.

* Fix style issues.
2022-05-05 15:09:23 +08:00
Fangjun Kuang
9ddbc681e7
Validate generated manifest files. (#338) 2022-05-03 07:08:33 +08:00
Fangjun Kuang
6af15914fa
Validate generated manifest files. (#338) 2022-05-03 07:02:54 +08:00
Fangjun Kuang
6dc2e04462
Update results. (#340)
* Update results.

* Typo fixes.
2022-04-29 15:49:45 +08:00
Fangjun Kuang
ac84220de9
Modified conformer with multi datasets (#312)
* Copy files for editing.

* Use librispeech + gigaspeech with modified conformer.

* Support specifying number of workers for on-the-fly feature extraction.

* Feature extraction code for GigaSpeech.

* Combine XL splits lazily during training.

* Fix warnings in decoding.

* Add decoding code for GigaSpeech.

* Fix decoding the gigaspeech dataset.

We have to use the decoder/joiner networks for the GigaSpeech dataset.

* Disable speed perturbe for XL subset.

* Compute the Nbest oracle WER for RNN-T decoding.

* Minor fixes.

* Minor fixes.

* Add results.

* Update results.

* Update CI.

* Update results.

* Fix style issues.

* Update results.

* Fix style issues.
2022-04-29 15:40:30 +08:00
yaozengwei
9c39d8b009 Merge remote-tracking branch 'k2-fsa/master' 2022-04-29 10:26:06 +08:00
Fangjun Kuang
caab6cfd92
Support specifying iteration number of checkpoints for decoding. (#336)
See also #289
2022-04-28 14:09:22 +08:00
Fangjun Kuang
9aeea3e1af
Support averaging models with weight tying. (#333) 2022-04-26 13:32:03 +08:00
pehonnet
9a98e6ced6
fix fp16 option in example usage (#332) 2022-04-25 18:51:53 +08:00
whsqkaak
d766dc5aee
Fix some typos. (#329) 2022-04-22 15:54:59 +08:00
Fangjun Kuang
3607c516d6
Update results for torchaudio RNN-T. (#322) 2022-04-20 11:15:10 +08:00
Fangjun Kuang
fce7f3cd9a
Support computing RNN-T loss with torchaudio (#316) 2022-04-19 18:47:13 +08:00
Wei Kang
021c79824e
Add LG decoding (#277)
* Add LG decoding

* Add log weight pushing

* Minor fixes
2022-04-19 17:23:46 +08:00
Wang, Guanbo
5fe58de43c
GigaSpeech recipe (#120)
* initial commit

* support download, data prep, and fbank

* on-the-fly feature extraction by default

* support BPE based lang

* support HLG for BPE

* small fix

* small fix

* chunked feature extraction by default

* Compute features for GigaSpeech by splitting the manifest.

* Fixes after review.

* Split manifests into 2000 pieces.

* set audio duration mismatch tolerance to 0.01

* small fix

* add conformer training recipe

* Add conformer.py without pre-commit checking

* lazy loading and use SingleCutSampler

* DynamicBucketingSampler

* use KaldifeatFbank to compute fbank for musan

* use pretrained language model and lexicon

* use 3gram to decode, 4gram to rescore

* Add decode.py

* Update .flake8

* Delete compute_fbank_gigaspeech.py

* Use BucketingSampler for valid and test dataloader

* Update params in train.py

* Use bpe_500

* update params in decode.py

* Decrease num_paths while CUDA OOM

* Added README

* Update RESULTS

* black

* Decrease num_paths while CUDA OOM

* Decode with post-processing

* Update results

* Remove lazy_load option

* Use default `storage_type`

* Keep the original tolerance

* Use split-lazy

* black

* Update pretrained model

Co-authored-by: Fangjun Kuang <csukuangfj@gmail.com>
2022-04-14 16:07:22 +08:00
Mingshuang Luo
d88e786513
Changes for pretrained.py (tedlium3 pruned RNN-T) (#311) 2022-04-14 09:54:07 +08:00
Daniel Povey
62fbfb52d0
Merge pull request #315 from danpovey/mixprec_md300
Add results for mixed precision with max-duration 300
2022-04-13 20:23:07 +08:00
Daniel Povey
af6ae840ee Add results for mixed precision with max-duration 300 2022-04-13 20:22:11 +08:00
Daniel Povey
c0003483d3
Merge pull request #313 from glynpu/fix_comments
fix comments
2022-04-13 14:03:02 +08:00
Guo Liyong
78418ac37c fix comments 2022-04-13 13:09:24 +08:00
Daniel Povey
2a854f5607
Merge pull request #309 from danpovey/update_results
Update results; will further update this before merge
2022-04-12 12:22:48 +08:00
Daniel Povey
9ed7a169e1 Add one more epoch of full expt 2022-04-12 12:20:10 +08:00
Daniel Povey
d0a53aad48 Fix tensorboard log location 2022-04-12 11:51:15 +08:00
Daniel Povey
65818d16de Add more results 2022-04-12 11:48:16 +08:00
Fangjun Kuang
bdeff338c2
Fix CI errors. (#310) 2022-04-12 09:09:56 +08:00
Mingshuang Luo
118e195004
Update results for tedlium3 pruned RNN-T (#307)
* Update README.md
2022-04-11 22:19:26 +08:00
Mingshuang Luo
93c60a9d30
Code style check for librispeech pruned transducer stateless2 (#308) 2022-04-11 22:15:18 +08:00
Daniel Povey
ead822477c Fix rebase 2022-04-11 21:01:13 +08:00
Daniel Povey
e8eb0b94d9 Updating RESULTS.md; fix in beam_search.py 2022-04-11 21:00:11 +08:00
pkufool
a92133ef96 Minor fixes 2022-04-11 20:58:47 +08:00
pkufool
ddd8f9e15e Minor fixes 2022-04-11 20:58:43 +08:00
pkufool
cc0d4ffa4f Add mix precision support 2022-04-11 20:58:02 +08:00
Mingshuang Luo
8cb727e24a
Tedlium3 pruned transducer stateless (#261)
* update tedlium3-pruned-transducer-stateless-codes

* update README.md

* update README.md

* add fast beam search for decoding

* do a change for RESULTS.md

* do a change for RESULTS.md

* do a fix

* do some changes for pruned RNN-T
2022-04-11 17:08:53 +08:00
Wei Kang
7012fd65b5
Support mix precision training on the reworked model (#305)
* Add mix precision support

* Minor fixes

* Minor fixes

* Minor fixes
2022-04-11 16:49:54 +08:00
Daniel Povey
34aad74a2c
Merge pull request #303 from danpovey/fix_docs
Fix docs in optim.py
2022-04-11 15:14:06 +08:00
Daniel Povey
03c7c2613d Fix docs in optim.py 2022-04-11 15:13:42 +08:00
Daniel Povey
6eb6d9b4cd
Merge pull request #288 from danpovey/reworked_model
Reworked model
2022-04-11 15:03:08 +08:00
Daniel Povey
5078332088 Fix adding learning rate to tensorboard 2022-04-11 14:58:15 +08:00
Daniel Povey
d5f9d49e53 Modify beam search to be efficient with current joienr 2022-04-11 12:35:29 +08:00
Daniel Povey
46d52dda10 Fix dir names 2022-04-11 12:03:41 +08:00
Wei Kang
f721a2fd7a
Minor fixes for logging (#296)
* Minor fixes for logging

* Minor fix
2022-04-10 23:34:18 +08:00
Zengwei Yao
08473a17aa
Modify init (#301)
* update icefall/__init__.py to import more common functions.

* update icefall/__init__.py

* make imports style consistent.

* exclude black check for icefall/__init__.py in pyproject.toml.
2022-04-10 23:29:28 +08:00
Daniel Povey
962cf868c9 Fix import 2022-04-10 15:31:46 +08:00
Daniel Povey
d1e4ae788d Refactor how learning rate is set. 2022-04-10 15:25:27 +08:00
Daniel Povey
82d58629ea Implement 2p version of learning rate schedule. 2022-04-10 13:50:31 +08:00
Daniel Povey
da50525ca5 Make lrate rule more symmetric 2022-04-10 13:25:40 +08:00
Daniel Povey
4d41ee0caa Implement 2o schedule 2022-04-09 18:37:03 +08:00
Daniel Povey
db72aee1f0 Set 2n rule.. 2022-04-09 18:15:56 +08:00
Daniel Povey
0f8ee68af2 Fix bug 2022-04-08 16:53:42 +08:00
Daniel Povey
f587cd527d Change exponential part of lrate to be epoch based 2022-04-08 16:24:21 +08:00
Daniel Povey
6ee32cf7af Set new scheduler 2022-04-08 16:10:06 +08:00
Fangjun Kuang
78b8792d1d
Fix potential bugs in PyTorch that exist in label_smoothing. (#300) 2022-04-08 13:41:33 +08:00
Fangjun Kuang
7c0070e6f6
Display torch version in the training log. (#299) 2022-04-08 11:39:54 +08:00
Daniel Povey
61486a0f76 Remove initial_speed 2022-04-06 13:17:26 +08:00
Daniel Povey
a41e93437c Change some defaults in LR-setting rule. 2022-04-06 12:36:58 +08:00
Zengwei Yao
ceeb95bcb8
update icefall/__init__.py to import more common functions. (#294) 2022-04-06 11:55:29 +08:00
Daniel Povey
2545237eb3 Changing initial_speed from 0.25 to 01 2022-04-05 18:00:54 +08:00
Daniel Povey
25724b5ce9 Bug-fix RE sign of target_rms 2022-04-05 13:49:35 +08:00
Daniel Povey
d1a669162c Fix bug in lambda 2022-04-05 13:31:52 +08:00
Daniel Povey
ed8eba91e1 Reduce model_warm_step from 4k to 3k 2022-04-05 13:24:09 +08:00
Daniel Povey
c3169222ae Simplified optimizer, rework somet things.. 2022-04-05 13:23:02 +08:00
Daniel Povey
0f5957394b Fix to reading scheudler from optim 2022-04-05 12:58:43 +08:00
Daniel Povey
1548cc7462 Fix checkpoint-writing 2022-04-05 11:19:40 +08:00
Wei Kang
cb3ba16f2b
Fix aishell prepare.sh when using pre-download data (#291) 2022-04-05 10:22:49 +08:00
Daniel Povey
47d49f29d7 Fix weight decay formula by adding 1/1-beta 2022-04-05 00:31:55 +08:00
Daniel Povey
2b0727a355 Fix weight decay formula by adding 1/1-beta 2022-04-05 00:31:28 +08:00
Daniel Povey
234366e51c Fix type of parameter 2022-04-05 00:18:36 +08:00
Daniel Povey
179d0605ea Change initialization to 0.25 2022-04-04 23:34:39 +08:00
Daniel Povey
d1f2f93460 Some fixes.. 2022-04-04 22:40:18 +08:00
Daniel Povey
72f4a673b1 First draft of new approach to learning rates + init 2022-04-04 20:21:34 +08:00
Daniel Povey
4929e4cf32 Change how warm-step is set 2022-04-04 17:09:25 +08:00
Daniel Povey
a5bbcd7b71 Make training more efficient, avoid redoing some projections. 2022-04-04 14:14:03 +08:00
Daniel Povey
99e9d6c4b8 Some cleanups 2022-04-04 13:37:10 +08:00
Daniel Povey
0fd0828f79 Fix to joiner to allow different dims 2022-04-04 13:34:43 +08:00
Fangjun Kuang
87cf9231ea
Support specifying iteration number of checkpoints for decoding. (#289) 2022-04-03 13:02:08 +08:00
Daniel Povey
9f62a0296c Revert transducer_stateless/ to state in upstream/master 2022-04-02 21:16:39 +08:00
Daniel Povey
807fcada68 Change learning speed of simple_lm_proj 2022-04-02 20:15:11 +08:00
Daniel Povey
34500afc43 Various bug fixes 2022-04-02 20:06:43 +08:00
Daniel Povey
8be10d3d6c First draft of model rework 2022-04-02 20:03:21 +08:00
Daniel Povey
eec597fdd5 Merge changes from master 2022-04-02 18:45:20 +08:00
Daniel Povey
e0ba4ef3ec Make layer dropout rate 0.075, was 0.1. 2022-04-02 17:48:54 +08:00
Zengwei Yao
0b6a2213c3
Modify icefall/__init__.py. (#287)
* Modify icefall/__init__.py to import common functions defined in icefall/utils.py.

* Modify icefall/__init__.py and .flake8.
2022-04-02 15:01:45 +08:00
Daniel Povey
45f872c27d Remove final dropout 2022-04-01 19:33:20 +08:00
Daniel Povey
92ec2e356e Fix test-mode 2022-04-01 12:22:12 +08:00
Fangjun Kuang
e7493ede90
Don't use a lambda for dataloader's worker_init_fn. (#284)
* Don't use a lambda for dataloader's worker_init_fn.
2022-03-31 20:32:00 +08:00
Daniel Povey
8caa18e2fe Bug fix to warmup_scale 2022-03-31 17:30:51 +08:00
Fangjun Kuang
9a11808ed3
Set the seed for dataloader. (#282)
Also, suppress torch warnings about division by truncation.
2022-03-31 16:48:46 +08:00
Daniel Povey
49bc761ba1 Merge branch 'rework2i_restoredrop_scaled_warmup' into rework2i_restoredrop_scaled_warmup_2proj
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless2/model.py
2022-03-31 14:45:55 +08:00
Daniel Povey
e663713258 Change how warmup is applied. 2022-03-31 14:43:49 +08:00
Daniel Povey
fcb0dba2cf Reduce initial_speed from 0.5 to 0.25 2022-03-31 13:47:28 +08:00
Daniel Povey
025d690995 Reduce initial_speed further from 0.5 to 0.25 2022-03-31 13:39:56 +08:00
Daniel Povey
ec54fa85cc Use initial_speed=0.5 2022-03-31 13:04:09 +08:00
Daniel Povey
e59db01b7c Reduce initial_speed 2022-03-31 13:03:26 +08:00
Daniel Povey
c67ae0f3a1 Make 2 projections.. 2022-03-31 13:02:40 +08:00
Daniel Povey
f75d40c725 Replace nn.Linear with ScaledLinear in simple joiner 2022-03-31 12:18:31 +08:00
Daniel Povey
9a0c2e7fee Merge branch 'rework2i' into rework2i_restoredrop 2022-03-31 12:17:02 +08:00
Daniel Povey
f47fe8337a Remove some un-used code 2022-03-31 12:16:08 +08:00
Daniel Povey
0599f38281 Add final dropout to conformer 2022-03-31 11:53:54 +08:00
LIyong.Guo
fc40bfea82
fix typo of torch.eig (#281)
Co-authored-by: glynpu <glynwpu@qq.com>
2022-03-31 10:43:46 +08:00
Fangjun Kuang
2045125fd9
Fix CI. (#280)
* Fix CI.
2022-03-31 10:43:02 +08:00
Daniel Povey
a2aca9f643 Bug-fix 2022-03-30 21:42:15 +08:00
Daniel Povey
f87811e65c Fix RE identity 2022-03-30 21:41:46 +08:00
Daniel Povey
709c387ce6 Initial refactoring to remove unnecessary vocab_size 2022-03-30 21:40:22 +08:00
Fangjun Kuang
981b064007
Update doc to clarify the installation order of dependencies. (#279) 2022-03-30 18:50:54 +08:00
Mingshuang Luo
f686635b54
Update diagnostics (#260)
* update diagnostics.py
2022-03-30 14:52:55 +08:00
Daniel Povey
74121ac478 Merge branch 'rework2h_randloader_pow0.333_conv_8' into rework2h_randloader_pow0.333_conv_8_lessdrop_speed
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless2/conformer.py
2022-03-30 12:24:15 +08:00
Daniel Povey
37ab0bcfa5 Reduce speed of some components 2022-03-30 11:46:23 +08:00
Daniel Povey
7c46c3b0d4 Remove dropout in output layer 2022-03-30 11:20:04 +08:00
Daniel Povey
21a099b110 Fix padding bug 2022-03-30 11:18:04 +08:00
Daniel Povey
ca6337b78a Add another convolutional layer 2022-03-30 11:12:35 +08:00
Daniel Povey
1b8d7defd0 Reduce 1st conv channels from 64 to 32 2022-03-30 00:44:18 +08:00
Daniel Povey
4e453a4bf9 Rework conformer, remove some code. 2022-03-29 23:41:13 +08:00
Daniel Povey
11124b03ea Refactoring and simplifying conformer and frontend 2022-03-29 20:32:14 +08:00
Daniel Povey
57f943b25c Merge branch 'rework2h_randloader' into rework2h_pow0.333 2022-03-29 19:05:39 +08:00
Daniel Povey
2cde99509f Change max-keep-prob to 0.95 2022-03-27 23:21:42 +08:00
Daniel Povey
262388134d Increase model_warm_step to 4k 2022-03-27 11:18:16 +08:00
Daniel Povey
8a8134b9e5 Change power of lr-schedule from -0.5 to -0.333 2022-03-27 00:31:08 +08:00
Daniel Povey
953aecf5e3 Reduce layer-drop prob after warmup to 1 in 100 2022-03-27 00:25:32 +08:00
Daniel Povey
b43468bb67 Reduce layer-drop prob 2022-03-26 19:36:33 +08:00
Daniel Povey
8a38d9a855 Fix/patch how fix_random_seed() is imported. 2022-03-26 15:43:47 +08:00
Daniel Povey
26a1730392 Add random-number-setting function in dataloader 2022-03-26 14:53:23 +08:00
Daniel Povey
0e694739f2 Fix test mode with random layer dropout 2022-03-25 23:28:52 +08:00
Daniel Povey
d2ed3dfc90 Fix bug 2022-03-25 20:35:11 +08:00
Daniel Povey
4b650e9f01 Make warmup work by scaling layer contributions; leave residual layer-drop 2022-03-25 20:34:33 +08:00
Daniel Povey
1f548548d2 Simplify the warmup code; max_abs 10->6 2022-03-24 15:06:11 +08:00
Daniel Povey
aab72bc2a5 Add changes from master to decode.py, train.py 2022-03-24 13:10:54 +08:00
Daniel Povey
5d9dae3064 Merge changes from master 2022-03-24 12:59:36 +08:00
Fangjun Kuang
395a3f952b
Batch decoding for models trained with optimized_transducer (#267)
* Add greedy search in batch mode.
* Add modified beam search in batch mode.
2022-03-23 19:11:34 +08:00
Fangjun Kuang
3ae7265737
More fixes to the checkpoint code. (#266) 2022-03-23 14:37:54 +08:00
Fangjun Kuang
6a091da0b0
Minor fixes for saving checkpoints. (#265)
* Minor fixes for saving checkpoints.

* Fix loading checkpoints saved by previous code.
2022-03-23 12:22:05 +08:00
Daniel Povey
9a8aa1f54a Change how warmup works. 2022-03-22 15:36:20 +08:00
Fangjun Kuang
8c7995d493
Support modified beam search in batch mode. (#264)
* Support modified beam search in batch mode.
* Update k2 versions in GitHub CI.
2022-03-22 15:14:04 +08:00
Daniel Povey
cef6348703 Change max-abs from 6 to 10 2022-03-22 13:50:54 +08:00
Daniel Povey
4004ca81d8 Increase warm_step (and valid_interval) 2022-03-22 13:32:24 +08:00
Daniel Povey
b82a505dfc Reduce initial pruned_loss scale from 0.01 to 0.0 2022-03-22 12:30:48 +08:00
Fangjun Kuang
d5c78a2238
Implement greedy search in batch mode for transducer decoding. (#262) 2022-03-22 10:32:22 +08:00
Daniel Povey
b7e84d5d77 Whitespace fix 2022-03-21 23:59:53 +08:00
Daniel Povey
2eef001d39 Fix balancer code 2022-03-21 23:59:26 +08:00
Daniel Povey
11a04c50ae Change 0.025,0.05 to 0.01 in initializations 2022-03-21 21:29:24 +08:00
Daniel Povey
05e30d0c46 Add max-abs=6, debugged version 2022-03-21 21:15:00 +08:00
Daniel Povey
d844655a63 Merge remote-tracking branch 'upstream/master' into rework2e_ws30k_16k_0.025_ma6 2022-03-21 21:13:25 +08:00
Daniel Povey
ccbf8ba086 Incorporate changes from master into pruned_transducer_stateless2. 2022-03-21 21:12:43 +08:00
Wei Kang
b2b4d9e0b6
Add fast beam search decoding (#250)
* Add fast beam search decoding

* Minor fixes

* Minor fixes

* Minor fixes

* Fix comments

* Fix comments
2022-03-21 16:22:25 +08:00
Daniel Povey
05b5e78d8f Add norm+balancer to VggSubsampling 2022-03-21 15:55:11 +08:00
Fangjun Kuang
ae564f91e6
Periodically saving checkpoint after processing given number of batches (#259)
* Periodically saving checkpoint after processing given number of batches.
2022-03-20 23:51:33 +08:00
Fangjun Kuang
910e6c9306
Minor fixes to tedlimu3 to make ./prepare.sh working. (#258) 2022-03-20 20:26:03 +08:00
Daniel Povey
0ee2404ff0 Remove logging code that broke with newer Lhotse; fix bug with pruned_loss 2022-03-19 14:01:45 +08:00
Daniel Povey
8cff994cd7 Set also scale for embedding to 0.025. 2022-03-18 21:30:05 +08:00
Daniel Povey
188eada7ac Change initial std from 0.05 to 0.025. 2022-03-18 21:28:34 +08:00
Daniel Povey
c9f1aeb7d1 Fix bug with import 2022-03-18 16:40:24 +08:00
Daniel Povey
2dfcd8f117 Double warm_step 2022-03-18 16:38:36 +08:00
Daniel Povey
ba3611cefd Cosmetic changes to swish 2022-03-18 16:35:48 +08:00
Daniel Povey
6769087d70 Remove scale_speed, make swish deriv more efficient. 2022-03-18 16:31:25 +08:00
Mingshuang Luo
ad28c8c5eb
Tedlium3 transducer stateless (#233)
* add tedlium3 transducer-stateless
2022-03-18 11:39:06 +08:00
Daniel Povey
cbe6b175d1 Reduce warmup scale on pruned loss form 0.1 to 0.01. 2022-03-17 16:46:59 +08:00
Daniel Povey
acc0eda5b0 Scale down pruned loss in warmup mode 2022-03-17 16:09:35 +08:00
Daniel Povey
13db33ffa2 Fix diagnostics-getting code 2022-03-17 15:53:53 +08:00
Daniel Povey
11bea4513e Add remaining files in pruned_transducer_stateless2 2022-03-17 11:17:52 +08:00
Daniel Povey
e3ad8f63e7 update decode.py file type 2022-03-16 22:22:10 +08:00
Daniel Povey
cc8e4412f7 Add more files.. 2022-03-16 22:16:40 +08:00
Daniel Povey
1f3a15f3c4 Start adding some files.. 2022-03-16 22:14:30 +08:00
Daniel Povey
87c92efbfe Changes from upstream/master 2022-03-16 21:49:15 +08:00
Mingshuang Luo
518ec6414a
Update diagnostics.py (#254)
* update diagnostics.py

* do some changes
2022-03-16 20:17:45 +08:00
Daniel Povey
e838c192ef Cosmetic changes/renaming things 2022-03-16 19:27:45 +08:00
Daniel Povey
dfc75752c4 Remove some dead code. 2022-03-16 18:06:01 +08:00
Daniel Povey
c82db4184a Remove xscale from pos_embedding 2022-03-16 15:50:11 +08:00
Daniel Povey
6561743d7b bug fix re sqrt 2022-03-16 14:55:17 +08:00
Daniel Povey
0e9cad3f1f Modifying initialization from normal->uniform; add initial_scale when initializing 2022-03-16 14:42:53 +08:00
Daniel Povey
00be56c7a0 Remove dead code 2022-03-16 12:49:00 +08:00
Daniel Povey
a783b96467 Fix typo 2022-03-16 12:43:44 +08:00
Daniel Povey
633213424d Rework of initialization 2022-03-16 12:42:59 +08:00
Daniel Povey
1331199530 Merge branch 'specaugmod_baseline' into randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp_convderiv2warmup_scale_0mean 2022-03-15 23:47:03 +08:00
Daniel Povey
261d7602a7 Draft of 0mean changes.. 2022-03-15 23:46:53 +08:00
Daniel Povey
fc873cc50d Make epsilon in BasicNorm learnable, optionally. 2022-03-15 17:00:17 +08:00
Daniel Povey
b2abcd721a Add more stats. 2022-03-15 16:38:19 +08:00
Fangjun Kuang
a7643301ec
Cache pip packages for GitHub actions (#253)
* Cache pip packages in GitHub actions.
2022-03-15 15:34:21 +08:00
Daniel Povey
1962fe298b Add deriv-balancer at output of embedding. 2022-03-15 14:35:15 +08:00
Daniel Povey
2e6d170be8 Merge branch 'specaugmod_baseline' into randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp_convderiv3warmup_embed 2022-03-15 14:33:08 +08:00
Daniel Povey
21ebd356e7 Add some extra info to diagnostics 2022-03-15 13:49:15 +08:00
Daniel Povey
86e5dcba11 Remove max-positive constraint in deriv-balancing; add second DerivBalancer in conv module. 2022-03-15 13:10:35 +08:00
Daniel Povey
a23010fc10 Add warmup mode 2022-03-14 23:04:51 +08:00
Daniel Povey
8d17a05dd2 Reduce constraints from deriv-balancer in ConvModule. 2022-03-14 19:23:33 +08:00
Daniel Povey
788963d40a Merge branch 'randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp' into randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp_convderiv 2022-03-14 14:37:40 +08:00
Daniel Povey
ae25688253 Make DoubleSwish more memory efficient 2022-03-14 11:02:32 +08:00
Mingshuang Luo
d0d806560f
Change for asr_datamodule.py (#241)
* change for asr_datamodule.py

* fix style check

* do a fix
2022-03-14 00:30:58 +08:00
Daniel Povey
437e8b2083 Reduce max-abs limit from 1000 to 100; introduce 2 DerivBalancer modules in conv layer. 2022-03-13 23:31:08 +08:00
Daniel Povey
f351777e9c Remove ExpScale in feedforward layes. 2022-03-13 17:29:39 +08:00
Daniel Povey
97c0bb82d3 Change dir name 2022-03-13 13:19:20 +08:00
Daniel Povey
5d69acb25b Add max-abs-value 2022-03-13 13:15:20 +08:00
Daniel Povey
e6a501d3c8 Add max-abs-value constraint in DerivBalancer 2022-03-13 11:52:13 +08:00
Daniel Povey
6042c96db2 Use learnable scales for joiner and decoder 2022-03-12 20:54:46 +08:00
Daniel Povey
2117f46361 DoubleSwish fix 2022-03-12 19:02:14 +08:00
Daniel Povey
be0a79cbca Replace ExpScaleRelu with DoubleSwish() 2022-03-12 19:00:48 +08:00
Daniel Povey
db7a3b6eea Reduce initial_scale. 2022-03-12 18:50:02 +08:00
Daniel Povey
b7b2d8970b Cosmetic change 2022-03-12 17:47:35 +08:00
Daniel Povey
a24572abd1 Bug-fix RE bias 2022-03-12 17:28:43 +08:00
Daniel Povey
a392cb9fbc Reduce initial scaling of modules 2022-03-12 16:53:03 +08:00
Fangjun Kuang
bb7f6ed6b7
Add modified beam search for pruned rnn-t. (#248)
* Add modified beam search for pruned rnn-t.

* Fix style issues.

* Update RESULTS.md.

* Fix typos.

* Minor fixes.

* Test the pre-trained model using GitHub actions.

* Let the user install optimized_transducer on her own.

* Fix errors in GitHub CI.
2022-03-12 16:16:55 +08:00
Fangjun Kuang
2f4e71f433
Add force alignment for stateless transducer. (#239)
* Add force alignment for stateless transducer.

* Add more documentation.

* Compute word starting time from framewise token alignment.

* Update README to include force alignment information.

* Fix typos.

* Fix more typos.

* Fixes after review.
2022-03-12 16:16:15 +08:00
Daniel Povey
d906bc2a4f Change dir name 2022-03-12 15:38:39 +08:00
Daniel Povey
ca8cf2a73b Another rework, use scales on linear/conv 2022-03-12 15:38:13 +08:00
Daniel Povey
0abba9e7a2 Fix self.post-scale-mha 2022-03-12 11:20:44 +08:00
Daniel Povey
76a2b9d362 Add learnable post-scale for mha 2022-03-12 11:19:49 +08:00
Daniel Povey
7eb5a84cbe Add identity pre_norm_final for diagnostics. 2022-03-11 21:00:43 +08:00
Daniel Povey
2d3a76292d Set scaling on SwishExpScale 2022-03-11 20:12:45 +08:00
Daniel Povey
cc558faf26 Fix scale from 0.5 to 2.0 as I really intended.. 2022-03-11 19:11:50 +08:00
Daniel Povey
98156711ef Introduce in_scale=0.5 for SwishExpScale 2022-03-11 19:07:34 +08:00
Daniel Povey
a0d5e2932c Reduce min_abs from 0.5 to 0.2 2022-03-11 18:17:49 +08:00
Daniel Povey
5eafccb369 Change how scales are applied; fix residual bug 2022-03-11 17:46:33 +08:00
Daniel Povey
bec33e6855 init 1st conv module to smaller variance 2022-03-11 16:37:17 +08:00
Daniel Povey
bcf417fce2 Change max_factor in DerivBalancer from 0.025 to 0.01; fix scaling code. 2022-03-11 14:47:46 +08:00
Daniel Povey
2940d3106f Fix q*scaling logic 2022-03-11 14:44:13 +08:00
Daniel Povey
137eae0b95 Reduce max_factor to 0.01 2022-03-11 14:42:17 +08:00
Daniel Povey
ab9a17413a Scale up pos_bias_u and pos_bias_v before use. 2022-03-11 14:37:52 +08:00
Daniel Povey
e3e14cf7a4 Change min-abs threshold from 0.2 to 0.5 2022-03-11 14:16:33 +08:00
Daniel Povey
bfce5f63e4 Fix dirname 2022-03-10 23:49:09 +08:00
Daniel Povey
76560f255c Add min-abs-value 0.2 2022-03-10 23:48:46 +08:00
Daniel Povey
2fa9c636a4 use nonzero threshold in DerivBalancer 2022-03-10 23:24:55 +08:00
Daniel Povey
425e274c82 Replace norm in ConvolutionModule with a scaling factor. 2022-03-10 16:01:53 +08:00
Daniel Povey
87b843f023 Change exp dir 2022-03-10 14:44:55 +08:00
Daniel Povey
b55472bb42 Replace most normalizations with scales (still have norm in conv) 2022-03-10 14:43:54 +08:00
Daniel Povey
059b57ad37 Add BasicNorm module 2022-03-10 14:32:05 +08:00
Daniel Povey
feb20ca84d Merge changes to diagnostics 2022-03-10 10:31:42 +08:00
Daniel Povey
1e5455ba29 Update diagnostics 2022-03-10 10:28:48 +08:00
Daniel Povey
d074cf73c6 Extensions to diagnostics code 2022-03-09 20:37:20 +08:00
Daniel Povey
e2ace9d545 Replace norm on input layer with scale of 0.1. 2022-03-07 11:24:04 +08:00
Daniel Povey
a37d98463a Restore ConvolutionModule to state before changes; change all Swish,Swish(Swish) to SwishOffset. 2022-03-06 11:55:02 +08:00
Daniel Povey
8a8b81cd18 Replace relu with swish-squared. 2022-03-05 22:21:42 +08:00
Fangjun Kuang
1603744469
Refactor conformer. (#237) 2022-03-05 19:26:06 +08:00
Daniel Povey
5f2c0a09b7 Convert swish nonlinearities to ReLU 2022-03-05 16:28:24 +08:00
Daniel Povey
0cd14ae739 Fix exp dir 2022-03-05 12:17:09 +08:00
Daniel Povey
65b09dd5f2 Double the threshold in brelu; slightly increase max_factor. 2022-03-05 00:07:14 +08:00
Daniel Povey
74f2b163de Merge diagnostics improvement 2022-03-04 23:15:47 +08:00
Daniel Povey
6252282fd0 Add deriv-balancing code 2022-03-04 20:19:11 +08:00
Daniel Povey
eb3ed54202 Reduce scale from 50 to 20 2022-03-04 15:56:45 +08:00
Daniel Povey
9cc5999829 Fix duplicate Swish; replace norm+swish with swish+exp-scale in convolution module 2022-03-04 15:50:51 +08:00
yaozengwei
ad62981765
Add diagnostics (#230)
* Adding diagnostics code...

* Move diagnostics code from local dir to the shared icefall dir

* Remove the diagnostics code in the local dir

* Update docs of arguments, and remove stats_types() function in TensorDiagnosticOptions object.

* Update docs of arguments.

* Add copyright information.

* Corrected the time in copyright information.

Co-authored-by: Daniel Povey <dpovey@gmail.com>
2022-03-04 15:38:23 +08:00
Daniel Povey
7e88999641 Increase scale from 20 to 50. 2022-03-04 14:31:29 +08:00
Daniel Povey
3207bd98a9 Increase scale on Scale from 4 to 20 2022-03-04 13:16:40 +08:00
Daniel Povey
503f8d521c Fix bug in diagnostics 2022-03-04 13:08:56 +08:00
Daniel Povey
3d9ddc2016 Fix backprop bug 2022-03-04 12:29:44 +08:00
Fangjun Kuang
2f0fbf430c
Remove duplicate files. (#236) 2022-03-04 11:56:31 +08:00
Daniel Povey
cd216f50b6 Add import 2022-03-04 11:03:01 +08:00
Daniel Povey
bc6c720e25 Combine ExpScale and swish for memory reduction 2022-03-04 10:52:05 +08:00
Daniel Povey
23b3aa233c Double learning rate of exp-scale units 2022-03-04 00:42:37 +08:00
Daniel Povey
5c177fc52b pelu_base->expscale, add 2xExpScale in subsampling, and in feedforward units. 2022-03-03 23:52:03 +08:00
Fangjun Kuang
3ec219dfa0
Add stateless transducer tutorial. (#235)
* WIP: Add stateless transducer tutorial.

* Add more doc.

* Minor fixes.
2022-03-03 22:33:47 +08:00
Daniel Povey
3fb559d2f0 Add baseline for the PeLU expt, keeping only the small normalization-related changes. 2022-03-02 18:27:08 +08:00
Fangjun Kuang
1ff6196c44
Fix joiner (#234)
* Add tests for Joiner

* Remove duplicate files.
2022-03-02 16:41:14 +08:00
Daniel Povey
9ed7d55a84 Small bug fixes/imports 2022-03-02 16:34:55 +08:00
Daniel Povey
9d1b4ae046 Add pelu to this good-performing setup.. 2022-03-02 16:33:27 +08:00
Fangjun Kuang
50d2281524
Add modified transducer loss for AIShell dataset (#219)
* Add modified transducer for aishell.

* Minor fixes.

* Add extra data in transducer training.

The extra data is from http://www.openslr.org/62/

* Update export.py and pretrained.py

* Update CI to install pretrained models with aishell.

* Update results.

* Update results.

* Update README.

* Use symlinks to avoid copies.
2022-03-02 16:02:38 +08:00
Fangjun Kuang
05cb297858
Update result for full libri + GigaSpeech using transducer_stateless. (#231) 2022-03-01 17:01:46 +08:00
Fangjun Kuang
72f838dee1
Update results for transducer_stateless after training for more epochs. (#207) 2022-03-01 16:35:02 +08:00
Daniel Povey
2ff520c800 Improvements to diagnostics (RE those with 1 dim 2022-02-28 12:22:27 +08:00
Daniel Povey
c1063def95 First version of rand-combine iterated-training-like idea. 2022-02-27 17:34:58 +08:00
Daniel Povey
63d8d935d4 Refactor/simplify ConformerEncoder 2022-02-27 13:56:15 +08:00
Daniel Povey
581786a6d3 Adding diagnostics code... 2022-02-27 13:44:43 +08:00
PF Luo
ac7c2d84bc
minor fix for aishell recipe (#223)
* just remove unnecessary torch.sum

* minor fixs for aishell
2022-02-23 08:33:20 +08:00
Fangjun Kuang
2332ba312d
Begin to use multiple datasets in training (#213)
* Begin to use multiple datasets.

* Finish preparing training datasets.

* Minor fixes

* Copy files.

* Finish training code.

* Display losses for gigaspeech and librispeech separately.

* Fix decode.py

* Make the probability to select a batch from GigaSpeech configurable.

* Update results.

* Minor fixes.
2022-02-21 15:27:27 +08:00
Fangjun Kuang
1c35ae1dba
Reset seed at the beginning of each epoch. (#221)
* Reset seed at the beginning of each epoch.

* Use a different seed for each epoch.
2022-02-21 15:16:39 +08:00
Fangjun Kuang
cbf8c18ebd
Minor fixes for aishell (#218)
* Minor fixes to aishell.

* Minor fixes.
2022-02-19 22:28:19 +08:00
PF Luo
277cc3f9bf
update aishell-1 recipe with k2.rnnt_loss (#215)
* update aishell-1 recipe with k2.rnnt_loss

* fix flak8 style

* typo

* add pretrained model link to result.md
2022-02-19 15:56:39 +08:00
Duo Ma
827b9df51a
Updated Aishell-1 transducer-stateless result (#217)
* Update RESULTS.md

* Update RESULTS.md
2022-02-19 15:56:04 +08:00
Wei Kang
b702281e90
Use k2 pruned transducer loss to train conformer-transducer model (#194)
* Using k2 pruned version transducer loss to train model

* Fix style

* Minor fixes
2022-02-17 13:33:54 +08:00
Wang, Guanbo
e8eb408760
Incremental pruning threshold (#214)
* Incremental pruning threshold

* flake8

* black

* minor fix
2022-02-16 16:59:27 +08:00
Daniel Povey
2af1b3af98 Remove ReLU in attention 2022-02-14 19:39:19 +08:00
Daniel Povey
d187ad8b73 Change max_frames from 0.2 to 0.15 2022-02-11 16:24:17 +08:00
Daniel Povey
4cd2c02fff Fix num_time_masks code; revert 0.8 to 0.9 2022-02-10 15:53:11 +08:00
Daniel Povey
c170c53006 Change p=0.9 to p=0.8 in SpecAug 2022-02-10 14:59:14 +08:00
Daniel Povey
8aa50df4f0 Change p=0.5->0.9, mask_fraction 0.3->0.2 2022-02-09 22:52:53 +08:00
Wang, Guanbo
70a3c56a18
Fix librispeech train.py (#211)
* fix librispeech train.py

* remove note
2022-02-09 16:42:28 +08:00
Daniel Povey
dd19a6a2b1 Fix to num_feature_masks bug I introduced; reduce max_frames_mask_fraction 0.4->0.3 2022-02-09 12:02:19 +08:00
Daniel Povey
bd36216e8c Use much more aggressive SpecAug setup 2022-02-08 21:55:20 +08:00
Daniel Povey
beaf5bfbab Merge specaug change from Mingshuang. 2022-02-08 19:42:23 +08:00
Daniel Povey
395065eb11 Merge branch 'spec-augment-change' of https://github.com/luomingshuang/icefall into attention_relu_specaug 2022-02-08 19:40:33 +08:00
Wang, Guanbo
be1c86b06c
print num_frame as %.2f (#204) 2022-02-08 14:56:58 +08:00
Mingshuang Luo
3323cabf46 Experiments based on SpecAugment change 2022-02-08 14:25:31 +08:00
Fangjun Kuang
27fa5f05d3
Update git SHA-1 in RESULTS.md for transducer_stateless. (#202) 2022-02-07 18:45:45 +08:00
Fangjun Kuang
a8150021e0
Use modified transducer loss in training. (#179)
* Use modified transducer loss in training.

* Minor fix.

* Add modified beam search.

* Add modified beam search.

* Minor fixes.

* Fix typo.

* Update RESULTS.

* Fix a typo.

* Minor fixes.
2022-02-07 18:37:36 +08:00
Daniel Povey
a859dcb205 Remove learnable offset, use relu instead. 2022-02-07 12:14:48 +08:00
Wei Kang
35ecd7e562
Fix torch.nn.Embedding error for torch below 1.8.0 (#198) 2022-02-06 21:59:54 +08:00
Daniel Povey
48a764eccf Add min in q,k,v of attention 2022-02-06 21:19:37 +08:00
Daniel Povey
8f8ec223a7 Changes to fbank computation, use lilcom chunky writer 2022-02-06 21:18:40 +08:00
pkufool
fcd25bdfff Fix torch.nn.Embedding error for torch below 1.8.0 2022-02-06 18:22:56 +08:00
Wei Kang
5ae80dfca7
Minor fixes (#193) 2022-01-27 18:01:17 +08:00
Piotr Żelasko
8e6fd97c6b
Merge pull request #185 from pzelasko/feature/libri-conformer-phone-ctc
Fix using `lang_phone` in conformer CTC training
2022-01-24 18:08:15 -05:00
Piotr Żelasko
1731cc37bb Black 2022-01-24 10:20:22 -05:00
Piotr Żelasko
f92c24a73a
Merge branch 'master' into feature/libri-conformer-phone-ctc 2022-01-24 10:18:56 -05:00
Piotr Żelasko
565c1d8413 Address code review 2022-01-24 10:17:47 -05:00
Piotr Żelasko
1d5fe8afa4 flake8 2022-01-21 17:27:02 -05:00
Piotr Żelasko
f0f35e6671 black 2022-01-21 17:22:41 -05:00
Piotr Żelasko
f28951f2b6 Add an assertion 2022-01-21 17:16:49 -05:00
Piotr Żelasko
3d109b121d Remove train_phones.py and modify train.py instead 2022-01-21 17:08:53 -05:00
Fangjun Kuang
d6050eb02e Fix calling optimized_transducer after new release. (#182) 2022-01-21 08:18:50 +08:00
Fangjun Kuang
f94ff19bfe
Refactor beam search and update results. (#177) 2022-01-18 16:40:19 +08:00
Fangjun Kuang
273e5fb2f3
Update git SHA1 for transducer_stateless model. (#174) 2022-01-10 11:58:17 +08:00
Fangjun Kuang
4c1b3665ee
Use optimized_transducer to compute transducer loss. (#162)
* WIP: Use optimized_transducer to compute transducer loss.

* Minor fixes.

* Fix decoding.

* Fix decoding.

* Add RESULTS.

* Update RESULTS.

* Update CI.

* Fix sampling rate for yesno recipe.
2022-01-10 11:54:58 +08:00
Piotr Żelasko
319e120869
Update feature config (compatible with Lhotse PR #525) (#172)
* Update feature config (compatible with Lhotse PR #525)

* black
2022-01-10 11:39:28 +08:00
Lucky Wong
6caff5fd38
minor fixes (#169)
* Fix no attribute 'data' error.

* minor fixes
2022-01-06 10:24:16 +08:00
Daniel Povey
4314309f1e
Merge pull request #168 from huangruizhe/patch-1
Update make_kn_lm.py
2022-01-03 18:38:03 +08:00
huangruizhe
298faabb90
minor fixes 2022-01-02 23:38:33 -08:00
huangruizhe
7577b08bed
fixed the mistake 2022-01-02 23:32:43 -08:00
huangruizhe
82c8fac6ee
fixed a case where BOW can have problem to compute (ZeroDivisionError) 2022-01-02 15:29:50 -08:00
huangruizhe
0a67015d63
Update make_kn_lm.py 2022-01-02 00:27:27 -08:00
huangruizhe
49aab7e658
Update make_kn_lm.py
Fixed issue #163
2022-01-02 00:14:27 -08:00
pingfengluo
ea8af0ee9a
add transducer_stateless with char unit to AIShell (#164) 2022-01-01 18:32:08 +08:00
Fangjun Kuang
413b2e8569
Add git sha1 to RESULTS.md for conformer encoder + stateless decoder. (#160) 2021-12-28 12:04:01 +08:00
Fangjun Kuang
14c93add50
Remove batchnorm, weight decay, and SOS from transducer conformer encoder (#155)
* Remove batchnorm, weight decay, and SOS.

* Make --context-size configurable.

* Update results.
2021-12-27 16:01:10 +08:00
Fangjun Kuang
8187d6236c
Minor fix to maximum number of symbols per frame for RNN-T decoding. (#157)
* Minor fix to maximum number of symbols per frame RNN-T decoding.

* Minor fixes.
2021-12-24 21:48:40 +08:00
Fangjun Kuang
5b6699a835
Minor fixes to the RNN-T Conformer model (#152)
* Disable weight decay.

* Remove input feature batchnorm..

* Replace BatchNorm in the Conformer model with LayerNorm.

* Use tanh in the joint network.

* Remove sos ID.

* Reduce the number of decoder layers from 4 to 2.

* Minor fixes.

* Fix typos.
2021-12-23 13:54:25 +08:00
Fangjun Kuang
fb6a57e9e0
Increase the size of the context in the RNN-T decoder. (#153) 2021-12-23 07:55:02 +08:00
Fangjun Kuang
cb04c8a750
Limit the number of symbols per frame in RNN-T decoding. (#151) 2021-12-18 11:00:42 +08:00
Fangjun Kuang
1d44da845b
RNN-T Conformer training for LibriSpeech (#143)
* Begin to add RNN-T training for librispeech.

* Copy files from conformer_ctc.

Will edit it.

* Use conformer/transformer model as encoder.

* Begin to add training script.

* Add training code.

* Remove long utterances to avoid OOM when a large max_duraiton is used.

* Begin to add decoding script.

* Add decoding script.

* Minor fixes.

* Add beam search.

* Use LSTM layers for the encoder.

Need more tunings.

* Use stateless decoder.

* Minor fixes to make it ready for merge.

* Fix README.

* Update RESULT.md to include RNN-T Conformer.

* Minor fixes.

* Fix tests.

* Minor fixes.

* Minor fixes.

* Fix tests.
2021-12-18 07:42:51 +08:00
Wei Kang
76a51bf037
Fix aishell tdnn_lstm_ctc decoding (#149) 2021-12-14 14:42:58 +08:00
Wei Kang
a183d5bfd7
Remove batchnorm (#147)
* Remove batch normalization

* Minor fixes

* Fix typo

* Fix comments

* Add assertion for use_feat_batchnorm
2021-12-14 08:20:03 +08:00
Fangjun Kuang
95af039733
RNN-T training for yesno. (#141)
* RNN-T training for yesno.

* Rename Jointer to Joiner.
2021-12-07 21:44:37 +08:00
Fangjun Kuang
1aff64b708
Apply layer normalization to the output of each gate in LSTM/GRU. (#139)
* Apply layer normalization to the output of each gate in LSTM.

* Apply layer normalization to the output of each gate in GRU.

* Add projection support to LayerNormLSTMCell.

* Add GPU tests.

* Use typeguard.check_argument_types() to validate type annotations.

* Add typeguard as a requirement.

* Minor fixes.

* Fix CI.

* Fix CI.

* Fix test failures for torch 1.8.0

* Fix errors.
2021-12-07 18:38:03 +08:00
pingfengluo
d1adc25338
Update AIShell recipe result (#140)
* add MMI to AIShell

* fix MMI decode graph

* export model

* typo

* fix code style

* typo

* fix data prepare to just use train text by uid

* use a faster way to get the intersection of train and aishell_transcript_v0.8.txt

* update AIShell result

* update

* typo
2021-12-04 14:43:04 +08:00
pingfengluo
89b84208aa
add phone based LF-MMI training to AIShell recipe (#137)
* add MMI to AIShell

* fix MMI decode graph

* export model

* typo

* fix code style

* typo
2021-12-02 12:32:23 +08:00
Fangjun Kuang
ec591698b0
Associate a cut with token alignment (without repeats) (#125)
* WIP: Associate a cut with token alignment (without repeats)

* Save framewise alignments with/without repeats.

* Minor fixes.
2021-11-29 18:50:54 +08:00
Fangjun Kuang
243fb9723c
Fix an error introduced while supporting torchscript. (#134)
Should be `G.dummy = 1`, not `G["dummy"] = 1`.
2021-11-27 09:07:04 +08:00
Fangjun Kuang
0e541f5b5d
Print hostname and IP address to the log. (#131)
We are using multiple machines to do various experiments. It makes
life easier to know which experiment is running on which machine
if we also log the IP and hostname of the machine.
2021-11-26 11:25:59 +08:00
LIyong.Guo
00e2f0ade8
Draft streaming decoding (#89)
* reusable parts from conformer_ctc

* streaming conformer code

* a trained model
2021-11-24 19:35:18 +08:00
Piotr Żelasko
8eb94fa4a0 CTC-only phone conformer recipe for LibriSpeech 2021-11-23 15:34:46 -05:00
Lucky Wong
769a9791ec
Fix no attribute 'data' error. (#129) 2021-11-22 18:31:04 +08:00
Wei Kang
e2c9c728d9
Update aishell tensorboard log for new LabelSmoothing loss (#128)
* Update aishell tensorboard log for new LabelSmoothing loss

* Minor fixes
2021-11-22 12:26:44 +08:00
Wei Kang
4151cca147
Add torch script support for Aishell and update documents (#124)
* Add aishell recipe

* Remove unnecessary code and update docs

* adapt to k2 v1.7, add docs and results

* Update conformer ctc model

* Update docs, pretrained.py & results

* Fix code style

* Fix code style

* Fix code style

* Minor fix

* Minor fix

* Fix pretrained.py

* Update pretrained model & corresponding docs

* Export torch script model for Aishell

* Add C++ deployment docs

* Minor fixes

* Fix unit test

* Update Readme
2021-11-19 16:37:05 +08:00
Wei Kang
30c43b7f69
Add aishell recipe (#30)
* Add aishell recipe

* Remove unnecessary code and update docs

* adapt to k2 v1.7, add docs and results

* Update conformer ctc model

* Update docs, pretrained.py & results

* Fix code style

* Fix code style

* Fix code style

* Minor fix

* Minor fix

* Fix pretrained.py

* Update pretrained model & corresponding docs
2021-11-18 10:00:47 +08:00
Fangjun Kuang
0660d12e4e
Fix computing WERs for empty hypotheses (#118)
* Fix computing WERs when empty lattices are generated.

* Minor fixes.
2021-11-17 19:25:47 +08:00
Fangjun Kuang
336283f872
New label smoothing (#109)
* Modify label smoothing to match the one implemented in PyTorch.

* Enable CI for torch 1.10

* Fix CI errors.

* Fix CI installation errors.

* Fix CI installation errors.

* Minor fixes.

* Minor fixes.

* Minor fixes.

* Minor fixes.

* Minor fixes.

* Fix CI errors.
2021-11-17 19:24:07 +08:00
Mingshuang Luo
10e46f3e1d
A little changes for timit recipe (#122)
* Update train.py

* Update train.py

* Update train.py

* Update tdnn_ligru_ctc.rst
2021-11-17 16:13:51 +08:00
Mingshuang Luo
2e0f255ada
Add timit recipe (including the code scripts and the docs) for icefall (#114)
* add timit recipe for icefall

* add shared file

* update the docs for timit recipe

* Delete shared

* update the timit recipe and check style

* Update model.py

* Do some changes

* Update model.py

* Update model.py

* Add README.md and RESULTS.md

* Update RESULTS.md

* Update README.md

* update the docs for timit recipe
2021-11-17 11:23:45 +08:00
Fangjun Kuang
68506609ad
Set fsa.properties to None after changing its labels in-place. (#121) 2021-11-16 23:11:30 +08:00
Daniel Povey
b9452235d5
Merge pull request #117 from csukuangfj/fix-empty-lattice
Handle empty lattices in attention decoder rescoring.
2021-11-11 16:26:02 +08:00
Fangjun Kuang
5b10310bd1 Handle empty lattices in attention decoder rescoring. 2021-11-11 15:42:30 +08:00
Fangjun Kuang
8d679c3e74
Fix typos. (#115) 2021-11-10 14:45:30 +08:00
Fangjun Kuang
21096e99d8
Update result for the librispeech recipe using vocab size 500 and att rate 0.8 (#113)
* Update RESULTS using vocab size 500, att rate 0.8

* Update README.

* Refactoring.

Since FSAs in an Nbest object are linear in structure, we can
add the scores of a path to compute the total scores.

* Update documentation.

* Change default vocab size from 5000 to 500.
2021-11-10 14:32:52 +08:00
Fangjun Kuang
04029871b6
Fix a bug in Nbest.compute_am_scores and Nbest.compute_lm_scores. (#111) 2021-11-09 13:44:51 +08:00
Fangjun Kuang
91cfecebf2
Remove duplicated token seq in rescoring. (#108)
* Remove duplicated token seq in rescoring.

* Use a larger range for ngram_lm_scale and attention_scale
2021-11-06 08:54:45 +08:00
Fangjun Kuang
810b193dcc
Clarify the doc about ctc-decoding. (#104) 2021-11-03 07:16:49 +08:00
Fangjun Kuang
42b437bea6
Use pre-sorted text to generate token ids for attention decoder. (#98)
* Use pre-sorted text to generate token ids for attention decoder.

See https://github.com/k2-fsa/icefall/issues/97
for more details.

* Fix typos.
2021-10-29 13:46:41 +08:00
Fangjun Kuang
12d647d899
Add a note about the CUDA OOM error. (#94)
* Add a note about the CUDA OOM error.

Some users consider this kind of OOM as an error during decoding,
but actually it is not. This pull request clarifies that.

* Fix style issues.
2021-10-29 12:17:56 +08:00
Fangjun Kuang
8cb7f712e4
Use GPU for averaging checkpoints if possible. (#84) 2021-10-26 17:10:04 +08:00
Fangjun Kuang
712ead8207
Fix an error when attention decoder rescoring returns None. (#90) 2021-10-22 19:52:25 +08:00
Piotr Żelasko
902e0b238d
Merge pull request #82 from pzelasko/feature/find-pessimistic-batches
Find CUDA OOM batches before starting training
2021-10-19 11:26:13 -04:00
Piotr Żelasko
3cc99d2af2 make flake8 happy 2021-10-19 11:24:54 -04:00
cdxie
d30244e28f
add a docker file for some users (#87)
* add a docker file for some users

Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8-python3.8

* add a describing file of how to use dockerfile

give some steps to use dockerfile
2021-10-19 13:00:59 +08:00
Piotr Żelasko
86f3e0ef37 Make flake8 happy 2021-10-18 09:54:40 -04:00
Piotr Żelasko
6fbd7a287c Refactor OOM batch scanning into a local function 2021-10-18 09:53:04 -04:00
Piotr Żelasko
d509d58f30 Merge branch 'master' into feature/find-pessimistic-batches 2021-10-18 09:47:21 -04:00
Fangjun Kuang
3effcb4225
Fix typos. (#85) 2021-10-18 16:17:14 +08:00
Fangjun Kuang
53b79fafa7
Add MMI training with word pieces as modelling unit. (#6)
* Fix an error in TDNN-LSTM training.

* WIP: Refactoring

* Refactor transformer.py

* Remove unused code.

* Minor fixes.

* Fix decoder padding mask.

* Add MMI training with word pieces.

* Remove unused files.

* Minor fixes.

* Refactoring.

* Minor fixes.

* Use pre-computed alignments in LF-MMI training.

* Minor fixes.

* Update decoding script.

* Add doc about how to check and use extracted alignments.

* Fix style issues.

* Fix typos.

* Fix style issues.

* Disable macOS tests for now.
2021-10-18 15:20:32 +08:00
Fangjun Kuang
4890e27b45
Extract framewise alignment information using CTC decoding (#39)
* Use new APIs with k2.RaggedTensor

* Fix style issues.

* Update the installation doc, saying it requires at least k2 v1.7

* Extract framewise alignment information using CTC decoding.

* Print environment information.

Print information about k2, lhotse, PyTorch, and icefall.

* Fix CI.

* Fix CI.

* Compute framewise alignment information of the LibriSpeech dataset.

* Update comments for the time to compute alignments of train-960.

* Preserve cut id in mix cut transformer.

* Minor fixes.

* Add doc about how to extract framewise alignments.
2021-10-18 14:24:33 +08:00
Jan "yenda" Trmal
bd7c2f7645
fix conformer typo in docs (#83) 2021-10-16 07:46:17 +08:00
Piotr Żelasko
403d1744ff Introduce backprop in finding OOM batches 2021-10-15 10:05:13 -04:00
Piotr Żelasko
060117a9ff Reformatting 2021-10-14 21:40:14 -04:00
Piotr Żelasko
1c7c79f2fc Find CUDA OOM batches before starting training 2021-10-14 21:28:11 -04:00
Fangjun Kuang
fee1f84b20
Test pre-trained model in CI (#80)
* Add CI to run pre-trained models.

* Minor fixes.

* Install kaldifeat

* Install a CPU version of PyTorch.

* Fix CI errors.

* Disable decoder layers in pretrained.py if it is not used.

* Clone pre-trained model from GitHub.

* Minor fixes.

* Minor fixes.

* Minor fixes.
2021-10-15 00:41:33 +08:00
Mingshuang Luo
5401ce199d
Update ctc-decoding on pretrained.py and conformer_ctc.rst (#78) 2021-10-14 23:29:06 +08:00
Fangjun Kuang
f2387fe523
Fix a bug introduced while supporting torch script. (#79) 2021-10-14 20:09:38 +08:00
Fangjun Kuang
5016ee3c95
Give an informative message when users provide an unsupported decoding method (#77) 2021-10-14 16:20:35 +08:00
Mingshuang Luo
39bc8cae94
Add ctc decoding to pretrained.py on conformer_ctc (#75)
* Add ctc-decoding to pretrained.py

* update pretrained.py and conformer_ctc.rst

* update ctc-decoding for pretrained.py on conformer_ctc

* Update pretrained.py

* fix the style issue

* Update conformer_ctc.rst

* Update the running logs
2021-10-13 12:20:16 +08:00
Mingshuang Luo
391432b356
Update train.py ("10"--->"params.log_interval") (#76)
* Update train.py

* Update train.py

* Update train.py
2021-10-12 21:30:31 +08:00
Mingshuang Luo
597c5efdb1
Use LossRecord to record and print the loss for the training process (#62)
* Update index.rst (AS->ASR)

* Update conformer_ctc.rst (pretraind->pretrained)

* Fix some spelling errors.

* Fix some spelling errors.

* Use LossRecord to record and print loss in the training process

* Change the name "LossRecord" to "MetricsTracker"
2021-10-12 15:58:03 +08:00
Fangjun Kuang
beb54ddb61
Support torch script. (#65)
* WIP: Support torchscript.

* Minor fixes.

* Fix style issues.

* Add documentation about how to deploy a trained model.
2021-10-12 14:55:05 +08:00
Piotr Żelasko
d54828e73a
Merge pull request #73 from pzelasko/feature/bucketing-in-test
Use BucketingSampler for dev and test data
2021-10-09 10:58:29 -04:00
Piotr Żelasko
069ebaf9ba Reformatting 2021-10-09 14:45:46 +00:00
Mingshuang Luo
6e43905d12
Update the documentation to include "ctc-decoding" (#71)
* Update conformer_ctc.rst
2021-10-09 11:56:25 +08:00
Piotr Żelasko
b682467e4d Use BucketingSampler for dev and test data 2021-10-08 22:32:13 -04:00
Piotr Żelasko
adb068eb82
setup.py (#64) 2021-10-01 16:43:08 +08:00
Fangjun Kuang
707d7017a7
Support pure ctc decoding requiring neither a lexicon nor an n-gram LM (#58)
* Rename lattice_score_scale to nbest_scale.

* Support pure CTC decoding requiring neither a lexicion nor an n-gram LM.

* Fix style issues.

* Fix a typo.

* Minor fixes.
2021-09-26 14:21:49 +08:00
Fangjun Kuang
455693aede
Fix hasattr of AttributeDict. (#52) 2021-09-22 16:37:20 +08:00
Fangjun Kuang
a80e58e15d
Refactor decode.py to make it more readable and more modular. (#44)
* Refactor decode.py to make it more readable and more modular.

* Fix an error.

Nbest.fsa should always have token IDs as labels and
word IDs as aux_labels.

* Add nbest decoding.

* Compute edit distance with k2.

* Refactor nbest-oracle.

* Add rescore with nbest lists.

* Add whole-lattice rescoring.

* Add rescoring with attention decoder.

* Refactoring.

* Fixes after refactoring.

* Fix a typo.

* Minor fixes.

* Replace [] with () for shapes.

* Use k2 v1.9

* Use Levenshtein graphs/alignment from k2 v1.9

* [doc] Require k2 >= v1.9

* Minor fixes.
2021-09-20 15:44:54 +08:00
Fangjun Kuang
cc77cb3459
Fix decode.py to remove the correct axis. (#50)
* Fix decode.py to remove the correct axis.

* Run GitHub actions manually.
2021-09-17 16:49:03 +08:00
Wei Kang
9a6e0489c8
update api for RaggedTensor (#45)
* Fix code style

* update k2 version in CI

* fix compile hlg
2021-09-14 16:39:56 +08:00
Fangjun Kuang
a2be2896a9
Fix the link to k2's installation doc. (#46) 2021-09-14 13:39:52 +08:00
Wei Kang
24656e9749
Update docs and remove unnecessary arguments (#42)
* Fix typo in docs

* Update docs and remove unnecessary arguments

* Fix code style
2021-09-13 18:28:57 +08:00
Fangjun Kuang
f792b466bf
Change default value of lattice-score-scale from 1.0 to 0.5 (#41)
* Change the default value of lattice-score-scale from 1.0 to 0.5

* Fix CI.
2021-09-13 10:49:18 +08:00
Fangjun Kuang
7f8e3a673a
Add commands for reproducing. (#40)
* Add commands for reproducing.

* Use --bucketing-sampler by default.
2021-09-09 13:50:31 +08:00
Fangjun Kuang
abadc71415
Use new APIs with k2.RaggedTensor (#38)
* Use new APIs with k2.RaggedTensor

* Fix style issues.

* Update the installation doc, saying it requires at least k2 v1.7

* Use k2 v1.7
2021-09-08 14:55:30 +08:00
Fangjun Kuang
331e5eb7ab
[doc] Fix typos. (#31) 2021-09-02 07:12:37 +08:00
3118 changed files with 630494 additions and 3466 deletions

30
.flake8
View File

@ -1,11 +1,35 @@
[flake8] [flake8]
show-source=true show-source=true
statistics=true statistics=true
max-line-length = 80 max-line-length = 88
per-file-ignores = per-file-ignores =
# line too long # line too long
egs/librispeech/ASR/conformer_ctc/conformer.py: E501, icefall/diagnostics.py: E501,
egs/*/ASR/*/conformer.py: E501,
egs/*/ASR/pruned_transducer_stateless*/*.py: E501,
egs/*/ASR/*/optim.py: E501,
egs/*/ASR/*/scaling.py: E501,
egs/librispeech/ASR/lstm_transducer_stateless*/*.py: E501, E203
egs/librispeech/ASR/conv_emformer_transducer_stateless*/*.py: E501, E203
egs/librispeech/ASR/conformer_ctc*/*py: E501,
egs/librispeech/ASR/zipformer_mmi/*.py: E501, E203
egs/librispeech/ASR/zipformer/*.py: E501, E203
egs/librispeech/ASR/RESULTS.md: E999,
egs/ljspeech/TTS/vits/*.py: E501, E203
# invalid escape sequence (cause by tex formular), W605
icefall/utils.py: E501, W605
exclude = exclude =
.git, .git,
**/data/** **/data/**,
icefall/shared/make_kn_lm.py,
icefall/__init__.py
icefall/ctc/__init__.py
ignore =
# E203 white space before ":"
E203,
# W503 line break before binary operator
W503,
# E226 missing whitespace around arithmetic operator
E226,

3
.git-blame-ignore-revs Normal file
View File

@ -0,0 +1,3 @@
# Migrate to 88 characters per line (see: https://github.com/lhotse-speech/lhotse/issues/890)
107df3b115a58f1b68a6458c3f94a130004be34c
d31db010371a4128856480382876acdc0d1739ed

1
.github/scripts/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
piper_phonemize.html

343
.github/scripts/aishell/ASR/run.sh vendored Executable file
View File

@ -0,0 +1,343 @@
#!/usr/bin/env bash
set -ex
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/aishell/ASR
function download_test_dev_manifests() {
git lfs install
fbank_url=https://huggingface.co/csukuangfj/aishell-test-dev-manifests
log "Downloading pre-commputed fbank from $fbank_url"
git clone https://huggingface.co/csukuangfj/aishell-test-dev-manifests
ln -s $PWD/aishell-test-dev-manifests/data .
}
function test_transducer_stateless3_2022_06_20() {
repo_url=https://huggingface.co/csukuangfj/icefall-aishell-pruned-transducer-stateless3-2022-06-20
log "Downloading pre-trained model from $repo_url"
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
ln -s pretrained-epoch-29-avg-5-torch-1.10.0.pt pretrained.pt
popd
log "test greedy_search with pretrained.py"
for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"
./pruned_transducer_stateless3/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
log "test beam search with pretrained.py"
for method in modified_beam_search beam_search fast_beam_search; do
log "$method"
./pruned_transducer_stateless3/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_LABEL_NAME}" == x"run-decode" ]]; then
mkdir -p pruned_transducer_stateless3/exp
ln -s $PWD/$repo/exp/pretrained.pt pruned_transducer_stateless3/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_char data/
ls -lh data
ls -lh pruned_transducer_stateless3/exp
log "Decoding test and dev"
# use a small value for decoding with CPU
max_duration=100
for method in greedy_search fast_beam_search modified_beam_search; do
log "Decoding with $method"
./pruned_transducer_stateless3/decode.py \
--decoding-method $method \
--epoch 999 \
--avg 1 \
--max-duration $max_duration \
--exp-dir pruned_transducer_stateless3/exp
done
rm pruned_transducer_stateless3/exp/*.pt
fi
rm -rf $repo
}
function test_zipformer_large_2023_10_24() {
log "CI testing large model"
repo_url=https://huggingface.co/zrjin/icefall-asr-aishell-zipformer-large-2023-10-24/
log "Downloading pre-trained model from $repo_url"
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
for method in modified_beam_search greedy_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--context-size 1 \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_char/tokens.txt \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
rm -rf $repo
}
function test_zipformer_2023_10_24() {
repo_url=https://huggingface.co/zrjin/icefall-asr-aishell-zipformer-2023-10-24/
log "Downloading pre-trained model from $repo_url"
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
for method in modified_beam_search greedy_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--context-size 1 \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_char/tokens.txt \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
rm -rf $repo
}
function test_zipformer_small_2023_10_24() {
log "CI testing small model"
repo_url=https://huggingface.co/zrjin/icefall-asr-aishell-zipformer-small-2023-10-24/
log "Downloading pre-trained model from $repo_url"
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
for method in modified_beam_search greedy_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--context-size 1 \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_char/tokens.txt \
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,768,768,768,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
rm -rf $repo
}
function test_transducer_stateless_modified_2022_03_01() {
repo_url=https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2022-03-01
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"
./transducer_stateless_modified/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
for method in modified_beam_search beam_search; do
log "$method"
./transducer_stateless_modified/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
rm -rf $repo
}
function test_transducer_stateless_modified_2_2022_03_01() {
repo_url=https://huggingface.co/csukuangfj/icefall-aishell-transducer-stateless-modified-2-2022-03-01
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"
./transducer_stateless_modified-2/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
for method in modified_beam_search beam_search; do
log "$method"
./transducer_stateless_modified-2/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/BAC009S0764W0121.wav \
$repo/test_wavs/BAC009S0764W0122.wav \
$repo/test_wavs/BAC009S0764W0123.wav
done
rm -rf $repo
}
function test_conformer_ctc() {
repo_url=https://huggingface.co/csukuangfj/icefall_asr_aishell_conformer_ctc
log "Downloading pre-trained model from $repo_url"
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained.pt"
git lfs pull --include "data/lang_char/H.fst"
git lfs pull --include "data/lang_char/HL.fst"
git lfs pull --include "data/lang_char/HLG.fst"
popd
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
log "CTC decoding"
log "Exporting model with torchscript"
pushd $repo/exp
ln -s pretrained.pt epoch-99.pt
popd
./conformer_ctc/export.py \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--tokens $repo/data/lang_char/tokens.txt \
--jit 1
ls -lh $repo/exp
ls -lh $repo/data/lang_char
log "Decoding with H on CPU with OpenFst"
./conformer_ctc/jit_pretrained_decode_with_H.py \
--nn-model $repo/exp/cpu_jit.pt \
--H $repo/data/lang_char/H.fst \
--tokens $repo/data/lang_char/tokens.txt \
$repo/test_wavs/0.wav \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav
log "Decoding with HL on CPU with OpenFst"
./conformer_ctc/jit_pretrained_decode_with_HL.py \
--nn-model $repo/exp/cpu_jit.pt \
--HL $repo/data/lang_char/HL.fst \
--words $repo/data/lang_char/words.txt \
$repo/test_wavs/0.wav \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav
log "Decoding with HLG on CPU with OpenFst"
./conformer_ctc/jit_pretrained_decode_with_HLG.py \
--nn-model $repo/exp/cpu_jit.pt \
--HLG $repo/data/lang_char/HLG.fst \
--words $repo/data/lang_char/words.txt \
$repo/test_wavs/0.wav \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav
rm -rf $repo
}
download_test_dev_manifests
test_transducer_stateless3_2022_06_20
test_zipformer_large_2023_10_24
test_zipformer_2023_10_24
test_zipformer_small_2023_10_24
test_transducer_stateless_modified_2022_03_01
test_transducer_stateless_modified_2_2022_03_01
# test_conformer_ctc # fails for torch 1.13.x and torch 2.0.x

94
.github/scripts/audioset/AT/run.sh vendored Executable file
View File

@ -0,0 +1,94 @@
#!/usr/bin/env bash
set -ex
python3 -m pip install onnxoptimizer onnxsim
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/audioset/AT
function test_pretrained() {
repo_url=https://huggingface.co/marcoyang/icefall-audio-tagging-audioset-zipformer-2024-03-12
repo=$(basename $repo_url)
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
pushd $repo/exp
git lfs pull --include pretrained.pt
ln -s pretrained.pt epoch-99.pt
ls -lh
popd
log "test pretrained.pt"
python3 zipformer/pretrained.py \
--checkpoint $repo/exp/pretrained.pt \
--label-dict $repo/data/class_labels_indices.csv \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav \
$repo/test_wavs/3.wav \
$repo/test_wavs/4.wav
log "test jit export"
ls -lh $repo/exp/
python3 zipformer/export.py \
--exp-dir $repo/exp \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--jit 1
ls -lh $repo/exp/
log "test jit models"
python3 zipformer/jit_pretrained.py \
--nn-model-filename $repo/exp/jit_script.pt \
--label-dict $repo/data/class_labels_indices.csv \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav \
$repo/test_wavs/3.wav \
$repo/test_wavs/4.wav
log "test onnx export"
ls -lh $repo/exp/
python3 zipformer/export-onnx.py \
--exp-dir $repo/exp \
--epoch 99 \
--avg 1 \
--use-averaged-model 0
ls -lh $repo/exp/
pushd $repo/exp/
mv model-epoch-99-avg-1.onnx model.onnx
mv model-epoch-99-avg-1.int8.onnx model.int8.onnx
popd
ls -lh $repo/exp/
log "test onnx models"
for m in model.onnx model.int8.onnx; do
log "$m"
python3 zipformer/onnx_pretrained.py \
--model-filename $repo/exp/model.onnx \
--label-dict $repo/data/class_labels_indices.csv \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav \
$repo/test_wavs/3.wav \
$repo/test_wavs/4.wav
done
log "prepare data for uploading to huggingface"
dst=/icefall/model-onnx
mkdir -p $dst
cp -v $repo/exp/*.onnx $dst/
cp -v $repo/data/* $dst/
cp -av $repo/test_wavs $dst
ls -lh $dst
ls -lh $dst/test_wavs
}
test_pretrained

167
.github/scripts/baker_zh/TTS/run-matcha.sh vendored Executable file
View File

@ -0,0 +1,167 @@
#!/usr/bin/env bash
set -ex
apt-get update
apt-get install -y sox
python3 -m pip install numba conformer==0.3.2 diffusers librosa
python3 -m pip install jieba
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/baker_zh/TTS
sed -i.bak s/600/8/g ./prepare.sh
sed -i.bak s/"first 100"/"first 3"/g ./prepare.sh
sed -i.bak s/500/5/g ./prepare.sh
git diff
function prepare_data() {
# We have created a subset of the data for testing
#
mkdir -p download
pushd download
wget -q https://huggingface.co/csukuangfj/tmp-files/resolve/main/BZNSYP-samples.tar.bz2
tar xvf BZNSYP-samples.tar.bz2
mv BZNSYP-samples BZNSYP
rm BZNSYP-samples.tar.bz2
popd
./prepare.sh
tree .
}
function train() {
pushd ./matcha
sed -i.bak s/1500/3/g ./train.py
git diff .
popd
./matcha/train.py \
--exp-dir matcha/exp \
--num-epochs 1 \
--save-every-n 1 \
--num-buckets 2 \
--tokens data/tokens.txt \
--max-duration 20
ls -lh matcha/exp
}
function infer() {
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v2
./matcha/infer.py \
--num-buckets 2 \
--epoch 1 \
--exp-dir ./matcha/exp \
--tokens data/tokens.txt \
--cmvn ./data/fbank/cmvn.json \
--vocoder ./generator_v2 \
--input-text "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔。" \
--output-wav ./generated.wav
ls -lh *.wav
soxi ./generated.wav
rm -v ./generated.wav
rm -v generator_v2
}
function export_onnx() {
pushd matcha/exp
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-baker-matcha-zh-2024-12-27/resolve/main/epoch-2000.pt
popd
pushd data/fbank
rm -v *.json
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-baker-matcha-zh-2024-12-27/resolve/main/cmvn.json
popd
./matcha/export_onnx.py \
--exp-dir ./matcha/exp \
--epoch 2000 \
--tokens ./data/tokens.txt \
--cmvn ./data/fbank/cmvn.json
ls -lh *.onnx
if false; then
# The CI machine does not have enough memory to run it
#
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v1
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v2
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v3
python3 ./matcha/export_onnx_hifigan.py
else
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/hifigan_v1.onnx
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/hifigan_v2.onnx
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/hifigan_v3.onnx
fi
ls -lh *.onnx
python3 ./matcha/generate_lexicon.py
for v in v1 v2 v3; do
python3 ./matcha/onnx_pretrained.py \
--acoustic-model ./model-steps-6.onnx \
--vocoder ./hifigan_$v.onnx \
--tokens ./data/tokens.txt \
--lexicon ./lexicon.txt \
--input-text "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔。" \
--output-wav /icefall/generated-matcha-tts-steps-6-$v.wav
done
ls -lh /icefall/*.wav
soxi /icefall/generated-matcha-tts-steps-6-*.wav
cp ./model-steps-*.onnx /icefall
d=matcha-icefall-zh-baker
mkdir $d
cp -v data/tokens.txt $d
cp -v lexicon.txt $d
cp model-steps-3.onnx $d
pushd $d
curl -SL -O https://github.com/csukuangfj/cppjieba/releases/download/sherpa-onnx-2024-04-19/dict.tar.bz2
tar xvf dict.tar.bz2
rm dict.tar.bz2
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-aishell3-vits-low-2024-04-06/resolve/main/data/date.fst
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-aishell3-vits-low-2024-04-06/resolve/main/data/number.fst
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-aishell3-vits-low-2024-04-06/resolve/main/data/phone.fst
cat >README.md <<EOF
# Introduction
This model is trained using the dataset from
https://en.data-baker.com/datasets/freeDatasets/
The dataset contains 10000 Chinese sentences of a native Chinese female speaker,
which is about 12 hours.
**Note**: The dataset is for non-commercial use only.
You can find the training code at
https://github.com/k2-fsa/icefall/tree/master/egs/baker_zh/TTS
EOF
ls -lh
popd
tar cvjf $d.tar.bz2 $d
mv $d.tar.bz2 /icefall
mv $d /icefall
}
prepare_data
train
infer
export_onnx
rm -rfv generator_v* matcha/exp
git checkout .

View File

@ -0,0 +1,19 @@
#!/usr/bin/env bash
# This script computes fbank features for the test-clean and test-other datasets.
# The computed features are saved to ~/tmp/fbank-libri and are
# cached for later runs
set -e
export PYTHONPATH=$PWD:$PYTHONPATH
echo $PYTHONPATH
mkdir ~/tmp/fbank-libri
cd egs/librispeech/ASR
mkdir -p data
cd data
[ ! -e fbank ] && ln -s ~/tmp/fbank-libri fbank
cd ..
./local/compute_fbank_librispeech.py --dataset 'test-clean test-other'
ls -lh data/fbank/

75
.github/scripts/docker/Dockerfile vendored Normal file
View File

@ -0,0 +1,75 @@
ARG PYTHON_VERSION=3.8
FROM python:${PYTHON_VERSION}
ARG TORCHAUDIO_VERSION="0.13.0"
ARG TORCH_VERSION="1.13.0"
ARG K2_VERSION="1.24.4.dev20231220"
ARG KALDIFEAT_VERSION="1.25.3.dev20231221"
ARG _K2_VERSION="${K2_VERSION}+cpu.torch${TORCH_VERSION}"
ARG _KALDIFEAT_VERSION="${KALDIFEAT_VERSION}+cpu.torch${TORCH_VERSION}"
RUN apt-get update -y && \
apt-get install -qq -y \
cmake \
ffmpeg \
git \
git-lfs \
graphviz \
less \
tree \
vim \
&& \
apt-get clean && \
rm -rf /var/cache/apt/archives /var/lib/apt/lists
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${_K2_VERSION}
LABEL kaldifeat_version=${_KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
# Install dependencies
RUN pip install --no-cache-dir \
torch==${TORCH_VERSION}+cpu -f https://download.pytorch.org/whl/torch \
torchaudio==${TORCHAUDIO_VERSION}+cpu -f https://download.pytorch.org/whl/torchaudio \
k2==${_K2_VERSION} -f https://k2-fsa.github.io/k2/cpu.html \
\
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${_KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cpu.html \
conformer==0.3.2 \
cython \
diffusers \
dill \
espnet_tts_frontend \
graphviz \
kaldi-decoder \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
librosa \
"matplotlib<=3.9.4" \
multi_quantization \
numba \
"numpy<2.0" \
onnxoptimizer \
onnxsim \
onnx==1.17.0 \
onnxmltools \
onnxruntime==1.17.1 \
piper_phonemize -f https://k2-fsa.github.io/icefall/piper_phonemize.html \
pypinyin==0.50.0 \
pytest \
sentencepiece>=0.1.96 \
six \
tensorboard \
typeguard
# RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
# cd /workspace/icefall && \
# pip install --no-cache-dir -r requirements.txt
#
# ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
#
# WORKDIR /workspace/icefall

View File

@ -0,0 +1,140 @@
#!/usr/bin/env python3
# Copyright 2023 Xiaomi Corp. (authors: Fangjun Kuang)
import argparse
import json
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--min-torch-version",
help="torch version",
)
parser.add_argument(
"--torch-version",
help="torch version",
)
parser.add_argument(
"--python-version",
help="python version",
)
return parser.parse_args()
def version_gt(a, b):
a_major, a_minor = list(map(int, a.split(".")))[:2]
b_major, b_minor = list(map(int, b.split(".")))[:2]
if a_major > b_major:
return True
if a_major == b_major and a_minor > b_minor:
return True
return False
def version_ge(a, b):
a_major, a_minor = list(map(int, a.split(".")))[:2]
b_major, b_minor = list(map(int, b.split(".")))[:2]
if a_major > b_major:
return True
if a_major == b_major and a_minor >= b_minor:
return True
return False
def get_torchaudio_version(torch_version):
if torch_version == "1.13.0":
return "0.13.0"
elif torch_version == "1.13.1":
return "0.13.1"
elif torch_version == "2.0.0":
return "2.0.1"
elif torch_version == "2.0.1":
return "2.0.2"
else:
return torch_version
def get_matrix(min_torch_version, specified_torch_version, specified_python_version):
k2_version = "1.24.4.dev20250630"
kaldifeat_version = "1.25.5.dev20250630"
version = "20250630"
# torchaudio 2.5.0 does not support python 3.13
python_version = ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13"]
torch_version = []
torch_version += ["1.13.0", "1.13.1"]
torch_version += ["2.0.0", "2.0.1"]
torch_version += ["2.1.0", "2.1.1", "2.1.2"]
torch_version += ["2.2.0", "2.2.1", "2.2.2"]
# Test only torch >= 2.3.0
torch_version += ["2.3.0", "2.3.1"]
torch_version += ["2.4.0"]
torch_version += ["2.4.1"]
torch_version += ["2.5.0"]
torch_version += ["2.5.1"]
torch_version += ["2.6.0", "2.7.0", "2.7.1"]
if specified_torch_version:
torch_version = [specified_torch_version]
if specified_python_version:
python_version = [specified_python_version]
matrix = []
for p in python_version:
for t in torch_version:
if min_torch_version and version_gt(min_torch_version, t):
continue
# torchaudio <= 1.13.x supports only python <= 3.10
if version_gt(p, "3.10") and not version_gt(t, "2.0"):
continue
# only torch>=2.2.0 supports python 3.12
if version_gt(p, "3.11") and not version_gt(t, "2.1"):
continue
if version_gt(p, "3.12") and not version_gt(t, "2.4"):
continue
if version_gt(t, "2.4") and version_gt("3.10", p):
# torch>=2.5 requires python 3.10
continue
k2_version_2 = k2_version
kaldifeat_version_2 = kaldifeat_version
matrix.append(
{
"k2-version": k2_version_2,
"kaldifeat-version": kaldifeat_version_2,
"version": version,
"python-version": p,
"torch-version": t,
"torchaudio-version": get_torchaudio_version(t),
}
)
return matrix
def main():
args = get_args()
matrix = get_matrix(
min_torch_version=args.min_torch_version,
specified_torch_version=args.torch_version,
specified_python_version=args.python_version,
)
print(json.dumps({"include": matrix}))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,17 @@
#!/usr/bin/env bash
# This script downloads the pre-computed fbank features for
# dev and test datasets of GigaSpeech.
#
# You will find directories `~/tmp/giga-dev-dataset-fbank` after running
# this script.
set -e
mkdir -p ~/tmp
cd ~/tmp
git lfs install
git clone https://huggingface.co/csukuangfj/giga-dev-dataset-fbank
ls -lh giga-dev-dataset-fbank/data/fbank

View File

@ -0,0 +1,25 @@
#!/usr/bin/env bash
# This script downloads the test-clean and test-other datasets
# of LibriSpeech and unzip them to the folder ~/tmp/download,
# which is cached by GitHub actions for later runs.
#
# You will find directories ~/tmp/download/LibriSpeech after running
# this script.
set -e
mkdir ~/tmp/download
cd egs/librispeech/ASR
ln -s ~/tmp/download .
cd download
wget -q --no-check-certificate https://www.openslr.org/resources/12/test-clean.tar.gz
tar xf test-clean.tar.gz
rm test-clean.tar.gz
wget -q --no-check-certificate https://www.openslr.org/resources/12/test-other.tar.gz
tar xf test-other.tar.gz
rm test-other.tar.gz
pwd
ls -lh
ls -lh LibriSpeech

View File

@ -0,0 +1,90 @@
#!/usr/bin/env python3
def get_v1_2_0_files():
prefix = (
"https://github.com/csukuangfj/piper-phonemize/releases/download/2023.12.5/"
)
files = [
"piper_phonemize-1.2.0-cp310-cp310-macosx_10_14_x86_64.whl",
"piper_phonemize-1.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.2.0-cp311-cp311-macosx_10_14_x86_64.whl",
"piper_phonemize-1.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.2.0-cp312-cp312-macosx_10_14_x86_64.whl",
"piper_phonemize-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.2.0-cp37-cp37m-macosx_10_14_x86_64.whl",
"piper_phonemize-1.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.2.0-cp38-cp38-macosx_10_14_x86_64.whl",
"piper_phonemize-1.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.2.0-cp39-cp39-macosx_10_14_x86_64.whl",
"piper_phonemize-1.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
]
ans = [prefix + f for f in files]
ans.sort()
return ans
def get_v1_3_0_files():
prefix = (
"https://github.com/csukuangfj/piper-phonemize/releases/download/2025.06.23/"
)
files = [
"piper_phonemize-1.3.0-cp310-cp310-macosx_10_9_universal2.whl",
"piper_phonemize-1.3.0-cp310-cp310-macosx_10_9_x86_64.whl",
"piper_phonemize-1.3.0-cp310-cp310-macosx_11_0_arm64.whl",
"piper_phonemize-1.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"piper_phonemize-1.3.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl",
"piper_phonemize-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.3.0-cp310-cp310-win_amd64.whl",
"piper_phonemize-1.3.0-cp311-cp311-macosx_10_9_universal2.whl",
"piper_phonemize-1.3.0-cp311-cp311-macosx_10_9_x86_64.whl",
"piper_phonemize-1.3.0-cp311-cp311-macosx_11_0_arm64.whl",
"piper_phonemize-1.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"piper_phonemize-1.3.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl",
"piper_phonemize-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.3.0-cp311-cp311-win_amd64.whl",
"piper_phonemize-1.3.0-cp312-cp312-macosx_10_13_universal2.whl",
"piper_phonemize-1.3.0-cp312-cp312-macosx_10_13_x86_64.whl",
"piper_phonemize-1.3.0-cp312-cp312-macosx_11_0_arm64.whl",
"piper_phonemize-1.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"piper_phonemize-1.3.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl",
"piper_phonemize-1.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.3.0-cp312-cp312-win_amd64.whl",
"piper_phonemize-1.3.0-cp313-cp313-macosx_10_13_universal2.whl",
"piper_phonemize-1.3.0-cp313-cp313-macosx_10_13_x86_64.whl",
"piper_phonemize-1.3.0-cp313-cp313-macosx_11_0_arm64.whl",
"piper_phonemize-1.3.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"piper_phonemize-1.3.0-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl",
"piper_phonemize-1.3.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.3.0-cp313-cp313-win_amd64.whl",
"piper_phonemize-1.3.0-cp38-cp38-macosx_10_9_universal2.whl",
"piper_phonemize-1.3.0-cp38-cp38-macosx_10_9_x86_64.whl",
"piper_phonemize-1.3.0-cp38-cp38-macosx_11_0_arm64.whl",
"piper_phonemize-1.3.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"piper_phonemize-1.3.0-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl",
"piper_phonemize-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.3.0-cp38-cp38-win_amd64.whl",
"piper_phonemize-1.3.0-cp39-cp39-macosx_10_9_universal2.whl",
"piper_phonemize-1.3.0-cp39-cp39-macosx_10_9_x86_64.whl",
"piper_phonemize-1.3.0-cp39-cp39-macosx_11_0_arm64.whl",
"piper_phonemize-1.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"piper_phonemize-1.3.0-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl",
"piper_phonemize-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"piper_phonemize-1.3.0-cp39-cp39-win_amd64.whl",
]
ans = [prefix + f for f in files]
ans.sort()
return ans
def main():
files = get_v1_3_0_files() + get_v1_2_0_files()
with open("piper_phonemize.html", "w") as f:
for url in files:
file = url.split("/")[-1]
f.write(f'<a href="{url}">{file}</a><br/>\n')
if __name__ == "__main__":
main()

15
.github/scripts/install-kaldifeat.sh vendored Executable file
View File

@ -0,0 +1,15 @@
#!/usr/bin/env bash
# This script installs kaldifeat into the directory ~/tmp/kaldifeat
# which is cached by GitHub actions for later runs.
set -e
mkdir -p ~/tmp
cd ~/tmp
git clone https://github.com/csukuangfj/kaldifeat
cd kaldifeat
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j2 _kaldifeat

132
.github/scripts/ksponspeech/ASR/run.sh vendored Executable file
View File

@ -0,0 +1,132 @@
#!/usr/bin/env bash
set -ex
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/ksponspeech/ASR
function test_pretrained_non_streaming() {
git lfs install
git clone https://huggingface.co/johnBamma/icefall-asr-ksponspeech-zipformer-2024-06-24
repo=icefall-asr-ksponspeech-zipformer-2024-06-24
pushd $repo
mkdir test_wavs
cd test_wavs
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/0.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/1.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/2.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/3.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/trans.txt
cd ../exp
ln -s pretrained.pt epoch-99.pt
ls -lh
popd
log 'test pretrained.py'
./zipformer/pretrained.py \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_bpe_5000/tokens.txt \
--method greedy_search \
$repo/test_wavs/0.wav \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav \
$repo/test_wavs/3.wav
log 'test export-onnx.py'
./zipformer/export-onnx.py \
--tokens $repo/data/lang_bpe_5000/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp/
ls -lh $repo/exp
ls -lh $repo/data/lang_bpe_5000/
log 'test exported onnx models'
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_5000/tokens.txt \
$repo/test_wavs/0.wav
dst=/tmp/model-2024-06-24
mkdir -p $dst
cp -av $repo/test_wavs $dst
cp -v $repo/exp/*.onnx $dst
cp -v $repo/exp/*.onnx $dst
cp -v $repo/data/lang_bpe_5000/tokens.txt $dst
cp -v $repo/data/lang_bpe_5000/bpe.model $dst
rm -rf $repo
}
function test_pretrained_streaming() {
git lfs install
git clone https://huggingface.co/johnBamma/icefall-asr-ksponspeech-pruned-transducer-stateless7-streaming-2024-06-12
repo=icefall-asr-ksponspeech-pruned-transducer-stateless7-streaming-2024-06-12
pushd $repo
mkdir test_wavs
cd test_wavs
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/0.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/1.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/2.wav
curl -SL -O https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16/resolve/main/test_wavs/3.wav
cd ../exp
ln -s pretrained.pt epoch-99.pt
ls -lh
popd
log 'test pretrained.py'
./pruned_transducer_stateless7_streaming/pretrained.py \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_bpe_5000/tokens.txt \
--method greedy_search \
$repo/test_wavs/0.wav \
$repo/test_wavs/1.wav \
$repo/test_wavs/2.wav \
$repo/test_wavs/3.wav
log 'test export-onnx.py'
./pruned_transducer_stateless7_streaming/export-onnx.py \
--tokens $repo/data/lang_bpe_5000/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--decode-chunk-len 32 \
--exp-dir $repo/exp/
ls -lh $repo/exp
ls -lh $repo/data/lang_bpe_5000/
log 'test exported onnx models'
./pruned_transducer_stateless7_streaming/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_5000/tokens.txt \
$repo/test_wavs/0.wav
dst=/tmp/model-2024-06-16
mkdir -p $dst
cp -v $repo/exp/*.onnx $dst
cp -v $repo/exp/*.onnx $dst
cp -v $repo/data/lang_bpe_5000/tokens.txt $dst
cp -v $repo/data/lang_bpe_5000/bpe.model $dst
rm -rf $repo
}
test_pretrained_non_streaming
test_pretrained_streaming

1644
.github/scripts/librispeech/ASR/run.sh vendored Executable file

File diff suppressed because it is too large Load Diff

275
.github/scripts/librispeech/ASR/run_rknn.sh vendored Executable file
View File

@ -0,0 +1,275 @@
#!/usr/bin/env bash
set -ex
python3 -m pip install kaldi-native-fbank soundfile librosa
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/librispeech/ASR
# https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed
# sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20
function export_2023_02_20() {
d=exp_2023_02_20
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/data/lang_char_bpe/tokens.txt
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/test_wavs/0.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/test_wavs/1.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/test_wavs/2.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/test_wavs/3.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed/resolve/main/test_wavs/4.wav
ls -lh
popd
./pruned_transducer_stateless7_streaming/export-onnx-zh.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d/ \
--decode-chunk-len 64 \
--num-encoder-layers "2,4,3,2,4" \
--feedforward-dims "1024,1024,1536,1536,1024" \
--nhead "8,8,8,8,8" \
--encoder-dims "384,384,384,384,384" \
--attention-dims "192,192,192,192,192" \
--encoder-unmasked-dims "256,256,256,256,256" \
--zipformer-downsampling-factors "1,2,4,8,2" \
--cnn-module-kernels "31,31,31,31,31" \
--decoder-dim 512 \
--joiner-dim 512
ls -lh $d/
./pruned_transducer_stateless7_streaming/onnx_pretrained.py \
--encoder-model-filename $d/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $d/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $d/joiner-epoch-99-avg-1.onnx \
--tokens $d/tokens.txt \
$d/0.wav
./pruned_transducer_stateless7_streaming/onnx_pretrained.py \
--encoder-model-filename $d/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $d/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $d/joiner-epoch-99-avg-1.onnx \
--tokens $d/tokens.txt \
$d/1.wav
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-bilingual-zh-en-2023-02-20
mkdir -p $dst
./pruned_transducer_stateless7_streaming/export_rknn.py \
--in-encoder $d/encoder-epoch-99-avg-1.onnx \
--in-decoder $d/decoder-epoch-99-avg-1.onnx \
--in-joiner $d/joiner-epoch-99-avg-1.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform 2>/dev/null
ls -lh $dst/
./pruned_transducer_stateless7_streaming/test_rknn_on_cpu_simulator.py \
--encoder $d/encoder-epoch-99-avg-1.onnx \
--decoder $d/decoder-epoch-99-avg-1.onnx \
--joiner $d/joiner-epoch-99-avg-1.onnx \
--tokens $d/tokens.txt \
--wav $d/0.wav
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
rm -rf $dst
done
}
# https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t
# sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16
function export_2023_02_16() {
d=exp_2023_02_16
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/data/lang_char_bpe/tokens.txt
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/test_wavs/0.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/test_wavs/1.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/test_wavs/2.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/test_wavs/3.wav
curl -SL -O https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t/resolve/main/test_wavs/4.wav
ls -lh
popd
./pruned_transducer_stateless7_streaming/export-onnx-zh.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d/ \
--decode-chunk-len 64 \
\
--num-encoder-layers 2,2,2,2,2 \
--feedforward-dims 768,768,768,768,768 \
--nhead 4,4,4,4,4 \
--encoder-dims 256,256,256,256,256 \
--attention-dims 192,192,192,192,192 \
--encoder-unmasked-dims 192,192,192,192,192 \
\
--zipformer-downsampling-factors "1,2,4,8,2" \
--cnn-module-kernels "31,31,31,31,31" \
--decoder-dim 512 \
--joiner-dim 512
ls -lh $d/
./pruned_transducer_stateless7_streaming/onnx_pretrained.py \
--encoder-model-filename $d/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $d/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $d/joiner-epoch-99-avg-1.onnx \
--tokens $d/tokens.txt \
$d/0.wav
./pruned_transducer_stateless7_streaming/onnx_pretrained.py \
--encoder-model-filename $d/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $d/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $d/joiner-epoch-99-avg-1.onnx \
--tokens $d/tokens.txt \
$d/1.wav
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-small-bilingual-zh-en-2023-02-16
mkdir -p $dst
./pruned_transducer_stateless7_streaming/export_rknn.py \
--in-encoder $d/encoder-epoch-99-avg-1.onnx \
--in-decoder $d/decoder-epoch-99-avg-1.onnx \
--in-joiner $d/joiner-epoch-99-avg-1.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform 2>/dev/null
ls -lh $dst/
./pruned_transducer_stateless7_streaming/test_rknn_on_cpu_simulator.py \
--encoder $d/encoder-epoch-99-avg-1.onnx \
--decoder $d/decoder-epoch-99-avg-1.onnx \
--joiner $d/joiner-epoch-99-avg-1.onnx \
--tokens $d/tokens.txt \
--wav $d/0.wav
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
rm -rf $dst
done
}
# https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#csukuangfj-sherpa-onnx-streaming-zipformer-en-2023-06-26-english
function export_2023_06_26() {
d=exp_2023_06_26
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -O https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17/resolve/main/data/lang_bpe_500/tokens.txt
curl -SL -o 0.wav https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17/resolve/main/data/lang_bpe_500/tokens.txt
curl -SL -o 1.wav https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17/resolve/main/test_wavs/1221-135766-0001.wav
curl -SL -o 2.wav https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17/resolve/main/test_wavs/1221-135766-0002.wav
ls -lh
popd
./zipformer/export-onnx-streaming.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d \
--use-ctc 0 \
--use-transducer 1 \
\
--chunk-size 32 \
--left-context-frames 128 \
--causal 1
ls -lh $d/
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-en-2023-06-26
mkdir -p $dst
./zipformer/export_rknn_transducer_streaming.py \
--in-encoder $d/encoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-decoder $d/decoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-joiner $d/joiner-epoch-99-avg-1-chunk-32-left-128.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform
ls -lh $dst/
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
rm -rf $dst
done
}
if [[ $rknn_toolkit2_version == "2.1.0" ]]; then
export_2023_02_16
export_2023_02_20
else
export_2023_06_26
fi

157
.github/scripts/ljspeech/TTS/run-matcha.sh vendored Executable file
View File

@ -0,0 +1,157 @@
#!/usr/bin/env bash
set -ex
apt-get update
apt-get install -y sox
python3 -m pip install piper_phonemize -f https://k2-fsa.github.io/icefall/piper_phonemize.html
python3 -m pip install espnet_tts_frontend
python3 -m pip install numba conformer==0.3.2 diffusers librosa
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/ljspeech/TTS
sed -i.bak s/600/8/g ./prepare.sh
sed -i.bak s/"first 100"/"first 3"/g ./prepare.sh
sed -i.bak s/500/5/g ./prepare.sh
git diff
function prepare_data() {
# We have created a subset of the data for testing
#
mkdir -p download
pushd download
wget -q https://huggingface.co/csukuangfj/ljspeech-subset-for-ci-test/resolve/main/LJSpeech-1.1.tar.bz2
tar xvf LJSpeech-1.1.tar.bz2
popd
./prepare.sh
tree .
}
function train() {
pushd ./matcha
sed -i.bak s/1500/3/g ./train.py
git diff .
popd
./matcha/train.py \
--exp-dir matcha/exp \
--num-epochs 1 \
--save-every-n 1 \
--num-buckets 2 \
--tokens data/tokens.txt \
--max-duration 20
ls -lh matcha/exp
}
function infer() {
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v1
./matcha/infer.py \
--num-buckets 2 \
--epoch 1 \
--exp-dir ./matcha/exp \
--tokens data/tokens.txt \
--vocoder ./generator_v1 \
--input-text "how are you doing?" \
--output-wav ./generated.wav
ls -lh *.wav
soxi ./generated.wav
rm -v ./generated.wav
rm -v generator_v1
}
function export_onnx() {
pushd matcha/exp
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/epoch-4000.pt
popd
pushd data/fbank
rm -fv *.json
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/data/cmvn.json
popd
./matcha/export_onnx.py \
--exp-dir ./matcha/exp \
--epoch 4000 \
--tokens ./data/tokens.txt \
--cmvn ./data/fbank/cmvn.json
ls -lh *.onnx
if false; then
# The CI machine does not have enough memory to run it
#
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v1
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v2
curl -SL -O https://github.com/csukuangfj/models/raw/refs/heads/master/hifigan/generator_v3
python3 ./matcha/export_onnx_hifigan.py
else
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/hifigan_v1.onnx
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/hifigan_v2.onnx
curl -SL -O https://huggingface.co/csukuangfj/icefall-tts-ljspeech-matcha-en-2024-10-28/resolve/main/exp/hifigan_v3.onnx
fi
ls -lh *.onnx
for v in v1 v2 v3; do
python3 ./matcha/onnx_pretrained.py \
--acoustic-model ./model-steps-6.onnx \
--vocoder ./hifigan_$v.onnx \
--tokens ./data/tokens.txt \
--input-text "how are you doing?" \
--output-wav /icefall/generated-matcha-tts-steps-6-$v.wav
done
ls -lh /icefall/*.wav
soxi /icefall/generated-matcha-tts-steps-6-*.wav
cp ./model-steps-*.onnx /icefall
d=matcha-icefall-en_US-ljspeech
mkdir $d
cp -v data/tokens.txt $d
cp model-steps-3.onnx $d
pushd $d
curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
tar xf espeak-ng-data.tar.bz2
rm espeak-ng-data.tar.bz2
cat >README.md <<EOF
# Introduction
This model is trained using the dataset from
https://keithito.com/LJ-Speech-Dataset/
The dataset contains only 1 female speaker.
You can find the training code at
https://github.com/k2-fsa/icefall/tree/master/egs/ljspeech/TTS#matcha
EOF
ls -lh
popd
tar cvjf $d.tar.bz2 $d
mv $d.tar.bz2 /icefall
mv $d /icefall
}
prepare_data
train
infer
export_onnx
rm -rfv generator_v* matcha/exp
git checkout .

157
.github/scripts/ljspeech/TTS/run.sh vendored Executable file
View File

@ -0,0 +1,157 @@
#!/usr/bin/env bash
set -ex
python3 -m pip install piper_phonemize -f https://k2-fsa.github.io/icefall/piper_phonemize.html
python3 -m pip install espnet_tts_frontend
python3 -m pip install numba
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/ljspeech/TTS
sed -i.bak s/600/8/g ./prepare.sh
sed -i.bak s/"first 100"/"first 3"/g ./prepare.sh
sed -i.bak s/500/5/g ./prepare.sh
git diff
function prepare_data() {
# We have created a subset of the data for testing
#
mkdir -p download
pushd download
wget -q https://huggingface.co/csukuangfj/ljspeech-subset-for-ci-test/resolve/main/LJSpeech-1.1.tar.bz2
tar xvf LJSpeech-1.1.tar.bz2
popd
./prepare.sh
tree .
}
function train() {
pushd ./vits
sed -i.bak s/200/3/g ./train.py
git diff .
popd
for t in low medium high; do
./vits/train.py \
--exp-dir vits/exp-$t \
--model-type $t \
--num-epochs 1 \
--save-every-n 1 \
--num-buckets 2 \
--tokens data/tokens.txt \
--max-duration 20
ls -lh vits/exp-$t
done
}
function infer() {
for t in low medium high; do
./vits/infer.py \
--num-buckets 2 \
--model-type $t \
--epoch 1 \
--exp-dir ./vits/exp-$t \
--tokens data/tokens.txt \
--max-duration 20
done
}
function export_onnx() {
for t in low medium high; do
./vits/export-onnx.py \
--model-type $t \
--epoch 1 \
--exp-dir ./vits/exp-$t \
--tokens data/tokens.txt
ls -lh vits/exp-$t/
done
}
function test_medium() {
git clone https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-medium-2024-03-12
./vits/export-onnx.py \
--model-type medium \
--epoch 820 \
--exp-dir ./icefall-tts-ljspeech-vits-medium-2024-03-12/exp \
--tokens ./icefall-tts-ljspeech-vits-medium-2024-03-12/data/tokens.txt
ls -lh ./icefall-tts-ljspeech-vits-medium-2024-03-12/exp
./vits/test_onnx.py \
--model-filename ./icefall-tts-ljspeech-vits-medium-2024-03-12/exp/vits-epoch-820.onnx \
--tokens ./icefall-tts-ljspeech-vits-medium-2024-03-12/data/tokens.txt \
--output-filename /icefall/test-medium.wav
ls -lh /icefall/test-medium.wav
d=/icefall/vits-icefall-en_US-ljspeech-medium
mkdir $d
cp -v ./icefall-tts-ljspeech-vits-medium-2024-03-12/data/tokens.txt $d/
cp -v ./icefall-tts-ljspeech-vits-medium-2024-03-12/exp/vits-epoch-820.onnx $d/model.onnx
rm -rf icefall-tts-ljspeech-vits-medium-2024-03-12
pushd $d
wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
tar xf espeak-ng-data.tar.bz2
rm espeak-ng-data.tar.bz2
cd ..
tar cjf vits-icefall-en_US-ljspeech-medium.tar.bz2 vits-icefall-en_US-ljspeech-medium
rm -rf vits-icefall-en_US-ljspeech-medium
ls -lh *.tar.bz2
popd
}
function test_low() {
git clone https://huggingface.co/csukuangfj/icefall-tts-ljspeech-vits-low-2024-03-12
./vits/export-onnx.py \
--model-type low \
--epoch 1600 \
--exp-dir ./icefall-tts-ljspeech-vits-low-2024-03-12/exp \
--tokens ./icefall-tts-ljspeech-vits-low-2024-03-12/data/tokens.txt
ls -lh ./icefall-tts-ljspeech-vits-low-2024-03-12/exp
./vits/test_onnx.py \
--model-filename ./icefall-tts-ljspeech-vits-low-2024-03-12/exp/vits-epoch-1600.onnx \
--tokens ./icefall-tts-ljspeech-vits-low-2024-03-12/data/tokens.txt \
--output-filename /icefall/test-low.wav
ls -lh /icefall/test-low.wav
d=/icefall/vits-icefall-en_US-ljspeech-low
mkdir $d
cp -v ./icefall-tts-ljspeech-vits-low-2024-03-12/data/tokens.txt $d/
cp -v ./icefall-tts-ljspeech-vits-low-2024-03-12/exp/vits-epoch-1600.onnx $d/model.onnx
rm -rf icefall-tts-ljspeech-vits-low-2024-03-12
pushd $d
wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
tar xf espeak-ng-data.tar.bz2
rm espeak-ng-data.tar.bz2
cd ..
tar cjf vits-icefall-en_US-ljspeech-low.tar.bz2 vits-icefall-en_US-ljspeech-low
rm -rf vits-icefall-en_US-ljspeech-low
ls -lh *.tar.bz2
popd
}
prepare_data
train
infer
export_onnx
rm -rf vits/exp-{low,medium,high}
test_medium
test_low

756
.github/scripts/multi_zh-hans/ASR/run.sh vendored Executable file
View File

@ -0,0 +1,756 @@
#!/usr/bin/env bash
set -ex
git config --global user.name "k2-fsa"
git config --global user.email "csukuangfj@gmail.com"
git config --global lfs.allowincompletepush true
python3 -m pip install onnxmltools==1.13.0 onnx==1.17.0 onnxruntime==1.17.1 sherpa-onnx
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/multi_zh-hans/ASR
log "pwd: $PWD"
function run_2023_9_2() {
repo_url=https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-2023-9-2
log "Downloading pre-trained model from $repo_url"
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
cd exp
git lfs pull --include pretrained.pt
ln -s pretrained.pt epoch-99.pt
cd ../data/lang_bpe_2000
ls -lh
git lfs pull --include L.pt L_disambig.pt Linv.pt bpe.model
git lfs pull --include "*.model"
ls -lh
popd
log "--------------------------------------------"
log "Export non-streaming ONNX transducer models "
log "--------------------------------------------"
./zipformer/export-onnx.py \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--causal False \
--fp16 1
ls -lh $repo/exp
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav \
$repo/test_wavs/TEST_MEETING_T0000000113.wav \
$repo/test_wavs/TEST_MEETING_T0000000219.wav \
$repo/test_wavs/TEST_MEETING_T0000000351.wav
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.int8.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.int8.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav \
$repo/test_wavs/TEST_MEETING_T0000000113.wav \
$repo/test_wavs/TEST_MEETING_T0000000219.wav \
$repo/test_wavs/TEST_MEETING_T0000000351.wav
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.fp16.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.fp16.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.fp16.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav \
$repo/test_wavs/TEST_MEETING_T0000000113.wav \
$repo/test_wavs/TEST_MEETING_T0000000219.wav \
$repo/test_wavs/TEST_MEETING_T0000000351.wav
rm -rf $repo
}
function run_2023_11_05_streaming() {
repo_url=https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05
log "Downloading pre-trained model from $repo_url"
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
cd exp/
git lfs pull --include pretrained.pt
rm -fv epoch-20.pt
rm -fv *.onnx
ln -s pretrained.pt epoch-20.pt
cd ../data/lang_bpe_2000
ls -lh
git lfs pull --include L.pt L_disambig.pt Linv.pt bpe.model
git lfs pull --include "*.model"
ls -lh
popd
log "----------------------------------------"
log "Export streaming ONNX CTC models "
log "----------------------------------------"
./zipformer/export-onnx-streaming-ctc.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--causal 1 \
--avg 1 \
--epoch 20 \
--use-averaged-model 0 \
--chunk-size 16 \
--left-context-frames 128 \
--use-ctc 1 \
--fp16 1
ls -lh $repo/exp/
log "------------------------------------------------------------"
log "Test exported streaming ONNX CTC models (greedy search) "
log "------------------------------------------------------------"
test_wavs=(
DEV_T0000000000.wav
DEV_T0000000001.wav
DEV_T0000000002.wav
TEST_MEETING_T0000000113.wav
TEST_MEETING_T0000000219.wav
TEST_MEETING_T0000000351.wav
)
for w in ${test_wavs[@]}; do
log "----fp32----"
./zipformer/onnx_pretrained-streaming-ctc.py \
--model-filename $repo/exp/ctc-epoch-20-avg-1-chunk-16-left-128.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/$w
log "----int8----"
./zipformer/onnx_pretrained-streaming-ctc.py \
--model-filename $repo/exp/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/$w
log "----fp16----"
./zipformer/onnx_pretrained-streaming-ctc.py \
--model-filename $repo/exp/ctc-epoch-20-avg-1-chunk-16-left-128.fp16.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/$w
done
log "Upload onnx CTC models to huggingface"
name=(
sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13
sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-int8-2023-12-13
sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-fp16-2023-12-13
)
for n in ${name[@]}; do
url=https://huggingface.co/k2-fsa/$n
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
if [[ $n == sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-2023-12-13 ]]; then
cp -v $repo/exp/ctc-epoch-20-avg-1-chunk-16-left-128.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-int8-2023-12-13 ]]; then
cp -v $repo/exp/ctc-epoch-20-avg-1-chunk-16-left-128.int8.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-ctc-multi-zh-hans-fp16-2023-12-13 ]]; then
cp -v $repo/exp/ctc-epoch-20-avg-1-chunk-16-left-128.fp16.onnx $dst
fi
cp -v $repo/data/lang_bpe_2000/tokens.txt $dst
cp -v $repo/data/lang_bpe_2000/bpe.model $dst
mkdir -p $dst/test_wavs
cp -v $repo/test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" "bpe.model" "*.wav"
ls -lh
file bpe.model
git status
git add .
git commit -m "upload model" && git push https://k2-fsa:${HF_TOKEN}@huggingface.co/k2-fsa/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
done
log "----------------------------------------"
log "Export streaming ONNX transducer models "
log "----------------------------------------"
./zipformer/export-onnx-streaming.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--causal 1 \
--avg 1 \
--epoch 20 \
--use-averaged-model 0 \
--chunk-size 16 \
--left-context-frames 128 \
--use-ctc 0 \
--fp16 1
ls -lh $repo/exp
log "------------------------------------------------------------"
log "Test exported streaming ONNX transducer models (Python code)"
log "------------------------------------------------------------"
log "test fp32"
./zipformer/onnx_pretrained-streaming.py \
--encoder-model-filename $repo/exp/encoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-20-avg-1-chunk-16-left-128.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav
log "test int8"
./zipformer/onnx_pretrained-streaming.py \
--encoder-model-filename $repo/exp/encoder-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav
log "test fp16"
./zipformer/onnx_pretrained-streaming.py \
--encoder-model-filename $repo/exp/encoder-epoch-20-avg-1-chunk-16-left-128.fp16.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-20-avg-1-chunk-16-left-128.fp16.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-20-avg-1-chunk-16-left-128.fp16.onnx \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav
name=(
sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-13
sherpa-onnx-streaming-zipformer-multi-zh-hans-int8-2023-12-13
sherpa-onnx-streaming-zipformer-multi-zh-hans-fp16-2023-12-13
)
for n in ${name[@]}; do
url=https://huggingface.co/csukuangfj/$n
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
if [[ $n == sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-13 ]]; then
cp -v $repo/exp/encoder-epoch-20-avg-1-chunk-16-left-128.onnx $dst
cp -v $repo/exp/decoder-epoch-20-avg-1-chunk-16-left-128.onnx $dst
cp -v $repo/exp/joiner-epoch-20-avg-1-chunk-16-left-128.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-multi-zh-hans-int8-2023-12-13 ]]; then
cp -v $repo/exp/encoder-epoch-20-avg-1-chunk-16-left-128.int8.onnx $dst
cp -v $repo/exp/decoder-epoch-20-avg-1-chunk-16-left-128.onnx $dst
cp -v $repo/exp/joiner-epoch-20-avg-1-chunk-16-left-128.int8.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-multi-zh-hans-fp16-2023-12-13 ]]; then
cp -v $repo/exp/encoder-epoch-20-avg-1-chunk-16-left-128.fp16.onnx $dst
cp -v $repo/exp/decoder-epoch-20-avg-1-chunk-16-left-128.fp16.onnx $dst
cp -v $repo/exp/joiner-epoch-20-avg-1-chunk-16-left-128.fp16.onnx $dst
fi
cp -v $repo/data/lang_bpe_2000/tokens.txt $dst
cp -v $repo/data/lang_bpe_2000/bpe.model $dst
mkdir -p $dst/test_wavs
cp -v $repo/test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" "bpe.model" "*.wav"
ls -lh
file bpe.model
git status
git add .
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
done
}
function run_2023_12_12_streaming() {
log "Upload onnx transducer models to huggingface"
url=https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
cp -v $repo/exp/encoder*.onnx $dst
cp -v $repo/exp/decoder*.onnx $dst
cp -v $repo/exp/joiner*.onnx $dst
cp -v $repo/data/lang_bpe_2000/tokens.txt $dst
cp -v $repo/data/lang_bpe_2000/bpe.model $dst
mkdir -p $dst/test_wavs
cp -v $repo/test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" bpe.model "*.wav"
git add .
git commit -m "upload model" && git push https://k2-fsa:${HF_TOKEN}@huggingface.co/k2-fsa/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
}
function run_yuekai_large() {
repo_url=https://csukuangfj:${HF_TOKEN}@huggingface.co/yuekai/icefall-asr-multi-zh-hans-zipformer-large
log "Downloading pre-trained model from $repo_url"
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -O https://huggingface.co/pingzxy/icefall-asr-multi-zh-hans-zipformer-large-onnx/resolve/main/tokens.txt
popd
log "----------------------------------------"
log "Export streaming ONNX CTC models "
log "----------------------------------------"
./zipformer/export-onnx-streaming-ctc.py \
--exp-dir $repo/ \
--tokens $repo/tokens.txt \
--causal 1 \
--avg 1 \
--epoch 99 \
--use-averaged-model 0 \
--chunk-size 16 \
--left-context-frames 128 \
--use-ctc 1 \
\
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 768,1024,1536,2048,1536,768 \
--encoder-dim 256,384,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
\
--fp16 1 \
--use-whisper-features 1
ls -lh $repo/
pushd $repo
cat >README.md <<EOF
# Introduction
This model is converted
from
https://huggingface.co/yuekai/icefall-asr-multi-zh-hans-zipformer-large
The training code can be found at
https://github.com/k2-fsa/icefall/blob/master/egs/multi_zh-hans/ASR/RESULTS.md#multi-chinese-datasets-char-based-training-results-streaming-on-zipformer-large-model
EOF
mv -v ctc-epoch-99-avg-1-chunk-16-left-128.fp16.onnx model.fp16.onnx
mv -v ctc-epoch-99-avg-1-chunk-16-left-128.int8.onnx model.int8.onnx
mv -v ctc-epoch-99-avg-1-chunk-16-left-128.onnx model.onnx
ls -lh *.onnx
mkdir test_wavs
cd test_wavs
curl -SL -O https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01/resolve/main/test_wavs/0.wav
curl -SL -O https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01/resolve/main/test_wavs/1.wav
curl -SL -O https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01/resolve/main/test_wavs/8k.wav
popd
for w in 0.wav 1.wav 8k.wav; do
log "---fp32---"
sherpa-onnx \
--zipformer2-ctc-model=$repo/model.onnx \
--tokens=$repo/tokens.txt \
$repo/test_wavs/$w
log "---int8---"
sherpa-onnx \
--zipformer2-ctc-model=$repo/model.int8.onnx \
--tokens=$repo/tokens.txt \
$repo/test_wavs/$w
log "---fp16---"
sherpa-onnx \
--zipformer2-ctc-model=$repo/model.fp16.onnx \
--tokens=$repo/tokens.txt \
$repo/test_wavs/$w
done
name=(
sherpa-onnx-streaming-zipformer-ctc-zh-2025-06-30
sherpa-onnx-streaming-zipformer-ctc-zh-int8-2025-06-30
sherpa-onnx-streaming-zipformer-ctc-zh-fp16-2025-06-30
)
for n in ${name[@]}; do
url=https://huggingface.co/csukuangfj/$n
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
if [[ $n == sherpa-onnx-streaming-zipformer-ctc-zh-2025-06-30 ]]; then
cp -v $repo/model.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-ctc-zh-int8-2025-06-30 ]]; then
cp -v $repo/model.int8.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-ctc-zh-fp16-2025-06-30 ]]; then
cp -v $repo/model.fp16.onnx $dst
fi
cp -v $repo/tokens.txt $dst
cp -v $repo/README.md $dst
mkdir -p $dst/test_wavs
cp -v $repo/test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" "*.wav"
ls -lh
git status
git add .
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
done
rm $repo/*.onnx
log "----------------------------------------"
log "Export streaming ONNX transducer models "
log "----------------------------------------"
./zipformer/export-onnx-streaming.py \
--exp-dir $repo \
--tokens $repo/tokens.txt \
--causal 1 \
--avg 1 \
--epoch 99 \
--use-averaged-model 0 \
--chunk-size 16 \
--left-context-frames 128 \
--use-ctc 0 \
\
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 768,1024,1536,2048,1536,768 \
--encoder-dim 256,384,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
\
--fp16 1 \
--use-whisper-features 1
ls -lh $repo
pushd $repo
for m in encoder decoder joiner; do
mv -v $m-epoch-99-avg-1-chunk-16-left-128.onnx $m.onnx
mv -v $m-epoch-99-avg-1-chunk-16-left-128.fp16.onnx $m.fp16.onnx
mv -v $m-epoch-99-avg-1-chunk-16-left-128.int8.onnx $m.int8.onnx
done
ls -lh *.onnx
popd
for w in 0.wav 1.wav 8k.wav; do
log "---fp32---"
sherpa-onnx \
--encoder=$repo/encoder.onnx \
--decoder=$repo/decoder.onnx \
--joiner=$repo/joiner.onnx \
--tokens=$repo/tokens.txt \
$repo/test_wavs/$w
log "---int8---"
sherpa-onnx \
--encoder=$repo/encoder.int8.onnx \
--decoder=$repo/decoder.onnx \
--joiner=$repo/joiner.int8.onnx \
--tokens=$repo/tokens.txt \
$repo/test_wavs/$w
log "---fp16---"
sherpa-onnx \
--encoder=$repo/encoder.fp16.onnx \
--decoder=$repo/decoder.fp16.onnx \
--joiner=$repo/joiner.fp16.onnx \
--tokens=$repo/tokens.txt \
$repo/test_wavs/$w
done
name=(
sherpa-onnx-streaming-zipformer-zh-2025-06-30
sherpa-onnx-streaming-zipformer-zh-int8-2025-06-30
sherpa-onnx-streaming-zipformer-zh-fp16-2025-06-30
)
for n in ${name[@]}; do
url=https://huggingface.co/csukuangfj/$n
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
if [[ $n == sherpa-onnx-streaming-zipformer-zh-2025-06-30 ]]; then
cp -v $repo/encoder.onnx $dst
cp -v $repo/decoder.onnx $dst
cp -v $repo/joiner.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-zh-int8-2025-06-30 ]]; then
cp -v $repo/encoder.int8.onnx $dst
cp -v $repo/decoder.onnx $dst
cp -v $repo/joiner.int8.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-zh-fp16-2025-06-30 ]]; then
cp -v $repo/encoder.fp16.onnx $dst
cp -v $repo/decoder.fp16.onnx $dst
cp -v $repo/joiner.fp16.onnx $dst
fi
cp -v $repo/tokens.txt $dst
cp -v $repo/README.md $dst
mkdir -p $dst/test_wavs
cp -v $repo/test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" "*.wav"
ls -lh
git status
git add .
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
done
}
function run_yuekai_xl() {
repo_url=https://csukuangfj:${HF_TOKEN}@huggingface.co/yuekai/icefall-asr-multi-zh-hans-zipformer-xl
log "Downloading pre-trained model from $repo_url"
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include pretrained.pt
git lfs pull --include data/lang_bpe_2000/bpe.model
mv pretrained.pt epoch-99.pt
ls -lh *.pt
popd
log "----------------------------------------"
log "Export streaming ONNX CTC models "
log "----------------------------------------"
./zipformer/export-onnx-streaming-ctc.py \
--exp-dir $repo/ \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--causal 1 \
--avg 1 \
--epoch 99 \
--use-averaged-model 0 \
--chunk-size 16 \
--left-context-frames 128 \
--use-ctc 1 \
\
--num-encoder-layers 2,3,5,6,5,3 \
--feedforward-dim 1536,2048,3072,4096,3072,1536 \
--encoder-dim 512,768,1024,1536,1024,512 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--decoder-dim 768 --joiner-dim 768 \
--value-head-dim 18 \
--query-head-dim 48 \
--num-heads 4,4,4,8,4,4 \
\
--fp16 1 \
--use-whisper-features 1 \
--use-external-data 1
mv -v ctc-epoch-99-avg-1-chunk-16-left-128.int8.onnx model.int8.onnx
mv -v ctc-epoch-99-avg-1-chunk-16-left-128.fp16.onnx model.fp16.onnx
ls -lh *.onnx
mkdir test_wavs
pushd test_wavs
curl -SL -O https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01/resolve/main/test_wavs/0.wav
curl -SL -O https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01/resolve/main/test_wavs/1.wav
curl -SL -O https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-small-ctc-zh-int8-2025-04-01/resolve/main/test_wavs/8k.wav
popd
for w in 0.wav 1.wav 8k.wav; do
log "---int8---"
sherpa-onnx \
--zipformer2-ctc-model=./model.int8.onnx \
--tokens=$repo/data/lang_bpe_2000/tokens.txt \
test_wavs/$w
log "---fp16---"
sherpa-onnx \
--zipformer2-ctc-model=./model.fp16.onnx \
--tokens=$repo/data/lang_bpe_2000/tokens.txt \
test_wavs/$w
done
pushd $repo
cat >README.md <<EOF
# Introduction
This model is converted
from
https://huggingface.co/yuekai/icefall-asr-multi-zh-hans-zipformer-xl
The training code can be found at
https://github.com/k2-fsa/icefall/blob/master/egs/multi_zh-hans/ASR/RESULTS.md#multi-chinese-datasets-char-based-training-results-streaming-on-zipformer-xl-model
EOF
popd
name=(
sherpa-onnx-streaming-zipformer-ctc-zh-xlarge-int8-2025-06-30
sherpa-onnx-streaming-zipformer-ctc-zh-xlarge-fp16-2025-06-30
)
for n in ${name[@]}; do
url=https://huggingface.co/csukuangfj/$n
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
if [[ $n == sherpa-onnx-streaming-zipformer-ctc-zh-xlarge-fp16-2025-06-30 ]]; then
cp -v model.fp16.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-ctc-zh-xlarge-int8-2025-06-30 ]]; then
cp -v model.int8.onnx $dst
fi
cp -v $repo/data/lang_bpe_2000/tokens.txt $dst
cp -v $repo/data/lang_bpe_2000/bpe.model $dst
cp -v $repo/README.md $dst
mkdir -p $dst/test_wavs
cp -v ./test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" "*.wav" "bpe.model"
ls -lh
git status
git add .
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
ls -lh $dst
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
done
rm -fv *.onnx *.weights
log "----------------------------------------"
log "Export streaming ONNX transducer models "
log "----------------------------------------"
./zipformer/export-onnx-streaming.py \
--exp-dir $repo/ \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--causal 1 \
--avg 1 \
--epoch 99 \
--use-averaged-model 0 \
--chunk-size 16 \
--left-context-frames 128 \
--use-ctc 0 \
\
--num-encoder-layers 2,3,5,6,5,3 \
--feedforward-dim 1536,2048,3072,4096,3072,1536 \
--encoder-dim 512,768,1024,1536,1024,512 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--decoder-dim 768 --joiner-dim 768 \
--value-head-dim 18 \
--query-head-dim 48 \
--num-heads 4,4,4,8,4,4 \
\
--fp16 1 \
--use-whisper-features 1 \
--use-external-data 1
ls -lh *.onnx
ls -lh *.weights
mv encoder-epoch-99-avg-1-chunk-16-left-128.fp16.onnx encoder.fp16.onnx
mv encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx encoder.int8.onnx
mv $repo/decoder-epoch-99-avg-1-chunk-16-left-128.onnx decoder.onnx
mv $repo/decoder-epoch-99-avg-1-chunk-16-left-128.fp16.onnx decoder.fp16.onnx
mv $repo/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx joiner.int8.onnx
mv $repo/joiner-epoch-99-avg-1-chunk-16-left-128.fp16.onnx joiner.fp16.onnx
name=(
sherpa-onnx-streaming-zipformer-zh-xlarge-int8-2025-06-30
sherpa-onnx-streaming-zipformer-zh-xlarge-fp16-2025-06-30
)
for n in ${name[@]}; do
url=https://huggingface.co/csukuangfj/$n
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
if [[ $n == sherpa-onnx-streaming-zipformer-zh-xlarge-fp16-2025-06-30 ]]; then
cp -v encoder.fp16.onnx $dst
cp -v decoder.fp16.onnx $dst
cp -v joiner.fp16.onnx $dst
elif [[ $n == sherpa-onnx-streaming-zipformer-zh-xlarge-int8-2025-06-30 ]]; then
cp -v encoder.int8.onnx $dst
cp -v decoder.onnx $dst
cp -v joiner.int8.onnx $dst
fi
cp -v $repo/data/lang_bpe_2000/tokens.txt $dst
cp -v $repo/data/lang_bpe_2000/bpe.model $dst
cp -v $repo/README.md $dst
mkdir -p $dst/test_wavs
cp -v ./test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx" "*.wav" "bpe.model"
ls -lh
git status
git add .
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
ls -lh $dst
tar cjfv $dst.tar.bz2 $dst
ls -lh *.tar.bz2
mv -v $dst.tar.bz2 ../../../
done
rm -fv *.onnx *.weights
}
# run_yuekai_large
# run_yuekai_xl
# run_2023_9_2
run_2023_11_05_streaming
# run_2023_12_12_streaming

73
.github/scripts/multi_zh-hans/ASR/run_rknn.sh vendored Executable file
View File

@ -0,0 +1,73 @@
#!/usr/bin/env bash
set -ex
python3 -m pip install kaldi-native-fbank soundfile librosa
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/multi_zh-hans/ASR
# https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12-chinese
function export_2023_11_05() {
d=exp
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/resolve/main/data/lang_bpe_2000/tokens.txt
curl -SL -O https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -o 0.wav https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/resolve/main/test_wavs/DEV_T0000000000.wav
curl -SL -o 1.wav https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/resolve/main/test_wavs/DEV_T0000000001.wav
curl -SL -o 2.wav https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-streaming-2023-11-05/resolve/main/test_wavs/DEV_T0000000002.wav
ls -lh
popd
./zipformer/export-onnx-streaming.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d \
--use-ctc 0 \
--use-transducer 1 \
--chunk-size 32 \
--left-context-frames 128 \
--causal 1
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-multi-zh-hans-2023-12-12
mkdir -p $dst
./zipformer/export_rknn_transducer_streaming.py \
--in-encoder $d/encoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-decoder $d/decoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-joiner $d/joiner-epoch-99-avg-1-chunk-32-left-128.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
rm -rf $dst
done
}
export_2023_11_05

View File

@ -0,0 +1,13 @@
#!/usr/bin/env bash
# This script assumes that test-clean and test-other are downloaded
# to egs/librispeech/ASR/download/LibriSpeech and generates manifest
# files in egs/librispeech/ASR/data/manifests
set -e
cd egs/librispeech/ASR
[ ! -e download ] && ln -s ~/tmp/download .
mkdir -p data/manifests
lhotse prepare librispeech -j 2 -p test-clean -p test-other ./download/LibriSpeech data/manifests
ls -lh data/manifests

View File

@ -0,0 +1,62 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/gigaspeech/ASR
repo_url=https://huggingface.co/wgb14/icefall-asr-gigaspeech-pruned-transducer-stateless2
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_NAME}" == x"workflow_dispatch" || x"${GITHUB_EVENT_LABEL_NAME}" == x"run-decode" ]]; then
mkdir -p pruned_transducer_stateless2/exp
ln -s $PWD/$repo/exp/pretrained-iter-3488000-avg-20.pt pruned_transducer_stateless2/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/
ls -lh data
ls -lh data/lang_bpe_500
ls -lh data/fbank
ls -lh pruned_transducer_stateless2/exp
pushd data/fbank
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/cuts_DEV.jsonl.gz
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/cuts_TEST.jsonl.gz
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/feats_DEV.lca
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/feats_TEST.lca
ln -sf cuts_DEV.jsonl.gz gigaspeech_cuts_DEV.jsonl.gz
ln -sf cuts_TEST.jsonl.gz gigaspeech_cuts_TEST.jsonl.gz
popd
log "Decoding dev and test"
# use a small value for decoding with CPU
max_duration=100
# Test only greedy_search to reduce CI running time
# for method in greedy_search fast_beam_search modified_beam_search; do
for method in greedy_search; do
log "Decoding with $method"
./pruned_transducer_stateless2/decode.py \
--decoding-method $method \
--epoch 999 \
--avg 1 \
--max-duration $max_duration \
--exp-dir pruned_transducer_stateless2/exp
done
rm pruned_transducer_stateless2/exp/*.pt
fi

View File

@ -0,0 +1,172 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/gigaspeech/ASR
repo_url=https://huggingface.co/yfyeung/icefall-asr-gigaspeech-zipformer-2023-10-17
log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "data/lang_bpe_500/tokens.txt"
git lfs pull --include "exp/jit_script.pt"
git lfs pull --include "exp/pretrained.pt"
rm epoch-30.pt
ln -s pretrained.pt epoch-30.pt
rm *.onnx
ls -lh
popd
log "----------------------------------------"
log "Export ONNX transducer models "
log "----------------------------------------"
./zipformer/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 30 \
--avg 1 \
--exp-dir $repo/exp
ls -lh $repo/exp
log "------------------------------------------------------------"
log "Test exported ONNX transducer models (Python code) "
log "------------------------------------------------------------"
log "test fp32"
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-30-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-30-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-30-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
log "test int8"
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-30-avg-1.int8.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-30-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-30-avg-1.int8.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
log "Upload models to huggingface"
git config --global user.name "k2-fsa"
git config --global user.email "xxx@gmail.com"
url=https://huggingface.co/k2-fsa/sherpa-onnx-zipformer-gigaspeech-2023-12-12
GIT_LFS_SKIP_SMUDGE=1 git clone $url
dst=$(basename $url)
cp -v $repo/exp/*.onnx $dst
cp -v $repo/data/lang_bpe_500/tokens.txt $dst
cp -v $repo/data/lang_bpe_500/bpe.model $dst
mkdir -p $dst/test_wavs
cp -v $repo/test_wavs/*.wav $dst/test_wavs
cd $dst
git lfs track "*.onnx"
git add .
git commit -m "upload model" && git push https://k2-fsa:${HF_TOKEN}@huggingface.co/k2-fsa/$dst main || true
log "Upload models to https://github.com/k2-fsa/sherpa-onnx"
rm -rf .git
rm -fv .gitattributes
cd ..
tar cjfv $dst.tar.bz2 $dst
ls -lh
mv -v $dst.tar.bz2 ../../../
log "Export to torchscript model"
./zipformer/export.py \
--exp-dir $repo/exp \
--use-averaged-model false \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 30 \
--avg 1 \
--jit 1
ls -lh $repo/exp/*.pt
log "Decode with models exported by torch.jit.script()"
./zipformer/jit_pretrained.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--nn-model-filename $repo/exp/jit_script.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
for method in greedy_search modified_beam_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done
echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_NAME}" == x"workflow_dispatch" || x"${GITHUB_EVENT_LABEL_NAME}" == x"run-decode" ]]; then
mkdir -p zipformer/exp
ln -s $PWD/$repo/exp/pretrained.pt zipformer/exp/epoch-30.pt
mkdir -p data
ln -s $PWD/$repo/data/lang_bpe_500 data/
ls -lh data
ls -lh zipformer/exp
mkdir -p data/fbank
pushd data/fbank
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/cuts_DEV.jsonl.gz
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/cuts_TEST.jsonl.gz
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/feats_DEV.lca
curl -SL -O https://huggingface.co/csukuangfj/giga-dev-dataset-fbank/resolve/main/data/fbank/feats_TEST.lca
ln -sf cuts_DEV.jsonl.gz gigaspeech_cuts_DEV.jsonl.gz
ln -sf cuts_TEST.jsonl.gz gigaspeech_cuts_TEST.jsonl.gz
popd
log "Decoding test-clean and test-other"
# use a small value for decoding with CPU
max_duration=100
for method in greedy_search; do
log "Decoding with $method"
./zipformer/decode.py \
--decoding-method $method \
--epoch 30 \
--avg 1 \
--use-averaged-model 0 \
--max-duration $max_duration \
--exp-dir zipformer/exp
done
rm zipformer/exp/*.pt
fi

View File

@ -0,0 +1,191 @@
#!/usr/bin/env bash
#
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/librispeech/ASR
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
abs_repo=$(realpath $repo)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
ln -s pretrained-iter-468000-avg-16.pt pretrained.pt
ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt
popd
log "Test exporting with torch.jit.trace()"
./lstm_transducer_stateless2/export.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--jit-trace 1
log "Decode with models exported by torch.jit.trace()"
./lstm_transducer_stateless2/jit_pretrained.py \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--encoder-model-filename $repo/exp/encoder_jit_trace.pt \
--decoder-model-filename $repo/exp/decoder_jit_trace.pt \
--joiner-model-filename $repo/exp/joiner_jit_trace.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"
./lstm_transducer_stateless2/pretrained.py \
--method greedy_search \
--max-sym-per-frame $sym \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done
for method in modified_beam_search beam_search fast_beam_search; do
log "$method"
./lstm_transducer_stateless2/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done
echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_LABEL_NAME}" == x"shallow-fusion" ]]; then
lm_repo_url=https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
log "Download pre-trained RNN-LM model from ${lm_repo_url}"
GIT_LFS_SKIP_SMUDGE=1 git clone $lm_repo_url
lm_repo=$(basename $lm_repo_url)
pushd $lm_repo
git lfs pull --include "exp/pretrained.pt"
mv exp/pretrained.pt exp/epoch-88.pt
popd
mkdir -p lstm_transducer_stateless2/exp
ln -sf $PWD/$repo/exp/pretrained.pt lstm_transducer_stateless2/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/
ls -lh data
ls -lh lstm_transducer_stateless2/exp
log "Decoding test-clean and test-other with RNN LM"
./lstm_transducer_stateless2/decode.py \
--use-averaged-model 0 \
--epoch 999 \
--avg 1 \
--exp-dir lstm_transducer_stateless2/exp \
--max-duration 600 \
--decoding-method modified_beam_search_lm_shallow_fusion \
--beam 4 \
--use-shallow-fusion 1 \
--lm-type rnn \
--lm-exp-dir $lm_repo/exp \
--lm-epoch 88 \
--lm-avg 1 \
--lm-scale 0.3 \
--rnn-lm-num-layers 3 \
--rnn-lm-tie-weights 1
fi
if [[ x"${GITHUB_EVENT_LABEL_NAME}" == x"LODR" ]]; then
bigram_repo_url=https://huggingface.co/marcoyang/librispeech_bigram
log "Download bi-gram LM from ${bigram_repo_url}"
GIT_LFS_SKIP_SMUDGE=1 git clone $bigram_repo_url
bigramlm_repo=$(basename $bigram_repo_url)
pushd $bigramlm_repo
git lfs pull --include "2gram.fst.txt"
cp 2gram.fst.txt $abs_repo/data/lang_bpe_500/.
popd
lm_repo_url=https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
log "Download pre-trained RNN-LM model from ${lm_repo_url}"
GIT_LFS_SKIP_SMUDGE=1 git clone $lm_repo_url
lm_repo=$(basename $lm_repo_url)
pushd $lm_repo
git lfs pull --include "exp/pretrained.pt"
mv exp/pretrained.pt exp/epoch-88.pt
popd
mkdir -p lstm_transducer_stateless2/exp
ln -sf $PWD/$repo/exp/pretrained.pt lstm_transducer_stateless2/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/
ls -lh data
ls -lh lstm_transducer_stateless2/exp
log "Decoding test-clean and test-other"
./lstm_transducer_stateless2/decode.py \
--use-averaged-model 0 \
--epoch 999 \
--avg 1 \
--exp-dir lstm_transducer_stateless2/exp \
--max-duration 600 \
--decoding-method modified_beam_search_LODR \
--beam 4 \
--use-shallow-fusion 1 \
--lm-type rnn \
--lm-exp-dir $lm_repo/exp \
--lm-scale 0.4 \
--lm-epoch 88 \
--rnn-lm-avg 1 \
--rnn-lm-num-layers 3 \
--rnn-lm-tie-weights 1 \
--tokens-ngram 2 \
--ngram-lm-scale -0.16
fi
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_NAME}" == x"workflow_dispatch" ]]; then
mkdir -p lstm_transducer_stateless2/exp
ln -s $PWD/$repo/exp/pretrained.pt lstm_transducer_stateless2/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/
ls -lh data
ls -lh lstm_transducer_stateless2/exp
log "Decoding test-clean and test-other"
# use a small value for decoding with CPU
max_duration=100
for method in greedy_search fast_beam_search; do
log "Decoding with $method"
./lstm_transducer_stateless2/decode.py \
--decoding-method $method \
--epoch 999 \
--avg 1 \
--use-averaged-model 0 \
--max-duration $max_duration \
--exp-dir lstm_transducer_stateless2/exp
done
rm lstm_transducer_stateless2/exp/*.pt
fi

135
.github/scripts/run-multi-corpora-zipformer.sh vendored Executable file
View File

@ -0,0 +1,135 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/multi_zh-hans/ASR
log "==== Test icefall-asr-multi-zh-hans-zipformer-2023-9-2 ===="
repo_url=https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-2023-9-2/
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
ln -s epoch-20.pt epoch-99.pt
popd
ls -lh $repo/exp/*.pt
./zipformer/pretrained.py \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--method greedy_search \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
for method in modified_beam_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
done
rm -rf $repo
log "==== Test icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24 ===="
repo_url=https://huggingface.co/zrjin/icefall-asr-multi-zh-hans-zipformer-ctc-2023-10-24/
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
ln -s epoch-20.pt epoch-99.pt
popd
ls -lh $repo/exp/*.pt
./zipformer/pretrained.py \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
--use-ctc 1 \
--method greedy_search \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
for method in modified_beam_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--beam-size 4 \
--use-ctc 1 \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_2000/tokens.txt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
done
rm -rf $repo
cd ../../../egs/multi_zh_en/ASR
log "==== Test icefall-asr-zipformer-multi-zh-en-2023-11-22 ===="
repo_url=https://huggingface.co/zrjin/icefall-asr-zipformer-multi-zh-en-2023-11-22/
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
./zipformer/pretrained.py \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bbpe_2000/bbpe.model \
--method greedy_search \
$repo/test_wavs/_1634_210_2577_1_1525157964032_3712259_29.wav \
$repo/test_wavs/_1634_210_2577_1_1525157964032_3712259_55.wav \
$repo/test_wavs/_1634_210_2577_1_1525157964032_3712259_75.wav
for method in modified_beam_search fast_beam_search; do
log "$method"
./zipformer/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--bpe-model $repo/data/lang_bbpe_2000/bbpe.model \
$repo/test_wavs/_1634_210_2577_1_1525157964032_3712259_29.wav \
$repo/test_wavs/_1634_210_2577_1_1525157964032_3712259_55.wav \
$repo/test_wavs/_1634_210_2577_1_1525157964032_3712259_75.wav
done
rm -rf $repo

View File

@ -0,0 +1,44 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/swbd/ASR
repo_url=https://huggingface.co/zrjin/icefall-asr-swbd-conformer-ctc-2023-8-26
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
ln -s epoch-98.pt epoch-99.pt
popd
ls -lh $repo/exp/*.pt
for method in ctc-decoding 1best; do
log "$method"
./conformer_ctc/pretrained.py \
--method $method \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--words-file $repo/data/lang_bpe_500/words.txt \
--HLG $repo/data/lang_bpe_500/HLG.pt \
--G $repo/data/lm/G_4_gram.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

View File

@ -0,0 +1,119 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/wenetspeech/ASR
repo_url=https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2
log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)
log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav
pushd $repo/exp
ln -s pretrained_epoch_10_avg_2.pt pretrained.pt
ln -s pretrained_epoch_10_avg_2.pt epoch-99.pt
popd
log "Test exporting to ONNX format"
./pruned_transducer_stateless2/export-onnx.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_char/tokens.txt \
--epoch 99 \
--avg 1
log "Export to torchscript model"
./pruned_transducer_stateless2/export.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_char/tokens.txt \
--epoch 99 \
--avg 1 \
--jit 1
./pruned_transducer_stateless2/export.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_char/tokens.txt \
--epoch 99 \
--avg 1 \
--jit-trace 1
ls -lh $repo/exp/*.onnx
ls -lh $repo/exp/*.pt
log "Decode with ONNX models"
./pruned_transducer_stateless2/onnx_check.py \
--jit-filename $repo/exp/cpu_jit.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-10-avg-2.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-10-avg-2.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-10-avg-2.onnx \
--onnx-joiner-encoder-proj-filename $repo/exp/joiner_encoder_proj-epoch-10-avg-2.onnx \
--onnx-joiner-decoder-proj-filename $repo/exp/joiner_decoder_proj-epoch-10-avg-2.onnx
./pruned_transducer_stateless2/onnx_pretrained.py \
--tokens $repo/data/lang_char/tokens.txt \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
log "Decode with models exported by torch.jit.trace()"
./pruned_transducer_stateless2/jit_pretrained.py \
--tokens $repo/data/lang_char/tokens.txt \
--encoder-model-filename $repo/exp/encoder_jit_trace.pt \
--decoder-model-filename $repo/exp/decoder_jit_trace.pt \
--joiner-model-filename $repo/exp/joiner_jit_trace.pt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
./pruned_transducer_stateless2/jit_pretrained.py \
--tokens $repo/data/lang_char/tokens.txt \
--encoder-model-filename $repo/exp/encoder_jit_script.pt \
--decoder-model-filename $repo/exp/decoder_jit_script.pt \
--joiner-model-filename $repo/exp/joiner_jit_script.pt \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
for sym in 1 2 3; do
log "Greedy search with --max-sym-per-frame $sym"
./pruned_transducer_stateless2/pretrained.py \
--checkpoint $repo/exp/epoch-99.pt \
--lang-dir $repo/data/lang_char \
--decoding-method greedy_search \
--max-sym-per-frame $sym \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
done
for method in modified_beam_search beam_search fast_beam_search; do
log "$method"
./pruned_transducer_stateless2/pretrained.py \
--decoding-method $method \
--beam-size 4 \
--checkpoint $repo/exp/epoch-99.pt \
--lang-dir $repo/data/lang_char \
$repo/test_wavs/DEV_T0000000000.wav \
$repo/test_wavs/DEV_T0000000001.wav \
$repo/test_wavs/DEV_T0000000002.wav
done

230
.github/scripts/test-ncnn-export.sh vendored Executable file
View File

@ -0,0 +1,230 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
pushd egs/librispeech/ASR
log "Install ncnn and pnnx"
# We are using a modified ncnn here. Will try to merge it to the official repo
# of ncnn
git clone https://github.com/csukuangfj/ncnn
pushd ncnn
git submodule init
git submodule update python/pybind11
python3 setup.py bdist_wheel
ls -lh dist/
pip install dist/*.whl
cd tools/pnnx
mkdir build
cd build
echo "which python3"
which python3
#/opt/hostedtoolcache/Python/3.8.16/x64/bin/python3
cmake -D Python3_EXECUTABLE=$(which python3) ..
make -j4 pnnx
./src/pnnx || echo "pass"
popd
export PATH=$PWD/ncnn/tools/pnnx/build/src:$PATH
log "=========================================================================="
repo_url=https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained-epoch-30-avg-10-averaged.pt"
cd exp
ln -s pretrained-epoch-30-avg-10-averaged.pt epoch-99.pt
popd
log "Export via torch.jit.trace()"
./conv_emformer_transducer_stateless2/export-for-ncnn.py \
--exp-dir $repo/exp \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--num-encoder-layers 12 \
--chunk-length 32 \
--cnn-module-kernel 31 \
--left-context-length 32 \
--right-context-length 8 \
--memory-size 32
pnnx $repo/exp/encoder_jit_trace-pnnx.pt
pnnx $repo/exp/decoder_jit_trace-pnnx.pt
pnnx $repo/exp/joiner_jit_trace-pnnx.pt
python3 ./conv_emformer_transducer_stateless2/streaming-ncnn-decode.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/1089-134686-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained-iter-468000-avg-16.pt"
cd exp
ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt
popd
log "Export via torch.jit.trace()"
./lstm_transducer_stateless2/export-for-ncnn.py \
--exp-dir $repo/exp \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--use-averaged-model 0
pnnx $repo/exp/encoder_jit_trace-pnnx.pt
pnnx $repo/exp/decoder_jit_trace-pnnx.pt
pnnx $repo/exp/joiner_jit_trace-pnnx.pt
python3 ./lstm_transducer_stateless2/streaming-ncnn-decode.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/1089-134686-0001.wav
python3 ./lstm_transducer_stateless2/ncnn-decode.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/1089-134686-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained.pt"
cd exp
ln -s pretrained.pt epoch-99.pt
popd
./pruned_transducer_stateless7_streaming/export-for-ncnn.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--exp-dir $repo/exp \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
\
--decode-chunk-len 32 \
--num-encoder-layers "2,4,3,2,4" \
--feedforward-dims "1024,1024,2048,2048,1024" \
--nhead "8,8,8,8,8" \
--encoder-dims "384,384,384,384,384" \
--attention-dims "192,192,192,192,192" \
--encoder-unmasked-dims "256,256,256,256,256" \
--zipformer-downsampling-factors "1,2,4,8,2" \
--cnn-module-kernels "31,31,31,31,31" \
--decoder-dim 512 \
--joiner-dim 512
pnnx $repo/exp/encoder_jit_trace-pnnx.pt
pnnx $repo/exp/decoder_jit_trace-pnnx.pt
pnnx $repo/exp/joiner_jit_trace-pnnx.pt
python3 ./pruned_transducer_stateless7_streaming/streaming-ncnn-decode.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/1089-134686-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/pfluo/k2fsa-zipformer-chinese-english-mixed
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_char_bpe/L.pt"
git lfs pull --include "data/lang_char_bpe/L_disambig.pt"
git lfs pull --include "data/lang_char_bpe/Linv.pt"
git lfs pull --include "exp/pretrained.pt"
cd exp
ln -s pretrained.pt epoch-9999.pt
popd
./pruned_transducer_stateless7_streaming/export-for-ncnn-zh.py \
--tokens $repo/data/lang_char_bpe/tokens.txt \
--exp-dir $repo/exp \
--use-averaged-model 0 \
--epoch 9999 \
--avg 1 \
--decode-chunk-len 32 \
--num-encoder-layers "2,4,3,2,4" \
--feedforward-dims "1024,1024,1536,1536,1024" \
--nhead "8,8,8,8,8" \
--encoder-dims "384,384,384,384,384" \
--attention-dims "192,192,192,192,192" \
--encoder-unmasked-dims "256,256,256,256,256" \
--zipformer-downsampling-factors "1,2,4,8,2" \
--cnn-module-kernels "31,31,31,31,31" \
--decoder-dim 512 \
--joiner-dim 512
pnnx $repo/exp/encoder_jit_trace-pnnx.pt
pnnx $repo/exp/decoder_jit_trace-pnnx.pt
pnnx $repo/exp/joiner_jit_trace-pnnx.pt
python3 ./pruned_transducer_stateless7_streaming/streaming-ncnn-decode.py \
--tokens $repo/data/lang_char_bpe/tokens.txt \
--encoder-param-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.param \
--encoder-bin-filename $repo/exp/encoder_jit_trace-pnnx.ncnn.bin \
--decoder-param-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.param \
--decoder-bin-filename $repo/exp/decoder_jit_trace-pnnx.ncnn.bin \
--joiner-param-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.param \
--joiner-bin-filename $repo/exp/joiner_jit_trace-pnnx.ncnn.bin \
$repo/test_wavs/0.wav
rm -rf $repo
log "--------------------------------------------------------------------------"

466
.github/scripts/test-onnx-export.sh vendored Executable file
View File

@ -0,0 +1,466 @@
#!/usr/bin/env bash
set -e
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/librispeech/ASR
log "=========================================================================="
repo_url=https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-2023-05-15
log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained.pt"
cd exp
ln -s pretrained.pt epoch-99.pt
popd
log "Export via torch.jit.script()"
./zipformer/export.py \
--use-averaged-model 0 \
--exp-dir $repo/exp \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--jit 1
log "Test export to ONNX format"
./zipformer/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--num-encoder-layers "2,2,3,4,3,2" \
--downsampling-factor "1,2,4,8,4,2" \
--feedforward-dim "512,768,1024,1536,1024,768" \
--num-heads "4,4,4,8,4,4" \
--encoder-dim "192,256,384,512,384,256" \
--query-head-dim 32 \
--value-head-dim 12 \
--pos-head-dim 4 \
--pos-dim 48 \
--encoder-unmasked-dim "192,192,256,256,256,192" \
--cnn-module-kernel "31,31,15,15,15,31" \
--decoder-dim 512 \
--joiner-dim 512 \
--causal False \
--chunk-size "16,32,64,-1" \
--left-context-frames "64,128,256,-1"
ls -lh $repo/exp
log "Run onnx_check.py"
./zipformer/onnx_check.py \
--jit-filename $repo/exp/jit_script.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-99-avg-1.onnx
log "Run onnx_pretrained.py"
./zipformer/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav
rm -rf $repo
repo_url=https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17
log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained.pt"
cd exp
ln -s pretrained.pt epoch-99.pt
popd
log "Test export streaming model to ONNX format"
./zipformer/export-onnx-streaming.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--num-encoder-layers "2,2,3,4,3,2" \
--downsampling-factor "1,2,4,8,4,2" \
--feedforward-dim "512,768,1024,1536,1024,768" \
--num-heads "4,4,4,8,4,4" \
--encoder-dim "192,256,384,512,384,256" \
--query-head-dim 32 \
--value-head-dim 12 \
--pos-head-dim 4 \
--pos-dim 48 \
--encoder-unmasked-dim "192,192,256,256,256,192" \
--cnn-module-kernel "31,31,15,15,15,31" \
--decoder-dim 512 \
--joiner-dim 512 \
--causal True \
--chunk-size 16 \
--left-context-frames 64
ls -lh $repo/exp
log "Run onnx_pretrained-streaming.py"
./zipformer/onnx_pretrained-streaming.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1-chunk-16-left-64.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1-chunk-16-left-64.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1-chunk-16-left-64.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained.pt"
cd exp
ln -s pretrained.pt epoch-99.pt
popd
log "Export via torch.jit.trace()"
./pruned_transducer_stateless7_streaming/jit_trace_export.py \
--bpe-model $repo/data/lang_bpe_500/bpe.model \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--decode-chunk-len 32 \
--exp-dir $repo/exp/
log "Test exporting to ONNX format"
./pruned_transducer_stateless7_streaming/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--decode-chunk-len 32 \
--exp-dir $repo/exp/
ls -lh $repo/exp
log "Run onnx_check.py"
./pruned_transducer_stateless7_streaming/onnx_check.py \
--jit-encoder-filename $repo/exp/encoder_jit_trace.pt \
--jit-decoder-filename $repo/exp/decoder_jit_trace.pt \
--jit-joiner-filename $repo/exp/joiner_jit_trace.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-99-avg-1.onnx
log "Run onnx_pretrained.py"
./pruned_transducer_stateless7_streaming/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13
log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained-iter-1224000-avg-14.pt"
cd exp
ln -s pretrained-iter-1224000-avg-14.pt epoch-9999.pt
popd
log "Export via torch.jit.script()"
./pruned_transducer_stateless3/export.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 9999 \
--avg 1 \
--exp-dir $repo/exp/ \
--jit 1
log "Test exporting to ONNX format"
./pruned_transducer_stateless3/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 9999 \
--avg 1 \
--exp-dir $repo/exp/
ls -lh $repo/exp
log "Run onnx_check.py"
./pruned_transducer_stateless3/onnx_check.py \
--jit-filename $repo/exp/cpu_jit.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-9999-avg-1.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-9999-avg-1.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-9999-avg-1.onnx
log "Run onnx_pretrained.py"
./pruned_transducer_stateless3/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-9999-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-9999-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-9999-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless5-2022-05-13
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained-epoch-39-avg-7.pt"
cd exp
ln -s pretrained-epoch-39-avg-7.pt epoch-99.pt
popd
log "Export via torch.jit.script()"
./pruned_transducer_stateless5/export.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--exp-dir $repo/exp \
--num-encoder-layers 18 \
--dim-feedforward 2048 \
--nhead 8 \
--encoder-dim 512 \
--decoder-dim 512 \
--joiner-dim 512 \
--jit 1
log "Test exporting to ONNX format"
./pruned_transducer_stateless5/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--use-averaged-model 0 \
--exp-dir $repo/exp \
--num-encoder-layers 18 \
--dim-feedforward 2048 \
--nhead 8 \
--encoder-dim 512 \
--decoder-dim 512 \
--joiner-dim 512
ls -lh $repo/exp
log "Run onnx_check.py"
./pruned_transducer_stateless5/onnx_check.py \
--jit-filename $repo/exp/cpu_jit.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-99-avg-1.onnx
log "Run onnx_pretrained.py"
./pruned_transducer_stateless5/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=
rm -rf $repo
log "--------------------------------------------------------------------------"
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless7-2022-11-11
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "exp/pretrained.pt"
cd exp
ln -s pretrained.pt epoch-99.pt
popd
log "Export via torch.jit.script()"
./pruned_transducer_stateless7/export.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--feedforward-dims "1024,1024,2048,2048,1024" \
--jit 1
log "Test exporting to ONNX format"
./pruned_transducer_stateless7/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--feedforward-dims "1024,1024,2048,2048,1024"
ls -lh $repo/exp
log "Run onnx_check.py"
./pruned_transducer_stateless7/onnx_check.py \
--jit-filename $repo/exp/cpu_jit.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-99-avg-1.onnx
log "Run onnx_pretrained.py"
./pruned_transducer_stateless7/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
log "=========================================================================="
repo_url=https://huggingface.co/Zengwei/icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained-epoch-30-avg-10-averaged.pt"
cd exp
ln -s pretrained-epoch-30-avg-10-averaged.pt epoch-99.pt
popd
log "Test exporting to ONNX format"
./conv_emformer_transducer_stateless2/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp \
--num-encoder-layers 12 \
--chunk-length 32 \
--cnn-module-kernel 31 \
--left-context-length 32 \
--right-context-length 8 \
--memory-size 32
log "Run onnx_pretrained.py"
./conv_emformer_transducer_stateless2/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1221-135766-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"
log "=========================================================================="
repo_url=https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)
pushd $repo
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "exp/pretrained-iter-468000-avg-16.pt"
cd exp
ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt
popd
log "Export via torch.jit.trace()"
./lstm_transducer_stateless2/export.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp/ \
--jit-trace 1
log "Test exporting to ONNX format"
./lstm_transducer_stateless2/export-onnx.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $repo/exp
ls -lh $repo/exp
log "Run onnx_check.py"
./lstm_transducer_stateless2/onnx_check.py \
--jit-encoder-filename $repo/exp/encoder_jit_trace.pt \
--jit-decoder-filename $repo/exp/decoder_jit_trace.pt \
--jit-joiner-filename $repo/exp/joiner_jit_trace.pt \
--onnx-encoder-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--onnx-decoder-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--onnx-joiner-filename $repo/exp/joiner-epoch-99-avg-1.onnx
log "Run onnx_pretrained.py"
./lstm_transducer_stateless2/onnx_pretrained.py \
--encoder-model-filename $repo/exp/encoder-epoch-99-avg-1.onnx \
--decoder-model-filename $repo/exp/decoder-epoch-99-avg-1.onnx \
--joiner-model-filename $repo/exp/joiner-epoch-99-avg-1.onnx \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1221-135766-0001.wav
rm -rf $repo
log "--------------------------------------------------------------------------"

196
.github/scripts/wenetspeech/ASR/run_rknn.sh vendored Executable file
View File

@ -0,0 +1,196 @@
#!/usr/bin/env bash
set -ex
python3 -m pip install kaldi-native-fbank soundfile librosa
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/wenetspeech/ASR
#https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#k2-fsa-icefall-asr-zipformer-wenetspeech-streaming-small-chinese
function export_2025_03_02() {
d=exp_2025_03_02
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/k2-fsa/icefall-asr-zipformer-wenetspeech-streaming-small/resolve/main/data/lang_char/tokens.txt
curl -SL -O https://huggingface.co/k2-fsa/icefall-asr-zipformer-wenetspeech-streaming-small/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -o 0.wav https://huggingface.co/k2-fsa/icefall-asr-zipformer-wenetspeech-streaming-small/resolve/main/test_wavs/DEV_T0000000000.wav
curl -SL -o 1.wav https://huggingface.co/k2-fsa/icefall-asr-zipformer-wenetspeech-streaming-small/resolve/main/test_wavs/DEV_T0000000001.wav
curl -SL -o 2.wav https://huggingface.co/k2-fsa/icefall-asr-zipformer-wenetspeech-streaming-small/resolve/main/test_wavs/DEV_T0000000002.wav
ls -lh
popd
./zipformer/export-onnx-streaming.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d \
--use-ctc 0 \
--use-transducer 1 \
\
--num-encoder-layers 2,2,2,2,2,2 \
--feedforward-dim 512,768,768,768,768,768 \
--encoder-dim 192,256,256,256,256,256 \
--encoder-unmasked-dim 192,192,192,192,192,192 \
\
--chunk-size 32 \
--left-context-frames 128 \
--causal 1
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-small-zh-2025-03-02
mkdir -p $dst
./zipformer/export_rknn_transducer_streaming.py \
--in-encoder $d/encoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-decoder $d/decoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-joiner $d/joiner-epoch-99-avg-1-chunk-32-left-128.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
rm -rf $dst
done
rm -rf $d
}
# https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-transducer/zipformer-transducer-models.html#k2-fsa-icefall-asr-zipformer-wenetspeech-streaming-large-chinese
function export_2025_03_03() {
d=exp_2025_03_03
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/data/lang_char/tokens.txt
curl -SL -O https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -o 0.wav https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/test_wavs/DEV_T0000000000.wav
curl -SL -o 1.wav https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/test_wavs/DEV_T0000000001.wav
curl -SL -o 2.wav https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/test_wavs/DEV_T0000000002.wav
ls -lh
popd
./zipformer/export-onnx-streaming.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d \
--use-ctc 0 \
--use-transducer 1 \
\
--chunk-size 32 \
--left-context-frames 128 \
--causal 1
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-zh-2025-03-03
mkdir -p $dst
./zipformer/export_rknn_transducer_streaming.py \
--in-encoder $d/encoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-decoder $d/decoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-joiner $d/joiner-epoch-99-avg-1-chunk-32-left-128.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
ls -lh $dst.tar.bz2
rm -rf $dst
done
rm -rf $d
}
function export_2023_06_15() {
d=exp_2023_06_15
mkdir $d
pushd $d
curl -SL -O https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/data/lang_char/tokens.txt
curl -SL -O https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/exp/pretrained.pt
mv pretrained.pt epoch-99.pt
curl -SL -o 0.wav https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/test_wavs/DEV_T0000000000.wav
curl -SL -o 1.wav https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/test_wavs/DEV_T0000000001.wav
curl -SL -o 2.wav https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615/resolve/main/test_wavs/DEV_T0000000002.wav
ls -lh
popd
./zipformer/export-onnx-streaming.py \
--dynamic-batch 0 \
--enable-int8-quantization 0 \
--tokens $d/tokens.txt \
--use-averaged-model 0 \
--epoch 99 \
--avg 1 \
--exp-dir $d \
--use-ctc 0 \
--use-transducer 1 \
\
--chunk-size 32 \
--left-context-frames 128 \
--causal 1
for platform in rk3562 rk3566 rk3568 rk3576 rk3588; do
dst=sherpa-onnx-$platform-streaming-zipformer-zh-2023-06-15
mkdir -p $dst
./zipformer/export_rknn_transducer_streaming.py \
--in-encoder $d/encoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-decoder $d/decoder-epoch-99-avg-1-chunk-32-left-128.onnx \
--in-joiner $d/joiner-epoch-99-avg-1-chunk-32-left-128.onnx \
--out-encoder $dst/encoder.rknn \
--out-decoder $dst/decoder.rknn \
--out-joiner $dst/joiner.rknn \
--target-platform $platform
cp $d/tokens.txt $dst
mkdir $dst/test_wavs
cp $d/*.wav $dst/test_wavs
tar cjvf $dst.tar.bz2 $dst
ls -lh $dst.tar.bz2
mv $dst.tar.bz2 /icefall/
ls -lh $dst/
echo "---"
ls -lh $dst.tar.bz2
rm -rf $dst
done
}
export_2025_03_02
export_2025_03_03
export_2023_06_15

86
.github/scripts/yesno/ASR/run.sh vendored Executable file
View File

@ -0,0 +1,86 @@
#!/usr/bin/env bash
set -ex
log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}
cd egs/yesno/ASR
log "data preparation"
./prepare.sh
log "training"
python3 ./tdnn/train.py
log "decoding"
python3 ./tdnn/decode.py
log "export to pretrained.pt"
python3 ./tdnn/export.py --epoch 14 --avg 2
python3 ./tdnn/pretrained.py \
--checkpoint ./tdnn/exp/pretrained.pt \
--HLG ./data/lang_phone/HLG.pt \
--words-file ./data/lang_phone/words.txt \
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
download/waves_yesno/0_0_1_0_0_0_1_0.wav
log "Test exporting to torchscript"
python3 ./tdnn/export.py --epoch 14 --avg 2 --jit 1
python3 ./tdnn/jit_pretrained.py \
--nn-model ./tdnn/exp/cpu_jit.pt \
--HLG ./data/lang_phone/HLG.pt \
--words-file ./data/lang_phone/words.txt \
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
download/waves_yesno/0_0_1_0_0_0_1_0.wav
log "Test exporting to onnx"
python3 ./tdnn/export_onnx.py --epoch 14 --avg 2
log "Test float32 model"
python3 ./tdnn/onnx_pretrained.py \
--nn-model ./tdnn/exp/model-epoch-14-avg-2.onnx \
--HLG ./data/lang_phone/HLG.pt \
--words-file ./data/lang_phone/words.txt \
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
download/waves_yesno/0_0_1_0_0_0_1_0.wav
log "Test int8 model"
python3 ./tdnn/onnx_pretrained.py \
--nn-model ./tdnn/exp/model-epoch-14-avg-2.int8.onnx \
--HLG ./data/lang_phone/HLG.pt \
--words-file ./data/lang_phone/words.txt \
download/waves_yesno/0_0_0_1_0_0_0_1.wav \
download/waves_yesno/0_0_1_0_0_0_1_0.wav
log "Test decoding with H"
python3 ./tdnn/export.py --epoch 14 --avg 2 --jit 1
python3 ./tdnn/jit_pretrained_decode_with_H.py \
--nn-model ./tdnn/exp/cpu_jit.pt \
--H ./data/lang_phone/H.fst \
--tokens ./data/lang_phone/tokens.txt \
./download/waves_yesno/0_0_0_1_0_0_0_1.wav \
./download/waves_yesno/0_0_1_0_0_0_1_0.wav \
./download/waves_yesno/0_0_1_0_0_1_1_1.wav
log "Test decoding with HL"
python3 ./tdnn/export.py --epoch 14 --avg 2 --jit 1
python3 ./tdnn/jit_pretrained_decode_with_HL.py \
--nn-model ./tdnn/exp/cpu_jit.pt \
--HL ./data/lang_phone/HL.fst \
--words ./data/lang_phone/words.txt \
./download/waves_yesno/0_0_0_1_0_0_0_1.wav \
./download/waves_yesno/0_0_1_0_0_0_1_0.wav \
./download/waves_yesno/0_0_1_0_0_1_1_1.wav
log "Show generated files"
ls -lh tdnn/exp
ls -lh data/lang_phone

72
.github/workflows/aishell.yml vendored Normal file
View File

@ -0,0 +1,72 @@
name: aishell
on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: aishell-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
echo "::set-output name=matrix::${MATRIX}"
aishell:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Run aishell tests
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
.github/scripts/aishell/ASR/run.sh

137
.github/workflows/audioset.yml vendored Normal file
View File

@ -0,0 +1,137 @@
name: audioset
on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: audioset-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
echo "::set-output name=matrix::${MATRIX}"
audioset:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
ls -lh
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Run tests
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
.github/scripts/audioset/AT/run.sh
- name: Show model files
shell: bash
run: |
sudo chown -R runner ./model-onnx
ls -lh ./model-onnx
chmod -x ./model-onnx/class_labels_indices.csv
echo "----------"
ls -lh ./model-onnx/*
- name: Upload model to huggingface
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0' && github.event_name == 'push'
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
uses: nick-fields/retry@v3
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
git config --global user.email "csukuangfj@gmail.com"
git config --global user.name "Fangjun Kuang"
rm -rf huggingface
export GIT_LFS_SKIP_SMUDGE=1
git clone https://huggingface.co/k2-fsa/sherpa-onnx-zipformer-audio-tagging-2024-04-09 huggingface
cd huggingface
git fetch
git pull
git merge -m "merge remote" --ff origin main
cp ../model-onnx/*.onnx ./
cp ../model-onnx/*.csv ./
cp -a ../model-onnx/test_wavs ./
ls -lh
git add .
git status
git commit -m "update models"
git status
git push https://csukuangfj:$HF_TOKEN@huggingface.co/k2-fsa/sherpa-onnx-zipformer-audio-tagging-2024-04-09 main || true
rm -rf huggingface
- name: Prepare for release
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0' && github.event_name == 'push'
shell: bash
run: |
d=sherpa-onnx-zipformer-audio-tagging-2024-04-09
mv ./model-onnx $d
tar cjvf ${d}.tar.bz2 $d
ls -lh
- name: Release exported onnx models
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0' && github.event_name == 'push'
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: sherpa-onnx-*.tar.bz2
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: audio-tagging-models

152
.github/workflows/baker_zh.yml vendored Normal file
View File

@ -0,0 +1,152 @@
name: baker_zh
on:
push:
branches:
- master
- baker-matcha-2
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: baker-zh-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
echo "::set-output name=matrix::${MATRIX}"
baker_zh:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
ls -lh
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Run tests
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
pip install onnx==1.17.0
pip list
git config --global --add safe.directory /icefall
.github/scripts/baker_zh/TTS/run-matcha.sh
- name: display files
shell: bash
run: |
ls -lh
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: generated-test-files-${{ matrix.python-version }}-${{ matrix.torch-version }}
path: ./*.wav
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-2
path: ./model-steps-2.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-3
path: ./model-steps-3.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-4
path: ./model-steps-4.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-5
path: ./model-steps-5.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-6
path: ./model-steps-6.onnx
- name: Upload models to huggingface
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0' && github.event_name == 'push'
shell: bash
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
d=matcha-icefall-zh-baker
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/$d hf
cp -av $d/* hf/
pushd hf
git add .
git config --global user.name "csukuangfj"
git config --global user.email "csukuangfj@gmail.com"
git config --global lfs.allowincompletepush true
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$d main || true
popd
- name: Release exported onnx models
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0' && github.event_name == 'push'
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: matcha-icefall-*.tar.bz2
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: tts-models

81
.github/workflows/build-cpu-docker.yml vendored Normal file
View File

@ -0,0 +1,81 @@
name: build-cpu-docker
on:
workflow_dispatch:
concurrency:
group: build-cpu-docker-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py)
echo "::set-output name=matrix::${MATRIX}"
build-cpu-docker:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
# refer to https://github.com/actions/checkout
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
- name: 'Login to GitHub Container Registry'
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build docker Image
shell: bash
run: |
cd .github/scripts/docker
torch_version=${{ matrix.torch-version }}
torchaudio_version=${{ matrix.torchaudio-version }}
echo "torch_version: $torch_version"
echo "torchaudio_version: $torchaudio_version"
version=${{ matrix.version }}
tag=ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v$version
echo "tag: $tag"
docker build \
-t $tag \
--build-arg PYTHON_VERSION=${{ matrix.python-version }} \
--build-arg TORCH_VERSION=$torch_version \
--build-arg TORCHAUDIO_VERSION=$torchaudio_version \
--build-arg K2_VERSION=${{ matrix.k2-version }} \
--build-arg KALDIFEAT_VERSION=${{ matrix.kaldifeat-version }} \
.
docker image ls
docker push $tag

74
.github/workflows/build-doc.yml vendored Normal file
View File

@ -0,0 +1,74 @@
# Copyright 2022 Xiaomi Corp. (author: Fangjun Kuang)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# refer to https://github.com/actions/starter-workflows/pull/47/files
# You can access it at https://k2-fsa.github.io/icefall/
name: Generate doc
on:
push:
branches:
- master
- doc
pull_request:
types: [labeled]
workflow_dispatch:
concurrency:
group: build_doc-${{ github.ref }}
cancel-in-progress: true
jobs:
build-doc:
# if: github.event.label.name == 'doc' || github.event_name == 'push'
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
python-version: ["3.8"]
steps:
# refer to https://github.com/actions/checkout
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Display Python version
run: python -c "import sys; print(sys.version)"
- name: Build doc
shell: bash
run: |
.github/scripts/generate-piper-phonemize-page.py
cd docs
python3 -m pip install -r ./requirements.txt
make html
touch build/html/.nojekyll
cp -v ../piper_phonemize.html ./build/html/
- name: Deploy
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./docs/build/html
publish_branch: gh-pages

View File

@ -0,0 +1,84 @@
# see also
# https://docs.github.com/en/actions/publishing-packages/publishing-docker-images#publishing-images-to-github-packages
name: Build docker image
on:
workflow_dispatch:
concurrency:
group: build_docker-${{ github.ref }}
cancel-in-progress: true
jobs:
build-docker-image:
name: ${{ matrix.image }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
image: ["torch2.4.1-cuda12.4", "torch2.4.1-cuda12.1", "torch2.4.1-cuda11.8", "torch2.4.0-cuda12.4", "torch2.4.0-cuda12.1", "torch2.4.0-cuda11.8", "torch2.3.1-cuda12.1", "torch2.3.1-cuda11.8", "torch2.2.2-cuda12.1", "torch2.2.2-cuda11.8", "torch2.2.1-cuda12.1", "torch2.2.1-cuda11.8", "torch2.2.0-cuda12.1", "torch2.2.0-cuda11.8", "torch2.1.0-cuda12.1", "torch2.1.0-cuda11.8", "torch2.0.0-cuda11.7", "torch1.13.0-cuda11.6", "torch1.12.1-cuda11.3", "torch1.9.0-cuda10.2"]
steps:
# refer to https://github.com/actions/checkout
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Rename
shell: bash
run: |
image=${{ matrix.image }}
mv -v ./docker/$image.dockerfile ./Dockerfile
- name: Free space
shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
- name: Free more space
shell: bash
run: |
# https://github.com/orgs/community/discussions/25678
cd /opt
find . -maxdepth 1 -mindepth 1 '!' -path ./containerd '!' -path ./actionarchivecache '!' -path ./runner '!' -path ./runner-cache -exec rm -rf '{}' ';'
sudo rm -rf /usr/share/dotnet
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tool-cache: false
# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true
- name: Check space
shell: bash
run: |
df -h
- name: Log in to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
file: ./Dockerfile
push: true
tags: k2fsa/icefall:${{ matrix.image }}

167
.github/workflows/ksponspeech.yml vendored Normal file
View File

@ -0,0 +1,167 @@
name: ksponspeech
on:
push:
branches:
- ksponspeech
workflow_dispatch:
jobs:
ksponspeech:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Test
shell: bash
run: |
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/ksponspeech/ASR/run.sh
- name: Show model files (2024-06-24)
shell: bash
run: |
src=/tmp/model-2024-06-24
ls -lh $src
- name: Show model files (2024-06-16)
shell: bash
run: |
src=/tmp/model-2024-06-16
ls -lh $src
- name: Upload model to huggingface (2024-06-24)
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
uses: nick-fields/retry@v3
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
src=/tmp/model-2024-06-24
git config --global user.email "csukuangfj@gmail.com"
git config --global user.name "Fangjun Kuang"
rm -rf hf
export GIT_LFS_SKIP_SMUDGE=1
export GIT_CLONE_PROTECTION_ACTIVE=false
git clone https://huggingface.co/k2-fsa/sherpa-onnx-zipformer-korean-2024-06-24 hf
cd hf
git fetch
git pull
git merge -m "merge remote" --ff origin main
cp -av $src/* ./
ls -lh
git lfs track "bpe.model"
git lfs track "*.onnx"
git add .
git status
git commit -m "update models"
git status
git push https://csukuangfj:$HF_TOKEN@huggingface.co/k2-fsa/sherpa-onnx-zipformer-korean-2024-06-24 main || true
rm -rf hf
- name: Upload model to huggingface (2024-06-16)
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
uses: nick-fields/retry@v3
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
src=/tmp/model-2024-06-16
git config --global user.email "csukuangfj@gmail.com"
git config --global user.name "Fangjun Kuang"
rm -rf hf
export GIT_LFS_SKIP_SMUDGE=1
export GIT_CLONE_PROTECTION_ACTIVE=false
git clone https://huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16 hf
cd hf
git fetch
git pull
git merge -m "merge remote" --ff origin main
cp -v $src/* ./
ls -lh
git lfs track "bpe.model"
git lfs track "*.onnx"
cp -av test_wavs $src/
git add .
git status
git commit -m "update models"
git status
git push https://csukuangfj:$HF_TOKEN@huggingface.co/k2-fsa/sherpa-onnx-streaming-zipformer-korean-2024-06-16 main || true
rm -rf hf
- name: Prepare for release (2024-06-16)
shell: bash
run: |
src=/tmp/model-2024-06-16
d=sherpa-onnx-streaming-zipformer-korean-2024-06-16
mv $src ./$d
tar cjvf ${d}.tar.bz2 $d
ls -lh
- name: Prepare for release (2024-06-24)
shell: bash
run: |
src=/tmp/model-2024-06-24
d=sherpa-onnx-zipformer-korean-2024-06-24
mv $src ./$d
tar cjvf ${d}.tar.bz2 $d
ls -lh
- name: Release exported onnx models
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: sherpa-onnx-*.tar.bz2
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: asr-models

72
.github/workflows/librispeech.yml vendored Normal file
View File

@ -0,0 +1,72 @@
name: librispeech
on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: librispeech-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
# MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10" --min-torch-version "2.6.0")
echo "::set-output name=matrix::${MATRIX}"
librispeech:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
# refer to https://github.com/actions/checkout
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Test zipformer/train.py with LibriSpeech
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
.github/scripts/librispeech/ASR/run.sh

166
.github/workflows/ljspeech.yml vendored Normal file
View File

@ -0,0 +1,166 @@
name: ljspeech
on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: ljspeech-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
echo "::set-output name=matrix::${MATRIX}"
ljspeech:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
ls -lh
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Run tests
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
pip install "matplotlib<=3.9.4"
pip list
.github/scripts/ljspeech/TTS/run-matcha.sh
.github/scripts/ljspeech/TTS/run.sh
- name: display files
shell: bash
run: |
ls -lh
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: generated-test-files-${{ matrix.python-version }}-${{ matrix.torch-version }}
path: ./*.wav
- name: Release exported onnx models
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0' && github.event_name == 'push'
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: vits-icefall-*.tar.bz2
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: tts-models
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-2
path: ./model-steps-2.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-3
path: ./model-steps-3.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-4
path: ./model-steps-4.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-5
path: ./model-steps-5.onnx
- uses: actions/upload-artifact@v4
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
with:
name: step-6
path: ./model-steps-6.onnx
- name: Upload models to huggingface
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
shell: bash
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
d=matcha-icefall-en_US-ljspeech
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/$d hf
cp -av $d/* hf/
pushd hf
git lfs track "cmn_dict"
git lfs track "ru_dict"
git add .
git config --global user.name "csukuangfj"
git config --global user.email "csukuangfj@gmail.com"
git config --global lfs.allowincompletepush true
git commit -m "upload model" && git push https://csukuangfj:${HF_TOKEN}@huggingface.co/csukuangfj/$d main || true
popd
- name: Release exported onnx models
if: matrix.python-version == '3.10' && matrix.torch-version == '2.3.0'
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: matcha-icefall-*.tar.bz2
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: tts-models

86
.github/workflows/multi-zh-hans.yml vendored Normal file
View File

@ -0,0 +1,86 @@
name: multi-zh-hans
on:
push:
branches:
- master
workflow_dispatch:
concurrency:
group: multi-zh-hans-${{ github.ref }}
cancel-in-progress: true
permissions:
contents: write
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --torch-version "2.7.0" --python-version "3.11"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --torch-version "2.7.0" --python-version "3.11")
echo "::set-output name=matrix::${MATRIX}"
multi-zh-hans:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Test with multi_zh-hans
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
export HF_TOKEN=${{ secrets.HF_TOKEN }}
cd /icefall
git config --global --add safe.directory /icefall
.github/scripts/multi_zh-hans/ASR/run.sh
- name: Show models
shell: bash
run: |
ls -lh *.tar.bz2
- name: upload model to https://github.com/k2-fsa/sherpa-onnx
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
file: ./*.tar.bz2
overwrite: true
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: asr-models

134
.github/workflows/rknn.yml vendored Normal file
View File

@ -0,0 +1,134 @@
name: rknn
on:
push:
branches:
- master
- rknn-zipformer2
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: rknn-${{ github.ref }}
cancel-in-progress: true
jobs:
rknn:
name: RKNN ${{ matrix.recipe }} ${{ matrix.rknn_toolkit2_version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10"]
k2-version: ["1.24.4.dev20241029"]
kaldifeat-version: ["1.25.5.dev20241029"]
torch-version: ["2.0.0"]
torchaudio-version: ["2.0.1"]
version: ["20241218"]
# recipe: ["librispeech", "wenetspeech", "multi_zh-hans"]
recipe: ["librispeech"]
rknn_toolkit2_version: ["2.2.0", "2.1.0"]
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Export RKNN model
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
cat /etc/*release
lsb_release -a
uname -a
python3 --version
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
python3 -m torch.utils.collect_env
python3 -m k2.version
pip list
export rknn_toolkit2_version=${{ matrix.rknn_toolkit2_version }}
if [[ $rknn_toolkit2_version == "2.1.0" ]]; then
# for the folder pruned_transducer_stateless7_streaming
curl -SL -O https://huggingface.co/csukuangfj/rknn-toolkit2/resolve/main/rknn_toolkit2-2.1.0%2B708089d1-cp310-cp310-linux_x86_64.whl
else
# for the folder zipformer/
curl -SL -O https://huggingface.co/csukuangfj/rknn-toolkit2/resolve/main/rknn_toolkit2-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
fi
# Install rknn
pip install ./*.whl "numpy<=1.26.4"
pip list | grep rknn
echo "---"
pip list
echo "---"
recipe=${{ matrix.recipe }}
.github/scripts/$recipe/ASR/run_rknn.sh > log-$recipe.txt 2>&1 || true
- uses: actions/upload-artifact@v4
with:
name: log-${{ matrix.recipe }}-${{ matrix.rknn_toolkit2_version }}
path: ./log-*.txt
- name: Display results
shell: bash
run: |
ls -lh *rk*.tar.bz2 || true
- name: Release to GitHub
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
overwrite: true
file: sherpa-onnx-*.tar.bz2
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: asr-models
- name: Upload model to huggingface
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
uses: nick-fields/retry@v3
with:
max_attempts: 20
timeout_seconds: 200
shell: bash
command: |
git config --global user.email "csukuangfj@gmail.com"
git config --global user.name "Fangjun Kuang"
rm -rf huggingface
export GIT_LFS_SKIP_SMUDGE=1
git clone https://huggingface.co/csukuangfj/sherpa-onnx-rknn-models huggingface
cd huggingface
git fetch
git pull
git merge -m "merge remote" --ff origin main
dst=streaming-asr
mkdir -p $dst
cp ../*rk*.tar.bz2 $dst/ || true
ls -lh $dst
git add .
git status
git commit -m "update models"
git status
git push https://csukuangfj:$HF_TOKEN@huggingface.co/csukuangfj/sherpa-onnx-rknn-models main || true
rm -rf huggingface

144
.github/workflows/run-docker-image.yml vendored Normal file
View File

@ -0,0 +1,144 @@
name: Run docker image
on:
workflow_dispatch:
concurrency:
group: run_docker_image-${{ github.ref }}
cancel-in-progress: true
jobs:
run-docker-image:
name: ${{ matrix.image }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
image: ["torch2.4.0-cuda12.4", "torch2.4.0-cuda12.1", "torch2.4.0-cuda11.8", "torch2.3.1-cuda12.1", "torch2.3.1-cuda11.8", "torch2.2.2-cuda12.1", "torch2.2.2-cuda11.8", "torch2.2.1-cuda12.1", "torch2.2.1-cuda11.8", "torch2.2.0-cuda12.1", "torch2.2.0-cuda11.8", "torch2.1.0-cuda12.1", "torch2.1.0-cuda11.8", "torch2.0.0-cuda11.7", "torch1.13.0-cuda11.6", "torch1.12.1-cuda11.3", "torch1.9.0-cuda10.2"]
steps:
# refer to https://github.com/actions/checkout
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Free space
shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
- name: Free more space
shell: bash
run: |
# https://github.com/orgs/community/discussions/25678
cd /opt
find . -maxdepth 1 -mindepth 1 '!' -path ./containerd '!' -path ./actionarchivecache '!' -path ./runner '!' -path ./runner-cache -exec rm -rf '{}' ';'
sudo rm -rf /usr/share/dotnet
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
# this might remove tools that are actually needed,
# if set to "true" but frees about 6 GB
tool-cache: false
# all of these default to true, but feel free to set to
# "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: false
swap-storage: true
- name: Check space
shell: bash
run: |
df -h
- name: Run the build process with Docker
uses: addnab/docker-run-action@v3
with:
image: k2fsa/icefall:${{ matrix.image }}
shell: bash
run: |
uname -a
cat /etc/*release
find / -name libcuda* 2>/dev/null
ls -lh /usr/local/
ls -lh /usr/local/cuda*
nvcc --version
ls -lh /usr/local/cuda-*/compat/*
# For torch1.9.0-cuda10.2
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/compat:$LD_LIBRARY_PATH
# For torch1.12.1-cuda11.3
export LD_LIBRARY_PATH=/usr/local/cuda-11.3/compat:$LD_LIBRARY_PATH
# For torch2.0.0-cuda11.7
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/compat:$LD_LIBRARY_PATH
# For torch2.1.0-cuda11.8
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/compat:$LD_LIBRARY_PATH
# For torch2.1.0-cuda12.1
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/compat:$LD_LIBRARY_PATH
which nvcc
cuda_dir=$(dirname $(which nvcc))
echo "cuda_dir: $cuda_dir"
find $cuda_dir -name libcuda.so*
echo "--------------------"
find / -name libcuda.so* 2>/dev/null
# for torch1.13.0-cuda11.6
if [ -e /opt/conda/lib/stubs/libcuda.so ]; then
cd /opt/conda/lib/stubs && ln -s libcuda.so libcuda.so.1 && cd -
export LD_LIBRARY_PATH=/opt/conda/lib/stubs:$LD_LIBRARY_PATH
fi
find / -name libcuda.so* 2>/dev/null
echo "LD_LIBRARY_PATH: $LD_LIBRARY_PATH"
python3 --version
which python3
python3 -m pip list
echo "----------torch----------"
python3 -m torch.utils.collect_env
echo "----------k2----------"
python3 -c "import k2; print(k2.__file__)"
python3 -c "import k2; print(k2.__dev_version__)"
python3 -m k2.version
echo "----------lhotse----------"
python3 -c "import lhotse; print(lhotse.__file__)"
python3 -c "import lhotse; print(lhotse.__version__)"
echo "----------kaldifeat----------"
python3 -c "import kaldifeat; print(kaldifeat.__file__)"
python3 -c "import kaldifeat; print(kaldifeat.__version__)"
echo "Test yesno recipe"
cd egs/yesno/ASR
./prepare.sh
./tdnn/train.py
./tdnn/decode.py

View File

@ -0,0 +1,128 @@
# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: run-gigaspeech-2022-05-13
# stateless transducer + k2 pruned rnnt-loss + reworked conformer
on:
push:
branches:
- master
pull_request:
types: [labeled]
schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"
workflow_dispatch:
concurrency:
group: run_gigaspeech_2022_05_13-${{ github.ref }}
cancel-in-progress: true
jobs:
run_gigaspeech_2022_05_13:
if: github.event_name == 'workflow_dispatch' || github.event.label.name == 'ready' || github.event.label.name == 'run-decode' || github.event_name == 'push' || github.event_name == 'schedule'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Download GigaSpeech dev/test dataset
shell: bash
run: |
sudo apt-get install -y -q git-lfs
.github/scripts/download-gigaspeech-dev-test-dataset.sh
- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
ln -s ~/tmp/giga-dev-dataset-fbank/data egs/gigaspeech/ASR/
ls -lh egs/gigaspeech/ASR/data/fbank
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/run-gigaspeech-pruned-transducer-stateless2-2022-05-12.sh
- name: Display decoding results for gigaspeech pruned_transducer_stateless2
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' || github.event.label.name == 'run-decode'
shell: bash
run: |
cd egs/gigaspeech/ASR/
tree ./pruned_transducer_stateless2/exp
sudo apt-get -qq install tree
cd pruned_transducer_stateless2
echo "results for pruned_transducer_stateless2"
echo "===greedy search==="
find exp/greedy_search -name "log-*" -exec grep -n --color "best for dev" {} + | sort -n -k2
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test" {} + | sort -n -k2
- name: Upload decoding results for gigaspeech pruned_transducer_stateless2
uses: actions/upload-artifact@v4
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' || github.event.label.name == 'run-decode'
with:
name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-latest-cpu-gigaspeech-pruned_transducer_stateless2-2022-05-12
path: egs/gigaspeech/ASR/pruned_transducer_stateless2/exp/

View File

@ -0,0 +1,136 @@
# Copyright 2022 Fangjun Kuang (csukuangfj@gmail.com)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: run-gigaspeech-zipformer-2023-10-17
# zipformer
on:
push:
branches:
- master
pull_request:
types: [labeled]
schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"
workflow_dispatch:
concurrency:
group: run_gigaspeech_2023_10_17_zipformer-${{ github.ref }}
cancel-in-progress: true
jobs:
run_gigaspeech_2023_10_17_zipformer:
if: github.event.label.name == 'zipformer' ||github.event.label.name == 'ready' || github.event.label.name == 'run-decode' || github.event_name == 'push' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
HF_TOKEN: ${{ secrets.HF_TOKEN }}
run: |
sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/run-gigaspeech-zipformer-2023-10-17.sh
- name: upload model to https://github.com/k2-fsa/sherpa-onnx
uses: svenstaro/upload-release-action@v2
with:
file_glob: true
file: ./*.tar.bz2
overwrite: true
repo_name: k2-fsa/sherpa-onnx
repo_token: ${{ secrets.UPLOAD_GH_SHERPA_ONNX_TOKEN }}
tag: asr-models
- name: Display decoding results for gigaspeech zipformer
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode' || github.event_name == 'workflow_dispatch'
shell: bash
run: |
cd egs/gigaspeech/ASR/
tree ./zipformer/exp
cd zipformer
echo "results for zipformer"
echo "===greedy search==="
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
# echo "===fast_beam_search==="
# find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
# find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
#
# echo "===modified beam search==="
# find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
# find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
- name: Upload decoding results for gigaspeech zipformer
uses: actions/upload-artifact@v4
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode' || github.event_name == 'workflow_dispatch'
with:
name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-latest-cpu-zipformer-2022-11-11
path: egs/gigaspeech/ASR/zipformer/exp/

View File

@ -0,0 +1,165 @@
name: run-librispeech-lstm-transducer2-2022-09-03
on:
push:
branches:
- master
pull_request:
types: [labeled]
schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"
workflow_dispatch:
concurrency:
group: run_librispeech_lstm_transducer_stateless2_2022_09_03-${{ github.ref }}
cancel-in-progress: true
jobs:
run_librispeech_lstm_transducer_stateless2_2022_09_03:
if: github.event.label.name == 'ready' || github.event.label.name == 'LODR' || github.event.label.name == 'shallow-fusion' || github.event_name == 'push' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Cache LibriSpeech test-clean and test-other datasets
id: libri-test-clean-and-test-other-data
uses: actions/cache@v2
with:
path: |
~/tmp/download
key: cache-libri-test-clean-and-test-other
- name: Download LibriSpeech test-clean and test-other
if: steps.libri-test-clean-and-test-other-data.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/download-librispeech-test-clean-and-test-other-dataset.sh
- name: Prepare manifests for LibriSpeech test-clean and test-other
shell: bash
run: |
.github/scripts/prepare-librispeech-test-clean-and-test-other-manifests.sh
- name: Cache LibriSpeech test-clean and test-other fbank features
id: libri-test-clean-and-test-other-fbank
uses: actions/cache@v2
with:
path: |
~/tmp/fbank-libri
key: cache-libri-fbank-test-clean-and-test-other-v2
- name: Compute fbank for LibriSpeech test-clean and test-other
if: steps.libri-test-clean-and-test-other-fbank.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/compute-fbank-librispeech-test-clean-and-test-other.sh
- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
mkdir -p egs/librispeech/ASR/data
ln -sfv ~/tmp/fbank-libri egs/librispeech/ASR/data/fbank
ls -lh egs/librispeech/ASR/data/*
sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/run-librispeech-lstm-transducer-stateless2-2022-09-03.sh
- name: Display decoding results for lstm_transducer_stateless2
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
shell: bash
run: |
cd egs/librispeech/ASR
tree lstm_transducer_stateless2/exp
cd lstm_transducer_stateless2/exp
echo "===greedy search==="
find greedy_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find greedy_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
echo "===fast_beam_search==="
find fast_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find fast_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
# echo "===modified beam search==="
# find modified_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
# find modified_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
- name: Display decoding results for lstm_transducer_stateless2
if: github.event.label.name == 'shallow-fusion'
shell: bash
run: |
cd egs/librispeech/ASR
tree lstm_transducer_stateless2/exp
cd lstm_transducer_stateless2/exp
echo "===modified_beam_search_lm_shallow_fusion==="
echo "===Using RNNLM==="
find modified_beam_search_lm_shallow_fusion -name "log-*rnn*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find modified_beam_search_lm_shallow_fusion -name "log-*rnn*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
- name: Display decoding results for lstm_transducer_stateless2
if: github.event.label.name == 'LODR'
shell: bash
run: |
cd egs/librispeech/ASR
tree lstm_transducer_stateless2/exp
cd lstm_transducer_stateless2/exp
echo "===modified_beam_search_rnnlm_LODR==="
find modified_beam_search_LODR -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find modified_beam_search_LODR -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2
- name: Upload decoding results for lstm_transducer_stateless2
uses: actions/upload-artifact@v4
if: github.event_name == 'schedule' || github.event.label.name == 'shallow-fusion' || github.event.label.name == 'LODR' || github.event_name == 'workflow_dispatch'
with:
name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-latest-cpu-lstm_transducer_stateless2-2022-09-03
path: egs/librispeech/ASR/lstm_transducer_stateless2/exp/

View File

@ -0,0 +1,86 @@
# Copyright 2023 Xiaomi Corp. (author: Zengrui Jin)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: run-multi-corpora-zipformer
on:
push:
branches:
- master
pull_request:
types: [labeled]
workflow_dispatch:
concurrency:
group: run_multi-corpora_zipformer-${{ github.ref }}
cancel-in-progress: true
jobs:
run_multi-corpora_zipformer:
if: github.event.label.name == 'onnx' || github.event.label.name == 'ready' || github.event_name == 'push' || github.event.label.name == 'multi-zh_hans' || github.event.label.name == 'zipformer' || github.event.label.name == 'multi-corpora'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/run-multi-corpora-zipformer.sh

73
.github/workflows/run-ptb-rnn-lm.yml vendored Normal file
View File

@ -0,0 +1,73 @@
name: run-ptb-rnn-lm-training
on:
push:
branches:
- master
pull_request:
types: [labeled]
schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"
workflow_dispatch:
concurrency:
group: run_ptb_rnn_lm_training-${{ github.ref }}
cancel-in-progress: true
jobs:
run_ptb_rnn_lm_training:
if: github.event.label.name == 'ready' || github.event.label.name == 'rnnlm' || github.event_name == 'push' || github.event_name == 'schedule'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: ["3.8"]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | grep -v kaldifst | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Prepare data
shell: bash
run: |
export PYTHONPATH=$PWD:$PYTHONPATH
cd egs/ptb/LM
./prepare.sh
- name: Run training
shell: bash
run: |
export PYTHONPATH=$PWD:$PYTHONPATH
cd egs/ptb/LM
./train-rnn-lm.sh --world-size 1 --num-epochs 5 --use-epoch 4 --use-avg 2
- name: Upload pretrained models
uses: actions/upload-artifact@v4
if: github.event.label.name == 'ready' || github.event.label.name == 'rnnlm' || github.event_name == 'push' || github.event_name == 'schedule'
with:
name: python-${{ matrix.python-version }}-ubuntu-rnn-lm-ptb
path: egs/ptb/LM/my-rnnlm-exp/

View File

@ -0,0 +1,86 @@
# Copyright 2023 Xiaomi Corp. (author: Zengrui Jin)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: run-swbd-conformer_ctc
on:
push:
branches:
- master
pull_request:
types: [labeled]
workflow_dispatch:
concurrency:
group: run-swbd-conformer_ctc-${{ github.ref }}
cancel-in-progress: true
jobs:
run-swbd-conformer_ctc:
if: github.event.label.name == 'onnx' || github.event.label.name == 'ready' || github.event_name == 'push' || github.event.label.name == 'swbd'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/run-swbd-conformer-ctc-2023-08-26.sh

View File

@ -0,0 +1,86 @@
# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: run-wenetspeech-pruned-transducer-stateless2
on:
push:
branches:
- master
pull_request:
types: [labeled]
workflow_dispatch:
concurrency:
group: run_wenetspeech_pruned_transducer_stateless2-${{ github.ref }}
cancel-in-progress: true
jobs:
run_wenetspeech_pruned_transducer_stateless2:
if: github.event.label.name == 'onnx' || github.event.label.name == 'ready' || github.event_name == 'push' || github.event.label.name == 'wenetspeech'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/run-wenetspeech-pruned-transducer-stateless2.sh

View File

@ -1,78 +0,0 @@
# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: run-yesno-recipe
on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
run-yesno-recipe:
runs-on: ${{ matrix.os }}
strategy:
matrix:
# os: [ubuntu-18.04, macos-10.15]
# TODO: enable macOS for CPU testing
os: [ubuntu-18.04]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
- name: Install libnsdfile and libsox
if: startsWith(matrix.os, 'ubuntu')
run: |
sudo apt update
sudo apt install -q -y libsndfile1-dev libsndfile1 ffmpeg
sudo apt install -q -y --fix-missing sox libsox-dev libsox-fmt-all
- name: Install Python dependencies
run: |
python3 -m pip install --upgrade pip black flake8
python3 -m pip install -U pip
python3 -m pip install k2==1.4.dev20210822+cpu.torch1.7.1 -f https://k2-fsa.org/nightly/
python3 -m pip install torchaudio==0.7.2
python3 -m pip install git+https://github.com/lhotse-speech/lhotse
# We are in ./icefall and there is a file: requirements.txt in it
python3 -m pip install -r requirements.txt
- name: Run yesno recipe
shell: bash
working-directory: ${{github.workspace}}
run: |
export PYTHONPATH=$PWD:$PYTHONPATH
echo $PYTHONPATH
cd egs/yesno/ASR
./prepare.sh
python3 ./tdnn/train.py
python3 ./tdnn/decode.py
# TODO: Check that the WER is less than some value

View File

@ -24,13 +24,19 @@ on:
branches: branches:
- master - master
workflow_dispatch:
concurrency:
group: style_check-${{ github.ref }}
cancel-in-progress: true
jobs: jobs:
style_check: style_check:
runs-on: ${{ matrix.os }} runs-on: ${{ matrix.os }}
strategy: strategy:
matrix: matrix:
os: [ubuntu-18.04, macos-10.15] os: [ubuntu-latest]
python-version: [3.7, 3.9] python-version: ["3.10"]
fail-fast: false fail-fast: false
steps: steps:
@ -45,18 +51,27 @@ jobs:
- name: Install Python dependencies - name: Install Python dependencies
run: | run: |
python3 -m pip install --upgrade pip black==21.6b0 flake8==3.9.2 python3 -m pip install --upgrade pip black==22.3.0 flake8==5.0.4 click==8.1.0 isort==5.10.1
# Click issue fixed in https://github.com/psf/black/pull/2966
- name: Run flake8 - name: Run flake8
shell: bash shell: bash
working-directory: ${{github.workspace}} working-directory: ${{github.workspace}}
run: | run: |
# stop the build if there are Python syntax errors or undefined names # stop the build if there are Python syntax errors or undefined names
flake8 . --count --show-source --statistics flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 \
--statistics --extend-ignore=E203,E266,E501,F401,E402,F403,F841,W503
- name: Run black - name: Run black
shell: bash shell: bash
working-directory: ${{github.workspace}} working-directory: ${{github.workspace}}
run: | run: |
black --check --diff . black --check --diff .
- name: Run isort
shell: bash
working-directory: ${{github.workspace}}
run: |
isort --check --diff .

77
.github/workflows/test-ncnn-export.yml vendored Normal file
View File

@ -0,0 +1,77 @@
name: test-ncnn-export
on:
push:
branches:
- master
pull_request:
types: [labeled]
schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"
workflow_dispatch:
concurrency:
group: test_ncnn_export-${{ github.ref }}
cancel-in-progress: true
jobs:
test_ncnn_export:
if: github.event.label.name == 'ready' || github.event.label.name == 'ncnn' || github.event_name == 'push' || github.event_name == 'schedule'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Test ncnn export
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/test-ncnn-export.sh

77
.github/workflows/test-onnx-export.yml vendored Normal file
View File

@ -0,0 +1,77 @@
name: test-onnx-export
on:
push:
branches:
- master
pull_request:
types: [labeled]
schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"
workflow_dispatch:
concurrency:
group: test_onnx_export-${{ github.ref }}
cancel-in-progress: true
jobs:
test_onnx_export:
if: github.event.label.name == 'ready' || github.event.label.name == 'onnx' || github.event_name == 'push' || github.event_name == 'schedule'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]
fail-fast: false
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'
- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*
- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22
- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh
- name: Test ONNX export
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH
.github/scripts/test-onnx-export.sh

View File

@ -1,71 +1,111 @@
# Copyright 2021 Fangjun Kuang (csukuangfj@gmail.com)
# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: test name: test
on: on:
push: push:
branches: branches:
- master - master
pull_request: pull_request:
branches: branches:
- master - master
workflow_dispatch:
concurrency:
group: test-${{ github.ref }}
cancel-in-progress: true
jobs: jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
echo "::set-output name=matrix::${MATRIX}"
test: test:
runs-on: ${{ matrix.os }} needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy: strategy:
matrix:
os: [ubuntu-18.04, macos-10.15]
python-version: [3.6, 3.7, 3.8, 3.9]
torch: ["1.8.1"]
k2-version: ["1.4.dev20210822"]
fail-fast: false fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps: steps:
- uses: actions/checkout@v2 - uses: actions/checkout@v4
with: with:
fetch-depth: 0 fetch-depth: 0
- name: Setup Python ${{ matrix.python-version }} - name: Free space
uses: actions/setup-python@v1 shell: bash
run: |
df -h
rm -rf /opt/hostedtoolcache
df -h
echo "pwd: $PWD"
echo "github.workspace ${{ github.workspace }}"
- name: Run tests
uses: addnab/docker-run-action@v3
with: with:
python-version: ${{ matrix.python-version }} image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
- name: Install Python dependencies pytest -v -s ./test
run: |
python3 -m pip install --upgrade pip pytest
pip install k2==${{ matrix.k2-version }}+cpu.torch${{ matrix.torch }} -f https://k2-fsa.org/nightly/
# icefall requirements
pip install -r requirements.txt
- name: Run tests # runt tests for conformer ctc
if: startsWith(matrix.os, 'ubuntu') cd egs/librispeech/ASR/conformer_ctc
run: | pytest -v -s
ls -lh
export PYTHONPATH=$PWD:$PWD/lhotse:$PYTHONPATH
echo $PYTHONPATH
pytest ./test
- name: Run tests cd ../pruned_transducer_stateless
if: startsWith(matrix.os, 'macos') pytest -v -s
run: |
ls -lh cd ../pruned_transducer_stateless2
export PYTHONPATH=$PWD:$PWD/lhotse:$PYTHONPATH pytest -v -s
lib_path=$(python -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())")
echo "lib_path: $lib_path" cd ../pruned_transducer_stateless3
export DYLD_LIBRARY_PATH=$lib_path:$DYLD_LIBRARY_PATH pytest -v -s
pytest ./test
cd ../pruned_transducer_stateless4
pytest -v -s
echo $PYTHONPATH
cd ../pruned_transducer_stateless7
pytest -v -s
cd ../transducer_stateless
pytest -v -s
# cd ../transducer
# pytest -v -s
cd ../transducer_stateless2
pytest -v -s
cd ../transducer_lstm
pytest -v -s
cd ../zipformer
pytest -v -s
- uses: actions/upload-artifact@v4
with:
path: egs/librispeech/ASR/zipformer/swoosh.pdf
name: swoosh-${{ matrix.python-version }}-${{ matrix.torch-version }}

67
.github/workflows/yesno.yml vendored Normal file
View File

@ -0,0 +1,67 @@
name: yesno
on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:
concurrency:
group: yesno-${{ github.ref }}
cancel-in-progress: true
jobs:
generate_build_matrix:
if: github.repository_owner == 'csukuangfj' || github.repository_owner == 'k2-fsa'
# see https://github.com/pytorch/pytorch/pull/50633
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generating build matrix
id: set-matrix
run: |
# outputting for debugging purposes
python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10"
MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10")
# MATRIX=$(python ./.github/scripts/docker/generate_build_matrix.py --python-version "3.10" --min-torch-version "2.5.0")
echo "::set-output name=matrix::${MATRIX}"
yesno:
needs: generate_build_matrix
name: py${{ matrix.python-version }} torch${{ matrix.torch-version }} v${{ matrix.version }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
${{ fromJson(needs.generate_build_matrix.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run the yesno recipe
uses: addnab/docker-run-action@v3
with:
image: ghcr.io/${{ github.repository_owner }}/icefall:cpu-py${{ matrix.python-version }}-torch${{ matrix.torch-version }}-v${{ matrix.version }}
options: |
--volume ${{ github.workspace }}/:/icefall
shell: bash
run: |
export PYTHONPATH=/icefall:$PYTHONPATH
cd /icefall
git config --global --add safe.directory /icefall
python3 -m torch.utils.collect_env
python3 -m k2.version
pip list
.github/scripts/yesno/ASR/run.sh

33
.gitignore vendored
View File

@ -1,7 +1,38 @@
icefall.egg-info/
data data
__pycache__ __pycache__
path.sh path.sh
exp exp
exp*/ exp*/
*.pt *.pt
download/ download
dask-worker-space
log
*.bak
*-bak
*bak.py
# Ignore Mac system files
.DS_store
# Ignore node_modules folder
node_modules
# ignore .nfs
.nfs*
# Ignore all text files
*.txt
# Ignore files related to API keys
.env
# Ignore SASS config files
.sass-cache
*.param
*.bin
.DS_Store
*.fst
*.arpa

View File

@ -1,24 +1,38 @@
repos: repos:
- repo: https://github.com/psf/black - repo: https://github.com/psf/black
rev: 21.6b0 rev: 22.3.0
hooks: hooks:
- id: black - id: black
args: [--line-length=80] args: ["--line-length=88"]
additional_dependencies: ['click==8.1.0']
exclude: icefall\/__init__\.py
- repo: https://github.com/PyCQA/flake8 - repo: https://github.com/PyCQA/flake8
rev: 3.9.2 rev: 5.0.4
hooks: hooks:
- id: flake8 - id: flake8
args: [--max-line-length=80] args: ["--max-line-length=88", "--extend-ignore=E203,E266,E501,F401,E402,F403,F841,W503"]
# What are we ignoring here?
# E203: whitespace before ':'
# E266: too many leading '#' for block comment
# E501: line too long
# F401: module imported but unused
# E402: module level import not at top of file
# F403: 'from module import *' used; unable to detect undefined names
# F841: local variable is assigned to but never used
# W503: line break before binary operator
# In addition, the default ignore list is:
# E121,E123,E126,E226,E24,E704,W503,W504
- repo: https://github.com/pycqa/isort - repo: https://github.com/pycqa/isort
rev: 5.9.2 rev: 5.12.0
hooks: hooks:
- id: isort - id: isort
args: [--profile=black, --line-length=80] args: ["--profile=black"]
- repo: https://github.com/pre-commit/pre-commit-hooks - repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1 rev: v4.2.0
hooks: hooks:
- id: check-executables-have-shebangs - id: check-executables-have-shebangs
- id: end-of-file-fixer - id: end-of-file-fixer

View File

@ -1,13 +1,4 @@
Legal Notices
NOTE (this is not from the Apache License): The copyright model is that
authors (or their employers, if noted in individual files) own their
individual contributions. The authors' contributions can be discerned
from the git history.
-------------------------------------------------------------------------
Apache License Apache License
Version 2.0, January 2004 Version 2.0, January 2004
http://www.apache.org/licenses/ http://www.apache.org/licenses/

384
README.md
View File

@ -2,22 +2,85 @@
<img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168> <img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168>
</div> </div>
## Installation # Introduction
Please refer to <https://icefall.readthedocs.io/en/latest/installation/index.html> The icefall project contains speech-related recipes for various datasets
using [k2-fsa](https://github.com/k2-fsa/k2) and [lhotse](https://github.com/lhotse-speech/lhotse).
You can use [sherpa](https://github.com/k2-fsa/sherpa), [sherpa-ncnn](https://github.com/k2-fsa/sherpa-ncnn) or [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for deployment with models
in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
You can try pre-trained models from within your browser without the need
to download or install anything by visiting this [huggingface space](https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition).
Please refer to [document](https://k2-fsa.github.io/icefall/huggingface/spaces.html) for more details.
# Installation
Please refer to [document](https://k2-fsa.github.io/icefall/installation/index.html)
for installation. for installation.
## Recipes # Recipes
Please refer to <https://icefall.readthedocs.io/en/latest/recipes/index.html> Please refer to [document](https://k2-fsa.github.io/icefall/recipes/index.html)
for more information. for more details.
We provide two recipes at present: ## ASR: Automatic Speech Recognition
### Supported Datasets
- [yesno][yesno] - [yesno][yesno]
- [LibriSpeech][librispeech]
### yesno - [Aidatatang_200zh][aidatatang_200zh]
- [Aishell][aishell]
- [Aishell2][aishell2]
- [Aishell4][aishell4]
- [Alimeeting][alimeeting]
- [AMI][ami]
- [CommonVoice][commonvoice]
- [Corpus of Spontaneous Japanese][csj]
- [GigaSpeech][gigaspeech]
- [LibriCSS][libricss]
- [LibriSpeech][librispeech]
- [Libriheavy][libriheavy]
- [Multi-Dialect Broadcast News Arabic Speech Recognition][mgb2]
- [SPGISpeech][spgispeech]
- [Switchboard][swbd]
- [TIMIT][timit]
- [TED-LIUM3][tedlium3]
- [TAL_CSASR][tal_csasr]
- [Voxpopuli][voxpopuli]
- [XBMU-AMDO31][xbmu-amdo31]
- [WenetSpeech][wenetspeech]
More datasets will be added in the future.
### Supported Models
The [LibriSpeech][librispeech] recipe supports the most comprehensive set of models, you are welcome to try them out.
#### CTC
- TDNN LSTM CTC
- Conformer CTC
- Zipformer CTC
#### MMI
- Conformer MMI
- Zipformer MMI
#### Transducer
- Conformer-based Encoder
- LSTM-based Encoder
- Zipformer-based Encoder
- LSTM-based Predictor
- [Stateless Predictor](https://research.google/pubs/rnn-transducer-with-stateless-prediction-network/)
#### Whisper
- [OpenAi Whisper](https://arxiv.org/abs/2212.04356) (We support fine-tuning on AiShell-1.)
If you are willing to contribute to icefall, please refer to [contributing](https://k2-fsa.github.io/icefall/contributing/index.html) for more details.
We would like to highlight the performance of some of the recipes here.
### [yesno][yesno]
This is the simplest ASR recipe in `icefall` and can be run on CPU. This is the simplest ASR recipe in `icefall` and can be run on CPU.
Training takes less than 30 seconds and gives you the following WER: Training takes less than 30 seconds and gives you the following WER:
@ -25,37 +88,302 @@ Training takes less than 30 seconds and gives you the following WER:
``` ```
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ] [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
``` ```
We do provide a Colab notebook for this recipe. We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
### LibriSpeech ### [LibriSpeech][librispeech]
We provide two models for this recipe: [conformer CTC model][LibriSpeech_conformer_ctc] Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)
and [TDNN LSTM CTC model][LibriSpeech_tdnn_lstm_ctc]. for the **latest** results.
#### Conformer CTC Model #### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc)
The best WER we currently have is: | | test-clean | test-other |
|-----|------------|------------|
| WER | 2.42 | 5.73 |
||test-clean|test-other|
|--|--|--|
|WER| 2.57% | 5.94% |
We provide a Colab notebook to run a pre-trained conformer CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing) We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
#### TDNN LSTM CTC Model #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc)
The WER for this model is: | | test-clean | test-other |
|-----|------------|------------|
| WER | 6.59 | 17.69 |
||test-clean|test-other| We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
|--|--|--|
|WER| 6.59% | 17.69% |
#### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
| | test-clean | test-other |
|---------------|------------|------------|
| greedy_search | 3.07 | 7.51 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
| | test-clean | test-other |
|---------------------------------------|------------|------------|
| modified_beam_search (`beam_size=4`) | 2.56 | 6.27 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer)
WER (modified_beam_search `beam_size=4` unless further stated)
1. LibriSpeech-960hr
| Encoder | Params | test-clean | test-other | epochs | devices |
|-----------------|--------|------------|------------|---------|------------|
| Zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
| Zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
| Zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
| Zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
2. LibriSpeech-960hr + GigaSpeech
| Encoder | Params | test-clean | test-other |
|-----------------|--------|------------|------------|
| Zipformer | 65.5M | 1.78 | 4.08 |
3. LibriSpeech-960hr + GigaSpeech + CommonVoice
| Encoder | Params | test-clean | test-other |
|-----------------|--------|------------|------------|
| Zipformer | 65.5M | 1.90 | 3.98 |
### [GigaSpeech][gigaspeech]
#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/conformer_ctc)
| | Dev | Test |
|-----|-------|-------|
| WER | 10.47 | 10.58 |
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/pruned_transducer_stateless2)
Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
| | Dev | Test |
|----------------------|-------|-------|
| greedy_search | 10.51 | 10.73 |
| fast_beam_search | 10.50 | 10.69 |
| modified_beam_search | 10.40 | 10.51 |
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/zipformer)
| | Dev | Test |
|----------------------|-------|-------|
| greedy_search | 10.31 | 10.50 |
| fast_beam_search | 10.26 | 10.48 |
| modified_beam_search | 10.25 | 10.38 |
### [Aishell][aishell]
#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/tdnn_lstm_ctc)
| | test |
|-----|-------|
| CER | 10.16 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/transducer_stateless)
| | test |
|-----|------|
| CER | 4.38 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/zipformer)
WER (modified_beam_search `beam_size=4`)
| Encoder | Params | dev | test | epochs |
|-----------------|--------|-----|------|---------|
| Zipformer | 73.4M | 4.13| 4.40 | 55 |
| Zipformer-small | 30.2M | 4.40| 4.67 | 55 |
| Zipformer-large | 157.3M | 4.03| 4.28 | 56 |
### [Aishell4][aishell4]
#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell4/ASR/pruned_transducer_stateless5)
1 Trained with all subsets:
| | test |
|-----|------------|
| CER | 29.08 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
### [TIMIT][timit]
#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_lstm_ctc)
| |TEST|
|---|----|
|PER| 19.71% |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
#### [TDNN LiGRU CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_ligru_ctc)
| |TEST|
|---|----|
|PER| 17.66% |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
### [TED-LIUM3][tedlium3]
#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless)
| | dev | test |
|--------------------------------------|-------|--------|
| modified_beam_search (`beam_size=4`) | 6.91 | 6.33 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
#### [Transducer (pruned_transducer_stateless)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/pruned_transducer_stateless)
| | dev | test |
|--------------------------------------|-------|--------|
| modified_beam_search (`beam_size=4`) | 6.77 | 6.14 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
### [Aidatatang_200zh][aidatatang_200zh]
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/aidatatang_200zh/ASR/pruned_transducer_stateless2)
| | Dev | Test |
|----------------------|-------|-------|
| greedy_search | 5.53 | 6.59 |
| fast_beam_search | 5.30 | 6.34 |
| modified_beam_search | 5.27 | 6.33 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
### [WenetSpeech][wenetspeech]
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2)
| | Dev | Test-Net | Test-Meeting |
|----------------------|-------|----------|--------------|
| greedy_search | 7.80 | 8.75 | 13.49 |
| fast_beam_search | 7.94 | 8.74 | 13.80 |
| modified_beam_search | 7.76 | 8.71 | 13.41 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
#### [Transducer **Streaming** (pruned_transducer_stateless5) ](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5)
| | Dev | Test-Net | Test-Meeting |
|----------------------|-------|----------|--------------|
| greedy_search | 8.78 | 10.12 | 16.16 |
| fast_beam_search| 9.01 | 10.47 | 16.28 |
| modified_beam_search | 8.53| 9.95 | 15.81 |
### [Alimeeting][alimeeting]
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/alimeeting/ASR/pruned_transducer_stateless2)
| | Eval | Test-Net |
|----------------------|--------|----------|
| greedy_search | 31.77 | 34.66 |
| fast_beam_search | 31.39 | 33.02 |
| modified_beam_search | 30.38 | 34.25 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
### [TAL_CSASR][tal_csasr]
#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5)
The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
|decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
|--|--|--|--|--|--|--|
|greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13|
|fast_beam_search| 7.18 | 6.39| 18.90 | 7.27| 6.55 | 18.77|
|modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
## TTS: Text-to-Speech
### Supported Datasets
- [LJSpeech][ljspeech]
- [VCTK][vctk]
- [LibriTTS][libritts_tts]
### Supported Models
- [VITS](https://arxiv.org/abs/2106.06103)
# Deployment with C++
Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
Please refer to
- https://k2-fsa.github.io/icefall/model-export/export-with-torch-jit-script.html
- https://k2-fsa.github.io/icefall/model-export/export-onnx.html
- https://k2-fsa.github.io/icefall/model-export/export-ncnn.html
for how to do this.
We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++.
Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing)
We provide a Colab notebook to run a pre-trained TDNN LSTM CTC model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kNmDXNMwREi0rZGAOIAOJo93REBuOTcd?usp=sharing)
[LibriSpeech_tdnn_lstm_ctc]: egs/librispeech/ASR/tdnn_lstm_ctc
[LibriSpeech_conformer_ctc]: egs/librispeech/ASR/conformer_ctc
[yesno]: egs/yesno/ASR [yesno]: egs/yesno/ASR
[librispeech]: egs/librispeech/ASR [librispeech]: egs/librispeech/ASR
[aishell]: egs/aishell/ASR
[aishell2]: egs/aishell2/ASR
[aishell4]: egs/aishell4/ASR
[timit]: egs/timit/ASR
[tedlium3]: egs/tedlium3/ASR
[gigaspeech]: egs/gigaspeech/ASR
[aidatatang_200zh]: egs/aidatatang_200zh/ASR
[wenetspeech]: egs/wenetspeech/ASR
[alimeeting]: egs/alimeeting/ASR
[tal_csasr]: egs/tal_csasr/ASR
[ami]: egs/ami
[swbd]: egs/swbd/ASR
[k2]: https://github.com/k2-fsa/k2
[commonvoice]: egs/commonvoice/ASR
[csj]: egs/csj/ASR
[libricss]: egs/libricss/SURT
[libritts_asr]: egs/libritts/ASR
[libriheavy]: egs/libriheavy/ASR
[mgb2]: egs/mgb2/ASR
[spgispeech]: egs/spgispeech/ASR
[voxpopuli]: egs/voxpopuli/ASR
[xbmu-amdo31]: egs/xbmu-amdo31/ASR
[vctk]: egs/vctk/TTS
[ljspeech]: egs/ljspeech/TTS
[libritts_tts]: egs/libritts/TTS
## Acknowledgements
Some contributors to this project were supported by Xiaomi Corporation. Others were supported by National Science Foundation CCRI award 2120435. This is not an exhaustive list of sources of support.

View File

@ -1,39 +1,37 @@
# Contributing to Our Project
## Pre-commit hooks Thank you for your interest in contributing to our project! We use Git pre-commit hooks to ensure code quality and consistency. Before contributing, please follow these guidelines to enable and use the pre-commit hooks.
We use [git][git] [pre-commit][pre-commit] [hooks][hooks] to check that files ## Pre-Commit Hooks
going to be committed:
- contain no trailing spaces We have set up pre-commit hooks to check that the files you're committing meet our coding and formatting standards. These checks include:
- are formatted with [black][black]
- are compatible to [PEP8][PEP8] (checked by [flake8][flake8])
- end in a newline and only a newline
- contain sorted `imports` (checked by [isort][isort])
These hooks are disabled by default. Please use the following commands to enable them: - Ensuring there are no trailing spaces.
- Formatting code with [black](https://github.com/psf/black).
- Checking compliance with PEP8 using [flake8](https://flake8.pycqa.org/).
- Verifying that files end with a newline character (and only a newline).
- Sorting imports using [isort](https://pycqa.github.io/isort/).
```bash Please note that these hooks are disabled by default. To enable them, follow these steps:
pip install pre-commit # run it only once
pre-commit install # run it only once, it will install all hooks
# modify some files ### Installation (Run only once)
git add <some files>
git commit # It runs all hooks automatically.
# If all hooks run successfully, you can write the commit message now. Done! 1. Install the `pre-commit` package using pip:
# ```bash
# If any hook failed, your commit was not successful. pip install pre-commit
# Please read the error messages and make changes accordingly. ```
# And rerun 1. Install the Git hooks using:
```bash
pre-commit install
```
### Making a Commit
Once you have enabled the pre-commit hooks, follow these steps when making a commit:
1. Make your changes to the codebase.
2. Stage your changes by using git add for the files you modified.
3. Commit your changes using git commit. The pre-commit hooks will run automatically at this point.
4. If all hooks run successfully, you can write your commit message, and your changes will be successfully committed.
5. If any hook fails, your commit will not be successful. Please read and follow the error messages provided, make the necessary changes, and then re-run git add and git commit.
git add <some files> ### Your Contribution
git commit Your contributions are valuable to us, and by following these guidelines, you help maintain code consistency and quality in our project. We appreciate your dedication to ensuring high-quality code. If you have questions or need assistance, feel free to reach out to us. Thank you for being part of our open-source community!
```
[git]: https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks
[flake8]: https://github.com/PyCQA/flake8
[PEP8]: https://www.python.org/dev/peps/pep-0008/
[black]: https://github.com/psf/black
[hooks]: https://github.com/pre-commit/pre-commit-hooks
[pre-commit]: https://github.com/pre-commit/pre-commit
[isort]: https://github.com/PyCQA/isort

129
docker/README.md Normal file
View File

@ -0,0 +1,129 @@
# icefall dockerfile
## Download from dockerhub
You can find pre-built docker image for icefall at the following address:
<https://hub.docker.com/r/k2fsa/icefall/tags>
Example usage:
```bash
docker run --gpus all --rm -it k2fsa/icefall:torch1.13.0-cuda11.6 /bin/bash
```
## Build from dockerfile
2 sets of configuration are provided - (a) Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8, and (b) Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8.
If your NVIDIA driver supports CUDA Version: 11.3, please go for case (a) Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8.
Otherwise, since the older PyTorch images are not updated with the [apt-key rotation by NVIDIA](https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key), you have to go for case (b) Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8. Ensure that your NVDIA driver supports at least CUDA 11.0.
You can check the highest CUDA version within your NVIDIA driver's support with the `nvidia-smi` command below. In this example, the highest CUDA version is 11.0, i.e. case (b) Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8.
```bash
$ nvidia-smi
Tue Sep 20 00:26:13 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03 Driver Version: 450.119.03 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN RTX On | 00000000:03:00.0 Off | N/A |
| 41% 31C P8 4W / 280W | 16MiB / 24219MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 TITAN RTX On | 00000000:04:00.0 Off | N/A |
| 41% 30C P8 11W / 280W | 6MiB / 24220MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2085 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 2240 G /usr/bin/gnome-shell 4MiB |
| 1 N/A N/A 2085 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
```
## Building images locally
If your environment requires a proxy to access the Internet, remember to add those information into the Dockerfile directly.
For most cases, you can uncomment these lines in the Dockerfile and add in your proxy details.
```dockerfile
ENV http_proxy=http://aaa.bb.cc.net:8080 \
https_proxy=http://aaa.bb.cc.net:8080
```
Then, proceed with these commands.
### If you are case (a), i.e. your NVIDIA driver supports CUDA version >= 11.3:
```bash
cd docker/Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8
docker build -t icefall/pytorch1.12.1 .
```
### If you are case (b), i.e. your NVIDIA driver can only support CUDA versions 11.0 <= x < 11.3:
```bash
cd docker/Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8
docker build -t icefall/pytorch1.7.1 .
```
## Running your built local image
Sample usage of the GPU based images. These commands are written with case (a) in mind, so please make the necessary changes to your image name if you are case (b).
Note: use [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to run the GPU images.
```bash
docker run -it --runtime=nvidia --shm-size=2gb --name=icefall --gpus all icefall/pytorch1.12.1
```
### Tips:
1. Since your data and models most probably won't be in the docker, you must use the -v flag to access the host machine. Do this by specifying `-v {/path/in/host/machine}:{/path/in/docker}`.
2. Also, if your environment requires a proxy, this would be a good time to add it in too: `-e http_proxy=http://aaa.bb.cc.net:8080 -e https_proxy=http://aaa.bb.cc.net:8080`.
Overall, your docker run command should look like this.
```bash
docker run -it --runtime=nvidia --shm-size=2gb --name=icefall --gpus all -v {/path/in/host/machine}:{/path/in/docker} -e http_proxy=http://aaa.bb.cc.net:8080 -e https_proxy=http://aaa.bb.cc.net:8080 icefall/pytorch1.12.1
```
You can explore more docker run options [here](https://docs.docker.com/engine/reference/commandline/run/) to suit your environment.
### Linking to icefall in your host machine
If you already have icefall downloaded onto your host machine, you can use that repository instead so that changes in your code are visible inside and outside of the container.
Note: Remember to set the -v flag above during the first run of the container, as that is the only way for your container to access your host machine.
Warning: Check that the icefall in your host machine is visible from within your container before proceeding to the commands below.
Use these commands once you are inside the container.
```bash
rm -r /workspace/icefall
ln -s {/path/in/docker/to/icefall} /workspace/icefall
```
## Starting another session in the same running container.
```bash
docker exec -it icefall /bin/bash
```
## Restarting a killed container that has been run before.
```bash
docker start -ai icefall
```
## Sample usage of the CPU based images:
```bash
docker run -it icefall /bin/bash
```

View File

@ -0,0 +1,74 @@
FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel
# ENV http_proxy=http://aaa.bbb.cc.net:8080 \
# https_proxy=http://aaa.bbb.cc.net:8080
# install normal source
RUN apt-get update && \
apt-get install -y --no-install-recommends \
g++ \
make \
automake \
autoconf \
bzip2 \
unzip \
wget \
sox \
libtool \
git \
subversion \
zlib1g-dev \
gfortran \
ca-certificates \
patch \
ffmpeg \
valgrind \
libssl-dev \
vim \
curl
# cmake
RUN wget -P /opt https://cmake.org/files/v3.18/cmake-3.18.0.tar.gz && \
cd /opt && \
tar -zxvf cmake-3.18.0.tar.gz && \
cd cmake-3.18.0 && \
./bootstrap && \
make && \
make install && \
rm -rf cmake-3.18.0.tar.gz && \
find /opt/cmake-3.18.0 -type f \( -name "*.o" -o -name "*.la" -o -name "*.a" \) -exec rm {} \; && \
cd -
# flac
RUN wget -P /opt https://downloads.xiph.org/releases/flac/flac-1.3.2.tar.xz && \
cd /opt && \
xz -d flac-1.3.2.tar.xz && \
tar -xvf flac-1.3.2.tar && \
cd flac-1.3.2 && \
./configure && \
make && make install && \
rm -rf flac-1.3.2.tar && \
find /opt/flac-1.3.2 -type f \( -name "*.o" -o -name "*.la" -o -name "*.a" \) -exec rm {} \; && \
cd -
RUN conda install -y -c pytorch torchaudio=0.12 && \
pip install graphviz
#install k2 from source
RUN git clone https://github.com/k2-fsa/k2.git /opt/k2 && \
cd /opt/k2 && \
python3 setup.py install && \
cd -
# install lhotse
RUN pip install git+https://github.com/lhotse-speech/lhotse
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install -r requirements.txt
RUN pip install kaldifeat
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,90 @@
FROM pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
# ENV http_proxy=http://aaa.bbb.cc.net:8080 \
# https_proxy=http://aaa.bbb.cc.net:8080
RUN rm /etc/apt/sources.list.d/cuda.list && \
rm /etc/apt/sources.list.d/nvidia-ml.list && \
apt-key del 7fa2af80
# install normal source
RUN apt-get update && \
apt-get install -y --no-install-recommends \
g++ \
make \
automake \
autoconf \
bzip2 \
unzip \
wget \
sox \
libtool \
git \
subversion \
zlib1g-dev \
gfortran \
ca-certificates \
patch \
ffmpeg \
valgrind \
libssl-dev \
vim \
curl
# Add new keys and reupdate
RUN curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub | apt-key add - && \
curl -fsSL https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub | apt-key add - && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \
echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \
rm -rf /var/lib/apt/lists/* && \
mv /opt/conda/lib/libcufft.so.10 /opt/libcufft.so.10.bak && \
mv /opt/conda/lib/libcurand.so.10 /opt/libcurand.so.10.bak && \
mv /opt/conda/lib/libcublas.so.11 /opt/libcublas.so.11.bak && \
mv /opt/conda/lib/libnvrtc.so.11.0 /opt/libnvrtc.so.11.1.bak && \
# mv /opt/conda/lib/libnvToolsExt.so.1 /opt/libnvToolsExt.so.1.bak && \
mv /opt/conda/lib/libcudart.so.11.0 /opt/libcudart.so.11.0.bak && \
apt-get update && apt-get -y upgrade
# cmake
RUN wget -P /opt https://cmake.org/files/v3.18/cmake-3.18.0.tar.gz && \
cd /opt && \
tar -zxvf cmake-3.18.0.tar.gz && \
cd cmake-3.18.0 && \
./bootstrap && \
make && \
make install && \
rm -rf cmake-3.18.0.tar.gz && \
find /opt/cmake-3.18.0 -type f \( -name "*.o" -o -name "*.la" -o -name "*.a" \) -exec rm {} \; && \
cd -
# flac
RUN wget -P /opt https://downloads.xiph.org/releases/flac/flac-1.3.2.tar.xz && \
cd /opt && \
xz -d flac-1.3.2.tar.xz && \
tar -xvf flac-1.3.2.tar && \
cd flac-1.3.2 && \
./configure && \
make && make install && \
rm -rf flac-1.3.2.tar && \
find /opt/flac-1.3.2 -type f \( -name "*.o" -o -name "*.la" -o -name "*.a" \) -exec rm {} \; && \
cd -
RUN conda install -y -c pytorch torchaudio=0.7.1 && \
pip install graphviz
#install k2 from source
RUN git clone https://github.com/k2-fsa/k2.git /opt/k2 && \
cd /opt/k2 && \
python3 setup.py install && \
cd -
# install lhotse
RUN pip install git+https://github.com/lhotse-speech/lhotse
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,72 @@
FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-devel
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.7
ARG K2_VERSION="1.24.4.dev20240223+cuda11.3.torch1.12.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda11.3.torch1.12.1"
ARG TORCHAUDIO_VERSION="0.12.1+cu113"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torch_stable.html \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,74 @@
FROM pytorch/pytorch:1.13.0-cuda11.6-cudnn8-runtime
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.9
ARG K2_VERSION="1.24.4.dev20240223+cuda11.6.torch1.13.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda11.6.torch1.13.0"
ARG TORCHAUDIO_VERSION="0.13.0+cu116"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torch_stable.html \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
ENV LD_LIBRARY_PATH /opt/conda/lib/stubs:$LD_LIBRARY_PATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,88 @@
FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.7
ARG K2_VERSION="1.24.4.dev20240223+cuda10.2.torch1.9.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda10.2.torch1.9.0"
ARG TORCHAUDIO_VERSION="0.9.0"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
# see https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/
RUN rm /etc/apt/sources.list.d/cuda.list && \
rm /etc/apt/sources.list.d/nvidia-ml.list && \
apt-key del 7fa2af80
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb && \
dpkg -i cuda-keyring_1.0-1_all.deb && \
rm -v cuda-keyring_1.0-1_all.deb && \
apt-get update && \
rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip uninstall -y tqdm && \
pip install -U --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torch_stable.html \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz \
tqdm>=4.63.0
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda11.7.torch2.0.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda11.7.torch2.0.0"
ARG TORCHAUDIO_VERSION="2.0.0+cu117"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda11.8.torch2.1.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda11.8.torch2.1.0"
ARG TORCHAUDIO_VERSION="2.1.0+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda12.1.torch2.1.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda12.1.torch2.1.0"
ARG TORCHAUDIO_VERSION="2.1.0+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.2.0-cuda11.8-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda11.8.torch2.2.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda11.8.torch2.2.0"
ARG TORCHAUDIO_VERSION="2.2.0+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda12.1.torch2.2.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda12.1.torch2.2.0"
ARG TORCHAUDIO_VERSION="2.2.0+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.2.1-cuda11.8-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda11.8.torch2.2.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda11.8.torch2.2.1"
ARG TORCHAUDIO_VERSION="2.2.1+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240223+cuda12.1.torch2.2.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240223+cuda12.1.torch2.2.1"
ARG TORCHAUDIO_VERSION="2.2.1+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.2.2-cuda11.8-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240328+cuda11.8.torch2.2.2"
ARG KALDIFEAT_VERSION="1.25.4.dev20240329+cuda11.8.torch2.2.2"
ARG TORCHAUDIO_VERSION="2.2.2+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.2.2-cuda12.1-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240328+cuda12.1.torch2.2.2"
ARG KALDIFEAT_VERSION="1.25.4.dev20240329+cuda12.1.torch2.2.2"
ARG TORCHAUDIO_VERSION="2.2.2+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.3.1-cuda11.8-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240606+cuda11.8.torch2.3.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240606+cuda11.8.torch2.3.1"
ARG TORCHAUDIO_VERSION="2.3.1+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.3.1-cuda12.1-cudnn8-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240606+cuda12.1.torch2.3.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240606+cuda12.1.torch2.3.1"
ARG TORCHAUDIO_VERSION="2.3.1+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.4.0-cuda11.8-cudnn9-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240725+cuda11.8.torch2.4.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240725+cuda11.8.torch2.4.0"
ARG TORCHAUDIO_VERSION="2.4.0+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.4.0-cuda12.1-cudnn9-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240725+cuda12.1.torch2.4.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240725+cuda12.1.torch2.4.0"
ARG TORCHAUDIO_VERSION="2.4.0+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240725+cuda12.4.torch2.4.0"
ARG KALDIFEAT_VERSION="1.25.4.dev20240725+cuda12.4.torch2.4.0"
ARG TORCHAUDIO_VERSION="2.4.0+cu124"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.4.1-cuda11.8-cudnn9-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240905+cuda11.8.torch2.4.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240905+cuda11.8.torch2.4.1"
ARG TORCHAUDIO_VERSION="2.4.1+cu118"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.4.1-cuda12.1-cudnn9-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240905+cuda12.1.torch2.4.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240905+cuda12.1.torch2.4.1"
ARG TORCHAUDIO_VERSION="2.4.1+cu121"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

View File

@ -0,0 +1,73 @@
FROM pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel
# python 3.10
ENV LC_ALL C.UTF-8
ARG DEBIAN_FRONTEND=noninteractive
# python 3.10
ARG K2_VERSION="1.24.4.dev20240905+cuda12.4.torch2.4.1"
ARG KALDIFEAT_VERSION="1.25.4.dev20240905+cuda12.4.torch2.4.1"
ARG TORCHAUDIO_VERSION="2.4.1+cu124"
LABEL authors="Fangjun Kuang <csukuangfj@gmail.com>"
LABEL k2_version=${K2_VERSION}
LABEL kaldifeat_version=${KALDIFEAT_VERSION}
LABEL github_repo="https://github.com/k2-fsa/icefall"
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
vim \
libssl-dev \
autoconf \
automake \
bzip2 \
ca-certificates \
ffmpeg \
g++ \
gfortran \
git \
libtool \
make \
patch \
sox \
subversion \
unzip \
valgrind \
wget \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
RUN pip install --no-cache-dir \
torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torchaudio/ \
k2==${K2_VERSION} -f https://k2-fsa.github.io/k2/cuda.html \
git+https://github.com/lhotse-speech/lhotse \
kaldifeat==${KALDIFEAT_VERSION} -f https://csukuangfj.github.io/kaldifeat/cuda.html \
kaldi_native_io \
kaldialign \
kaldifst \
kaldilm \
sentencepiece>=0.1.96 \
tensorboard \
typeguard \
dill \
onnx \
onnxruntime \
onnxmltools \
onnxoptimizer \
onnxsim \
multi_quantization \
typeguard \
numpy \
pytest \
graphviz
RUN git clone https://github.com/k2-fsa/icefall /workspace/icefall && \
cd /workspace/icefall && \
pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH /workspace/icefall:$PYTHONPATH
WORKDIR /workspace/icefall

24
docs/README.md Normal file
View File

@ -0,0 +1,24 @@
## Usage
```bash
cd /path/to/icefall/docs
pip install -r requirements.txt
make clean
make html
cd build/html
python3 -m http.server 8000
```
It prints:
```
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
```
Open your browser and go to <http://0.0.0.0:8000/> to view the generated
documentation.
Done!
**Hint**: You can change the port number when starting the server.

View File

@ -1,2 +1,3 @@
sphinx_rtd_theme sphinx_rtd_theme
sphinx sphinx
sphinxcontrib-youtube==1.1.0

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -16,7 +16,6 @@
import sphinx_rtd_theme import sphinx_rtd_theme
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
project = "icefall" project = "icefall"
@ -33,7 +32,9 @@ release = "0.1"
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones. # ones.
extensions = [ extensions = [
"sphinx.ext.todo",
"sphinx_rtd_theme", "sphinx_rtd_theme",
"sphinxcontrib.youtube",
] ]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
@ -73,5 +74,30 @@ html_context = {
"github_user": "k2-fsa", "github_user": "k2-fsa",
"github_repo": "icefall", "github_repo": "icefall",
"github_version": "master", "github_version": "master",
"conf_py_path": "/icefall/docs/source/", "conf_py_path": "/docs/source/",
} }
todo_include_todos = True
rst_epilog = """
.. _sherpa-ncnn: https://github.com/k2-fsa/sherpa-ncnn
.. _sherpa-onnx: https://github.com/k2-fsa/sherpa-onnx
.. _icefall: https://github.com/k2-fsa/icefall
.. _git-lfs: https://git-lfs.com/
.. _ncnn: https://github.com/tencent/ncnn
.. _LibriSpeech: https://www.openslr.org/12
.. _Gigaspeech: https://github.com/SpeechColab/GigaSpeech
.. _musan: http://www.openslr.org/17/
.. _ONNX: https://github.com/onnx/onnx
.. _onnxruntime: https://github.com/microsoft/onnxruntime
.. _torch: https://github.com/pytorch/pytorch
.. _torchaudio: https://github.com/pytorch/audio
.. _k2: https://github.com/k2-fsa/k2
.. _lhotse: https://github.com/lhotse-speech/lhotse
.. _yesno: https://www.openslr.org/1/
.. _Next-gen Kaldi: https://github.com/k2-fsa
.. _Kaldi: https://github.com/kaldi-asr/kaldi
.. _lilcom: https://github.com/danpovey/lilcom
.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
"""

View File

@ -11,9 +11,9 @@ We use the following tools to make the code style to be as consistent as possibl
The following versions of the above tools are used: The following versions of the above tools are used:
- ``black == 12.6b0`` - ``black == 22.3.0``
- ``flake8 == 3.9.2`` - ``flake8 == 5.0.4``
- ``isort == 5.9.2`` - ``isort == 5.10.1``
After running the following commands: After running the following commands:
@ -38,7 +38,7 @@ Please fix any issues reported by the check tools.
.. HINT:: .. HINT::
Some of the check tools, i.e., ``black`` and ``isort`` will modify Some of the check tools, i.e., ``black`` and ``isort`` will modify
the files to be commited **in-place**. So please run ``git status`` the files to be committed **in-place**. So please run ``git status``
after failure to see which file has been modified by the tools after failure to see which file has been modified by the tools
before you make any further changes. before you make any further changes.
@ -54,10 +54,17 @@ it should succeed this time:
If you want to check the style of your code before ``git commit``, you If you want to check the style of your code before ``git commit``, you
can do the following: can do the following:
.. code-block:: bash
$ pre-commit install
$ pre-commit run
Or without installing the pre-commit hooks:
.. code-block:: bash .. code-block:: bash
$ cd icefall $ cd icefall
$ pip install black==21.6b0 flake8==3.9.2 isort==5.9.2 $ pip install black==22.3.0 flake8==5.0.4 isort==5.10.1
$ black --check your_changed_file.py $ black --check your_changed_file.py
$ black your_changed_file.py # modify it in-place $ black your_changed_file.py # modify it in-place
$ $

View File

@ -3,7 +3,7 @@ How to create a recipe
.. HINT:: .. HINT::
Please read :ref:`follow the code style` to adjust your code sytle. Please read :ref:`follow the code style` to adjust your code style.
.. CAUTION:: .. CAUTION::

View File

@ -0,0 +1,187 @@
.. _LODR:
LODR for RNN Transducer
=======================
As a type of E2E model, neural transducers are usually considered as having an internal
language model, which learns the language level information on the training corpus.
In real-life scenario, there is often a mismatch between the training corpus and the target corpus space.
This mismatch can be a problem when decoding for neural transducer models with language models as its internal
language can act "against" the external LM. In this tutorial, we show how to use
`Low-order Density Ratio <https://arxiv.org/abs/2203.16776>`_ to alleviate this effect to further improve the performance
of langugae model integration.
.. note::
This tutorial is based on the recipe
`pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
which is a streaming transducer model trained on `LibriSpeech`_.
However, you can easily apply LODR to other recipes.
If you encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`__.
.. note::
For simplicity, the training and testing corpus in this tutorial are the same (`LibriSpeech`_). However,
you can change the testing set to any other domains (e.g `GigaSpeech`_) and prepare the language models
using that corpus.
First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here <https://arxiv.org/abs/2002.11268>`_
to address the language information mismatch between the training
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
are acoustically similar, DR derives the following formula for decoding with Bayes' theorem:
.. math::
\text{score}\left(y_u|\mathit{x},y\right) =
\log p\left(y_u|\mathit{x},y_{1:u-1}\right) +
\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)
where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively.
Here, the source domain LM is trained on the training corpus. The only difference in the above formula compared to
shallow fusion is the subtraction of the source domain LM.
Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
considered to be weak and can only capture low-level language information. Therefore, `LODR <https://arxiv.org/abs/2203.16776>`__ proposed to use
a low-order n-gram LM as an approximation of the ILM of the neural transducer. This leads to the following formula
during decoding for transducer model:
.. math::
\text{score}\left(y_u|\mathit{x},y\right) =
\log p_{rnnt}\left(y_u|\mathit{x},y_{1:u-1}\right) +
\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)
In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
the only difference lies in the choice of source domain LM. According to the original `paper <https://arxiv.org/abs/2203.16776>`_,
LODR achieves similar performance compared to DR in both intra-domain and cross-domain settings.
As a bi-gram is much faster to evaluate, LODR is usually much faster.
Now, we will show you how to use LODR in ``icefall``.
For illustration purpose, we will use a pre-trained ASR model from this `link <https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29>`_.
If you want to train your model from scratch, please have a look at :ref:`non_streaming_librispeech_pruned_transducer_stateless`.
The testing scenario here is intra-domain (we decode the model trained on `LibriSpeech`_ on `LibriSpeech`_ testing sets).
As the initial step, let's download the pre-trained model.
.. code-block:: bash
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29
$ cd icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
$ git lfs pull --include "pretrained.pt"
$ ln -s pretrained.pt epoch-99.pt # create a symbolic link so that the checkpoint can be loaded
$ cd ../data/lang_bpe_500
$ git lfs pull --include bpe.model
$ cd ../../..
To test the model, let's have a look at the decoding results **without** using LM. This can be done via the following command:
.. code-block:: bash
$ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/
$ ./pruned_transducer_stateless7_streaming/decode.py \
--epoch 99 \
--avg 1 \
--use-averaged-model False \
--exp-dir $exp_dir \
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model \
--max-duration 600 \
--decode-chunk-len 32 \
--decoding-method modified_beam_search
The following WERs are achieved on test-clean and test-other:
.. code-block:: text
$ For test-clean, WER of different settings are:
$ beam_size_4 3.11 best for test-clean
$ For test-other, WER of different settings are:
$ beam_size_4 7.93 best for test-other
Then, we download the external language model and bi-gram LM that are necessary for LODR.
Note that the bi-gram is estimated on the LibriSpeech 960 hours' text.
.. code-block:: bash
$ # download the external LM
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
$ # create a symbolic link so that the checkpoint can be loaded
$ pushd icefall-librispeech-rnn-lm/exp
$ git lfs pull --include "pretrained.pt"
$ ln -s pretrained.pt epoch-99.pt
$ popd
$
$ # download the bi-gram
$ git lfs install
$ git clone https://huggingface.co/marcoyang/librispeech_bigram
$ pushd data/lang_bpe_500
$ ln -s ../../librispeech_bigram/2gram.fst.txt .
$ popd
Then, we perform LODR decoding by setting ``--decoding-method`` to ``modified_beam_search_lm_LODR``:
.. code-block:: bash
$ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
$ lm_dir=./icefall-librispeech-rnn-lm/exp
$ lm_scale=0.42
$ LODR_scale=-0.24
$ ./pruned_transducer_stateless7_streaming/decode.py \
--epoch 99 \
--avg 1 \
--use-averaged-model False \
--beam-size 4 \
--exp-dir $exp_dir \
--max-duration 600 \
--decode-chunk-len 32 \
--decoding-method modified_beam_search_LODR \
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model \
--use-shallow-fusion 1 \
--lm-type rnn \
--lm-exp-dir $lm_dir \
--lm-epoch 99 \
--lm-scale $lm_scale \
--lm-avg 1 \
--rnn-lm-embedding-dim 2048 \
--rnn-lm-hidden-dim 2048 \
--rnn-lm-num-layers 3 \
--lm-vocab-size 500 \
--tokens-ngram 2 \
--ngram-lm-scale $LODR_scale
There are two extra arguments that need to be given when doing LODR. ``--tokens-ngram`` specifies the order of n-gram. As we
are using a bi-gram, we set it to 2. ``--ngram-lm-scale`` is the scale of the bi-gram, it should be a negative number
as we are subtracting the bi-gram's score during decoding.
The decoding results obtained with the above command are shown below:
.. code-block:: text
$ For test-clean, WER of different settings are:
$ beam_size_4 2.61 best for test-clean
$ For test-other, WER of different settings are:
$ beam_size_4 6.74 best for test-other
Recall that the lowest WER we obtained in :ref:`shallow_fusion` with beam size of 4 is ``2.77/7.08``, LODR
indeed **further improves** the WER. We can do even better if we increase ``--beam-size``:
.. list-table:: WER of LODR with different beam sizes
:widths: 25 25 50
:header-rows: 1
* - Beam size
- test-clean
- test-other
* - 4
- 2.61
- 6.74
* - 8
- 2.45
- 6.38
* - 12
- 2.4
- 6.23

Some files were not shown because too many files have changed in this diff Show More