20 Commits

Author SHA1 Message Date
marcoyang1998
d84631c403
Merge cc168d104128348e9e24835c856c1bd946638e71 into 231bbcd2b638826a94cf019fa31ae8683d3552ee 2023-11-03 17:07:28 +08:00
zr_jin
23913f6afd
Minor refinements for some stale but recently merged PRs (#1354)
* incorporate https://github.com/k2-fsa/icefall/pull/1269

* incorporate https://github.com/k2-fsa/icefall/pull/1301

* black formatted

* incorporate https://github.com/k2-fsa/icefall/pull/1162

* black formatted
2023-10-31 10:28:20 +08:00
zr_jin
f9980aa606
minor fixes (#1332) 2023-10-24 08:17:17 +08:00
zr_jin
92ef561ff7
Minor fixes for torch.jit.script support (#1329) 2023-10-24 01:10:50 +08:00
marcoyang1998
ce372cce33
Update documentation to PromptASR (#1321) 2023-10-19 17:24:31 +08:00
marcoyang1998
16a2748d6c
PromptASR for contextualized ASR with controllable style (#1250)
* Add PromptASR with BERT as text encoder

* Support using word-list based content prompts for context biasing

* Upload the pretrained models to huggingface

* Add usage example
2023-10-11 14:56:41 +08:00
marcoyang1998
cc168d1041 update the pipeline 2023-08-09 12:11:43 +08:00
marcoyang1998
b8540ac3c0 minor fix 2023-07-20 15:51:34 +08:00
marcoyang1998
754ac00509 add more normalizations such as number/year to words; fix a few bugs when feeding input to WER computation 2023-07-20 15:50:50 +08:00
marcoyang1998
5532bb1683 add files for decoding 2023-07-19 22:05:53 +08:00
marcoyang1998
4f3a6606ad add necessary files for training 2023-07-19 22:04:11 +08:00
marcoyang1998
88a311734d add script to prepare validation and test sets 2023-07-19 11:01:07 +08:00
marcoyang1998
0aee07fb4c change the valid/test sets; only do simple normalization in the dataloader, i.e only replace full-width symbol, replace double hyphen with space 2023-07-19 11:00:07 +08:00
marcoyang1998
0d1cd4f595 add char coverage option to avoid having a lot of rarely used tokens in the BPE; add the option to use byte-fallback in training BPE 2023-07-19 10:55:57 +08:00
marcoyang1998
b53c0d1e5f initial commit for zipformer recipe 2023-07-18 11:42:19 +08:00
marcoyang1998
6939b3d6aa minor fixes 2023-07-18 11:14:06 +08:00
marcoyang
0e7df7c5c4 add necessary utility files 2023-07-18 10:06:22 +08:00
marcoyang
189d424b25 only use medium text to train the BPE as the whole corpus is tooooo large 2023-07-18 10:06:01 +08:00
marcoyang
fef229e024 add necessary files to compute features 2023-07-17 10:36:25 +08:00
marcoyang
44d01195c0 initial commit for libriheavy 2023-07-14 23:50:27 +08:00