37 Commits

Author SHA1 Message Date
marcoyang
ae3149cb7f freeze BERT option 2023-09-21 10:24:14 +08:00
marcoyang
21cc1dfff4 fix lhotse compatibility 2023-09-21 10:22:56 +08:00
marcoyang1998
fdff6b3b3a add shared 2023-09-20 14:56:38 +08:00
marcoyang1998
9485587976 add RESULTS.md, pending model link 2023-09-20 11:45:13 +08:00
marcoyang1998
203cd5cf11 add usage in decoder_bert.py 2023-09-20 11:44:36 +08:00
marcoyang1998
cda6e06a85 updates 2023-09-20 10:35:37 +08:00
marcoyang1998
93461fb77e add documentation to different text sampling function 2023-09-20 09:57:03 +08:00
marcoyang1998
6579800720 update 2023-09-19 18:38:56 +08:00
marcoyang1998
bea1bd295f add script for generating context list for each utterance 2023-09-19 17:44:52 +08:00
marcoyang1998
8401f26342 update some documentation for cross-attention zipformer 2023-09-19 14:53:33 +08:00
marcoyang1998
58dc0430be remove subformer scripts 2023-09-18 17:28:50 +08:00
marcoyang1998
d411ffb4b6 update 2023-09-15 16:08:27 +08:00
marcoyang1998
a0fe6bcd0d further clean up 2023-09-15 11:13:51 +08:00
marcoyang1998
ae2c7c73f6 remove/rename files 2023-09-15 10:54:58 +08:00
marcoyang1998
1bd6be03c1 minor updates 2023-09-15 09:56:42 +08:00
marcoyang1998
cb85d4c337 remove unused scripts 2023-09-15 09:55:34 +08:00
marcoyang1998
66ac3a4ecc removed un-used files 2023-09-14 18:38:44 +08:00
marcoyang1998
84ff2ab67c add text normalization for librispeech test sets 2023-09-14 18:36:09 +08:00
marcoyang1998
81af525de4 update the biasing lists 2023-09-08 10:15:21 +08:00
marcoyang1998
bbf1577818 add long audio transcription scripts 2023-09-08 10:02:41 +08:00
marcoyang1998
07e27348dd more updates 2023-09-08 10:01:48 +08:00
marcoyang1998
013cafdd6d updates 2023-09-08 10:00:00 +08:00
marcoyang1998
522273f97e change the text normalization for upper_case_no_punc 2023-09-08 09:57:24 +08:00
marcoyang1998
77890a6115 add context biasing at different levels 2023-09-08 09:56:45 +08:00
marcoyang1998
d4c5a1c157 updates 2023-09-08 09:55:41 +08:00
marcoyang1998
cad01bfcb6 add subformer model with style embeddings 2023-08-29 16:04:51 +08:00
marcoyang1998
16e8907805 update text normalization for librispeech test sets 2023-08-29 16:03:56 +08:00
marcoyang1998
80c54c05e2 support showing WERs of different books 2023-08-17 23:59:37 +08:00
marcoyang1998
f23882b9f6 also sample from distractors when using separate words in the ref text; increase the max length of substring 2023-08-17 12:11:33 +08:00
marcoyang1998
8a238317a4 support using subformer as text encoder and train with style 2023-08-16 19:08:36 +08:00
marcoyang1998
73fa1651f0 minor updates to utils.py 2023-08-16 16:47:23 +08:00
marcoyang1998
2091bb5f25 add two pass decoding 2023-08-16 16:46:50 +08:00
marcoyang1998
0982db9cde add a few args to support context list and rare words 2023-08-16 16:44:58 +08:00
marcoyang1998
4420788f66 support using context list and random substring as pre text 2023-08-16 16:44:29 +08:00
marcoyang1998
17d0918969 fix the post normalization bug, avoid multiple words 2023-08-16 09:39:42 +08:00
marcoyang1998
fdc4fcabb9 use a more aggresive sampling_weight 2023-08-16 09:38:40 +08:00
marcoyang1998
ae4d2fbfcc initial commit 2023-08-14 09:51:20 +08:00