68 Commits

Author SHA1 Message Date
Daniel Povey
ebf8aa129d Apply layer bypass during warmup in a new way, including 2s and 4s of layers. 2022-10-07 16:56:40 +08:00
Daniel Povey
bd325e8769 Remove debug info 2022-10-06 20:31:15 +08:00
Daniel Povey
a3179c30e7 Various fixes, finish implementating frame masking 2022-10-06 20:29:45 +08:00
Daniel Povey
e4c9786e4a Merge branch 'scaled_adam_exp27' into scaled_adam_exp69
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
2022-10-06 18:04:48 +08:00
Daniel Povey
e1d741a632 Slight code cleanup/simplification 2022-10-06 14:29:51 +08:00
Daniel Povey
99d17d13cf Merge branch 'scaled_adam_exp58' into scaled_adam_exp67 2022-10-06 14:27:12 +08:00
Daniel Povey
02eb7af824 Don't always apply the frame mask 2022-10-06 13:01:36 +08:00
Daniel Povey
0685ac792d Remove layer dropout and model-level warmup 2022-10-06 12:36:42 +08:00
Daniel Povey
537c3537c0 Remove warmup 2022-10-06 12:33:43 +08:00
Daniel Povey
bb233d3449 Add debug info 2022-10-05 23:18:50 +08:00
Daniel Povey
1cd7e93183 Fix bug setting layerdrop mask 2022-10-05 16:19:45 +08:00
Daniel Povey
61f62837fa Fix bug RE self.training 2022-10-05 15:34:39 +08:00
Daniel Povey
81542832bf Bug fices 2022-10-04 22:34:24 +08:00
Daniel Povey
5fe8cb134f Remove final combination; implement layer drop that drops the final layers. 2022-10-04 22:19:44 +08:00
Daniel Povey
006fcc18cd Introduce offset in layerdrop_scaleS 2022-10-04 12:06:35 +08:00
Daniel Povey
33c24e4114 Bug fix 2022-10-03 23:07:30 +08:00
Daniel Povey
a9f950a1f7 Make the scaling factors more global and the randomness of dropout more random 2022-10-03 22:49:32 +08:00
Daniel Povey
88d0da7192 Simplify the learned scaling factor on the modules 2022-10-03 17:54:56 +08:00
Daniel Povey
b3af9f67ae Implement efficient layer dropout 2022-10-03 17:19:16 +08:00
Daniel Povey
93dff29243 Introduce a scale dependent on the masking value 2022-10-03 14:34:37 +08:00
Daniel Povey
5a8995328f Stop backprop bug 2022-10-03 13:33:01 +08:00
Daniel Povey
a0a1874415 Bug fix 2022-10-03 13:23:26 +08:00
Daniel Povey
c20fc3be14 Randomize order of some modules 2022-10-03 13:02:42 +08:00
Daniel Povey
1be455438a Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change. 2022-10-02 14:00:36 +08:00
Daniel Povey
cf5f7e5dfd Swap random_prob and single_prob, to reduce prob of being randomized. 2022-10-01 23:50:38 +08:00
Daniel Povey
8d517a69e4 Increase feature_mask_dropout_prob from 0.15 to 0.2. 2022-10-01 23:32:24 +08:00
Daniel Povey
e9326a7d16 Remove dropout from inside ConformerEncoderLayer, for adding to residuals 2022-10-01 13:13:10 +08:00
Daniel Povey
cc64f2f15c Reduce feature_mask_dropout_prob from 0.25 to 0.15. 2022-10-01 12:24:07 +08:00
Daniel Povey
1eb603f4ad Reduce single_prob from 0.5 to 0.25 2022-09-30 22:14:53 +08:00
Daniel Povey
ab7c940803 Include changes from Liyong about padding conformer module. 2022-09-30 18:37:31 +08:00
Daniel Povey
38f89053bd Introduce feature mask per frame 2022-09-29 17:31:04 +08:00
Daniel Povey
056b9a4f9a Apply single_prob mask, so sometimes we just get one layer as output. 2022-09-29 15:29:37 +08:00
Daniel Povey
d8f7310118 Add print statement 2022-09-29 14:15:29 +08:00
Daniel Povey
d398f0ed70 Decrease random_prob from 0.5 to 0.333 2022-09-29 13:55:33 +08:00
Daniel Povey
461ad3655a Implement AttentionCombine as replacement for RandomCombine 2022-09-29 13:44:03 +08:00
Daniel Povey
14a2603ada Bug fix 2022-09-28 20:59:24 +08:00
Daniel Povey
e5666628bd Bug fix 2022-09-28 20:58:34 +08:00
Daniel Povey
df795912ed Try to reproduce baseline but with current code with 2 encoder stacks, as a baseline 2022-09-28 20:56:40 +08:00
Daniel Povey
1005ff35ba Fix w.r.t. uneven upsampling 2022-09-28 13:57:26 +08:00
Daniel Povey
10a3061025 Simplify downsampling and upsampling 2022-09-28 13:49:11 +08:00
Daniel Povey
01af88c2f6 Various fixes 2022-09-27 16:09:30 +08:00
Daniel Povey
d34eafa623 Closer to working.. 2022-09-27 15:47:58 +08:00
Daniel Povey
e5a0d8929b Remove unused out_balancer member 2022-09-27 13:10:59 +08:00
Daniel Povey
6b12f20995 Remove out_balancer and out_norm from conv modules 2022-09-27 12:25:11 +08:00
Daniel Povey
71b3756ada Use half the dim per head, in self_attn layers. 2022-09-24 15:40:44 +08:00
Daniel Povey
ce3f59d9c7 Use dropout in attention, on attn weights. 2022-09-22 19:18:50 +08:00
Daniel Povey
24aea947d2 Fix issues where grad is None, and unused-grad cases 2022-09-22 19:18:16 +08:00
Daniel Povey
c16f795962 Avoid error in ddp by using last module'sc scores 2022-09-22 18:52:16 +08:00
Daniel Povey
0f85a3c2e5 Implement persistent attention scores 2022-09-22 18:47:16 +08:00
Daniel Povey
1d20c12bc0 Increase max_var_per_eig to 0.2 2022-09-22 12:28:35 +08:00