941 Commits

Author SHA1 Message Date
Daniel Povey
f8f200e2b2 Make layerdrop different in different processes. 2022-10-09 12:25:12 +08:00
Daniel Povey
3e137dda5b Decrease frequency of logging variance_proportion 2022-10-09 12:05:52 +08:00
Daniel Povey
e6540865f3 Do warmup by dropping out whole layers. 2022-10-09 11:50:24 +08:00
Daniel Povey
5255969544 Revert "Change warmup schedule and increase warmup_batches from 4k to 6k"
This reverts commit 86845bd5d859ceb6f83cd83f3719c3e6641de987.
2022-10-09 11:30:27 +08:00
Daniel Povey
d467338837 Limit bypass scale to >= 0.1 2022-10-08 21:37:21 +08:00
Daniel Povey
bc9fbe2579 Bug fix 2022-10-08 21:06:09 +08:00
Daniel Povey
9023fe7151 Change the initial keep-prob back from 0.25 to 0.5 2022-10-08 20:55:15 +08:00
Daniel Povey
97a0fbe44b Make the bypass scale trainable. 2022-10-08 20:32:49 +08:00
Daniel Povey
86845bd5d8 Change warmup schedule and increase warmup_batches from 4k to 6k 2022-10-08 19:10:26 +08:00
Daniel Povey
2631f05c1f Make it start warming up from the very start, and increase warmup_batches to 6k 2022-10-08 19:09:41 +08:00
Daniel Povey
5c99e97c3b Decrease initial keep_prob to 0.25. 2022-10-08 18:35:59 +08:00
Daniel Povey
b1fa3d50fb Implement layer dropout (in a relatively efficient way) 2022-10-08 16:07:20 +08:00
Daniel Povey
af545e061b Make the warmup mask per frame. 2022-10-08 15:37:02 +08:00
Daniel Povey
6dc449da84 Remove debug print 2022-10-08 13:10:07 +08:00
Daniel Povey
71b8bfe212 Fix bug in warmup 2022-10-08 13:04:14 +08:00
Daniel Povey
606d3bd2d3 Do dropout a different way 2022-10-08 12:55:11 +08:00
Daniel Povey
fe4a7e904f Have warmup that gradually removes dropout from layers; multiply initialization scales by 0.1. 2022-10-08 12:45:22 +08:00
Daniel Povey
300da1306d Add warmup schedule where dropout disappears from earlier layers first. 2022-10-08 12:16:53 +08:00
Daniel Povey
9c1a239931 Fix issue with warmup in test time 2022-10-08 11:01:02 +08:00
Daniel Povey
97bc894f62 Implement layer dropout with probability 0.075 2022-10-07 19:01:35 +08:00
Daniel Povey
b9a95af099 Remove the feature where it was bypassing groups of layers. 2022-10-07 18:50:53 +08:00
Daniel Povey
ff4028df8e Revert initial_scale to previous values. 2022-10-07 17:19:23 +08:00
Daniel Povey
28e5f46854 Update checkpoint.py to deal with int params 2022-10-07 17:06:38 +08:00
Daniel Povey
ebf8aa129d Apply layer bypass during warmup in a new way, including 2s and 4s of layers. 2022-10-07 16:56:40 +08:00
Daniel Povey
314f2381e2 Don't compute validation if printing diagnostics. 2022-10-07 14:03:17 +08:00
Daniel Povey
bd325e8769 Remove debug info 2022-10-06 20:31:15 +08:00
Daniel Povey
a3179c30e7 Various fixes, finish implementating frame masking 2022-10-06 20:29:45 +08:00
Daniel Povey
e4c9786e4a Merge branch 'scaled_adam_exp27' into scaled_adam_exp69
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
2022-10-06 18:04:48 +08:00
Daniel Povey
e1d741a632 Slight code cleanup/simplification 2022-10-06 14:29:51 +08:00
Daniel Povey
99d17d13cf Merge branch 'scaled_adam_exp58' into scaled_adam_exp67 2022-10-06 14:27:12 +08:00
Daniel Povey
02eb7af824 Don't always apply the frame mask 2022-10-06 13:01:36 +08:00
Daniel Povey
0685ac792d Remove layer dropout and model-level warmup 2022-10-06 12:36:42 +08:00
Daniel Povey
537c3537c0 Remove warmup 2022-10-06 12:33:43 +08:00
Daniel Povey
bb233d3449 Add debug info 2022-10-05 23:18:50 +08:00
Daniel Povey
040592a9e3 Fix eigs call 2022-10-05 16:22:33 +08:00
Daniel Povey
1cd7e93183 Fix bug setting layerdrop mask 2022-10-05 16:19:45 +08:00
Daniel Povey
61f62837fa Fix bug RE self.training 2022-10-05 15:34:39 +08:00
Daniel Povey
81542832bf Bug fices 2022-10-04 22:34:24 +08:00
Daniel Povey
5fe8cb134f Remove final combination; implement layer drop that drops the final layers. 2022-10-04 22:19:44 +08:00
Daniel Povey
006fcc18cd Introduce offset in layerdrop_scaleS 2022-10-04 12:06:35 +08:00
Daniel Povey
33c24e4114 Bug fix 2022-10-03 23:07:30 +08:00
Daniel Povey
a9f950a1f7 Make the scaling factors more global and the randomness of dropout more random 2022-10-03 22:49:32 +08:00
Daniel Povey
96e0d92fb7 Compute valid loss on batch 0. 2022-10-03 18:24:00 +08:00
Daniel Povey
88d0da7192 Simplify the learned scaling factor on the modules 2022-10-03 17:54:56 +08:00
Daniel Povey
b3af9f67ae Implement efficient layer dropout 2022-10-03 17:19:16 +08:00
Daniel Povey
93dff29243 Introduce a scale dependent on the masking value 2022-10-03 14:34:37 +08:00
Daniel Povey
5a8995328f Stop backprop bug 2022-10-03 13:33:01 +08:00
Daniel Povey
a0a1874415 Bug fix 2022-10-03 13:23:26 +08:00
Daniel Povey
c20fc3be14 Randomize order of some modules 2022-10-03 13:02:42 +08:00
Daniel Povey
1be455438a Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change. 2022-10-02 14:00:36 +08:00