Daniel Povey
|
00841f0f49
|
Remove unused code LearnedScale.
|
2022-10-09 16:07:31 +08:00 |
|
Daniel Povey
|
cf450908c6
|
Revert also the changes in scaled_adam_exp85 regarding warmup schedule
|
2022-10-09 14:26:32 +08:00 |
|
Daniel Povey
|
40fa33d702
|
Decrease initial_layerdrop_prob from 0.75 to 0.5
|
2022-10-09 13:59:56 +08:00 |
|
Daniel Povey
|
44ad73c44f
|
For speed, drop the same num layers per job.
|
2022-10-09 13:40:24 +08:00 |
|
Daniel Povey
|
f8f200e2b2
|
Make layerdrop different in different processes.
|
2022-10-09 12:25:12 +08:00 |
|
Daniel Povey
|
e6540865f3
|
Do warmup by dropping out whole layers.
|
2022-10-09 11:50:24 +08:00 |
|
Daniel Povey
|
5255969544
|
Revert "Change warmup schedule and increase warmup_batches from 4k to 6k"
This reverts commit 86845bd5d859ceb6f83cd83f3719c3e6641de987.
|
2022-10-09 11:30:27 +08:00 |
|
Daniel Povey
|
d467338837
|
Limit bypass scale to >= 0.1
|
2022-10-08 21:37:21 +08:00 |
|
Daniel Povey
|
bc9fbe2579
|
Bug fix
|
2022-10-08 21:06:09 +08:00 |
|
Daniel Povey
|
9023fe7151
|
Change the initial keep-prob back from 0.25 to 0.5
|
2022-10-08 20:55:15 +08:00 |
|
Daniel Povey
|
97a0fbe44b
|
Make the bypass scale trainable.
|
2022-10-08 20:32:49 +08:00 |
|
Daniel Povey
|
86845bd5d8
|
Change warmup schedule and increase warmup_batches from 4k to 6k
|
2022-10-08 19:10:26 +08:00 |
|
Daniel Povey
|
2631f05c1f
|
Make it start warming up from the very start, and increase warmup_batches to 6k
|
2022-10-08 19:09:41 +08:00 |
|
Daniel Povey
|
5c99e97c3b
|
Decrease initial keep_prob to 0.25.
|
2022-10-08 18:35:59 +08:00 |
|
Daniel Povey
|
b1fa3d50fb
|
Implement layer dropout (in a relatively efficient way)
|
2022-10-08 16:07:20 +08:00 |
|
Daniel Povey
|
af545e061b
|
Make the warmup mask per frame.
|
2022-10-08 15:37:02 +08:00 |
|
Daniel Povey
|
6dc449da84
|
Remove debug print
|
2022-10-08 13:10:07 +08:00 |
|
Daniel Povey
|
71b8bfe212
|
Fix bug in warmup
|
2022-10-08 13:04:14 +08:00 |
|
Daniel Povey
|
606d3bd2d3
|
Do dropout a different way
|
2022-10-08 12:55:11 +08:00 |
|
Daniel Povey
|
fe4a7e904f
|
Have warmup that gradually removes dropout from layers; multiply initialization scales by 0.1.
|
2022-10-08 12:45:22 +08:00 |
|
Daniel Povey
|
300da1306d
|
Add warmup schedule where dropout disappears from earlier layers first.
|
2022-10-08 12:16:53 +08:00 |
|
Daniel Povey
|
9c1a239931
|
Fix issue with warmup in test time
|
2022-10-08 11:01:02 +08:00 |
|
Daniel Povey
|
97bc894f62
|
Implement layer dropout with probability 0.075
|
2022-10-07 19:01:35 +08:00 |
|
Daniel Povey
|
b9a95af099
|
Remove the feature where it was bypassing groups of layers.
|
2022-10-07 18:50:53 +08:00 |
|
Daniel Povey
|
ff4028df8e
|
Revert initial_scale to previous values.
|
2022-10-07 17:19:23 +08:00 |
|
Daniel Povey
|
ebf8aa129d
|
Apply layer bypass during warmup in a new way, including 2s and 4s of layers.
|
2022-10-07 16:56:40 +08:00 |
|
Daniel Povey
|
bd325e8769
|
Remove debug info
|
2022-10-06 20:31:15 +08:00 |
|
Daniel Povey
|
a3179c30e7
|
Various fixes, finish implementating frame masking
|
2022-10-06 20:29:45 +08:00 |
|
Daniel Povey
|
e4c9786e4a
|
Merge branch 'scaled_adam_exp27' into scaled_adam_exp69
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
|
2022-10-06 18:04:48 +08:00 |
|
Daniel Povey
|
e1d741a632
|
Slight code cleanup/simplification
|
2022-10-06 14:29:51 +08:00 |
|
Daniel Povey
|
99d17d13cf
|
Merge branch 'scaled_adam_exp58' into scaled_adam_exp67
|
2022-10-06 14:27:12 +08:00 |
|
Daniel Povey
|
02eb7af824
|
Don't always apply the frame mask
|
2022-10-06 13:01:36 +08:00 |
|
Daniel Povey
|
0685ac792d
|
Remove layer dropout and model-level warmup
|
2022-10-06 12:36:42 +08:00 |
|
Daniel Povey
|
537c3537c0
|
Remove warmup
|
2022-10-06 12:33:43 +08:00 |
|
Daniel Povey
|
bb233d3449
|
Add debug info
|
2022-10-05 23:18:50 +08:00 |
|
Daniel Povey
|
1cd7e93183
|
Fix bug setting layerdrop mask
|
2022-10-05 16:19:45 +08:00 |
|
Daniel Povey
|
61f62837fa
|
Fix bug RE self.training
|
2022-10-05 15:34:39 +08:00 |
|
Daniel Povey
|
81542832bf
|
Bug fices
|
2022-10-04 22:34:24 +08:00 |
|
Daniel Povey
|
5fe8cb134f
|
Remove final combination; implement layer drop that drops the final layers.
|
2022-10-04 22:19:44 +08:00 |
|
Daniel Povey
|
006fcc18cd
|
Introduce offset in layerdrop_scaleS
|
2022-10-04 12:06:35 +08:00 |
|
Daniel Povey
|
33c24e4114
|
Bug fix
|
2022-10-03 23:07:30 +08:00 |
|
Daniel Povey
|
a9f950a1f7
|
Make the scaling factors more global and the randomness of dropout more random
|
2022-10-03 22:49:32 +08:00 |
|
Daniel Povey
|
88d0da7192
|
Simplify the learned scaling factor on the modules
|
2022-10-03 17:54:56 +08:00 |
|
Daniel Povey
|
b3af9f67ae
|
Implement efficient layer dropout
|
2022-10-03 17:19:16 +08:00 |
|
Daniel Povey
|
93dff29243
|
Introduce a scale dependent on the masking value
|
2022-10-03 14:34:37 +08:00 |
|
Daniel Povey
|
5a8995328f
|
Stop backprop bug
|
2022-10-03 13:33:01 +08:00 |
|
Daniel Povey
|
a0a1874415
|
Bug fix
|
2022-10-03 13:23:26 +08:00 |
|
Daniel Povey
|
c20fc3be14
|
Randomize order of some modules
|
2022-10-03 13:02:42 +08:00 |
|
Daniel Povey
|
1be455438a
|
Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change.
|
2022-10-02 14:00:36 +08:00 |
|
Daniel Povey
|
cf5f7e5dfd
|
Swap random_prob and single_prob, to reduce prob of being randomized.
|
2022-10-01 23:50:38 +08:00 |
|