214 Commits

Author SHA1 Message Date
Daniel Povey
a0507a83a5 Change scalar_max in optim.py from 2.0 to 5.0 2022-10-25 22:58:07 +08:00
Daniel Povey
146626bb85 Renaming in optim.py; remove step() from scan_pessimistic_batches_for_oom in train.py 2022-10-22 17:44:21 +08:00
Daniel Povey
af0fc31c78 Introduce warmup schedule in optimizer 2022-10-22 15:15:43 +08:00
Daniel Povey
1ec9fe5c98 Make warmup period decrease scale on simple loss, leaving pruned loss scale constant. 2022-10-22 14:48:53 +08:00
Daniel Povey
efde3757c7 Reset optimizer state when we change loss function definition. 2022-10-22 14:30:18 +08:00
Daniel Povey
857b3735e7 Fix bug where fewer layers were dropped than should be; remove unnecesary print statement. 2022-10-10 13:18:40 +08:00
Daniel Povey
dece8ad204 Various fixes from debugging with nvtx, but removed the NVTX annotations. 2022-10-09 21:14:52 +08:00
Daniel Povey
bd7dce460b Reintroduce batching to the optimizer 2022-10-09 20:29:23 +08:00
Daniel Povey
24aea947d2 Fix issues where grad is None, and unused-grad cases 2022-09-22 19:18:16 +08:00
Daniel Povey
03a77f8ae5 Merge branch 'scaled_adam_exp7c' into scaled_adam_exp11c 2022-09-22 18:15:44 +08:00
Daniel Povey
e2fdfe990c Loosen limit on param_max_rms, from 2.0 to 3.0; change how param_min_rms is applied. 2022-09-20 15:20:43 +08:00
Daniel Povey
3d72a65de8 Implement max-eig-proportion.. 2022-09-19 10:26:37 +08:00
Daniel Povey
69404f61ef Use scalar_lr_scale for scalars as well as sizes. 2022-09-18 14:12:27 +08:00
Daniel Povey
bb1bee4a7b Improve how quartiles are printed 2022-09-16 17:30:03 +08:00
Daniel Povey
8298333bd2 Implement gradient clipping. 2022-09-16 16:52:46 +08:00
Daniel Povey
8f876b3f54 Remove batching from ScaledAdam, in preparation to add gradient norm clipping 2022-09-16 15:42:56 +08:00
Daniel Povey
257c961b66 1st attempt at scaled_adam 2022-09-16 13:59:52 +08:00
Daniel Povey
276928655e Merge branch 'pradam_exp1m8' into pradam_exp1m7s2 2022-08-24 04:17:30 +08:00
Daniel Povey
64f7166545 Some cleanups 2022-08-18 07:03:50 +08:00
Daniel Povey
5c33899ddc Increase cov_min[3] from 0.001 to 0.002 2022-08-06 16:28:02 +08:00
Daniel Povey
9bbf8ada57 Scale up diag of grad_cov by 1.0001 prior to diagonalizing it. 2022-08-06 07:06:23 +08:00
Daniel Povey
c021b4fec6 Increase cov_min[3] from 0.0001 to 0.001 2022-08-06 07:02:52 +08:00
Daniel Povey
a5b9b7b974 Cosmetic changes 2022-08-05 03:51:00 +08:00
Daniel Povey
dc9133227f Reworked how inverse is done, fixed bug in _apply_min_max_with_metric, regarding how M is normalized. 2022-08-04 09:46:14 +08:00
Daniel Povey
766bf69a98 Reduce cov_max[2] from 4.0 to 3.5 2022-08-03 04:10:11 +08:00
Daniel Povey
129b28aa9b Increase cov_min[2] from 0.05 to 0.1; decrease cov_max[2] from 5.0 to 4.0. 2022-08-02 15:17:24 +08:00
Daniel Povey
202752418a Increase cov_min[2] from 0.02 to 0.05. 2022-08-02 15:15:41 +08:00
Daniel Povey
e44ab25e99 Bug fix 2022-08-02 14:31:37 +08:00
Daniel Povey
e9f4ada1c0 Swap the order of applying min and max in smoothing operations 2022-08-02 11:55:43 +08:00
Daniel Povey
9473c7e23d Lots of changes to how min and max are applied, use 1-norm for min in smooth_cov but not _apply_min_max_with_metric. 2022-08-02 11:29:54 +08:00
Daniel Povey
6ab4cf615d 1st draft of new method of normalizing covs that uses normalization w.r.t. spectral 2-norm 2022-08-02 09:34:37 +08:00
Daniel Povey
4919134a94 Merge making hidden_dim an arg 2022-08-02 09:09:29 +08:00
Daniel Povey
c64bd5ebcd Merge making hidden_dim an arg 2022-08-02 09:07:36 +08:00
Daniel Povey
b008340d83 Merge making hidden_dim an arg 2022-08-02 09:01:19 +08:00
Daniel Povey
9f2229edb5 Merge making hidden_dim an arg 2022-08-02 08:58:00 +08:00
Daniel Povey
a45f820e25 Merge making hidden_dim an arg 2022-08-02 08:56:36 +08:00
Daniel Povey
804f264ffd Merge hidden_dim providing it as arg 2022-08-02 08:40:13 +08:00
Daniel Povey
ee311247ea Decrease debugging freq 2022-08-01 03:55:18 +08:00
Daniel Povey
4c5d49c448 Some numerical improvements, and a fix to calculation of mean_eig in _apply_min_max_with_metric(), to average over blocks too. 2022-08-01 03:51:39 +08:00
Daniel Povey
e2cc09a8c6 Fix issue with max_eig formula; restore cov_min[1]=0.0025. 2022-07-31 18:29:44 +08:00
Daniel Povey
3590c2fc42 Set cov_min[1] to 0 to stop an invertibility problem 2022-07-31 18:06:01 +08:00
Daniel Povey
7231c610e8 Restore min_cov applied with G. 2022-07-31 02:22:07 -07:00
Daniel Povey
d84a2e22e3 Applying max to G with noinv method with metric. 2022-07-31 02:10:27 -07:00
Daniel Povey
2042c9862c Merge branch 'pradam_exp1m4_nophase1_noinv' into pradam_exp1m4_nophase1_rework_noinv 2022-07-31 01:32:36 -07:00
Daniel Povey
90fa8a63eb Use different approach for applying max eig, with matmul, no inverse. 2022-07-31 01:32:11 -07:00
Daniel Povey
ed1a147ef1 Implement no-inverse max-cov 2022-07-31 00:08:02 -07:00
Daniel Povey
0666789cb8 Small numerical improvements; config change of eps and G_diag changed 1.01 to 1.005; decrease an eps from 1e-10 to 1e-20 2022-07-30 21:48:54 -07:00
Daniel Povey
cb67540cdc this version not working great 2022-07-30 21:14:03 -07:00
Daniel Povey
790e8c4ba9 Changes that should not really affect the results, just cleanup. 2022-07-30 19:20:36 -07:00
Daniel Povey
5184ac570d Removing phase1, adding regular smoothing with the mean. 2022-07-30 19:15:51 -07:00