Daniel Povey
|
096ebeaf23
|
take a couple files from liyong's branch
|
2023-01-05 12:01:42 +08:00 |
|
Daniel Povey
|
8056e0f9af
|
Make sure param_rms limit is effectively applied; fix tests in optim.py
|
2022-12-29 23:55:16 +08:00 |
|
Daniel Povey
|
28cac1c2dc
|
Merge debugging changes to optimizer.
|
2022-12-20 13:01:50 +08:00 |
|
Daniel Povey
|
bf37c7ca85
|
Regularize how we apply the min and max to the eps of BasicNorm
|
2022-10-26 12:51:20 +08:00 |
|
Daniel Povey
|
a0507a83a5
|
Change scalar_max in optim.py from 2.0 to 5.0
|
2022-10-25 22:58:07 +08:00 |
|
Daniel Povey
|
146626bb85
|
Renaming in optim.py; remove step() from scan_pessimistic_batches_for_oom in train.py
|
2022-10-22 17:44:21 +08:00 |
|
Daniel Povey
|
af0fc31c78
|
Introduce warmup schedule in optimizer
|
2022-10-22 15:15:43 +08:00 |
|
Daniel Povey
|
1ec9fe5c98
|
Make warmup period decrease scale on simple loss, leaving pruned loss scale constant.
|
2022-10-22 14:48:53 +08:00 |
|
Daniel Povey
|
efde3757c7
|
Reset optimizer state when we change loss function definition.
|
2022-10-22 14:30:18 +08:00 |
|
Daniel Povey
|
857b3735e7
|
Fix bug where fewer layers were dropped than should be; remove unnecesary print statement.
|
2022-10-10 13:18:40 +08:00 |
|
Daniel Povey
|
dece8ad204
|
Various fixes from debugging with nvtx, but removed the NVTX annotations.
|
2022-10-09 21:14:52 +08:00 |
|
Daniel Povey
|
bd7dce460b
|
Reintroduce batching to the optimizer
|
2022-10-09 20:29:23 +08:00 |
|
Daniel Povey
|
24aea947d2
|
Fix issues where grad is None, and unused-grad cases
|
2022-09-22 19:18:16 +08:00 |
|
Daniel Povey
|
03a77f8ae5
|
Merge branch 'scaled_adam_exp7c' into scaled_adam_exp11c
|
2022-09-22 18:15:44 +08:00 |
|
Daniel Povey
|
e2fdfe990c
|
Loosen limit on param_max_rms, from 2.0 to 3.0; change how param_min_rms is applied.
|
2022-09-20 15:20:43 +08:00 |
|
Daniel Povey
|
3d72a65de8
|
Implement max-eig-proportion..
|
2022-09-19 10:26:37 +08:00 |
|
Daniel Povey
|
69404f61ef
|
Use scalar_lr_scale for scalars as well as sizes.
|
2022-09-18 14:12:27 +08:00 |
|
Daniel Povey
|
bb1bee4a7b
|
Improve how quartiles are printed
|
2022-09-16 17:30:03 +08:00 |
|
Daniel Povey
|
8298333bd2
|
Implement gradient clipping.
|
2022-09-16 16:52:46 +08:00 |
|
Daniel Povey
|
8f876b3f54
|
Remove batching from ScaledAdam, in preparation to add gradient norm clipping
|
2022-09-16 15:42:56 +08:00 |
|
Daniel Povey
|
257c961b66
|
1st attempt at scaled_adam
|
2022-09-16 13:59:52 +08:00 |
|
Daniel Povey
|
276928655e
|
Merge branch 'pradam_exp1m8' into pradam_exp1m7s2
|
2022-08-24 04:17:30 +08:00 |
|
Daniel Povey
|
64f7166545
|
Some cleanups
|
2022-08-18 07:03:50 +08:00 |
|
Daniel Povey
|
5c33899ddc
|
Increase cov_min[3] from 0.001 to 0.002
|
2022-08-06 16:28:02 +08:00 |
|
Daniel Povey
|
9bbf8ada57
|
Scale up diag of grad_cov by 1.0001 prior to diagonalizing it.
|
2022-08-06 07:06:23 +08:00 |
|
Daniel Povey
|
c021b4fec6
|
Increase cov_min[3] from 0.0001 to 0.001
|
2022-08-06 07:02:52 +08:00 |
|
Daniel Povey
|
a5b9b7b974
|
Cosmetic changes
|
2022-08-05 03:51:00 +08:00 |
|
Daniel Povey
|
dc9133227f
|
Reworked how inverse is done, fixed bug in _apply_min_max_with_metric, regarding how M is normalized.
|
2022-08-04 09:46:14 +08:00 |
|
Daniel Povey
|
766bf69a98
|
Reduce cov_max[2] from 4.0 to 3.5
|
2022-08-03 04:10:11 +08:00 |
|
Daniel Povey
|
129b28aa9b
|
Increase cov_min[2] from 0.05 to 0.1; decrease cov_max[2] from 5.0 to 4.0.
|
2022-08-02 15:17:24 +08:00 |
|
Daniel Povey
|
202752418a
|
Increase cov_min[2] from 0.02 to 0.05.
|
2022-08-02 15:15:41 +08:00 |
|
Daniel Povey
|
e44ab25e99
|
Bug fix
|
2022-08-02 14:31:37 +08:00 |
|
Daniel Povey
|
e9f4ada1c0
|
Swap the order of applying min and max in smoothing operations
|
2022-08-02 11:55:43 +08:00 |
|
Daniel Povey
|
9473c7e23d
|
Lots of changes to how min and max are applied, use 1-norm for min in smooth_cov but not _apply_min_max_with_metric.
|
2022-08-02 11:29:54 +08:00 |
|
Daniel Povey
|
6ab4cf615d
|
1st draft of new method of normalizing covs that uses normalization w.r.t. spectral 2-norm
|
2022-08-02 09:34:37 +08:00 |
|
Daniel Povey
|
4919134a94
|
Merge making hidden_dim an arg
|
2022-08-02 09:09:29 +08:00 |
|
Daniel Povey
|
c64bd5ebcd
|
Merge making hidden_dim an arg
|
2022-08-02 09:07:36 +08:00 |
|
Daniel Povey
|
b008340d83
|
Merge making hidden_dim an arg
|
2022-08-02 09:01:19 +08:00 |
|
Daniel Povey
|
9f2229edb5
|
Merge making hidden_dim an arg
|
2022-08-02 08:58:00 +08:00 |
|
Daniel Povey
|
a45f820e25
|
Merge making hidden_dim an arg
|
2022-08-02 08:56:36 +08:00 |
|
Daniel Povey
|
804f264ffd
|
Merge hidden_dim providing it as arg
|
2022-08-02 08:40:13 +08:00 |
|
Daniel Povey
|
ee311247ea
|
Decrease debugging freq
|
2022-08-01 03:55:18 +08:00 |
|
Daniel Povey
|
4c5d49c448
|
Some numerical improvements, and a fix to calculation of mean_eig in _apply_min_max_with_metric(), to average over blocks too.
|
2022-08-01 03:51:39 +08:00 |
|
Daniel Povey
|
e2cc09a8c6
|
Fix issue with max_eig formula; restore cov_min[1]=0.0025.
|
2022-07-31 18:29:44 +08:00 |
|
Daniel Povey
|
3590c2fc42
|
Set cov_min[1] to 0 to stop an invertibility problem
|
2022-07-31 18:06:01 +08:00 |
|
Daniel Povey
|
7231c610e8
|
Restore min_cov applied with G.
|
2022-07-31 02:22:07 -07:00 |
|
Daniel Povey
|
d84a2e22e3
|
Applying max to G with noinv method with metric.
|
2022-07-31 02:10:27 -07:00 |
|
Daniel Povey
|
2042c9862c
|
Merge branch 'pradam_exp1m4_nophase1_noinv' into pradam_exp1m4_nophase1_rework_noinv
|
2022-07-31 01:32:36 -07:00 |
|
Daniel Povey
|
90fa8a63eb
|
Use different approach for applying max eig, with matmul, no inverse.
|
2022-07-31 01:32:11 -07:00 |
|
Daniel Povey
|
ed1a147ef1
|
Implement no-inverse max-cov
|
2022-07-31 00:08:02 -07:00 |
|