748 Commits

Author SHA1 Message Date
Daniel Povey
de1fd91435 Adding max_abs=3.0 to ActivationBalancer modules inside feedoforward modules. 2022-07-16 07:19:26 +08:00
Daniel Povey
23e6d2e6d8 Fix to the fix 2022-07-16 06:53:44 +08:00
Daniel Povey
4c8d77d14a Fix return type 2022-07-15 14:18:07 +08:00
Daniel Povey
68c5935691 Fix bug re param_cov freshness, properly. 2022-07-15 08:33:10 +08:00
Daniel Povey
b6ee698278 Make LR update period less frequent later in training; fix bug with param_cov freshness, was too fresh 2022-07-15 07:59:30 +08:00
Daniel Povey
689441b237 Reduce param_pow from 0.75 to 0.5 2022-07-14 06:08:06 +08:00
Daniel Povey
7f6fe02db9 Fix formula for smoothing (was applying more smoothing than intended, and in the opposite sense to intended), also revert max_rms from 2.0 to 4.0 2022-07-14 06:06:02 +08:00
Daniel Povey
4785245e5c Reduce debug freq 2022-07-13 06:51:23 +08:00
Daniel Povey
d48fe0b99c Change max rms from 10.0 to 4.0 2022-07-13 05:53:35 +08:00
Daniel Povey
cedfb5a377 Make max eig ratio 10 2022-07-12 13:59:58 +08:00
Daniel Povey
278358bb9f Remove debug code 2022-07-12 08:39:14 +08:00
Daniel Povey
8c44ff26f7 Fix bug in batching code for scalars 2022-07-12 08:36:45 +08:00
Daniel Povey
25cb8308d5 Add max_block_size=512 to PrAdam 2022-07-12 08:35:14 +08:00
Daniel Povey
41df045773 Simplify formula, getting rid of scalar_exp_avg_sq 2022-07-11 17:14:12 -07:00
Daniel Povey
4f0e219523 Bug fix to reproduce past results with max_block_size unset. 2022-07-11 17:03:32 -07:00
Daniel Povey
075a2e27d8 Replace max_fullcov_size with max_block_size 2022-07-11 16:37:01 -07:00
Daniel Povey
3468c3aa5a Remove ActivationBalancer, unnecessary 2022-07-11 14:12:24 -07:00
Daniel Povey
7993c84cd6 Apparently working version, with changed test-code topology 2022-07-11 13:17:29 -07:00
Daniel Povey
245d39b1bb Still debugging but close to done 2022-07-11 00:33:37 -07:00
Daniel Povey
27da50a1f6 Committing partial work.. 2022-07-10 15:46:32 -07:00
Daniel Povey
d25df4af5e Slight refactoring, preparing for batching. 2022-07-09 22:24:36 -07:00
Daniel Povey
d9a6180ae0 Bug fix 2022-07-10 10:20:39 +08:00
Daniel Povey
b7035844a2 Introduce scalar_max, stop eps getting large or small 2022-07-10 10:13:55 +08:00
Daniel Povey
2f73434541 Reduce debug frequency 2022-07-10 06:44:50 +08:00
Daniel Povey
b3bb2dac6f Iterative, more principled way of estimating param_cov 2022-07-10 06:28:01 +08:00
Daniel Povey
d139c18f22 Max eig of Q limited to 5 times the mean 2022-07-09 14:30:03 +08:00
Daniel Povey
ffeef4ede4 Remove rank-1 dims, meaning where size==numel(), from processing. 2022-07-09 13:36:48 +08:00
Daniel Povey
2fc9eb9789 Respect param_pow 2022-07-09 12:49:04 +08:00
Daniel Povey
209acaf6e4 Increase lr_update_period to 200. The update takes about 2 minutes, fore entire model. 2022-07-09 11:36:54 +08:00
Daniel Povey
61cab3ab65 introduce grad_cov_period 2022-07-09 10:29:23 +08:00
Daniel Povey
35a51bc153 Reduce debug probs 2022-07-09 10:22:19 +08:00
Daniel Povey
65bc964854 Fix bug for scalar update 2022-07-09 10:14:20 +08:00
Daniel Povey
aa2237a793 Bug fix 2022-07-09 10:11:54 +08:00
Daniel Povey
50ee414486 Fix train.py for new optimizer 2022-07-09 10:09:53 +08:00
Daniel Povey
6810849058 Implement new version of learning method. Does more complete diagonalization of grads than the previous methods. 2022-07-09 10:02:17 +08:00
Daniel Povey
a9edecd32c Conformed that symmetrizing helps because of interaction with regular update; still meta_lr_scale=0 best :-( 2022-07-09 05:20:04 +08:00
Daniel Povey
52bfb2b018 This works better for reasons I dont understand. transpose is enough, same as symmetrizing. 2022-07-08 11:53:59 +08:00
Daniel Povey
e9ab1ddd39 Inconseqeuential config change 2022-07-08 11:03:16 +08:00
Daniel Povey
be6680e3ba Couple configuration changes, comment simplification 2022-07-08 09:46:42 +08:00
Daniel Povey
75e872ea57 Fix bug in getting denom in proj update 2022-07-08 09:13:54 +08:00
Daniel Povey
914ac1e621 Works better with meta_lr_scale=0, must be bug. 2022-07-08 09:07:06 +08:00
Daniel Povey
923468b8af Deal with SVD failure better. 2022-07-08 09:00:12 +08:00
Daniel Povey
97feb8a3ec Reduce meta_lr_scale, reduces loss @140 from 1.4 to 0.39 2022-07-08 06:33:07 +08:00
Daniel Povey
b6199a71e9 Introduce delta_scale to slow down changes on M; significantly better. 2022-07-08 06:05:31 +08:00
Daniel Povey
ceb9815f2b Increase lr_est_period 2022-07-08 05:51:18 +08:00
Daniel Povey
fb36712e6b Another bug fix, regarding Q being transposed. 2022-07-08 05:22:24 +08:00
Daniel Povey
ad2e698fc3 Cleanups 2022-07-08 04:44:21 +08:00
Daniel Povey
04d2e10b4f Version that runs 2022-07-08 04:37:46 +08:00
Daniel Povey
e6d00ee3e4 More drafts of new method, not tested. 2022-07-06 23:05:06 -07:00
Daniel Povey
26815d177f Draft of the new method.. 2022-07-06 22:59:36 -07:00