722 Commits

Author SHA1 Message Date
Daniel Povey
3acdf3b395 Reworking the computation of Z to be numerically better. 2022-07-25 06:37:26 +08:00
Daniel Povey
5513f7fee5 Initial version of fixing numerical issue, will continue though 2022-07-25 06:27:01 +08:00
Daniel Povey
b0f0c6c4ab Setting lr_update_period=(200,4k) in train.py 2022-07-25 04:38:12 +08:00
Daniel Povey
06718052ec Refactoring, putting tunable values in constructor, a little cleanup 2022-07-25 04:31:42 +08:00
Daniel Povey
8efc512823 Remove some debugging code, found the mismatch 2022-07-24 11:52:10 +08:00
Daniel Povey
ba96439c76 Saving version I am trying to debug 2022-07-24 11:00:40 +08:00
Daniel Povey
962e95f119 Using a more flexible test. Moved to simpler update , tuned diffrently. 2022-07-24 09:20:53 +08:00
Daniel Povey
b8a9485011 Print git version for test output 2022-07-24 06:54:29 +08:00
Daniel Povey
48ac7e0bc3 Add max as well as min to G_prime 2022-07-24 06:50:05 +08:00
Daniel Povey
6290fcb535 Cleanup and refactoring 2022-07-24 05:48:38 +08:00
Daniel Povey
8a9bbb93bc Cosmetic fixes 2022-07-24 04:45:57 +08:00
Daniel Povey
966ac36cde Fixes to comments 2022-07-24 04:36:41 +08:00
Daniel Povey
33ffd17515 Some cleanup 2022-07-24 04:22:11 +08:00
Daniel Povey
ddceb7963b Interpolate between iterative estimate of scale, and original value. 2022-07-23 15:27:48 +08:00
Daniel Povey
2c4bdd0ad0 Add _update_param_scales_simple(), add documentation 2022-07-23 14:49:58 +08:00
Daniel Povey
9730352257 Redce smoothing constant slightly 2022-07-23 13:12:31 +08:00
Daniel Povey
e1873fc0bb Tune phase2 again, from 0.005,5.0 to 0.01,40. Epoch 140 is 0.21/0.149 2022-07-23 10:10:01 +08:00
Daniel Povey
0fc58bac56 More tuning, epoch-140 results are 0.23,0.11 2022-07-23 09:52:51 +08:00
Daniel Povey
34a2d331bf Smooth in opposite orientation to G 2022-07-23 09:38:16 +08:00
Daniel Povey
a972655a70 Tuning. 2022-07-23 09:15:49 +08:00
Daniel Povey
b47433b77a Fix bug in smooth_cov, for power==1.0 2022-07-23 09:06:03 +08:00
Daniel Povey
cc388675a9 Bug fix RE rankj 2022-07-23 08:24:59 +08:00
Daniel Povey
dee496145d this version performs way worse but has bugs fixed, can optimize from here. 2022-07-23 08:11:20 +08:00
Daniel Povey
dd10eb140f First version after refactorization and changing the math, where optim.py runs 2022-07-23 06:32:56 +08:00
Daniel Povey
4da4e69fba Draft of new way of smoothing param_rms, diagonalized by grad 2022-07-22 06:37:20 +08:00
Daniel Povey
a63afe348a Increase max_lr_factor from 3.0 to 4.0 2022-07-19 06:56:41 +08:00
Daniel Povey
79a2f09f62 Change how formula for max_lr_factor works, and increase factor from 2.5 to 3. 2022-07-19 06:54:49 +08:00
Daniel Povey
525c097130 Increase power from 0.7 to 0.75 2022-07-19 05:44:03 +08:00
Daniel Povey
2dff1161b4 Reduce max_lr_factor from 3.0 to 2.5 2022-07-19 05:15:03 +08:00
Daniel Povey
8bb44b2944 Change param_pow from 0.6 to 0.7 2022-07-19 05:08:32 +08:00
Daniel Povey
bb1e1e154a Increasing param_pow to 0.6 and decreasing max_lr_factor from 4 to 3. 2022-07-18 09:06:32 +08:00
Daniel Povey
8db3b48edb Update parameter dependent part of cov more slowly, plus bug fix. 2022-07-18 05:26:55 +08:00
Daniel Povey
198cf2635c Reduce param_pow from 0.5 to 0.4. 2022-07-17 15:35:07 +08:00
Daniel Povey
3857a87b47 Merge branch 'merge_refactor_param_cov_norank1_iter_batch_max4.0_pow0.5_fix2r_lrupdate200_2k_ns' into merge2_refactor_max4.0_pow0.5_200_1k_ma3.0 2022-07-17 15:32:43 +08:00
Daniel Povey
a572eb4e33 Reducing final lr_update_period from 2k to 1k 2022-07-17 12:56:02 +08:00
Daniel Povey
f36ebad618 Remove 2/3 StructuredLinear/StructuredConv1d modules, use linear/conv1d 2022-07-17 06:40:19 +08:00
Daniel Povey
7e88e2a0e9 Increase debug freq; add type to diagnostics and increase precision of mean,rms 2022-07-17 06:40:16 +08:00
Daniel Povey
de1fd91435 Adding max_abs=3.0 to ActivationBalancer modules inside feedoforward modules. 2022-07-16 07:19:26 +08:00
Daniel Povey
23e6d2e6d8 Fix to the fix 2022-07-16 06:53:44 +08:00
Daniel Povey
4c8d77d14a Fix return type 2022-07-15 14:18:07 +08:00
Daniel Povey
68c5935691 Fix bug re param_cov freshness, properly. 2022-07-15 08:33:10 +08:00
Daniel Povey
b6ee698278 Make LR update period less frequent later in training; fix bug with param_cov freshness, was too fresh 2022-07-15 07:59:30 +08:00
Daniel Povey
689441b237 Reduce param_pow from 0.75 to 0.5 2022-07-14 06:08:06 +08:00
Daniel Povey
7f6fe02db9 Fix formula for smoothing (was applying more smoothing than intended, and in the opposite sense to intended), also revert max_rms from 2.0 to 4.0 2022-07-14 06:06:02 +08:00
Daniel Povey
4785245e5c Reduce debug freq 2022-07-13 06:51:23 +08:00
Daniel Povey
d48fe0b99c Change max rms from 10.0 to 4.0 2022-07-13 05:53:35 +08:00
Daniel Povey
cedfb5a377 Make max eig ratio 10 2022-07-12 13:59:58 +08:00
Daniel Povey
278358bb9f Remove debug code 2022-07-12 08:39:14 +08:00
Daniel Povey
8c44ff26f7 Fix bug in batching code for scalars 2022-07-12 08:36:45 +08:00
Daniel Povey
25cb8308d5 Add max_block_size=512 to PrAdam 2022-07-12 08:35:14 +08:00