Daniel Povey
|
1d20c12bc0
|
Increase max_var_per_eig to 0.2
|
2022-09-22 12:28:35 +08:00 |
|
Daniel Povey
|
6eb9a0bc9b
|
Halve max_var_per_eig to 0.05
|
2022-09-20 14:39:17 +08:00 |
|
Daniel Povey
|
cd5ac76a05
|
Add max-var-per-eig in encoder layers
|
2022-09-20 14:22:07 +08:00 |
|
Daniel Povey
|
db1f4ccdd1
|
4x scale on max-eig constraint
|
2022-09-20 14:20:13 +08:00 |
|
Daniel Povey
|
3d72a65de8
|
Implement max-eig-proportion..
|
2022-09-19 10:26:37 +08:00 |
|
Daniel Povey
|
5f27cbdb44
|
Merge branch 'scaled_adam_exp4_max_var_per_eig' into scaled_adam_exp7
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
|
2022-09-18 21:23:59 +08:00 |
|
Daniel Povey
|
0f567e27a5
|
Add max_var_per_eig in self-attn
|
2022-09-18 21:22:01 +08:00 |
|
Daniel Povey
|
eb77fa7aaa
|
Restore min_positive,max_positive limits on linear_pos projection
|
2022-09-18 14:38:30 +08:00 |
|
Daniel Povey
|
69404f61ef
|
Use scalar_lr_scale for scalars as well as sizes.
|
2022-09-18 14:12:27 +08:00 |
|
Daniel Povey
|
76031a7c1d
|
Loosen some limits of activation balancers
|
2022-09-18 13:59:44 +08:00 |
|
Daniel Povey
|
3122637266
|
Use ScaledLinear where I previously had StructuredLinear
|
2022-09-17 13:18:58 +08:00 |
|
Daniel Povey
|
4a2b940321
|
Remove StructuredLinear,StructuredConv1d
|
2022-09-17 13:14:08 +08:00 |
|
Daniel Povey
|
1a184596b6
|
A little code refactoring
|
2022-09-16 20:56:21 +08:00 |
|
Daniel Povey
|
bb1bee4a7b
|
Improve how quartiles are printed
|
2022-09-16 17:30:03 +08:00 |
|
Daniel Povey
|
5f55f80fbb
|
Configure train.py with clipping_scale=2.0
|
2022-09-16 17:19:52 +08:00 |
|
Daniel Povey
|
8298333bd2
|
Implement gradient clipping.
|
2022-09-16 16:52:46 +08:00 |
|
Daniel Povey
|
8f876b3f54
|
Remove batching from ScaledAdam, in preparation to add gradient norm clipping
|
2022-09-16 15:42:56 +08:00 |
|
Daniel Povey
|
3b450c2682
|
Bug fix in train.py, fix optimzier name
|
2022-09-16 14:10:42 +08:00 |
|
Daniel Povey
|
257c961b66
|
1st attempt at scaled_adam
|
2022-09-16 13:59:52 +08:00 |
|
Daniel Povey
|
276928655e
|
Merge branch 'pradam_exp1m8' into pradam_exp1m7s2
|
2022-08-24 04:17:30 +08:00 |
|
Daniel Povey
|
80beb9c8d7
|
Merge branch 'pradam_exp1n2' into pradam_exp1m7s2
|
2022-08-24 04:14:25 +08:00 |
|
Daniel Povey
|
64f7166545
|
Some cleanups
|
2022-08-18 07:03:50 +08:00 |
|
Daniel Povey
|
5c33899ddc
|
Increase cov_min[3] from 0.001 to 0.002
|
2022-08-06 16:28:02 +08:00 |
|
Daniel Povey
|
9bbf8ada57
|
Scale up diag of grad_cov by 1.0001 prior to diagonalizing it.
|
2022-08-06 07:06:23 +08:00 |
|
Daniel Povey
|
c021b4fec6
|
Increase cov_min[3] from 0.0001 to 0.001
|
2022-08-06 07:02:52 +08:00 |
|
Daniel Povey
|
a5b9b7b974
|
Cosmetic changes
|
2022-08-05 03:51:00 +08:00 |
|
Daniel Povey
|
dc9133227f
|
Reworked how inverse is done, fixed bug in _apply_min_max_with_metric, regarding how M is normalized.
|
2022-08-04 09:46:14 +08:00 |
|
Daniel Povey
|
766bf69a98
|
Reduce cov_max[2] from 4.0 to 3.5
|
2022-08-03 04:10:11 +08:00 |
|
Daniel Povey
|
129b28aa9b
|
Increase cov_min[2] from 0.05 to 0.1; decrease cov_max[2] from 5.0 to 4.0.
|
2022-08-02 15:17:24 +08:00 |
|
Daniel Povey
|
202752418a
|
Increase cov_min[2] from 0.02 to 0.05.
|
2022-08-02 15:15:41 +08:00 |
|
Daniel Povey
|
e44ab25e99
|
Bug fix
|
2022-08-02 14:31:37 +08:00 |
|
Daniel Povey
|
e9f4ada1c0
|
Swap the order of applying min and max in smoothing operations
|
2022-08-02 11:55:43 +08:00 |
|
Daniel Povey
|
9473c7e23d
|
Lots of changes to how min and max are applied, use 1-norm for min in smooth_cov but not _apply_min_max_with_metric.
|
2022-08-02 11:29:54 +08:00 |
|
Daniel Povey
|
6ab4cf615d
|
1st draft of new method of normalizing covs that uses normalization w.r.t. spectral 2-norm
|
2022-08-02 09:34:37 +08:00 |
|
Daniel Povey
|
4919134a94
|
Merge making hidden_dim an arg
|
2022-08-02 09:09:29 +08:00 |
|
Daniel Povey
|
c64bd5ebcd
|
Merge making hidden_dim an arg
|
2022-08-02 09:07:36 +08:00 |
|
Daniel Povey
|
b008340d83
|
Merge making hidden_dim an arg
|
2022-08-02 09:01:19 +08:00 |
|
Daniel Povey
|
9f2229edb5
|
Merge making hidden_dim an arg
|
2022-08-02 08:58:00 +08:00 |
|
Daniel Povey
|
a45f820e25
|
Merge making hidden_dim an arg
|
2022-08-02 08:56:36 +08:00 |
|
Daniel Povey
|
6714f85cc4
|
Merge making hidden_dim an arg
|
2022-08-02 08:55:27 +08:00 |
|
Daniel Povey
|
804f264ffd
|
Merge hidden_dim providing it as arg
|
2022-08-02 08:40:13 +08:00 |
|
Daniel Povey
|
ee311247ea
|
Decrease debugging freq
|
2022-08-01 03:55:18 +08:00 |
|
Daniel Povey
|
4c5d49c448
|
Some numerical improvements, and a fix to calculation of mean_eig in _apply_min_max_with_metric(), to average over blocks too.
|
2022-08-01 03:51:39 +08:00 |
|
Daniel Povey
|
e2cc09a8c6
|
Fix issue with max_eig formula; restore cov_min[1]=0.0025.
|
2022-07-31 18:29:44 +08:00 |
|
Daniel Povey
|
3590c2fc42
|
Set cov_min[1] to 0 to stop an invertibility problem
|
2022-07-31 18:06:01 +08:00 |
|
Daniel Povey
|
7231c610e8
|
Restore min_cov applied with G.
|
2022-07-31 02:22:07 -07:00 |
|
Daniel Povey
|
d84a2e22e3
|
Applying max to G with noinv method with metric.
|
2022-07-31 02:10:27 -07:00 |
|
Daniel Povey
|
2042c9862c
|
Merge branch 'pradam_exp1m4_nophase1_noinv' into pradam_exp1m4_nophase1_rework_noinv
|
2022-07-31 01:32:36 -07:00 |
|
Daniel Povey
|
90fa8a63eb
|
Use different approach for applying max eig, with matmul, no inverse.
|
2022-07-31 01:32:11 -07:00 |
|
Daniel Povey
|
ed1a147ef1
|
Implement no-inverse max-cov
|
2022-07-31 00:08:02 -07:00 |
|