874 Commits

Author SHA1 Message Date
Daniel Povey
d398f0ed70 Decrease random_prob from 0.5 to 0.333 2022-09-29 13:55:33 +08:00
Daniel Povey
461ad3655a Implement AttentionCombine as replacement for RandomCombine 2022-09-29 13:44:03 +08:00
Daniel Povey
e5a0d8929b Remove unused out_balancer member 2022-09-27 13:10:59 +08:00
Daniel Povey
6b12f20995 Remove out_balancer and out_norm from conv modules 2022-09-27 12:25:11 +08:00
Daniel Povey
76e66408c5 Some cosmetic improvements 2022-09-27 11:08:44 +08:00
Daniel Povey
71b3756ada Use half the dim per head, in self_attn layers. 2022-09-24 15:40:44 +08:00
Daniel Povey
ce3f59d9c7 Use dropout in attention, on attn weights. 2022-09-22 19:18:50 +08:00
Daniel Povey
24aea947d2 Fix issues where grad is None, and unused-grad cases 2022-09-22 19:18:16 +08:00
Daniel Povey
c16f795962 Avoid error in ddp by using last module'sc scores 2022-09-22 18:52:16 +08:00
Daniel Povey
0f85a3c2e5 Implement persistent attention scores 2022-09-22 18:47:16 +08:00
Daniel Povey
03a77f8ae5 Merge branch 'scaled_adam_exp7c' into scaled_adam_exp11c 2022-09-22 18:15:44 +08:00
Daniel Povey
ceadfad48d Reduce debug freq 2022-09-22 12:30:49 +08:00
Daniel Povey
1d20c12bc0 Increase max_var_per_eig to 0.2 2022-09-22 12:28:35 +08:00
Daniel Povey
e2fdfe990c Loosen limit on param_max_rms, from 2.0 to 3.0; change how param_min_rms is applied. 2022-09-20 15:20:43 +08:00
Daniel Povey
6eb9a0bc9b Halve max_var_per_eig to 0.05 2022-09-20 14:39:17 +08:00
Daniel Povey
cd5ac76a05 Add max-var-per-eig in encoder layers 2022-09-20 14:22:07 +08:00
Daniel Povey
db1f4ccdd1 4x scale on max-eig constraint 2022-09-20 14:20:13 +08:00
Daniel Povey
3d72a65de8 Implement max-eig-proportion.. 2022-09-19 10:26:37 +08:00
Daniel Povey
5f27cbdb44 Merge branch 'scaled_adam_exp4_max_var_per_eig' into scaled_adam_exp7
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
2022-09-18 21:23:59 +08:00
Daniel Povey
0f567e27a5 Add max_var_per_eig in self-attn 2022-09-18 21:22:01 +08:00
Daniel Povey
eb77fa7aaa Restore min_positive,max_positive limits on linear_pos projection 2022-09-18 14:38:30 +08:00
Daniel Povey
69404f61ef Use scalar_lr_scale for scalars as well as sizes. 2022-09-18 14:12:27 +08:00
Daniel Povey
76031a7c1d Loosen some limits of activation balancers 2022-09-18 13:59:44 +08:00
Daniel Povey
3122637266 Use ScaledLinear where I previously had StructuredLinear 2022-09-17 13:18:58 +08:00
Daniel Povey
4a2b940321 Remove StructuredLinear,StructuredConv1d 2022-09-17 13:14:08 +08:00
Daniel Povey
1a184596b6 A little code refactoring 2022-09-16 20:56:21 +08:00
Daniel Povey
bb1bee4a7b Improve how quartiles are printed 2022-09-16 17:30:03 +08:00
Daniel Povey
5f55f80fbb Configure train.py with clipping_scale=2.0 2022-09-16 17:19:52 +08:00
Daniel Povey
8298333bd2 Implement gradient clipping. 2022-09-16 16:52:46 +08:00
Daniel Povey
8f876b3f54 Remove batching from ScaledAdam, in preparation to add gradient norm clipping 2022-09-16 15:42:56 +08:00
Daniel Povey
3b450c2682 Bug fix in train.py, fix optimzier name 2022-09-16 14:10:42 +08:00
Daniel Povey
257c961b66 1st attempt at scaled_adam 2022-09-16 13:59:52 +08:00
Daniel Povey
276928655e Merge branch 'pradam_exp1m8' into pradam_exp1m7s2 2022-08-24 04:17:30 +08:00
Daniel Povey
80beb9c8d7 Merge branch 'pradam_exp1n2' into pradam_exp1m7s2 2022-08-24 04:14:25 +08:00
Daniel Povey
64f7166545 Some cleanups 2022-08-18 07:03:50 +08:00
Daniel Povey
5c33899ddc Increase cov_min[3] from 0.001 to 0.002 2022-08-06 16:28:02 +08:00
Daniel Povey
9bbf8ada57 Scale up diag of grad_cov by 1.0001 prior to diagonalizing it. 2022-08-06 07:06:23 +08:00
Daniel Povey
c021b4fec6 Increase cov_min[3] from 0.0001 to 0.001 2022-08-06 07:02:52 +08:00
Daniel Povey
a5b9b7b974 Cosmetic changes 2022-08-05 03:51:00 +08:00
Daniel Povey
dc9133227f Reworked how inverse is done, fixed bug in _apply_min_max_with_metric, regarding how M is normalized. 2022-08-04 09:46:14 +08:00
Daniel Povey
766bf69a98 Reduce cov_max[2] from 4.0 to 3.5 2022-08-03 04:10:11 +08:00
Daniel Povey
129b28aa9b Increase cov_min[2] from 0.05 to 0.1; decrease cov_max[2] from 5.0 to 4.0. 2022-08-02 15:17:24 +08:00
Daniel Povey
202752418a Increase cov_min[2] from 0.02 to 0.05. 2022-08-02 15:15:41 +08:00
Daniel Povey
e44ab25e99 Bug fix 2022-08-02 14:31:37 +08:00
Daniel Povey
e9f4ada1c0 Swap the order of applying min and max in smoothing operations 2022-08-02 11:55:43 +08:00
Daniel Povey
9473c7e23d Lots of changes to how min and max are applied, use 1-norm for min in smooth_cov but not _apply_min_max_with_metric. 2022-08-02 11:29:54 +08:00
Daniel Povey
6ab4cf615d 1st draft of new method of normalizing covs that uses normalization w.r.t. spectral 2-norm 2022-08-02 09:34:37 +08:00
Daniel Povey
4919134a94 Merge making hidden_dim an arg 2022-08-02 09:09:29 +08:00
Daniel Povey
c64bd5ebcd Merge making hidden_dim an arg 2022-08-02 09:07:36 +08:00
Daniel Povey
b008340d83 Merge making hidden_dim an arg 2022-08-02 09:01:19 +08:00