1891 Commits

Author SHA1 Message Date
Daniel Povey
d100aed58b Revert "Reduce min of scale in Conv2dSubsampling from 0.01 to 0.2"
This reverts commit 7589e3768975df10c3d022beb4c88f14c2f25d3d.
2022-11-30 13:17:20 +08:00
Daniel Povey
12e8c3f0fa One more layer on input 2022-11-29 16:47:24 +08:00
Daniel Povey
640d48262f Double scale on aux_loss 2022-11-29 16:20:21 +08:00
Daniel Povey
7589e37689 Reduce min of scale in Conv2dSubsampling from 0.01 to 0.2 2022-11-29 16:18:41 +08:00
Daniel Povey
441fcf063d Reduce final value of bypass_min from 0.25 to 0.2 2022-11-29 16:15:34 +08:00
Daniel Povey
73e420865c Revert min_abs in NonlinAttentionModule from 2.0 to 1.5 2022-11-29 15:53:29 +08:00
Daniel Povey
f48786534a Merge branch 'scaled_adam_exp535' into scaled_adam_exp548 2022-11-29 15:43:44 +08:00
Daniel Povey
28c5923986 Remove max_factor=0.02 option in bottleneck balancer of class AttentionSqueeze, change its min,max positive to 0.2,0.8 2022-11-29 15:43:25 +08:00
Daniel Povey
5632782ee1 Merge branch 'scaled_adam_exp539' into scaled_adam_exp548 2022-11-29 15:40:23 +08:00
Daniel Povey
b90d8aabde Revert the alternate-layers-only thing for nonlin_attention and attention_squeeze 2022-11-29 15:38:55 +08:00
Daniel Povey
753269668a Change ratio in NonlinAttentionModule from 8 to 2 2022-11-29 15:38:13 +08:00
Daniel Povey
93942725c4 Increase min_abs of balancer of encoder layer from 0.2 to 0.4. 2022-11-29 13:46:47 +08:00
Daniel Povey
36a2f33a6f Have value dim in NonlinAttentionModule be half of num_channels 2022-11-28 21:55:06 +08:00
Daniel Povey
258d4f1353 Let ratio be 8, not 2, for sigmoid in NonlinAttentionModule 2022-11-28 21:51:29 +08:00
Daniel Povey
7018c722b5 Let ratio of values to sigmoids be 8, not 2 2022-11-28 21:50:11 +08:00
Daniel Povey
643c547eec Double just the value dim in NonlinAttentionLayer. 2022-11-28 20:56:47 +08:00
Daniel Povey
88bc45d596 Halve scale on aux_loss 2022-11-28 16:37:46 +08:00
Daniel Povey
cee62c823d have final prob of aux_loss for input projections be 0 2022-11-28 16:36:17 +08:00
Daniel Povey
9cf5d92f39 Have nonlin_attention and attention_squeeze operate only on every other layer. 2022-11-28 16:24:24 +08:00
Daniel Povey
87ef4078d3 Add two more layers. 2022-11-28 13:56:40 +08:00
Daniel Povey
f483f1e0ef Implement attention weights sharing for successive layers, for Zipformer 2022-11-28 13:41:11 +08:00
Daniel Povey
2a289d38b7 Make max_abs for feedforward module be a constant at 15.0 2022-11-28 13:19:37 +08:00
Daniel Povey
27a12a982b Increase min_abs and max_abs in feedforward module. 2022-11-28 12:52:28 +08:00
Daniel Povey
121f7e2a45 Documentation fix. 2022-11-28 12:10:08 +08:00
Daniel Povey
c6d859dd05 Increase min_abs of balancer in NonlinAttentionModule from 1.5 to 2.0. 2022-11-28 11:35:00 +08:00
Daniel Povey
39ce60bb7c Decrease final value of max_abs in AttentionSqueeze from 5.0 to 1.0 2022-11-28 10:45:53 +08:00
Daniel Povey
0bfd81d721 fix bug RE dims_to_mean 2022-11-28 10:42:06 +08:00
Daniel Povey
109825cafb Fix problem with mean offset in LinearWithAuxLoss. 2022-11-28 09:46:01 +08:00
Daniel Povey
a3b07fd098 Double aux_grad scale 2022-11-28 00:19:03 +08:00
Daniel Povey
9752778ee6 Use the same schedule for in_proj as out_proj. Only affects a couple of modules. 2022-11-28 00:09:26 +08:00
Daniel Povey
9e7add6be8 Work out alpha (scale on z) in LinearWithAuxLossFunction 2022-11-27 23:48:26 +08:00
Daniel Povey
0307252832 Bug fix 2022-11-27 21:33:37 +08:00
Daniel Povey
5128ff8797 Changes to balancer min_abs/max_abs limits. 2022-11-27 21:14:41 +08:00
Daniel Povey
a610011c3c Partially revert sign_gain_factor 2022-11-27 17:18:33 +08:00
Daniel Povey
30d0bc6ad7 Make gain factor 4 times larger, for constraining the sign in ActivationBalancer. 2022-11-27 17:17:11 +08:00
Daniel Povey
785a524341 Increase in_abs of hidden balancer of ff modules from 0.2 to 1.0 2022-11-27 17:06:31 +08:00
Daniel Povey
ff361a7495 Change default prob on limit_param_value from 0.2 to 0.6. 2022-11-27 14:00:59 +08:00
Daniel Povey
2f4df1278d Have aux_grad_scales for input terminate after 1k batches; double the scale on aux_grad. 2022-11-27 13:56:50 +08:00
Daniel Povey
a6fb9772a8 Remove 4 layers. 2022-11-27 13:29:29 +08:00
Daniel Povey
2e0111e6ef Halve aux_grad_scale 2022-11-26 23:36:00 +08:00
Daniel Povey
c91014f104 Changes to balancer schedules: start max_abs from 5.0 not 4.0, start min_positive from 0.1 more consistently; finish at 8k not 12k. 2022-11-26 23:10:18 +08:00
Daniel Povey
633b6785f1 Halve final scale of aux_grad, and make schedule decrease more slowly. 2022-11-26 22:27:20 +08:00
Daniel Povey
f71b1d2c3a Add 4 more layers 2022-11-26 21:18:24 +08:00
Daniel Povey
4874ded2e9 Introduce balancer schedules for the DoubleSwish() in feedforward and conv modules 2022-11-26 20:20:20 +08:00
Daniel Povey
320c58401f Increase 2 feedforward dims from 1.5k to 2k. 2022-11-26 19:45:41 +08:00
Daniel Povey
9ce99b150d Remove one attention_squeeze module; halve dimension in NonlinAttention module; put schedule on balancer of ConvolutionModule 2022-11-26 19:42:33 +08:00
Daniel Povey
a96b92fb54 Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now. 2022-11-26 19:38:29 +08:00
Daniel Povey
e19118a966 Merge branch 'scaled_adam_exp503' into scaled_adam_exp505 2022-11-26 19:29:58 +08:00
Daniel Povey
faed28ba6a Changes for debugging/stats. 2022-11-26 18:59:15 +08:00
Daniel Povey
48d699c94b Change for speed/memory 2022-11-26 18:42:03 +08:00