Daniel Povey
|
39ce60bb7c
|
Decrease final value of max_abs in AttentionSqueeze from 5.0 to 1.0
|
2022-11-28 10:45:53 +08:00 |
|
Daniel Povey
|
0bfd81d721
|
fix bug RE dims_to_mean
|
2022-11-28 10:42:06 +08:00 |
|
Daniel Povey
|
109825cafb
|
Fix problem with mean offset in LinearWithAuxLoss.
|
2022-11-28 09:46:01 +08:00 |
|
Daniel Povey
|
a3b07fd098
|
Double aux_grad scale
|
2022-11-28 00:19:03 +08:00 |
|
Daniel Povey
|
9752778ee6
|
Use the same schedule for in_proj as out_proj. Only affects a couple of modules.
|
2022-11-28 00:09:26 +08:00 |
|
Daniel Povey
|
9e7add6be8
|
Work out alpha (scale on z) in LinearWithAuxLossFunction
|
2022-11-27 23:48:26 +08:00 |
|
Daniel Povey
|
0307252832
|
Bug fix
|
2022-11-27 21:33:37 +08:00 |
|
Daniel Povey
|
5128ff8797
|
Changes to balancer min_abs/max_abs limits.
|
2022-11-27 21:14:41 +08:00 |
|
Daniel Povey
|
a610011c3c
|
Partially revert sign_gain_factor
|
2022-11-27 17:18:33 +08:00 |
|
Daniel Povey
|
30d0bc6ad7
|
Make gain factor 4 times larger, for constraining the sign in ActivationBalancer.
|
2022-11-27 17:17:11 +08:00 |
|
Daniel Povey
|
785a524341
|
Increase in_abs of hidden balancer of ff modules from 0.2 to 1.0
|
2022-11-27 17:06:31 +08:00 |
|
Daniel Povey
|
ff361a7495
|
Change default prob on limit_param_value from 0.2 to 0.6.
|
2022-11-27 14:00:59 +08:00 |
|
Daniel Povey
|
2f4df1278d
|
Have aux_grad_scales for input terminate after 1k batches; double the scale on aux_grad.
|
2022-11-27 13:56:50 +08:00 |
|
Daniel Povey
|
a6fb9772a8
|
Remove 4 layers.
|
2022-11-27 13:29:29 +08:00 |
|
Daniel Povey
|
2e0111e6ef
|
Halve aux_grad_scale
|
2022-11-26 23:36:00 +08:00 |
|
Daniel Povey
|
c91014f104
|
Changes to balancer schedules: start max_abs from 5.0 not 4.0, start min_positive from 0.1 more consistently; finish at 8k not 12k.
|
2022-11-26 23:10:18 +08:00 |
|
Daniel Povey
|
633b6785f1
|
Halve final scale of aux_grad, and make schedule decrease more slowly.
|
2022-11-26 22:27:20 +08:00 |
|
Daniel Povey
|
f71b1d2c3a
|
Add 4 more layers
|
2022-11-26 21:18:24 +08:00 |
|
Daniel Povey
|
4874ded2e9
|
Introduce balancer schedules for the DoubleSwish() in feedforward and conv modules
|
2022-11-26 20:20:20 +08:00 |
|
Daniel Povey
|
320c58401f
|
Increase 2 feedforward dims from 1.5k to 2k.
|
2022-11-26 19:45:41 +08:00 |
|
Daniel Povey
|
9ce99b150d
|
Remove one attention_squeeze module; halve dimension in NonlinAttention module; put schedule on balancer of ConvolutionModule
|
2022-11-26 19:42:33 +08:00 |
|
Daniel Povey
|
a96b92fb54
|
Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now.
|
2022-11-26 19:38:29 +08:00 |
|
Daniel Povey
|
e19118a966
|
Merge branch 'scaled_adam_exp503' into scaled_adam_exp505
|
2022-11-26 19:29:58 +08:00 |
|
Daniel Povey
|
faed28ba6a
|
Changes for debugging/stats.
|
2022-11-26 18:59:15 +08:00 |
|
Daniel Povey
|
48d699c94b
|
Change for speed/memory
|
2022-11-26 18:42:03 +08:00 |
|
Daniel Povey
|
8858fb38f1
|
Halve expected value of aux_grad scale, and implement it more efficiently, via a scale on the prob of using it.
|
2022-11-26 14:52:59 +08:00 |
|
Daniel Povey
|
110c2601ab
|
Changes for speed
|
2022-11-26 14:38:16 +08:00 |
|
Daniel Povey
|
c653c66413
|
Undo cast in autocast mode.
|
2022-11-26 14:29:49 +08:00 |
|
Daniel Povey
|
d1ee1f2d98
|
Try to save memory in autocast mode.
|
2022-11-26 14:25:27 +08:00 |
|
Daniel Povey
|
7b5c0382f9
|
Fix to LinearWithAuxLoss for bias=False case
|
2022-11-26 14:16:53 +08:00 |
|
Daniel Povey
|
5f80807027
|
Add LinearWithAuxLoss in nonlin_attention and AttentionSqueeze modules.
|
2022-11-26 14:15:09 +08:00 |
|
Daniel Povey
|
4058d56c0d
|
Remove squeeze_excite from Conv2dSubsampling.
|
2022-11-26 14:04:41 +08:00 |
|
Daniel Povey
|
281b54e7bf
|
Use LinearWithAuxLoss in more places.
|
2022-11-26 12:25:22 +08:00 |
|
Daniel Povey
|
d9c7e4f216
|
Make the in_proj of feedforward modules also be a LinearWithAuxLoss.
|
2022-11-26 12:13:31 +08:00 |
|
Daniel Povey
|
029f5869c4
|
increase schedule init from 0.1 to 0.2
|
2022-11-25 18:06:13 +08:00 |
|
Daniel Povey
|
2368968114
|
Make out_proj of feedforward modules be a LinearWithAuxLoss, with nonzero final value at 0.01.
|
2022-11-25 18:00:46 +08:00 |
|
Daniel Povey
|
8f1ef60951
|
Integrate LinearWithAuxLoss into SqueezeExcite1d
|
2022-11-25 16:24:28 +08:00 |
|
Daniel Povey
|
1ebc3dd158
|
Bug fixes to LinearWithAuxLoss
|
2022-11-25 16:20:28 +08:00 |
|
Daniel Povey
|
0a997d64c4
|
Fixes for half precision
|
2022-11-25 16:07:47 +08:00 |
|
Daniel Povey
|
6a91f343e9
|
Use LinearWithAuxLoss in squeeze-attention module
|
2022-11-25 16:04:51 +08:00 |
|
Daniel Povey
|
ba348169bf
|
Change for diagnostic purposes, sigmoid of NonlinAttention.
|
2022-11-25 12:39:16 +08:00 |
|
Daniel Povey
|
0614f65428
|
Bug fix, remove 2nd activation in a row
|
2022-11-24 17:20:28 +08:00 |
|
Daniel Povey
|
534eca4bf3
|
Add 1d squeeze and excite (-like) module in Conv2dSubsampling
|
2022-11-24 16:18:40 +08:00 |
|
Daniel Povey
|
dd3826104e
|
Start whitening schedules for activation in NonlinAttentionModule, AttentionSqueezeModule lower; increase some whitening probs.
|
2022-11-24 15:25:59 +08:00 |
|
Daniel Povey
|
0ac26f4234
|
Increase initial whitening target for self_attn from 2.0 to 3.0.
|
2022-11-24 15:18:28 +08:00 |
|
Daniel Povey
|
45069175d9
|
Add a second whitening to the NonlinAttentionModule, after the aggregation.
|
2022-11-24 14:16:13 +08:00 |
|
Daniel Povey
|
35f0ea0015
|
Changes to whitening modules for memory efficiency, moving them inside; increase their prob.
|
2022-11-24 13:47:22 +08:00 |
|
Daniel Povey
|
de73e2e424
|
Move whitening of NonlinAttentionModule from the output to the interior just apply to the value.
|
2022-11-24 13:27:32 +08:00 |
|
Daniel Povey
|
ee61ec63b3
|
Introduce schedules for whitening.
|
2022-11-23 19:49:34 +08:00 |
|
Daniel Povey
|
a6657e6b40
|
Harmonize whitening modules, adding them to 3 submodules and changing configuration on 2 others and location in NonlinAttention.
|
2022-11-23 19:08:19 +08:00 |
|