1891 Commits

Author SHA1 Message Date
Daniel Povey
8858fb38f1 Halve expected value of aux_grad scale, and implement it more efficiently, via a scale on the prob of using it. 2022-11-26 14:52:59 +08:00
Daniel Povey
110c2601ab Changes for speed 2022-11-26 14:38:16 +08:00
Daniel Povey
c653c66413 Undo cast in autocast mode. 2022-11-26 14:29:49 +08:00
Daniel Povey
d1ee1f2d98 Try to save memory in autocast mode. 2022-11-26 14:25:27 +08:00
Daniel Povey
7b5c0382f9 Fix to LinearWithAuxLoss for bias=False case 2022-11-26 14:16:53 +08:00
Daniel Povey
5f80807027 Add LinearWithAuxLoss in nonlin_attention and AttentionSqueeze modules. 2022-11-26 14:15:09 +08:00
Daniel Povey
4058d56c0d Remove squeeze_excite from Conv2dSubsampling. 2022-11-26 14:04:41 +08:00
Daniel Povey
281b54e7bf Use LinearWithAuxLoss in more places. 2022-11-26 12:25:22 +08:00
Daniel Povey
d9c7e4f216 Make the in_proj of feedforward modules also be a LinearWithAuxLoss. 2022-11-26 12:13:31 +08:00
Daniel Povey
029f5869c4 increase schedule init from 0.1 to 0.2 2022-11-25 18:06:13 +08:00
Daniel Povey
2368968114 Make out_proj of feedforward modules be a LinearWithAuxLoss, with nonzero final value at 0.01. 2022-11-25 18:00:46 +08:00
Daniel Povey
8f1ef60951 Integrate LinearWithAuxLoss into SqueezeExcite1d 2022-11-25 16:24:28 +08:00
Daniel Povey
1ebc3dd158 Bug fixes to LinearWithAuxLoss 2022-11-25 16:20:28 +08:00
Daniel Povey
0a997d64c4 Fixes for half precision 2022-11-25 16:07:47 +08:00
Daniel Povey
6a91f343e9 Use LinearWithAuxLoss in squeeze-attention module 2022-11-25 16:04:51 +08:00
Daniel Povey
ba348169bf Change for diagnostic purposes, sigmoid of NonlinAttention. 2022-11-25 12:39:16 +08:00
Daniel Povey
0614f65428 Bug fix, remove 2nd activation in a row 2022-11-24 17:20:28 +08:00
Daniel Povey
534eca4bf3 Add 1d squeeze and excite (-like) module in Conv2dSubsampling 2022-11-24 16:18:40 +08:00
Daniel Povey
dd3826104e Start whitening schedules for activation in NonlinAttentionModule, AttentionSqueezeModule lower; increase some whitening probs. 2022-11-24 15:25:59 +08:00
Daniel Povey
0ac26f4234 Increase initial whitening target for self_attn from 2.0 to 3.0. 2022-11-24 15:18:28 +08:00
Daniel Povey
45069175d9 Add a second whitening to the NonlinAttentionModule, after the aggregation. 2022-11-24 14:16:13 +08:00
Daniel Povey
35f0ea0015 Changes to whitening modules for memory efficiency, moving them inside; increase their prob. 2022-11-24 13:47:22 +08:00
Daniel Povey
de73e2e424 Move whitening of NonlinAttentionModule from the output to the interior just apply to the value. 2022-11-24 13:27:32 +08:00
Daniel Povey
ee61ec63b3 Introduce schedules for whitening. 2022-11-23 19:49:34 +08:00
Daniel Povey
a6657e6b40 Harmonize whitening modules, adding them to 3 submodules and changing configuration on 2 others and location in NonlinAttention. 2022-11-23 19:08:19 +08:00
Daniel Povey
9ceb41acb4 Remove balancer from SelfAttention module. 2022-11-23 18:41:36 +08:00
Daniel Povey
f2dbf87461 Remove invocation of out_balancer 2022-11-23 18:40:27 +08:00
Daniel Povey
b88f12fe83 Remove out_balancer of NonlinAttentionModule 2022-11-23 18:37:45 +08:00
Daniel Povey
9138695dfe Fix bug RE attn_weights 2022-11-23 17:04:17 +08:00
Daniel Povey
36e49a8d61 Change for mem efficiency 2022-11-23 15:38:34 +08:00
Daniel Povey
1d0252d420 Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
Below is a more complete list of the changes I am making, although some of
these may be counted in the last

  numbers XXX below correspond to branches numbered scaled_adam_expXXX.
    - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
      but simplified this a little to use the same dropout schedule and drop them out all together
      also have all 3 submodules use separate heads.
    - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
    - merge from 462->467, about using TanSwish not tanh.
    - merge 462->465, remove whitening in self-attention module
    - merge the part of 465->466  that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
f89a85aed8 Merge branch 'scaled_adam_exp465' into scaled_adam_exp472 2022-11-23 14:16:17 +08:00
Daniel Povey
edd4bf5312 Merge branch 'scaled_adam_exp467' into scaled_adam_exp472 2022-11-23 14:13:19 +08:00
Daniel Povey
d95571eacf From 460->461, revert change about balancing output of attention_squeeze module. 2022-11-23 14:12:08 +08:00
Daniel Povey
fe51eea397 Implement a form of dropout for squeeze_weights, dropout-to-constant. 2022-11-23 14:06:17 +08:00
Daniel Povey
066f1e4658 Implement TanSwish(), use it as activation in AttentionSqueeze module. 2022-11-22 23:34:11 +08:00
Daniel Povey
1826648dde Fix formulas and constants 2022-11-22 22:54:05 +08:00
Daniel Povey
6c5763fbb3 Implement subtracted momentum [0.33,0.66], and print name in Whiten module. 2022-11-22 21:57:48 +08:00
Daniel Povey
1a2632d0a2 Remove whitening in SelfAttention module. 2022-11-22 20:01:09 +08:00
Daniel Povey
99cd9f5788 Add more layers. 2022-11-22 19:48:42 +08:00
Daniel Povey
19683aa516 Change activation in bottleneck to Tanh. 2022-11-22 17:32:02 +08:00
Daniel Povey
8dfeaa5f92 Restore whitener tha was in the AttentionSqueeze module. 2022-11-22 15:45:53 +08:00
Daniel Povey
7acdaea085 Change balancer to whitener for ff module; tighther min/max-pos limit on NonlinAttentionModule; whitener->balancer for AttentionSqueeze. 2022-11-22 15:42:41 +08:00
Daniel Povey
26916f41e7 Add balancer at output of FeedforwardModule 2022-11-22 14:43:46 +08:00
Daniel Povey
fe1793e288 Add output balancer to NonlinAttentionModule. 2022-11-22 14:29:07 +08:00
Daniel Povey
71f118e725 Use 2 groups in whitening for NonlinAttentionModule; limit 40->20. 2022-11-21 23:23:41 +08:00
Daniel Povey
b3b5e8b9b9 Increase Whiten limit from 10.0 to 40.0. 2022-11-21 22:19:45 +08:00
Daniel Povey
56efdcda49 Reduce whitening limit to 10 and move it to the beginning. 2022-11-21 21:07:32 +08:00
Daniel Povey
584f5bf88c Also add balancer in NonlinAttentionModule 2022-11-21 18:25:24 +08:00
Daniel Povey
0504f705ec Add Whiten module in NonlinAttentionModule 2022-11-21 18:19:52 +08:00