1816 Commits

Author SHA1 Message Date
Daniel Povey
9ceb41acb4 Remove balancer from SelfAttention module. 2022-11-23 18:41:36 +08:00
Daniel Povey
f2dbf87461 Remove invocation of out_balancer 2022-11-23 18:40:27 +08:00
Daniel Povey
b88f12fe83 Remove out_balancer of NonlinAttentionModule 2022-11-23 18:37:45 +08:00
Daniel Povey
9138695dfe Fix bug RE attn_weights 2022-11-23 17:04:17 +08:00
Daniel Povey
36e49a8d61 Change for mem efficiency 2022-11-23 15:38:34 +08:00
Daniel Povey
1d0252d420 Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
Below is a more complete list of the changes I am making, although some of
these may be counted in the last

  numbers XXX below correspond to branches numbered scaled_adam_expXXX.
    - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
      but simplified this a little to use the same dropout schedule and drop them out all together
      also have all 3 submodules use separate heads.
    - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
    - merge from 462->467, about using TanSwish not tanh.
    - merge 462->465, remove whitening in self-attention module
    - merge the part of 465->466  that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
f89a85aed8 Merge branch 'scaled_adam_exp465' into scaled_adam_exp472 2022-11-23 14:16:17 +08:00
Daniel Povey
edd4bf5312 Merge branch 'scaled_adam_exp467' into scaled_adam_exp472 2022-11-23 14:13:19 +08:00
Daniel Povey
d95571eacf From 460->461, revert change about balancing output of attention_squeeze module. 2022-11-23 14:12:08 +08:00
Daniel Povey
fe51eea397 Implement a form of dropout for squeeze_weights, dropout-to-constant. 2022-11-23 14:06:17 +08:00
Daniel Povey
066f1e4658 Implement TanSwish(), use it as activation in AttentionSqueeze module. 2022-11-22 23:34:11 +08:00
Daniel Povey
1826648dde Fix formulas and constants 2022-11-22 22:54:05 +08:00
Daniel Povey
6c5763fbb3 Implement subtracted momentum [0.33,0.66], and print name in Whiten module. 2022-11-22 21:57:48 +08:00
Daniel Povey
1a2632d0a2 Remove whitening in SelfAttention module. 2022-11-22 20:01:09 +08:00
Daniel Povey
99cd9f5788 Add more layers. 2022-11-22 19:48:42 +08:00
Daniel Povey
19683aa516 Change activation in bottleneck to Tanh. 2022-11-22 17:32:02 +08:00
Daniel Povey
8dfeaa5f92 Restore whitener tha was in the AttentionSqueeze module. 2022-11-22 15:45:53 +08:00
Daniel Povey
7acdaea085 Change balancer to whitener for ff module; tighther min/max-pos limit on NonlinAttentionModule; whitener->balancer for AttentionSqueeze. 2022-11-22 15:42:41 +08:00
Daniel Povey
26916f41e7 Add balancer at output of FeedforwardModule 2022-11-22 14:43:46 +08:00
Daniel Povey
fe1793e288 Add output balancer to NonlinAttentionModule. 2022-11-22 14:29:07 +08:00
Daniel Povey
71f118e725 Use 2 groups in whitening for NonlinAttentionModule; limit 40->20. 2022-11-21 23:23:41 +08:00
Daniel Povey
b3b5e8b9b9 Increase Whiten limit from 10.0 to 40.0. 2022-11-21 22:19:45 +08:00
Daniel Povey
56efdcda49 Reduce whitening limit to 10 and move it to the beginning. 2022-11-21 21:07:32 +08:00
Daniel Povey
584f5bf88c Also add balancer in NonlinAttentionModule 2022-11-21 18:25:24 +08:00
Daniel Povey
0504f705ec Add Whiten module in NonlinAttentionModule 2022-11-21 18:19:52 +08:00
Daniel Povey
211e3af680 Remove changes in previous merge commit that did not relate to length_factor. 2022-11-21 14:32:05 +08:00
Daniel Povey
a6770657c8 Merge branch 'scaled_adam_exp445' into scaled_adam_exp450 2022-11-21 14:29:50 +08:00
Daniel Povey
836c72dd36 Changes and bug-fixes RE balancers; restore activation in AttentionSqueeze, remove in NonlinAttention. 2022-11-21 14:29:36 +08:00
Daniel Povey
9fe6add587 Fix to diagnostics.py (fix for max being doubled), from scaled_adam_exp446; small cosmetic fixes. 2022-11-21 14:00:55 +08:00
Daniel Povey
a10a0bce7d Increase length_factor from 1.5 to 3.0. 2022-11-20 16:36:18 +08:00
Daniel Povey
cdfbbdded2 Refactoring, and change length_factor from 2.0 to 1.5. 2022-11-20 16:34:51 +08:00
Daniel Povey
a52ec3da28 Change feedforward dims: increase 1536->1792 for largest ff dim and move it one step later. 2022-11-20 14:24:41 +08:00
Daniel Povey
31b2a735b8 Move feedforward1 to the beginning, separating it from small_conv_module. 2022-11-20 13:17:39 +08:00
Daniel Povey
40c883343a Merge branch 'scaled_adam_exp439' into scaled_adam_exp440 2022-11-20 13:08:00 +08:00
Daniel Povey
cf16c96edd Merge branch 'scaled_adam_exp433' into scaled_adam_exp440 2022-11-20 13:07:35 +08:00
Daniel Povey
8b3303594c Revert 419->420 change, regarding random shift in pos embedding 2022-11-20 13:07:20 +08:00
Daniel Povey
4e21db07f6 Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers. 2022-11-19 22:05:10 +08:00
Daniel Povey
d23fda7c5f Multiply length_factor by 2.0. 2022-11-19 13:36:16 +08:00
Daniel Povey
b9871cc4f5 Merge branch 'scaled_adam_exp420' into scaled_adam_exp421 2022-11-18 14:54:36 +08:00
Daniel Povey
0601dd72fd Bug-fix RE random shift 2022-11-18 14:53:03 +08:00
Daniel Povey
8a095c1cd1 Add SmallConvModule; decrease feedforward dims to keep about same num params. 2022-11-18 12:46:40 +08:00
Daniel Povey
f7c99ed1d1 Introduce random shift with stddev=1.0 into pos_emb 2022-11-18 12:06:12 +08:00
Daniel Povey
e9806950f5 Reduce pos-dim from 96 to 48. 2022-11-17 23:42:39 +08:00
Daniel Povey
8b50932d5a Merge branch 'scaled_adam_exp416' into scaled_adam_exp418 2022-11-17 18:34:07 +08:00
Daniel Povey
e73ced1607 Bug fix in formula for pos embedding 2022-11-17 16:02:57 +08:00
Daniel Povey
48f32971f3 Reduce final pos_emb_skip rate from 0.075 to 0.0, and add dropout=0.15 for pos embedding module 2022-11-17 14:33:54 +08:00
Daniel Povey
27f8497fea Reduce pos_dim from 128 to 96. 2022-11-17 10:39:36 +08:00
Daniel Povey
526b5e59a6 Increase pos-head-dim from 2 to 4. 2022-11-16 11:53:55 +08:00
Daniel Povey
fc74ff63fb Remove one feedforward module and give params to the other 2. 2022-11-16 11:46:05 +08:00
Daniel Povey
3d47335ab6 Double the duration of layer skipping warmup, from 2k to 4k. 2022-11-16 11:41:48 +08:00