Daniel Povey
9ceb41acb4
Remove balancer from SelfAttention module.
2022-11-23 18:41:36 +08:00
Daniel Povey
f2dbf87461
Remove invocation of out_balancer
2022-11-23 18:40:27 +08:00
Daniel Povey
b88f12fe83
Remove out_balancer of NonlinAttentionModule
2022-11-23 18:37:45 +08:00
Daniel Povey
9138695dfe
Fix bug RE attn_weights
2022-11-23 17:04:17 +08:00
Daniel Povey
36e49a8d61
Change for mem efficiency
2022-11-23 15:38:34 +08:00
Daniel Povey
1d0252d420
Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
...
Below is a more complete list of the changes I am making, although some of
these may be counted in the last
numbers XXX below correspond to branches numbered scaled_adam_expXXX.
- from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
but simplified this a little to use the same dropout schedule and drop them out all together
also have all 3 submodules use separate heads.
- from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
- merge from 462->467, about using TanSwish not tanh.
- merge 462->465, remove whitening in self-attention module
- merge the part of 465->466 that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
f89a85aed8
Merge branch 'scaled_adam_exp465' into scaled_adam_exp472
2022-11-23 14:16:17 +08:00
Daniel Povey
edd4bf5312
Merge branch 'scaled_adam_exp467' into scaled_adam_exp472
2022-11-23 14:13:19 +08:00
Daniel Povey
d95571eacf
From 460->461, revert change about balancing output of attention_squeeze module.
2022-11-23 14:12:08 +08:00
Daniel Povey
fe51eea397
Implement a form of dropout for squeeze_weights, dropout-to-constant.
2022-11-23 14:06:17 +08:00
Daniel Povey
066f1e4658
Implement TanSwish(), use it as activation in AttentionSqueeze module.
2022-11-22 23:34:11 +08:00
Daniel Povey
1826648dde
Fix formulas and constants
2022-11-22 22:54:05 +08:00
Daniel Povey
6c5763fbb3
Implement subtracted momentum [0.33,0.66], and print name in Whiten module.
2022-11-22 21:57:48 +08:00
Daniel Povey
1a2632d0a2
Remove whitening in SelfAttention module.
2022-11-22 20:01:09 +08:00
Daniel Povey
99cd9f5788
Add more layers.
2022-11-22 19:48:42 +08:00
Daniel Povey
19683aa516
Change activation in bottleneck to Tanh.
2022-11-22 17:32:02 +08:00
Daniel Povey
8dfeaa5f92
Restore whitener tha was in the AttentionSqueeze module.
2022-11-22 15:45:53 +08:00
Daniel Povey
7acdaea085
Change balancer to whitener for ff module; tighther min/max-pos limit on NonlinAttentionModule; whitener->balancer for AttentionSqueeze.
2022-11-22 15:42:41 +08:00
Daniel Povey
26916f41e7
Add balancer at output of FeedforwardModule
2022-11-22 14:43:46 +08:00
Daniel Povey
fe1793e288
Add output balancer to NonlinAttentionModule.
2022-11-22 14:29:07 +08:00
Daniel Povey
71f118e725
Use 2 groups in whitening for NonlinAttentionModule; limit 40->20.
2022-11-21 23:23:41 +08:00
Daniel Povey
b3b5e8b9b9
Increase Whiten limit from 10.0 to 40.0.
2022-11-21 22:19:45 +08:00
Daniel Povey
56efdcda49
Reduce whitening limit to 10 and move it to the beginning.
2022-11-21 21:07:32 +08:00
Daniel Povey
584f5bf88c
Also add balancer in NonlinAttentionModule
2022-11-21 18:25:24 +08:00
Daniel Povey
0504f705ec
Add Whiten module in NonlinAttentionModule
2022-11-21 18:19:52 +08:00
Daniel Povey
211e3af680
Remove changes in previous merge commit that did not relate to length_factor.
2022-11-21 14:32:05 +08:00
Daniel Povey
a6770657c8
Merge branch 'scaled_adam_exp445' into scaled_adam_exp450
2022-11-21 14:29:50 +08:00
Daniel Povey
836c72dd36
Changes and bug-fixes RE balancers; restore activation in AttentionSqueeze, remove in NonlinAttention.
2022-11-21 14:29:36 +08:00
Daniel Povey
9fe6add587
Fix to diagnostics.py (fix for max being doubled), from scaled_adam_exp446; small cosmetic fixes.
2022-11-21 14:00:55 +08:00
Daniel Povey
a10a0bce7d
Increase length_factor from 1.5 to 3.0.
2022-11-20 16:36:18 +08:00
Daniel Povey
cdfbbdded2
Refactoring, and change length_factor from 2.0 to 1.5.
2022-11-20 16:34:51 +08:00
Daniel Povey
a52ec3da28
Change feedforward dims: increase 1536->1792 for largest ff dim and move it one step later.
2022-11-20 14:24:41 +08:00
Daniel Povey
31b2a735b8
Move feedforward1 to the beginning, separating it from small_conv_module.
2022-11-20 13:17:39 +08:00
Daniel Povey
40c883343a
Merge branch 'scaled_adam_exp439' into scaled_adam_exp440
2022-11-20 13:08:00 +08:00
Daniel Povey
cf16c96edd
Merge branch 'scaled_adam_exp433' into scaled_adam_exp440
2022-11-20 13:07:35 +08:00
Daniel Povey
8b3303594c
Revert 419->420 change, regarding random shift in pos embedding
2022-11-20 13:07:20 +08:00
Daniel Povey
4e21db07f6
Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers.
2022-11-19 22:05:10 +08:00
Daniel Povey
d23fda7c5f
Multiply length_factor by 2.0.
2022-11-19 13:36:16 +08:00
Daniel Povey
b9871cc4f5
Merge branch 'scaled_adam_exp420' into scaled_adam_exp421
2022-11-18 14:54:36 +08:00
Daniel Povey
0601dd72fd
Bug-fix RE random shift
2022-11-18 14:53:03 +08:00
Daniel Povey
8a095c1cd1
Add SmallConvModule; decrease feedforward dims to keep about same num params.
2022-11-18 12:46:40 +08:00
Daniel Povey
f7c99ed1d1
Introduce random shift with stddev=1.0 into pos_emb
2022-11-18 12:06:12 +08:00
Daniel Povey
e9806950f5
Reduce pos-dim from 96 to 48.
2022-11-17 23:42:39 +08:00
Daniel Povey
8b50932d5a
Merge branch 'scaled_adam_exp416' into scaled_adam_exp418
2022-11-17 18:34:07 +08:00
Daniel Povey
e73ced1607
Bug fix in formula for pos embedding
2022-11-17 16:02:57 +08:00
Daniel Povey
48f32971f3
Reduce final pos_emb_skip rate from 0.075 to 0.0, and add dropout=0.15 for pos embedding module
2022-11-17 14:33:54 +08:00
Daniel Povey
27f8497fea
Reduce pos_dim from 128 to 96.
2022-11-17 10:39:36 +08:00
Daniel Povey
526b5e59a6
Increase pos-head-dim from 2 to 4.
2022-11-16 11:53:55 +08:00
Daniel Povey
fc74ff63fb
Remove one feedforward module and give params to the other 2.
2022-11-16 11:46:05 +08:00
Daniel Povey
3d47335ab6
Double the duration of layer skipping warmup, from 2k to 4k.
2022-11-16 11:41:48 +08:00