Daniel Povey
|
19683aa516
|
Change activation in bottleneck to Tanh.
|
2022-11-22 17:32:02 +08:00 |
|
Daniel Povey
|
8dfeaa5f92
|
Restore whitener tha was in the AttentionSqueeze module.
|
2022-11-22 15:45:53 +08:00 |
|
Daniel Povey
|
7acdaea085
|
Change balancer to whitener for ff module; tighther min/max-pos limit on NonlinAttentionModule; whitener->balancer for AttentionSqueeze.
|
2022-11-22 15:42:41 +08:00 |
|
Daniel Povey
|
26916f41e7
|
Add balancer at output of FeedforwardModule
|
2022-11-22 14:43:46 +08:00 |
|
Daniel Povey
|
fe1793e288
|
Add output balancer to NonlinAttentionModule.
|
2022-11-22 14:29:07 +08:00 |
|
Daniel Povey
|
71f118e725
|
Use 2 groups in whitening for NonlinAttentionModule; limit 40->20.
|
2022-11-21 23:23:41 +08:00 |
|
Daniel Povey
|
b3b5e8b9b9
|
Increase Whiten limit from 10.0 to 40.0.
|
2022-11-21 22:19:45 +08:00 |
|
Daniel Povey
|
56efdcda49
|
Reduce whitening limit to 10 and move it to the beginning.
|
2022-11-21 21:07:32 +08:00 |
|
Daniel Povey
|
584f5bf88c
|
Also add balancer in NonlinAttentionModule
|
2022-11-21 18:25:24 +08:00 |
|
Daniel Povey
|
0504f705ec
|
Add Whiten module in NonlinAttentionModule
|
2022-11-21 18:19:52 +08:00 |
|
Daniel Povey
|
211e3af680
|
Remove changes in previous merge commit that did not relate to length_factor.
|
2022-11-21 14:32:05 +08:00 |
|
Daniel Povey
|
a6770657c8
|
Merge branch 'scaled_adam_exp445' into scaled_adam_exp450
|
2022-11-21 14:29:50 +08:00 |
|
Daniel Povey
|
836c72dd36
|
Changes and bug-fixes RE balancers; restore activation in AttentionSqueeze, remove in NonlinAttention.
|
2022-11-21 14:29:36 +08:00 |
|
Daniel Povey
|
9fe6add587
|
Fix to diagnostics.py (fix for max being doubled), from scaled_adam_exp446; small cosmetic fixes.
|
2022-11-21 14:00:55 +08:00 |
|
Daniel Povey
|
a10a0bce7d
|
Increase length_factor from 1.5 to 3.0.
|
2022-11-20 16:36:18 +08:00 |
|
Daniel Povey
|
cdfbbdded2
|
Refactoring, and change length_factor from 2.0 to 1.5.
|
2022-11-20 16:34:51 +08:00 |
|
Daniel Povey
|
a52ec3da28
|
Change feedforward dims: increase 1536->1792 for largest ff dim and move it one step later.
|
2022-11-20 14:24:41 +08:00 |
|
Daniel Povey
|
31b2a735b8
|
Move feedforward1 to the beginning, separating it from small_conv_module.
|
2022-11-20 13:17:39 +08:00 |
|
Daniel Povey
|
40c883343a
|
Merge branch 'scaled_adam_exp439' into scaled_adam_exp440
|
2022-11-20 13:08:00 +08:00 |
|
Daniel Povey
|
cf16c96edd
|
Merge branch 'scaled_adam_exp433' into scaled_adam_exp440
|
2022-11-20 13:07:35 +08:00 |
|
Daniel Povey
|
8b3303594c
|
Revert 419->420 change, regarding random shift in pos embedding
|
2022-11-20 13:07:20 +08:00 |
|
Daniel Povey
|
4e21db07f6
|
Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers.
|
2022-11-19 22:05:10 +08:00 |
|
Daniel Povey
|
d23fda7c5f
|
Multiply length_factor by 2.0.
|
2022-11-19 13:36:16 +08:00 |
|
Daniel Povey
|
b9871cc4f5
|
Merge branch 'scaled_adam_exp420' into scaled_adam_exp421
|
2022-11-18 14:54:36 +08:00 |
|
Daniel Povey
|
0601dd72fd
|
Bug-fix RE random shift
|
2022-11-18 14:53:03 +08:00 |
|
Daniel Povey
|
8a095c1cd1
|
Add SmallConvModule; decrease feedforward dims to keep about same num params.
|
2022-11-18 12:46:40 +08:00 |
|
Daniel Povey
|
f7c99ed1d1
|
Introduce random shift with stddev=1.0 into pos_emb
|
2022-11-18 12:06:12 +08:00 |
|
Daniel Povey
|
e9806950f5
|
Reduce pos-dim from 96 to 48.
|
2022-11-17 23:42:39 +08:00 |
|
Daniel Povey
|
8b50932d5a
|
Merge branch 'scaled_adam_exp416' into scaled_adam_exp418
|
2022-11-17 18:34:07 +08:00 |
|
Daniel Povey
|
e73ced1607
|
Bug fix in formula for pos embedding
|
2022-11-17 16:02:57 +08:00 |
|
Daniel Povey
|
48f32971f3
|
Reduce final pos_emb_skip rate from 0.075 to 0.0, and add dropout=0.15 for pos embedding module
|
2022-11-17 14:33:54 +08:00 |
|
Daniel Povey
|
27f8497fea
|
Reduce pos_dim from 128 to 96.
|
2022-11-17 10:39:36 +08:00 |
|
Daniel Povey
|
526b5e59a6
|
Increase pos-head-dim from 2 to 4.
|
2022-11-16 11:53:55 +08:00 |
|
Daniel Povey
|
fc74ff63fb
|
Remove one feedforward module and give params to the other 2.
|
2022-11-16 11:46:05 +08:00 |
|
Daniel Povey
|
3d47335ab6
|
Double the duration of layer skipping warmup, from 2k to 4k.
|
2022-11-16 11:41:48 +08:00 |
|
Daniel Povey
|
22a1401f36
|
Remove self_attn1 module
|
2022-11-16 11:37:08 +08:00 |
|
Daniel Povey
|
d542fa61ff
|
Double pos_dim from 64 to 128.
|
2022-11-16 11:35:25 +08:00 |
|
Daniel Povey
|
000af07a2a
|
Increase final pos_emb_skip rate from 0.05 to 0.075
|
2022-11-16 11:34:26 +08:00 |
|
Daniel Povey
|
6668814940
|
Increase pos_emb_skip_rate from 0.05 to 0.075.
|
2022-11-15 11:50:14 +08:00 |
|
Daniel Povey
|
f76075fd1a
|
Make pos_emb dropout rate be constant during training; also cosmetic changes
|
2022-11-15 11:42:12 +08:00 |
|
Daniel Povey
|
867556200f
|
Have zero dropout in the position embedding, but dropout the entire thing with twice the final prob.
|
2022-11-15 11:39:20 +08:00 |
|
Daniel Povey
|
380f773069
|
Merge branch 'scaled_adam_exp387' into scaled_adam_exp390
|
2022-11-15 11:35:54 +08:00 |
|
Daniel Povey
|
a1a4b715d9
|
Introduce a dropout schedule for the pos embedding, in training time.
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
|
2022-11-15 11:28:50 +08:00 |
|
Daniel Povey
|
6ea1706e11
|
Fix potential/theoretical issue in backward of LimitParamValue
|
2022-11-14 23:31:00 +08:00 |
|
Daniel Povey
|
d1df919547
|
Cosmetic improvements
|
2022-11-14 23:26:33 +08:00 |
|
Daniel Povey
|
46bd93b792
|
Cosmetic fix
|
2022-11-14 23:17:20 +08:00 |
|
Daniel Povey
|
a680c7de2e
|
Make bypass_scale be a tensor.
|
2022-11-14 19:12:16 +08:00 |
|
Daniel Povey
|
ff6431ed0f
|
Implement limits on parameter values a different way.
|
2022-11-14 16:02:38 +08:00 |
|
Daniel Povey
|
ce4b50d094
|
Revert making the dropout of pos_emb independent across the batch.
|
2022-11-14 15:34:39 +08:00 |
|
Daniel Povey
|
804917837e
|
Remove pos_emb csales
|
2022-11-14 15:32:54 +08:00 |
|