1892 Commits

Author SHA1 Message Date
Daniel Povey
0504f705ec Add Whiten module in NonlinAttentionModule 2022-11-21 18:19:52 +08:00
Daniel Povey
211e3af680 Remove changes in previous merge commit that did not relate to length_factor. 2022-11-21 14:32:05 +08:00
Daniel Povey
a6770657c8 Merge branch 'scaled_adam_exp445' into scaled_adam_exp450 2022-11-21 14:29:50 +08:00
Daniel Povey
836c72dd36 Changes and bug-fixes RE balancers; restore activation in AttentionSqueeze, remove in NonlinAttention. 2022-11-21 14:29:36 +08:00
Daniel Povey
9fe6add587 Fix to diagnostics.py (fix for max being doubled), from scaled_adam_exp446; small cosmetic fixes. 2022-11-21 14:00:55 +08:00
Daniel Povey
a10a0bce7d Increase length_factor from 1.5 to 3.0. 2022-11-20 16:36:18 +08:00
Daniel Povey
cdfbbdded2 Refactoring, and change length_factor from 2.0 to 1.5. 2022-11-20 16:34:51 +08:00
Daniel Povey
a52ec3da28 Change feedforward dims: increase 1536->1792 for largest ff dim and move it one step later. 2022-11-20 14:24:41 +08:00
Daniel Povey
31b2a735b8 Move feedforward1 to the beginning, separating it from small_conv_module. 2022-11-20 13:17:39 +08:00
Daniel Povey
40c883343a Merge branch 'scaled_adam_exp439' into scaled_adam_exp440 2022-11-20 13:08:00 +08:00
Daniel Povey
cf16c96edd Merge branch 'scaled_adam_exp433' into scaled_adam_exp440 2022-11-20 13:07:35 +08:00
Daniel Povey
8b3303594c Revert 419->420 change, regarding random shift in pos embedding 2022-11-20 13:07:20 +08:00
Daniel Povey
4e21db07f6 Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers. 2022-11-19 22:05:10 +08:00
Daniel Povey
d23fda7c5f Multiply length_factor by 2.0. 2022-11-19 13:36:16 +08:00
Daniel Povey
b9871cc4f5 Merge branch 'scaled_adam_exp420' into scaled_adam_exp421 2022-11-18 14:54:36 +08:00
Daniel Povey
0601dd72fd Bug-fix RE random shift 2022-11-18 14:53:03 +08:00
Daniel Povey
8a095c1cd1 Add SmallConvModule; decrease feedforward dims to keep about same num params. 2022-11-18 12:46:40 +08:00
Daniel Povey
f7c99ed1d1 Introduce random shift with stddev=1.0 into pos_emb 2022-11-18 12:06:12 +08:00
Daniel Povey
e9806950f5 Reduce pos-dim from 96 to 48. 2022-11-17 23:42:39 +08:00
Daniel Povey
8b50932d5a Merge branch 'scaled_adam_exp416' into scaled_adam_exp418 2022-11-17 18:34:07 +08:00
Daniel Povey
e73ced1607 Bug fix in formula for pos embedding 2022-11-17 16:02:57 +08:00
Daniel Povey
48f32971f3 Reduce final pos_emb_skip rate from 0.075 to 0.0, and add dropout=0.15 for pos embedding module 2022-11-17 14:33:54 +08:00
Daniel Povey
27f8497fea Reduce pos_dim from 128 to 96. 2022-11-17 10:39:36 +08:00
Daniel Povey
526b5e59a6 Increase pos-head-dim from 2 to 4. 2022-11-16 11:53:55 +08:00
Daniel Povey
fc74ff63fb Remove one feedforward module and give params to the other 2. 2022-11-16 11:46:05 +08:00
Daniel Povey
3d47335ab6 Double the duration of layer skipping warmup, from 2k to 4k. 2022-11-16 11:41:48 +08:00
Daniel Povey
22a1401f36 Remove self_attn1 module 2022-11-16 11:37:08 +08:00
Daniel Povey
d542fa61ff Double pos_dim from 64 to 128. 2022-11-16 11:35:25 +08:00
Daniel Povey
000af07a2a Increase final pos_emb_skip rate from 0.05 to 0.075 2022-11-16 11:34:26 +08:00
Daniel Povey
6668814940 Increase pos_emb_skip_rate from 0.05 to 0.075. 2022-11-15 11:50:14 +08:00
Daniel Povey
f76075fd1a Make pos_emb dropout rate be constant during training; also cosmetic changes 2022-11-15 11:42:12 +08:00
Daniel Povey
867556200f Have zero dropout in the position embedding, but dropout the entire thing with twice the final prob. 2022-11-15 11:39:20 +08:00
Daniel Povey
380f773069 Merge branch 'scaled_adam_exp387' into scaled_adam_exp390 2022-11-15 11:35:54 +08:00
Daniel Povey
a1a4b715d9 Introduce a dropout schedule for the pos embedding, in training time.
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
2022-11-15 11:28:50 +08:00
Daniel Povey
6ea1706e11 Fix potential/theoretical issue in backward of LimitParamValue 2022-11-14 23:31:00 +08:00
Daniel Povey
d1df919547 Cosmetic improvements 2022-11-14 23:26:33 +08:00
Daniel Povey
46bd93b792 Cosmetic fix 2022-11-14 23:17:20 +08:00
Daniel Povey
a680c7de2e Make bypass_scale be a tensor. 2022-11-14 19:12:16 +08:00
Daniel Povey
ff6431ed0f Implement limits on parameter values a different way. 2022-11-14 16:02:38 +08:00
Daniel Povey
ce4b50d094 Revert making the dropout of pos_emb independent across the batch. 2022-11-14 15:34:39 +08:00
Daniel Povey
804917837e Remove pos_emb csales 2022-11-14 15:32:54 +08:00
Daniel Povey
ba69eb48fe Remove pos_emb schedule 2022-11-14 15:31:56 +08:00
Daniel Povey
54048009db Fix self.training condition 2022-11-14 15:15:24 +08:00
Daniel Povey
e1fb25262a Refactorize the scheduling code a little 2022-11-14 14:52:27 +08:00
Daniel Povey
b32dec1119 Add printing capability 2022-11-14 14:16:28 +08:00
Daniel Povey
4c8575878a Bug fix in ScheduledSampler 2022-11-14 13:52:14 +08:00
Daniel Povey
614b5b1a52 Treat batch_idx==0.0 separately to get scan_pessimistic_batches_for_oom() to work. should not affect results. 2022-11-14 13:20:31 +08:00
Daniel Povey
cde4ca27ee Introduce a dropout schedule for the pos embedding, in training time. 2022-11-14 13:00:30 +08:00
Daniel Povey
cd4730b657 Try to refactor the code for scheduling 2022-11-14 12:50:24 +08:00
Daniel Povey
aa0b1a37cd Change to valid interval for libri-100 2022-11-13 23:29:17 +08:00