84 Commits

Author SHA1 Message Date
Daniel Povey
fc74ff63fb Remove one feedforward module and give params to the other 2. 2022-11-16 11:46:05 +08:00
Daniel Povey
3d47335ab6 Double the duration of layer skipping warmup, from 2k to 4k. 2022-11-16 11:41:48 +08:00
Daniel Povey
22a1401f36 Remove self_attn1 module 2022-11-16 11:37:08 +08:00
Daniel Povey
000af07a2a Increase final pos_emb_skip rate from 0.05 to 0.075 2022-11-16 11:34:26 +08:00
Daniel Povey
867556200f Have zero dropout in the position embedding, but dropout the entire thing with twice the final prob. 2022-11-15 11:39:20 +08:00
Daniel Povey
380f773069 Merge branch 'scaled_adam_exp387' into scaled_adam_exp390 2022-11-15 11:35:54 +08:00
Daniel Povey
a1a4b715d9 Introduce a dropout schedule for the pos embedding, in training time.
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
2022-11-15 11:28:50 +08:00
Daniel Povey
d1df919547 Cosmetic improvements 2022-11-14 23:26:33 +08:00
Daniel Povey
46bd93b792 Cosmetic fix 2022-11-14 23:17:20 +08:00
Daniel Povey
a680c7de2e Make bypass_scale be a tensor. 2022-11-14 19:12:16 +08:00
Daniel Povey
ff6431ed0f Implement limits on parameter values a different way. 2022-11-14 16:02:38 +08:00
Daniel Povey
ce4b50d094 Revert making the dropout of pos_emb independent across the batch. 2022-11-14 15:34:39 +08:00
Daniel Povey
804917837e Remove pos_emb csales 2022-11-14 15:32:54 +08:00
Daniel Povey
ba69eb48fe Remove pos_emb schedule 2022-11-14 15:31:56 +08:00
Daniel Povey
e1fb25262a Refactorize the scheduling code a little 2022-11-14 14:52:27 +08:00
Daniel Povey
614b5b1a52 Treat batch_idx==0.0 separately to get scan_pessimistic_batches_for_oom() to work. should not affect results. 2022-11-14 13:20:31 +08:00
Daniel Povey
cde4ca27ee Introduce a dropout schedule for the pos embedding, in training time. 2022-11-14 13:00:30 +08:00
Daniel Povey
cd4730b657 Try to refactor the code for scheduling 2022-11-14 12:50:24 +08:00
Daniel Povey
a256425b2f Reduce dropout_rate for RelPositionalEncoding from 0.2 to 0.15; 2022-11-13 23:29:07 +08:00
Daniel Povey
463fed3d6a Use compression of large x in the formula for pos_emb 2022-11-13 13:23:42 +08:00
Daniel Povey
6c16d08b4f Add bias in interior of SelfAttn module 2022-11-13 11:58:01 +08:00
Daniel Povey
4a5a13b678 Increase dropout rate for PosEmb from 0.1 to 0.2. 2022-11-12 23:26:58 +08:00
Daniel Povey
70408d22fe Add trainable scales for pos_emb
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
2022-11-12 23:25:17 +08:00
Daniel Povey
e67d4ca40d Make pos_emb be dropped out independently across batch 2022-11-12 19:21:29 +08:00
Daniel Povey
f7aff4f507 Revert "Make sub-module dropped out independently."
This reverts commit 3ff3f440ee6d2a367cc3cc45e40f8eb69d122861.
2022-11-11 21:36:36 +08:00
Daniel Povey
742bcaa340 Comment 2022-11-10 23:26:36 +08:00
Daniel Povey
2c6f5e82b2 Use atan not tanh 2022-11-10 22:36:39 +08:00
Daniel Povey
60274ea731 New formula for pos emb 2022-11-10 22:06:05 +08:00
Daniel Povey
fd26b890d2 Tweak formula for widths 2022-11-10 13:04:56 +08:00
Daniel Povey
6091146e91 Change the formula for the embedding to be a bit more symmetric. 2022-11-10 11:39:37 +08:00
Daniel Povey
082b93d911 Remove unused variable. 2022-11-10 11:18:10 +08:00
Daniel Povey
125ea04a42 Rework positional encoding 2022-11-09 20:48:27 +08:00
Daniel Povey
e4a3b2da7d Mostly-cosmetic fixes found via mypy 2022-11-09 17:40:09 +08:00
Daniel Povey
308059edba Cosmetic fixes 2022-11-09 17:14:18 +08:00
Daniel Povey
3ff3f440ee Make sub-module dropped out independently. 2022-11-09 14:15:56 +08:00
Daniel Povey
cba194aa26 Bug fix RE masking 2022-11-09 13:12:34 +08:00
Daniel Povey
20e6d2a157 Rework zipformer code for clarity and extensibility 2022-11-09 12:56:07 +08:00
Daniel Povey
797a0e6ce7 Change order of convolution and nonlin-attention modules 2022-11-08 20:00:25 +08:00
Daniel Povey
36bff9b369 Fix to comment 2022-11-07 12:33:12 +08:00
Daniel Povey
47f42ef5db Replace the 1st of the ConvolutionModules with NonlinAttentionModule 2022-11-05 14:19:43 +08:00
Daniel Povey
eb6e2b5a1d Have 2 squeeze-excite modules per layer, using different attention heads. 2022-11-04 17:40:51 +08:00
Daniel Povey
efbe20694f Use the attention weights as input for the ModifiedSEModule 2022-11-04 16:01:07 +08:00
Daniel Povey
0d94783e76 Instead of a pooling operation, use the first bottleneck_dim dimensions of the preceding self_attn.forward2 as the input to the squeeze-excite module. 2022-11-04 15:16:59 +08:00
Daniel Povey
c27ee8cfcf Merge branch 'scaled_adam_exp277' into scaled_adam_exp281 2022-11-04 15:06:23 +08:00
Daniel Povey
67d470766f Revert bottleneck_dim from 8 to 16 2022-11-04 15:02:56 +08:00
Daniel Povey
cefcd061bd Merge branch 'scaled_adam_exp271' into scaled_adam_exp274 2022-11-04 14:50:00 +08:00
Daniel Povey
31d9bbfb3c Merge branch 'scaled_adam_exp268b' into scaled_adam_exp279 2022-11-04 14:42:00 +08:00
Daniel Povey
70300e34d3 Merge branch 'scaled_adam_exp273' into scaled_adam_exp277
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
2022-11-04 12:41:02 +08:00
Daniel Povey
f625810de1 Use the balancer; remove the unused sigmoid module. 2022-11-03 19:21:37 +08:00
Daniel Povey
a9c384e69e Add Whiten module after squeeze_proj. 2022-11-03 19:04:34 +08:00