56 Commits

Author SHA1 Message Date
Daniel Povey
2eef001d39 Fix balancer code 2022-03-21 23:59:26 +08:00
Daniel Povey
87c92efbfe Changes from upstream/master 2022-03-16 21:49:15 +08:00
Daniel Povey
e838c192ef Cosmetic changes/renaming things 2022-03-16 19:27:45 +08:00
Daniel Povey
dfc75752c4 Remove some dead code. 2022-03-16 18:06:01 +08:00
Daniel Povey
c82db4184a Remove xscale from pos_embedding 2022-03-16 15:50:11 +08:00
Daniel Povey
0e9cad3f1f Modifying initialization from normal->uniform; add initial_scale when initializing 2022-03-16 14:42:53 +08:00
Daniel Povey
00be56c7a0 Remove dead code 2022-03-16 12:49:00 +08:00
Daniel Povey
633213424d Rework of initialization 2022-03-16 12:42:59 +08:00
Daniel Povey
261d7602a7 Draft of 0mean changes.. 2022-03-15 23:46:53 +08:00
Daniel Povey
fc873cc50d Make epsilon in BasicNorm learnable, optionally. 2022-03-15 17:00:17 +08:00
Daniel Povey
86e5dcba11 Remove max-positive constraint in deriv-balancing; add second DerivBalancer in conv module. 2022-03-15 13:10:35 +08:00
Daniel Povey
a23010fc10 Add warmup mode 2022-03-14 23:04:51 +08:00
Daniel Povey
8d17a05dd2 Reduce constraints from deriv-balancer in ConvModule. 2022-03-14 19:23:33 +08:00
Daniel Povey
437e8b2083 Reduce max-abs limit from 1000 to 100; introduce 2 DerivBalancer modules in conv layer. 2022-03-13 23:31:08 +08:00
Daniel Povey
f351777e9c Remove ExpScale in feedforward layes. 2022-03-13 17:29:39 +08:00
Daniel Povey
5d69acb25b Add max-abs-value 2022-03-13 13:15:20 +08:00
Daniel Povey
db7a3b6eea Reduce initial_scale. 2022-03-12 18:50:02 +08:00
Daniel Povey
b7b2d8970b Cosmetic change 2022-03-12 17:47:35 +08:00
Daniel Povey
a392cb9fbc Reduce initial scaling of modules 2022-03-12 16:53:03 +08:00
Daniel Povey
ca8cf2a73b Another rework, use scales on linear/conv 2022-03-12 15:38:13 +08:00
Daniel Povey
0abba9e7a2 Fix self.post-scale-mha 2022-03-12 11:20:44 +08:00
Daniel Povey
76a2b9d362 Add learnable post-scale for mha 2022-03-12 11:19:49 +08:00
Daniel Povey
7eb5a84cbe Add identity pre_norm_final for diagnostics. 2022-03-11 21:00:43 +08:00
Daniel Povey
cc558faf26 Fix scale from 0.5 to 2.0 as I really intended.. 2022-03-11 19:11:50 +08:00
Daniel Povey
98156711ef Introduce in_scale=0.5 for SwishExpScale 2022-03-11 19:07:34 +08:00
Daniel Povey
5eafccb369 Change how scales are applied; fix residual bug 2022-03-11 17:46:33 +08:00
Daniel Povey
bcf417fce2 Change max_factor in DerivBalancer from 0.025 to 0.01; fix scaling code. 2022-03-11 14:47:46 +08:00
Daniel Povey
2940d3106f Fix q*scaling logic 2022-03-11 14:44:13 +08:00
Daniel Povey
ab9a17413a Scale up pos_bias_u and pos_bias_v before use. 2022-03-11 14:37:52 +08:00
Daniel Povey
2fa9c636a4 use nonzero threshold in DerivBalancer 2022-03-10 23:24:55 +08:00
Daniel Povey
425e274c82 Replace norm in ConvolutionModule with a scaling factor. 2022-03-10 16:01:53 +08:00
Daniel Povey
b55472bb42 Replace most normalizations with scales (still have norm in conv) 2022-03-10 14:43:54 +08:00
Daniel Povey
a37d98463a Restore ConvolutionModule to state before changes; change all Swish,Swish(Swish) to SwishOffset. 2022-03-06 11:55:02 +08:00
Daniel Povey
8a8b81cd18 Replace relu with swish-squared. 2022-03-05 22:21:42 +08:00
Fangjun Kuang
1603744469
Refactor conformer. (#237) 2022-03-05 19:26:06 +08:00
Daniel Povey
5f2c0a09b7 Convert swish nonlinearities to ReLU 2022-03-05 16:28:24 +08:00
Daniel Povey
65b09dd5f2 Double the threshold in brelu; slightly increase max_factor. 2022-03-05 00:07:14 +08:00
Daniel Povey
6252282fd0 Add deriv-balancing code 2022-03-04 20:19:11 +08:00
Daniel Povey
eb3ed54202 Reduce scale from 50 to 20 2022-03-04 15:56:45 +08:00
Daniel Povey
9cc5999829 Fix duplicate Swish; replace norm+swish with swish+exp-scale in convolution module 2022-03-04 15:50:51 +08:00
Daniel Povey
7e88999641 Increase scale from 20 to 50. 2022-03-04 14:31:29 +08:00
Daniel Povey
3207bd98a9 Increase scale on Scale from 4 to 20 2022-03-04 13:16:40 +08:00
Daniel Povey
cd216f50b6 Add import 2022-03-04 11:03:01 +08:00
Daniel Povey
bc6c720e25 Combine ExpScale and swish for memory reduction 2022-03-04 10:52:05 +08:00
Daniel Povey
23b3aa233c Double learning rate of exp-scale units 2022-03-04 00:42:37 +08:00
Daniel Povey
5c177fc52b pelu_base->expscale, add 2xExpScale in subsampling, and in feedforward units. 2022-03-03 23:52:03 +08:00
Daniel Povey
3fb559d2f0 Add baseline for the PeLU expt, keeping only the small normalization-related changes. 2022-03-02 18:27:08 +08:00
Daniel Povey
9d1b4ae046 Add pelu to this good-performing setup.. 2022-03-02 16:33:27 +08:00
Daniel Povey
c1063def95 First version of rand-combine iterated-training-like idea. 2022-02-27 17:34:58 +08:00
Daniel Povey
63d8d935d4 Refactor/simplify ConformerEncoder 2022-02-27 13:56:15 +08:00