Daniel Povey
|
c82db4184a
|
Remove xscale from pos_embedding
|
2022-03-16 15:50:11 +08:00 |
|
Daniel Povey
|
0e9cad3f1f
|
Modifying initialization from normal->uniform; add initial_scale when initializing
|
2022-03-16 14:42:53 +08:00 |
|
Daniel Povey
|
00be56c7a0
|
Remove dead code
|
2022-03-16 12:49:00 +08:00 |
|
Daniel Povey
|
a783b96467
|
Fix typo
|
2022-03-16 12:43:44 +08:00 |
|
Daniel Povey
|
633213424d
|
Rework of initialization
|
2022-03-16 12:42:59 +08:00 |
|
Daniel Povey
|
1331199530
|
Merge branch 'specaugmod_baseline' into randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp_convderiv2warmup_scale_0mean
|
2022-03-15 23:47:03 +08:00 |
|
Daniel Povey
|
261d7602a7
|
Draft of 0mean changes..
|
2022-03-15 23:46:53 +08:00 |
|
Daniel Povey
|
fc873cc50d
|
Make epsilon in BasicNorm learnable, optionally.
|
2022-03-15 17:00:17 +08:00 |
|
Daniel Povey
|
b2abcd721a
|
Add more stats.
|
2022-03-15 16:38:19 +08:00 |
|
Daniel Povey
|
1962fe298b
|
Add deriv-balancer at output of embedding.
|
2022-03-15 14:35:15 +08:00 |
|
Daniel Povey
|
2e6d170be8
|
Merge branch 'specaugmod_baseline' into randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp_convderiv3warmup_embed
|
2022-03-15 14:33:08 +08:00 |
|
Daniel Povey
|
21ebd356e7
|
Add some extra info to diagnostics
|
2022-03-15 13:49:15 +08:00 |
|
Daniel Povey
|
86e5dcba11
|
Remove max-positive constraint in deriv-balancing; add second DerivBalancer in conv module.
|
2022-03-15 13:10:35 +08:00 |
|
Daniel Povey
|
a23010fc10
|
Add warmup mode
|
2022-03-14 23:04:51 +08:00 |
|
Daniel Povey
|
8d17a05dd2
|
Reduce constraints from deriv-balancer in ConvModule.
|
2022-03-14 19:23:33 +08:00 |
|
Daniel Povey
|
437e8b2083
|
Reduce max-abs limit from 1000 to 100; introduce 2 DerivBalancer modules in conv layer.
|
2022-03-13 23:31:08 +08:00 |
|
Daniel Povey
|
f351777e9c
|
Remove ExpScale in feedforward layes.
|
2022-03-13 17:29:39 +08:00 |
|
Daniel Povey
|
97c0bb82d3
|
Change dir name
|
2022-03-13 13:19:20 +08:00 |
|
Daniel Povey
|
5d69acb25b
|
Add max-abs-value
|
2022-03-13 13:15:20 +08:00 |
|
Daniel Povey
|
e6a501d3c8
|
Add max-abs-value constraint in DerivBalancer
|
2022-03-13 11:52:13 +08:00 |
|
Daniel Povey
|
6042c96db2
|
Use learnable scales for joiner and decoder
|
2022-03-12 20:54:46 +08:00 |
|
Daniel Povey
|
db7a3b6eea
|
Reduce initial_scale.
|
2022-03-12 18:50:02 +08:00 |
|
Daniel Povey
|
b7b2d8970b
|
Cosmetic change
|
2022-03-12 17:47:35 +08:00 |
|
Daniel Povey
|
a392cb9fbc
|
Reduce initial scaling of modules
|
2022-03-12 16:53:03 +08:00 |
|
Daniel Povey
|
d906bc2a4f
|
Change dir name
|
2022-03-12 15:38:39 +08:00 |
|
Daniel Povey
|
ca8cf2a73b
|
Another rework, use scales on linear/conv
|
2022-03-12 15:38:13 +08:00 |
|
Daniel Povey
|
0abba9e7a2
|
Fix self.post-scale-mha
|
2022-03-12 11:20:44 +08:00 |
|
Daniel Povey
|
76a2b9d362
|
Add learnable post-scale for mha
|
2022-03-12 11:19:49 +08:00 |
|
Daniel Povey
|
7eb5a84cbe
|
Add identity pre_norm_final for diagnostics.
|
2022-03-11 21:00:43 +08:00 |
|
Daniel Povey
|
cc558faf26
|
Fix scale from 0.5 to 2.0 as I really intended..
|
2022-03-11 19:11:50 +08:00 |
|
Daniel Povey
|
98156711ef
|
Introduce in_scale=0.5 for SwishExpScale
|
2022-03-11 19:07:34 +08:00 |
|
Daniel Povey
|
5eafccb369
|
Change how scales are applied; fix residual bug
|
2022-03-11 17:46:33 +08:00 |
|
Daniel Povey
|
bec33e6855
|
init 1st conv module to smaller variance
|
2022-03-11 16:37:17 +08:00 |
|
Daniel Povey
|
bcf417fce2
|
Change max_factor in DerivBalancer from 0.025 to 0.01; fix scaling code.
|
2022-03-11 14:47:46 +08:00 |
|
Daniel Povey
|
2940d3106f
|
Fix q*scaling logic
|
2022-03-11 14:44:13 +08:00 |
|
Daniel Povey
|
ab9a17413a
|
Scale up pos_bias_u and pos_bias_v before use.
|
2022-03-11 14:37:52 +08:00 |
|
Daniel Povey
|
e3e14cf7a4
|
Change min-abs threshold from 0.2 to 0.5
|
2022-03-11 14:16:33 +08:00 |
|
Daniel Povey
|
bfce5f63e4
|
Fix dirname
|
2022-03-10 23:49:09 +08:00 |
|
Daniel Povey
|
76560f255c
|
Add min-abs-value 0.2
|
2022-03-10 23:48:46 +08:00 |
|
Daniel Povey
|
2fa9c636a4
|
use nonzero threshold in DerivBalancer
|
2022-03-10 23:24:55 +08:00 |
|
Daniel Povey
|
425e274c82
|
Replace norm in ConvolutionModule with a scaling factor.
|
2022-03-10 16:01:53 +08:00 |
|
Daniel Povey
|
87b843f023
|
Change exp dir
|
2022-03-10 14:44:55 +08:00 |
|
Daniel Povey
|
b55472bb42
|
Replace most normalizations with scales (still have norm in conv)
|
2022-03-10 14:43:54 +08:00 |
|
Daniel Povey
|
feb20ca84d
|
Merge changes to diagnostics
|
2022-03-10 10:31:42 +08:00 |
|
Daniel Povey
|
1e5455ba29
|
Update diagnostics
|
2022-03-10 10:28:48 +08:00 |
|
Daniel Povey
|
d074cf73c6
|
Extensions to diagnostics code
|
2022-03-09 20:37:20 +08:00 |
|
Daniel Povey
|
e2ace9d545
|
Replace norm on input layer with scale of 0.1.
|
2022-03-07 11:24:04 +08:00 |
|
Daniel Povey
|
a37d98463a
|
Restore ConvolutionModule to state before changes; change all Swish,Swish(Swish) to SwishOffset.
|
2022-03-06 11:55:02 +08:00 |
|
Daniel Povey
|
8a8b81cd18
|
Replace relu with swish-squared.
|
2022-03-05 22:21:42 +08:00 |
|
Daniel Povey
|
5f2c0a09b7
|
Convert swish nonlinearities to ReLU
|
2022-03-05 16:28:24 +08:00 |
|