Daniel Povey
|
87c92efbfe
|
Changes from upstream/master
|
2022-03-16 21:49:15 +08:00 |
|
Daniel Povey
|
e838c192ef
|
Cosmetic changes/renaming things
|
2022-03-16 19:27:45 +08:00 |
|
Daniel Povey
|
dfc75752c4
|
Remove some dead code.
|
2022-03-16 18:06:01 +08:00 |
|
Daniel Povey
|
c82db4184a
|
Remove xscale from pos_embedding
|
2022-03-16 15:50:11 +08:00 |
|
Daniel Povey
|
6561743d7b
|
bug fix re sqrt
|
2022-03-16 14:55:17 +08:00 |
|
Daniel Povey
|
0e9cad3f1f
|
Modifying initialization from normal->uniform; add initial_scale when initializing
|
2022-03-16 14:42:53 +08:00 |
|
Daniel Povey
|
633213424d
|
Rework of initialization
|
2022-03-16 12:42:59 +08:00 |
|
Daniel Povey
|
261d7602a7
|
Draft of 0mean changes..
|
2022-03-15 23:46:53 +08:00 |
|
Daniel Povey
|
fc873cc50d
|
Make epsilon in BasicNorm learnable, optionally.
|
2022-03-15 17:00:17 +08:00 |
|
Daniel Povey
|
1962fe298b
|
Add deriv-balancer at output of embedding.
|
2022-03-15 14:35:15 +08:00 |
|
Daniel Povey
|
86e5dcba11
|
Remove max-positive constraint in deriv-balancing; add second DerivBalancer in conv module.
|
2022-03-15 13:10:35 +08:00 |
|
Daniel Povey
|
788963d40a
|
Merge branch 'randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp' into randcombine1_expscale3_rework2c_maxabs1000_maxp0.95_noexp_convderiv
|
2022-03-14 14:37:40 +08:00 |
|
Daniel Povey
|
ae25688253
|
Make DoubleSwish more memory efficient
|
2022-03-14 11:02:32 +08:00 |
|
Daniel Povey
|
437e8b2083
|
Reduce max-abs limit from 1000 to 100; introduce 2 DerivBalancer modules in conv layer.
|
2022-03-13 23:31:08 +08:00 |
|
Daniel Povey
|
f351777e9c
|
Remove ExpScale in feedforward layes.
|
2022-03-13 17:29:39 +08:00 |
|
Daniel Povey
|
5d69acb25b
|
Add max-abs-value
|
2022-03-13 13:15:20 +08:00 |
|
Daniel Povey
|
e6a501d3c8
|
Add max-abs-value constraint in DerivBalancer
|
2022-03-13 11:52:13 +08:00 |
|
Daniel Povey
|
2117f46361
|
DoubleSwish fix
|
2022-03-12 19:02:14 +08:00 |
|
Daniel Povey
|
be0a79cbca
|
Replace ExpScaleRelu with DoubleSwish()
|
2022-03-12 19:00:48 +08:00 |
|
Daniel Povey
|
a24572abd1
|
Bug-fix RE bias
|
2022-03-12 17:28:43 +08:00 |
|
Daniel Povey
|
a392cb9fbc
|
Reduce initial scaling of modules
|
2022-03-12 16:53:03 +08:00 |
|
Daniel Povey
|
ca8cf2a73b
|
Another rework, use scales on linear/conv
|
2022-03-12 15:38:13 +08:00 |
|
Daniel Povey
|
2d3a76292d
|
Set scaling on SwishExpScale
|
2022-03-11 20:12:45 +08:00 |
|
Daniel Povey
|
98156711ef
|
Introduce in_scale=0.5 for SwishExpScale
|
2022-03-11 19:07:34 +08:00 |
|
Daniel Povey
|
a0d5e2932c
|
Reduce min_abs from 0.5 to 0.2
|
2022-03-11 18:17:49 +08:00 |
|
Daniel Povey
|
bec33e6855
|
init 1st conv module to smaller variance
|
2022-03-11 16:37:17 +08:00 |
|
Daniel Povey
|
bcf417fce2
|
Change max_factor in DerivBalancer from 0.025 to 0.01; fix scaling code.
|
2022-03-11 14:47:46 +08:00 |
|
Daniel Povey
|
137eae0b95
|
Reduce max_factor to 0.01
|
2022-03-11 14:42:17 +08:00 |
|
Daniel Povey
|
e3e14cf7a4
|
Change min-abs threshold from 0.2 to 0.5
|
2022-03-11 14:16:33 +08:00 |
|
Daniel Povey
|
76560f255c
|
Add min-abs-value 0.2
|
2022-03-10 23:48:46 +08:00 |
|
Daniel Povey
|
2fa9c636a4
|
use nonzero threshold in DerivBalancer
|
2022-03-10 23:24:55 +08:00 |
|
Daniel Povey
|
b55472bb42
|
Replace most normalizations with scales (still have norm in conv)
|
2022-03-10 14:43:54 +08:00 |
|
Daniel Povey
|
059b57ad37
|
Add BasicNorm module
|
2022-03-10 14:32:05 +08:00 |
|
Daniel Povey
|
e2ace9d545
|
Replace norm on input layer with scale of 0.1.
|
2022-03-07 11:24:04 +08:00 |
|
Daniel Povey
|
a37d98463a
|
Restore ConvolutionModule to state before changes; change all Swish,Swish(Swish) to SwishOffset.
|
2022-03-06 11:55:02 +08:00 |
|
Daniel Povey
|
8a8b81cd18
|
Replace relu with swish-squared.
|
2022-03-05 22:21:42 +08:00 |
|
Daniel Povey
|
5f2c0a09b7
|
Convert swish nonlinearities to ReLU
|
2022-03-05 16:28:24 +08:00 |
|
Daniel Povey
|
65b09dd5f2
|
Double the threshold in brelu; slightly increase max_factor.
|
2022-03-05 00:07:14 +08:00 |
|
Daniel Povey
|
6252282fd0
|
Add deriv-balancing code
|
2022-03-04 20:19:11 +08:00 |
|
Daniel Povey
|
eb3ed54202
|
Reduce scale from 50 to 20
|
2022-03-04 15:56:45 +08:00 |
|
Daniel Povey
|
7e88999641
|
Increase scale from 20 to 50.
|
2022-03-04 14:31:29 +08:00 |
|
Daniel Povey
|
3207bd98a9
|
Increase scale on Scale from 4 to 20
|
2022-03-04 13:16:40 +08:00 |
|
Daniel Povey
|
3d9ddc2016
|
Fix backprop bug
|
2022-03-04 12:29:44 +08:00 |
|
Daniel Povey
|
bc6c720e25
|
Combine ExpScale and swish for memory reduction
|
2022-03-04 10:52:05 +08:00 |
|
Daniel Povey
|
23b3aa233c
|
Double learning rate of exp-scale units
|
2022-03-04 00:42:37 +08:00 |
|
Daniel Povey
|
5c177fc52b
|
pelu_base->expscale, add 2xExpScale in subsampling, and in feedforward units.
|
2022-03-03 23:52:03 +08:00 |
|
Daniel Povey
|
3fb559d2f0
|
Add baseline for the PeLU expt, keeping only the small normalization-related changes.
|
2022-03-02 18:27:08 +08:00 |
|
Daniel Povey
|
9ed7d55a84
|
Small bug fixes/imports
|
2022-03-02 16:34:55 +08:00 |
|
Daniel Povey
|
9d1b4ae046
|
Add pelu to this good-performing setup..
|
2022-03-02 16:33:27 +08:00 |
|
Fangjun Kuang
|
1c35ae1dba
|
Reset seed at the beginning of each epoch. (#221)
* Reset seed at the beginning of each epoch.
* Use a different seed for each epoch.
|
2022-02-21 15:16:39 +08:00 |
|