Daniel Povey
eec597fdd5
Merge changes from master
2022-04-02 18:45:20 +08:00
Fangjun Kuang
9a11808ed3
Set the seed for dataloader. ( #282 )
...
Also, suppress torch warnings about division by truncation.
2022-03-31 16:48:46 +08:00
Daniel Povey
87c92efbfe
Changes from upstream/master
2022-03-16 21:49:15 +08:00
Daniel Povey
a783b96467
Fix typo
2022-03-16 12:43:44 +08:00
Daniel Povey
633213424d
Rework of initialization
2022-03-16 12:42:59 +08:00
Daniel Povey
261d7602a7
Draft of 0mean changes..
2022-03-15 23:46:53 +08:00
Daniel Povey
fc873cc50d
Make epsilon in BasicNorm learnable, optionally.
2022-03-15 17:00:17 +08:00
Daniel Povey
1962fe298b
Add deriv-balancer at output of embedding.
2022-03-15 14:35:15 +08:00
Daniel Povey
86e5dcba11
Remove max-positive constraint in deriv-balancing; add second DerivBalancer in conv module.
2022-03-15 13:10:35 +08:00
Daniel Povey
a23010fc10
Add warmup mode
2022-03-14 23:04:51 +08:00
Daniel Povey
8d17a05dd2
Reduce constraints from deriv-balancer in ConvModule.
2022-03-14 19:23:33 +08:00
Daniel Povey
437e8b2083
Reduce max-abs limit from 1000 to 100; introduce 2 DerivBalancer modules in conv layer.
2022-03-13 23:31:08 +08:00
Daniel Povey
f351777e9c
Remove ExpScale in feedforward layes.
2022-03-13 17:29:39 +08:00
Daniel Povey
97c0bb82d3
Change dir name
2022-03-13 13:19:20 +08:00
Daniel Povey
e6a501d3c8
Add max-abs-value constraint in DerivBalancer
2022-03-13 11:52:13 +08:00
Daniel Povey
6042c96db2
Use learnable scales for joiner and decoder
2022-03-12 20:54:46 +08:00
Daniel Povey
a392cb9fbc
Reduce initial scaling of modules
2022-03-12 16:53:03 +08:00
Daniel Povey
d906bc2a4f
Change dir name
2022-03-12 15:38:39 +08:00
Daniel Povey
76a2b9d362
Add learnable post-scale for mha
2022-03-12 11:19:49 +08:00
Daniel Povey
cc558faf26
Fix scale from 0.5 to 2.0 as I really intended..
2022-03-11 19:11:50 +08:00
Daniel Povey
98156711ef
Introduce in_scale=0.5 for SwishExpScale
2022-03-11 19:07:34 +08:00
Daniel Povey
5eafccb369
Change how scales are applied; fix residual bug
2022-03-11 17:46:33 +08:00
Daniel Povey
bec33e6855
init 1st conv module to smaller variance
2022-03-11 16:37:17 +08:00
Daniel Povey
ab9a17413a
Scale up pos_bias_u and pos_bias_v before use.
2022-03-11 14:37:52 +08:00
Daniel Povey
e3e14cf7a4
Change min-abs threshold from 0.2 to 0.5
2022-03-11 14:16:33 +08:00
Daniel Povey
bfce5f63e4
Fix dirname
2022-03-10 23:49:09 +08:00
Daniel Povey
76560f255c
Add min-abs-value 0.2
2022-03-10 23:48:46 +08:00
Daniel Povey
2fa9c636a4
use nonzero threshold in DerivBalancer
2022-03-10 23:24:55 +08:00
Daniel Povey
425e274c82
Replace norm in ConvolutionModule with a scaling factor.
2022-03-10 16:01:53 +08:00
Daniel Povey
87b843f023
Change exp dir
2022-03-10 14:44:55 +08:00
Daniel Povey
e2ace9d545
Replace norm on input layer with scale of 0.1.
2022-03-07 11:24:04 +08:00
Daniel Povey
a37d98463a
Restore ConvolutionModule to state before changes; change all Swish,Swish(Swish) to SwishOffset.
2022-03-06 11:55:02 +08:00
Daniel Povey
8a8b81cd18
Replace relu with swish-squared.
2022-03-05 22:21:42 +08:00
Daniel Povey
5f2c0a09b7
Convert swish nonlinearities to ReLU
2022-03-05 16:28:24 +08:00
Daniel Povey
0cd14ae739
Fix exp dir
2022-03-05 12:17:09 +08:00
Daniel Povey
6252282fd0
Add deriv-balancing code
2022-03-04 20:19:11 +08:00
Daniel Povey
9cc5999829
Fix duplicate Swish; replace norm+swish with swish+exp-scale in convolution module
2022-03-04 15:50:51 +08:00
yaozengwei
ad62981765
Add diagnostics ( #230 )
...
* Adding diagnostics code...
* Move diagnostics code from local dir to the shared icefall dir
* Remove the diagnostics code in the local dir
* Update docs of arguments, and remove stats_types() function in TensorDiagnosticOptions object.
* Update docs of arguments.
* Add copyright information.
* Corrected the time in copyright information.
Co-authored-by: Daniel Povey <dpovey@gmail.com>
2022-03-04 15:38:23 +08:00
Daniel Povey
7e88999641
Increase scale from 20 to 50.
2022-03-04 14:31:29 +08:00
Daniel Povey
3207bd98a9
Increase scale on Scale from 4 to 20
2022-03-04 13:16:40 +08:00
Daniel Povey
23b3aa233c
Double learning rate of exp-scale units
2022-03-04 00:42:37 +08:00
Daniel Povey
5c177fc52b
pelu_base->expscale, add 2xExpScale in subsampling, and in feedforward units.
2022-03-03 23:52:03 +08:00
Daniel Povey
3fb559d2f0
Add baseline for the PeLU expt, keeping only the small normalization-related changes.
2022-03-02 18:27:08 +08:00
Daniel Povey
9d1b4ae046
Add pelu to this good-performing setup..
2022-03-02 16:33:27 +08:00
Daniel Povey
c1063def95
First version of rand-combine iterated-training-like idea.
2022-02-27 17:34:58 +08:00
Daniel Povey
581786a6d3
Adding diagnostics code...
2022-02-27 13:44:43 +08:00
Fangjun Kuang
1c35ae1dba
Reset seed at the beginning of each epoch. ( #221 )
...
* Reset seed at the beginning of each epoch.
* Use a different seed for each epoch.
2022-02-21 15:16:39 +08:00
Daniel Povey
2af1b3af98
Remove ReLU in attention
2022-02-14 19:39:19 +08:00
Daniel Povey
d187ad8b73
Change max_frames from 0.2 to 0.15
2022-02-11 16:24:17 +08:00
Daniel Povey
4cd2c02fff
Fix num_time_masks code; revert 0.8 to 0.9
2022-02-10 15:53:11 +08:00