Daniel Povey
|
51d8e64d66
|
Reduce min_abs on NonlinAttentionModule balancer2 to default from 0.5.
|
2022-12-06 15:32:56 +08:00 |
|
Daniel Povey
|
8f841e5b2b
|
Add another balancer for NonlinAttentionModule.
|
2022-12-06 11:11:28 +08:00 |
|
Daniel Povey
|
63e881f89b
|
Pass in dropout from train.py
|
2022-12-05 23:49:40 +08:00 |
|
Daniel Povey
|
22617da725
|
Make dropout a schedule starting at 0.3.
|
2022-12-05 23:39:24 +08:00 |
|
Daniel Povey
|
0da228c587
|
Restore the computation of valid stats.
|
2022-12-05 19:50:25 +08:00 |
|
Daniel Povey
|
178eca1c0e
|
Revert scaling, scale only grad.
|
2022-12-05 17:53:23 +08:00 |
|
Daniel Povey
|
b93cf0676a
|
Initialize Conv2dSubsampling with scale.
|
2022-12-05 17:31:56 +08:00 |
|
Daniel Povey
|
7999dd0dbe
|
Introduce scalar multiplication and change rules for updating gradient scale.
|
2022-12-05 16:15:20 +08:00 |
|
Daniel Povey
|
12fb2081b1
|
Fix deriv code
|
2022-12-04 21:22:06 +08:00 |
|
Daniel Povey
|
c57eaf7979
|
Change x coeff from -0.1 to -0.08, as in 608.
|
2022-12-04 21:15:49 +08:00 |
|
Daniel Povey
|
7b1f093077
|
Use Swoosh-R in the Conv and Swoosh-L in the feedforward.
|
2022-12-04 19:18:16 +08:00 |
|
Daniel Povey
|
d214e1c352
|
Change min_abs,max_abs from 2,10 to 1,5 for FF module
|
2022-12-04 16:15:38 +08:00 |
|
Daniel Povey
|
67812276ed
|
Change Swoosh formula so left crossing is near zero; change min_positive, max_positive of ActivationBalancer.
|
2022-12-03 15:10:03 +08:00 |
|
Daniel Povey
|
d5bfca4f49
|
Change activation of ConvolutionModule from Swoosh to Tanh, and min_abs from 1.0 to 0.75.
|
2022-12-03 14:50:09 +08:00 |
|
Daniel Povey
|
074f94f256
|
Increase ConvolutionModule min_abs from 0.4 to 1.0
|
2022-12-03 14:47:36 +08:00 |
|
Daniel Povey
|
306fd85bab
|
Reduce initial whitening-schedule limit from 7.5 to 5.0 for NonlinAttentionModule.
|
2022-12-03 14:40:39 +08:00 |
|
Daniel Povey
|
b8e3091e04
|
Increase scale_gain_factor to 0.04.
|
2022-12-03 00:48:19 +08:00 |
|
Daniel Povey
|
183fc7a76d
|
Fix to diagnostics.py
|
2022-12-03 00:22:30 +08:00 |
|
Daniel Povey
|
3faa4ffa3f
|
Revert "Change whitening limit of NonlinAttentionModule from _whitening_schedule(7.5), to 5.0"
This reverts commit dcf6fced40a831892e7bb4e52eddf26e6e070eef.
|
2022-12-03 00:19:48 +08:00 |
|
Daniel Povey
|
bd1b1dd7e3
|
Simplify formula for Swoosh and make it pass through 0; make max_abs of ConvolutionModule a constant.
|
2022-12-03 00:13:09 +08:00 |
|
Daniel Povey
|
862e5828c5
|
Set min_abs in balancer2 of ConvolutionModule to 0.4, and max_abs to 10.0 (this is a dontcare, really).
|
2022-12-02 21:02:06 +08:00 |
|
Daniel Povey
|
509467988f
|
Reduce min_abs in balancer2 of conv_module from 1.0 to 0.2.
|
2022-12-02 20:29:23 +08:00 |
|
Daniel Povey
|
84f51ab1b1
|
Bug fix in scripting mode
|
2022-12-02 20:28:17 +08:00 |
|
Daniel Povey
|
9a2a58e20d
|
Fix bug one versus zero
|
2022-12-02 19:12:18 +08:00 |
|
Daniel Povey
|
2bfc38207c
|
Fix constants in SwooshFunction.
|
2022-12-02 18:37:23 +08:00 |
|
Daniel Povey
|
14267a5194
|
Use Swoosh not DoubleSwish in zipformer; fix constants in Swoosh
|
2022-12-02 16:58:31 +08:00 |
|
Daniel Povey
|
ec10573edc
|
First version of swoosh
|
2022-12-02 16:34:53 +08:00 |
|
Daniel Povey
|
d260b54177
|
Subtract, not add, 0.025.
|
2022-12-02 15:55:48 +08:00 |
|
Daniel Povey
|
9a71406a46
|
Reduce offset from 0.075 to 0.025.
|
2022-12-02 15:40:21 +08:00 |
|
Daniel Povey
|
c71a3c6098
|
Change offset
|
2022-12-02 15:20:37 +08:00 |
|
Daniel Povey
|
f0f204552d
|
Add -0.05 to DoubleSwish.
|
2022-12-02 15:17:41 +08:00 |
|
Daniel Povey
|
4afd95d822
|
Merge branch 'scaled_adam_exp583' into scaled_adam_exp592
|
2022-12-02 15:14:11 +08:00 |
|
Daniel Povey
|
85bd9859e9
|
Use different heads for nonlin/squeeze on alternate layers
|
2022-12-02 13:17:45 +08:00 |
|
Daniel Povey
|
d8185201e9
|
Merge branch 'scaled_adam_exp569' into scaled_adam_exp585
|
2022-12-01 19:14:26 +08:00 |
|
Daniel Povey
|
4f9bb332fb
|
Use fewer hidden channels in NonlinAttentionModule
|
2022-12-01 19:14:12 +08:00 |
|
Daniel Povey
|
983a690c63
|
Change DoubleSwish formulation, add alpha*x only for x.abs() > 0.15.
|
2022-12-01 17:20:56 +08:00 |
|
Daniel Povey
|
a9798b3b75
|
Merge branch 'scaled_adam_exp564' into scaled_adam_exp577
|
2022-12-01 16:26:21 +08:00 |
|
Daniel Povey
|
12fe963ceb
|
Merge branch 'scaled_adam_exp564' into scaled_adam_exp574
|
2022-12-01 16:26:15 +08:00 |
|
Daniel Povey
|
8976e1e43b
|
Merge branch 'scaled_adam_exp564' into scaled_adam_exp573
|
2022-12-01 16:26:12 +08:00 |
|
Daniel Povey
|
691633b049
|
Merge branch 'scaled_adam_exp564' into scaled_adam_exp569
|
2022-12-01 16:25:53 +08:00 |
|
Daniel Povey
|
2102038e0e
|
Fix bug in diagnostics.py
|
2022-12-01 16:23:50 +08:00 |
|
Daniel Povey
|
d294449221
|
Fix typo 0.21->0.2
|
2022-12-01 15:29:46 +08:00 |
|
Daniel Povey
|
f0c46ce564
|
Double nonlin_skip_rate and have it last twice longer.
|
2022-12-01 15:28:44 +08:00 |
|
Daniel Povey
|
cac1a8b860
|
Merge branch 'scaled_adam_exp569' into scaled_adam_exp576
|
2022-12-01 15:20:20 +08:00 |
|
Daniel Povey
|
4621e924ba
|
Introduce dropout schedule for NonlinAttentionModule
|
2022-12-01 15:19:51 +08:00 |
|
Daniel Povey
|
dcf6fced40
|
Change whitening limit of NonlinAttentionModule from _whitening_schedule(7.5), to 5.0
|
2022-12-01 14:28:56 +08:00 |
|
Daniel Povey
|
025bcc155d
|
Change scale_min of Conv2dSubsampling from .01 to .1; some cosmetic changes/unimportant bugfixes.
|
2022-12-01 14:20:15 +08:00 |
|
Daniel Povey
|
ba31272c92
|
Change sigmoid to tanh in NonlinAttentionModule, and adjust abs limits of balancer to compensate.
|
2022-11-30 21:44:45 +08:00 |
|
Daniel Povey
|
d682ecc246
|
Introduce alpha for DoubleSwish, set it to -0.05.
|
2022-11-30 18:58:25 +08:00 |
|
Daniel Povey
|
2969eb5467
|
Fix diagnostics bug
|
2022-11-30 16:52:21 +08:00 |
|