119 Commits

Author SHA1 Message Date
Daniel Povey
56ac7354df Remove LinearWithAuxLoss; simplify schedule of prob in ActivationBalancer. 2022-12-16 15:07:42 +08:00
Daniel Povey
2d0fe7637c Memory fix in WithLoss 2022-12-11 17:20:26 +08:00
Daniel Povey
0fc646f281 Merge branch 'scaled_adam_exp663' into scaled_adam_exp665 2022-12-10 00:07:37 +08:00
Daniel Povey
d35eb7a3a6 Add cosmetic/diagnostics changes from scaled_adam_exp656. 2022-12-09 22:02:42 +08:00
Daniel Povey
5c0957d950 Fix memory issue in ActivationBalancer 2022-12-09 18:11:27 +08:00
Daniel Povey
2ef0228db0 Make the ActivationBalancer relative to the mean, limited to -min_abs..max_abs 2022-12-09 17:59:00 +08:00
Daniel Povey
3f82ee0783 Merge dropout schedule, 0.3 ... 0.1 over 20k batches 2022-12-08 18:18:46 +08:00
Daniel Povey
22617da725 Make dropout a schedule starting at 0.3. 2022-12-05 23:39:24 +08:00
Daniel Povey
178eca1c0e Revert scaling, scale only grad. 2022-12-05 17:53:23 +08:00
Daniel Povey
b93cf0676a Initialize Conv2dSubsampling with scale. 2022-12-05 17:31:56 +08:00
Daniel Povey
12fb2081b1 Fix deriv code 2022-12-04 21:22:06 +08:00
Daniel Povey
c57eaf7979 Change x coeff from -0.1 to -0.08, as in 608. 2022-12-04 21:15:49 +08:00
Daniel Povey
7b1f093077 Use Swoosh-R in the Conv and Swoosh-L in the feedforward. 2022-12-04 19:18:16 +08:00
Daniel Povey
67812276ed Change Swoosh formula so left crossing is near zero; change min_positive, max_positive of ActivationBalancer. 2022-12-03 15:10:03 +08:00
Daniel Povey
b8e3091e04 Increase scale_gain_factor to 0.04. 2022-12-03 00:48:19 +08:00
Daniel Povey
bd1b1dd7e3 Simplify formula for Swoosh and make it pass through 0; make max_abs of ConvolutionModule a constant. 2022-12-03 00:13:09 +08:00
Daniel Povey
84f51ab1b1 Bug fix in scripting mode 2022-12-02 20:28:17 +08:00
Daniel Povey
9a2a58e20d Fix bug one versus zero 2022-12-02 19:12:18 +08:00
Daniel Povey
2bfc38207c Fix constants in SwooshFunction. 2022-12-02 18:37:23 +08:00
Daniel Povey
14267a5194 Use Swoosh not DoubleSwish in zipformer; fix constants in Swoosh 2022-12-02 16:58:31 +08:00
Daniel Povey
ec10573edc First version of swoosh 2022-12-02 16:34:53 +08:00
Daniel Povey
d260b54177 Subtract, not add, 0.025. 2022-12-02 15:55:48 +08:00
Daniel Povey
9a71406a46 Reduce offset from 0.075 to 0.025. 2022-12-02 15:40:21 +08:00
Daniel Povey
c71a3c6098 Change offset 2022-12-02 15:20:37 +08:00
Daniel Povey
f0f204552d Add -0.05 to DoubleSwish. 2022-12-02 15:17:41 +08:00
Daniel Povey
983a690c63 Change DoubleSwish formulation, add alpha*x only for x.abs() > 0.15. 2022-12-01 17:20:56 +08:00
Daniel Povey
d682ecc246 Introduce alpha for DoubleSwish, set it to -0.05. 2022-11-30 18:58:25 +08:00
Daniel Povey
0bfd81d721 fix bug RE dims_to_mean 2022-11-28 10:42:06 +08:00
Daniel Povey
109825cafb Fix problem with mean offset in LinearWithAuxLoss. 2022-11-28 09:46:01 +08:00
Daniel Povey
9e7add6be8 Work out alpha (scale on z) in LinearWithAuxLossFunction 2022-11-27 23:48:26 +08:00
Daniel Povey
a610011c3c Partially revert sign_gain_factor 2022-11-27 17:18:33 +08:00
Daniel Povey
30d0bc6ad7 Make gain factor 4 times larger, for constraining the sign in ActivationBalancer. 2022-11-27 17:17:11 +08:00
Daniel Povey
ff361a7495 Change default prob on limit_param_value from 0.2 to 0.6. 2022-11-27 14:00:59 +08:00
Daniel Povey
a96b92fb54 Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now. 2022-11-26 19:38:29 +08:00
Daniel Povey
110c2601ab Changes for speed 2022-11-26 14:38:16 +08:00
Daniel Povey
c653c66413 Undo cast in autocast mode. 2022-11-26 14:29:49 +08:00
Daniel Povey
d1ee1f2d98 Try to save memory in autocast mode. 2022-11-26 14:25:27 +08:00
Daniel Povey
7b5c0382f9 Fix to LinearWithAuxLoss for bias=False case 2022-11-26 14:16:53 +08:00
Daniel Povey
1ebc3dd158 Bug fixes to LinearWithAuxLoss 2022-11-25 16:20:28 +08:00
Daniel Povey
0a997d64c4 Fixes for half precision 2022-11-25 16:07:47 +08:00
Daniel Povey
6a91f343e9 Use LinearWithAuxLoss in squeeze-attention module 2022-11-25 16:04:51 +08:00
Daniel Povey
ee61ec63b3 Introduce schedules for whitening. 2022-11-23 19:49:34 +08:00
Daniel Povey
1d0252d420 Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
Below is a more complete list of the changes I am making, although some of
these may be counted in the last

  numbers XXX below correspond to branches numbered scaled_adam_expXXX.
    - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
      but simplified this a little to use the same dropout schedule and drop them out all together
      also have all 3 submodules use separate heads.
    - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
    - merge from 462->467, about using TanSwish not tanh.
    - merge 462->465, remove whitening in self-attention module
    - merge the part of 465->466  that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
066f1e4658 Implement TanSwish(), use it as activation in AttentionSqueeze module. 2022-11-22 23:34:11 +08:00
Daniel Povey
6c5763fbb3 Implement subtracted momentum [0.33,0.66], and print name in Whiten module. 2022-11-22 21:57:48 +08:00
Daniel Povey
4e21db07f6 Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers. 2022-11-19 22:05:10 +08:00
Daniel Povey
6ea1706e11 Fix potential/theoretical issue in backward of LimitParamValue 2022-11-14 23:31:00 +08:00
Daniel Povey
ff6431ed0f Implement limits on parameter values a different way. 2022-11-14 16:02:38 +08:00
Daniel Povey
54048009db Fix self.training condition 2022-11-14 15:15:24 +08:00
Daniel Povey
e1fb25262a Refactorize the scheduling code a little 2022-11-14 14:52:27 +08:00