Daniel Povey
2bfc38207c
Fix constants in SwooshFunction.
2022-12-02 18:37:23 +08:00
Daniel Povey
14267a5194
Use Swoosh not DoubleSwish in zipformer; fix constants in Swoosh
2022-12-02 16:58:31 +08:00
Daniel Povey
ec10573edc
First version of swoosh
2022-12-02 16:34:53 +08:00
Daniel Povey
d260b54177
Subtract, not add, 0.025.
2022-12-02 15:55:48 +08:00
Daniel Povey
9a71406a46
Reduce offset from 0.075 to 0.025.
2022-12-02 15:40:21 +08:00
Daniel Povey
c71a3c6098
Change offset
2022-12-02 15:20:37 +08:00
Daniel Povey
f0f204552d
Add -0.05 to DoubleSwish.
2022-12-02 15:17:41 +08:00
Daniel Povey
983a690c63
Change DoubleSwish formulation, add alpha*x only for x.abs() > 0.15.
2022-12-01 17:20:56 +08:00
Daniel Povey
d682ecc246
Introduce alpha for DoubleSwish, set it to -0.05.
2022-11-30 18:58:25 +08:00
Daniel Povey
0bfd81d721
fix bug RE dims_to_mean
2022-11-28 10:42:06 +08:00
Daniel Povey
109825cafb
Fix problem with mean offset in LinearWithAuxLoss.
2022-11-28 09:46:01 +08:00
Daniel Povey
9e7add6be8
Work out alpha (scale on z) in LinearWithAuxLossFunction
2022-11-27 23:48:26 +08:00
Daniel Povey
a610011c3c
Partially revert sign_gain_factor
2022-11-27 17:18:33 +08:00
Daniel Povey
30d0bc6ad7
Make gain factor 4 times larger, for constraining the sign in ActivationBalancer.
2022-11-27 17:17:11 +08:00
Daniel Povey
ff361a7495
Change default prob on limit_param_value from 0.2 to 0.6.
2022-11-27 14:00:59 +08:00
Daniel Povey
a96b92fb54
Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now.
2022-11-26 19:38:29 +08:00
Daniel Povey
110c2601ab
Changes for speed
2022-11-26 14:38:16 +08:00
Daniel Povey
c653c66413
Undo cast in autocast mode.
2022-11-26 14:29:49 +08:00
Daniel Povey
d1ee1f2d98
Try to save memory in autocast mode.
2022-11-26 14:25:27 +08:00
Daniel Povey
7b5c0382f9
Fix to LinearWithAuxLoss for bias=False case
2022-11-26 14:16:53 +08:00
Daniel Povey
1ebc3dd158
Bug fixes to LinearWithAuxLoss
2022-11-25 16:20:28 +08:00
Daniel Povey
0a997d64c4
Fixes for half precision
2022-11-25 16:07:47 +08:00
Daniel Povey
6a91f343e9
Use LinearWithAuxLoss in squeeze-attention module
2022-11-25 16:04:51 +08:00
Daniel Povey
ee61ec63b3
Introduce schedules for whitening.
2022-11-23 19:49:34 +08:00
Daniel Povey
1d0252d420
Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
...
Below is a more complete list of the changes I am making, although some of
these may be counted in the last
numbers XXX below correspond to branches numbered scaled_adam_expXXX.
- from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
but simplified this a little to use the same dropout schedule and drop them out all together
also have all 3 submodules use separate heads.
- from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
- merge from 462->467, about using TanSwish not tanh.
- merge 462->465, remove whitening in self-attention module
- merge the part of 465->466 that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
066f1e4658
Implement TanSwish(), use it as activation in AttentionSqueeze module.
2022-11-22 23:34:11 +08:00
Daniel Povey
6c5763fbb3
Implement subtracted momentum [0.33,0.66], and print name in Whiten module.
2022-11-22 21:57:48 +08:00
Daniel Povey
4e21db07f6
Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers.
2022-11-19 22:05:10 +08:00
Daniel Povey
6ea1706e11
Fix potential/theoretical issue in backward of LimitParamValue
2022-11-14 23:31:00 +08:00
Daniel Povey
ff6431ed0f
Implement limits on parameter values a different way.
2022-11-14 16:02:38 +08:00
Daniel Povey
54048009db
Fix self.training condition
2022-11-14 15:15:24 +08:00
Daniel Povey
e1fb25262a
Refactorize the scheduling code a little
2022-11-14 14:52:27 +08:00
Daniel Povey
b32dec1119
Add printing capability
2022-11-14 14:16:28 +08:00
Daniel Povey
4c8575878a
Bug fix in ScheduledSampler
2022-11-14 13:52:14 +08:00
Daniel Povey
cd4730b657
Try to refactor the code for scheduling
2022-11-14 12:50:24 +08:00
Daniel Povey
e4a3b2da7d
Mostly-cosmetic fixes found via mypy
2022-11-09 17:40:09 +08:00
Daniel Povey
e08f5c1bce
Replace Pooling module with ModifiedSEModule
2022-11-01 14:38:06 +08:00
Daniel Povey
a067fe8026
Fix clamping of epsilon
2022-10-28 12:50:14 +08:00
Daniel Povey
7b8a0108ea
Merge branch 'scaled_adam_exp188' into scaled_adam_exp198b
2022-10-28 12:49:36 +08:00
Daniel Povey
b9f6ba1aa2
Remove some unused variables.
2022-10-28 12:01:45 +08:00
Daniel Povey
bf37c7ca85
Regularize how we apply the min and max to the eps of BasicNorm
2022-10-26 12:51:20 +08:00
Daniel Povey
78f3cba58c
Add logging about memory used.
2022-10-25 19:19:33 +08:00
Daniel Povey
6a6df19bde
Hopefully make penalize_abs_values_gt more memory efficient.
2022-10-25 18:41:33 +08:00
Daniel Povey
dbfbd8016b
Cast to float16 in DoubleSwish forward
2022-10-25 13:16:00 +08:00
Daniel Povey
36cb279318
More memory efficient backprop for DoubleSwish.
2022-10-25 12:21:22 +08:00
Daniel Povey
95aaa4a8d2
Store only half precision output for softmax.
2022-10-23 21:24:46 +08:00
Daniel Povey
d3876e32c4
Make it use float16 if in amp but use clamp to avoid wrapping error
2022-10-23 21:13:23 +08:00
Daniel Povey
85657946bb
Try a more exact way to round to uint8 that should prevent ever wrapping around to zero
2022-10-23 20:56:26 +08:00
Daniel Povey
d6aa386552
Fix randn to rand
2022-10-23 17:19:19 +08:00
Daniel Povey
e586cc319c
Change the discretization of the sigmoid to be expectation preserving.
2022-10-23 17:11:35 +08:00