Daniel Povey
|
494139d27a
|
Replace BasicNorm of encoder layers with ConvNorm1d
|
2022-12-20 19:15:14 +08:00 |
|
Daniel Povey
|
5e1bf8b8ec
|
Add BasicNorm to ConvNeXt; increase prob given to CutoffEstimator; adjust default probs of ActivationBalancer.
|
2022-12-18 14:14:15 +08:00 |
|
Daniel Povey
|
dfeafd6aa8
|
Remove print statement in CutoffEstimator
|
2022-12-17 16:28:45 +08:00 |
|
Daniel Povey
|
29df07ba2c
|
Add memory cutoff on ActivationBalancer and Whiten
|
2022-12-17 16:20:15 +08:00 |
|
Daniel Povey
|
744dca1c9b
|
Merge branch 'scaled_adam_exp724' into scaled_adam_exp726
|
2022-12-17 15:46:57 +08:00 |
|
Daniel Povey
|
b9326e1ef2
|
Fix to print statement
|
2022-12-16 18:07:43 +08:00 |
|
Daniel Povey
|
8e6c7ef3e2
|
Adjust default prob of ActivationBalancer.
|
2022-12-16 15:08:46 +08:00 |
|
Daniel Povey
|
56ac7354df
|
Remove LinearWithAuxLoss; simplify schedule of prob in ActivationBalancer.
|
2022-12-16 15:07:42 +08:00 |
|
Daniel Povey
|
083e5474c4
|
Reduce ConvNeXt parameters.
|
2022-12-16 00:21:04 +08:00 |
|
Daniel Povey
|
8d9301e225
|
Remove potentially wrong typing info
|
2022-12-15 23:47:41 +08:00 |
|
Daniel Povey
|
6caaa4e9c6
|
Bug fix in caching_eval, may make no difference.
|
2022-12-15 23:32:29 +08:00 |
|
Daniel Povey
|
f5d4fb092d
|
Bug fix in caching_eval
|
2022-12-15 23:24:36 +08:00 |
|
Daniel Povey
|
d26ee2bf81
|
Try to implement caching evaluation for memory efficient training
|
2022-12-15 23:06:40 +08:00 |
|
Daniel Povey
|
f66c1600f4
|
Bug fix to printing code
|
2022-12-15 21:55:23 +08:00 |
|
Daniel Povey
|
2d0fe7637c
|
Memory fix in WithLoss
|
2022-12-11 17:20:26 +08:00 |
|
Daniel Povey
|
0fc646f281
|
Merge branch 'scaled_adam_exp663' into scaled_adam_exp665
|
2022-12-10 00:07:37 +08:00 |
|
Daniel Povey
|
d35eb7a3a6
|
Add cosmetic/diagnostics changes from scaled_adam_exp656.
|
2022-12-09 22:02:42 +08:00 |
|
Daniel Povey
|
5c0957d950
|
Fix memory issue in ActivationBalancer
|
2022-12-09 18:11:27 +08:00 |
|
Daniel Povey
|
2ef0228db0
|
Make the ActivationBalancer relative to the mean, limited to -min_abs..max_abs
|
2022-12-09 17:59:00 +08:00 |
|
Daniel Povey
|
3f82ee0783
|
Merge dropout schedule, 0.3 ... 0.1 over 20k batches
|
2022-12-08 18:18:46 +08:00 |
|
Daniel Povey
|
22617da725
|
Make dropout a schedule starting at 0.3.
|
2022-12-05 23:39:24 +08:00 |
|
Daniel Povey
|
178eca1c0e
|
Revert scaling, scale only grad.
|
2022-12-05 17:53:23 +08:00 |
|
Daniel Povey
|
b93cf0676a
|
Initialize Conv2dSubsampling with scale.
|
2022-12-05 17:31:56 +08:00 |
|
Daniel Povey
|
12fb2081b1
|
Fix deriv code
|
2022-12-04 21:22:06 +08:00 |
|
Daniel Povey
|
c57eaf7979
|
Change x coeff from -0.1 to -0.08, as in 608.
|
2022-12-04 21:15:49 +08:00 |
|
Daniel Povey
|
7b1f093077
|
Use Swoosh-R in the Conv and Swoosh-L in the feedforward.
|
2022-12-04 19:18:16 +08:00 |
|
Daniel Povey
|
67812276ed
|
Change Swoosh formula so left crossing is near zero; change min_positive, max_positive of ActivationBalancer.
|
2022-12-03 15:10:03 +08:00 |
|
Daniel Povey
|
b8e3091e04
|
Increase scale_gain_factor to 0.04.
|
2022-12-03 00:48:19 +08:00 |
|
Daniel Povey
|
bd1b1dd7e3
|
Simplify formula for Swoosh and make it pass through 0; make max_abs of ConvolutionModule a constant.
|
2022-12-03 00:13:09 +08:00 |
|
Daniel Povey
|
84f51ab1b1
|
Bug fix in scripting mode
|
2022-12-02 20:28:17 +08:00 |
|
Daniel Povey
|
9a2a58e20d
|
Fix bug one versus zero
|
2022-12-02 19:12:18 +08:00 |
|
Daniel Povey
|
2bfc38207c
|
Fix constants in SwooshFunction.
|
2022-12-02 18:37:23 +08:00 |
|
Daniel Povey
|
14267a5194
|
Use Swoosh not DoubleSwish in zipformer; fix constants in Swoosh
|
2022-12-02 16:58:31 +08:00 |
|
Daniel Povey
|
ec10573edc
|
First version of swoosh
|
2022-12-02 16:34:53 +08:00 |
|
Daniel Povey
|
d260b54177
|
Subtract, not add, 0.025.
|
2022-12-02 15:55:48 +08:00 |
|
Daniel Povey
|
9a71406a46
|
Reduce offset from 0.075 to 0.025.
|
2022-12-02 15:40:21 +08:00 |
|
Daniel Povey
|
c71a3c6098
|
Change offset
|
2022-12-02 15:20:37 +08:00 |
|
Daniel Povey
|
f0f204552d
|
Add -0.05 to DoubleSwish.
|
2022-12-02 15:17:41 +08:00 |
|
Daniel Povey
|
983a690c63
|
Change DoubleSwish formulation, add alpha*x only for x.abs() > 0.15.
|
2022-12-01 17:20:56 +08:00 |
|
Daniel Povey
|
d682ecc246
|
Introduce alpha for DoubleSwish, set it to -0.05.
|
2022-11-30 18:58:25 +08:00 |
|
Daniel Povey
|
0bfd81d721
|
fix bug RE dims_to_mean
|
2022-11-28 10:42:06 +08:00 |
|
Daniel Povey
|
109825cafb
|
Fix problem with mean offset in LinearWithAuxLoss.
|
2022-11-28 09:46:01 +08:00 |
|
Daniel Povey
|
9e7add6be8
|
Work out alpha (scale on z) in LinearWithAuxLossFunction
|
2022-11-27 23:48:26 +08:00 |
|
Daniel Povey
|
a610011c3c
|
Partially revert sign_gain_factor
|
2022-11-27 17:18:33 +08:00 |
|
Daniel Povey
|
30d0bc6ad7
|
Make gain factor 4 times larger, for constraining the sign in ActivationBalancer.
|
2022-11-27 17:17:11 +08:00 |
|
Daniel Povey
|
ff361a7495
|
Change default prob on limit_param_value from 0.2 to 0.6.
|
2022-11-27 14:00:59 +08:00 |
|
Daniel Povey
|
a96b92fb54
|
Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now.
|
2022-11-26 19:38:29 +08:00 |
|
Daniel Povey
|
110c2601ab
|
Changes for speed
|
2022-11-26 14:38:16 +08:00 |
|
Daniel Povey
|
c653c66413
|
Undo cast in autocast mode.
|
2022-11-26 14:29:49 +08:00 |
|
Daniel Povey
|
d1ee1f2d98
|
Try to save memory in autocast mode.
|
2022-11-26 14:25:27 +08:00 |
|