icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-12-11 06:55:27 +00:00

Author	SHA1	Message	Date
Daniel Povey	56ac7354df	Remove LinearWithAuxLoss; simplify schedule of prob in ActivationBalancer.	2022-12-16 15:07:42 +08:00
Daniel Povey	2d0fe7637c	Memory fix in WithLoss	2022-12-11 17:20:26 +08:00
Daniel Povey	0fc646f281	Merge branch 'scaled_adam_exp663' into scaled_adam_exp665	2022-12-10 00:07:37 +08:00
Daniel Povey	d35eb7a3a6	Add cosmetic/diagnostics changes from scaled_adam_exp656.	2022-12-09 22:02:42 +08:00
Daniel Povey	5c0957d950	Fix memory issue in ActivationBalancer	2022-12-09 18:11:27 +08:00
Daniel Povey	2ef0228db0	Make the ActivationBalancer relative to the mean, limited to -min_abs..max_abs	2022-12-09 17:59:00 +08:00
Daniel Povey	3f82ee0783	Merge dropout schedule, 0.3 ... 0.1 over 20k batches	2022-12-08 18:18:46 +08:00
Daniel Povey	22617da725	Make dropout a schedule starting at 0.3.	2022-12-05 23:39:24 +08:00
Daniel Povey	178eca1c0e	Revert scaling, scale only grad.	2022-12-05 17:53:23 +08:00
Daniel Povey	b93cf0676a	Initialize Conv2dSubsampling with scale.	2022-12-05 17:31:56 +08:00
Daniel Povey	12fb2081b1	Fix deriv code	2022-12-04 21:22:06 +08:00
Daniel Povey	c57eaf7979	Change x coeff from -0.1 to -0.08, as in 608.	2022-12-04 21:15:49 +08:00
Daniel Povey	7b1f093077	Use Swoosh-R in the Conv and Swoosh-L in the feedforward.	2022-12-04 19:18:16 +08:00
Daniel Povey	67812276ed	Change Swoosh formula so left crossing is near zero; change min_positive, max_positive of ActivationBalancer.	2022-12-03 15:10:03 +08:00
Daniel Povey	b8e3091e04	Increase scale_gain_factor to 0.04.	2022-12-03 00:48:19 +08:00
Daniel Povey	bd1b1dd7e3	Simplify formula for Swoosh and make it pass through 0; make max_abs of ConvolutionModule a constant.	2022-12-03 00:13:09 +08:00
Daniel Povey	84f51ab1b1	Bug fix in scripting mode	2022-12-02 20:28:17 +08:00
Daniel Povey	9a2a58e20d	Fix bug one versus zero	2022-12-02 19:12:18 +08:00
Daniel Povey	2bfc38207c	Fix constants in SwooshFunction.	2022-12-02 18:37:23 +08:00
Daniel Povey	14267a5194	Use Swoosh not DoubleSwish in zipformer; fix constants in Swoosh	2022-12-02 16:58:31 +08:00
Daniel Povey	ec10573edc	First version of swoosh	2022-12-02 16:34:53 +08:00
Daniel Povey	d260b54177	Subtract, not add, 0.025.	2022-12-02 15:55:48 +08:00
Daniel Povey	9a71406a46	Reduce offset from 0.075 to 0.025.	2022-12-02 15:40:21 +08:00
Daniel Povey	c71a3c6098	Change offset	2022-12-02 15:20:37 +08:00
Daniel Povey	f0f204552d	Add -0.05 to DoubleSwish.	2022-12-02 15:17:41 +08:00
Daniel Povey	983a690c63	Change DoubleSwish formulation, add alpha*x only for x.abs() > 0.15.	2022-12-01 17:20:56 +08:00
Daniel Povey	d682ecc246	Introduce alpha for DoubleSwish, set it to -0.05.	2022-11-30 18:58:25 +08:00
Daniel Povey	0bfd81d721	fix bug RE dims_to_mean	2022-11-28 10:42:06 +08:00
Daniel Povey	109825cafb	Fix problem with mean offset in LinearWithAuxLoss.	2022-11-28 09:46:01 +08:00
Daniel Povey	9e7add6be8	Work out alpha (scale on z) in LinearWithAuxLossFunction	2022-11-27 23:48:26 +08:00
Daniel Povey	a610011c3c	Partially revert sign_gain_factor	2022-11-27 17:18:33 +08:00
Daniel Povey	30d0bc6ad7	Make gain factor 4 times larger, for constraining the sign in ActivationBalancer.	2022-11-27 17:17:11 +08:00
Daniel Povey	ff361a7495	Change default prob on limit_param_value from 0.2 to 0.6.	2022-11-27 14:00:59 +08:00
Daniel Povey	a96b92fb54	Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now.	2022-11-26 19:38:29 +08:00
Daniel Povey	110c2601ab	Changes for speed	2022-11-26 14:38:16 +08:00
Daniel Povey	c653c66413	Undo cast in autocast mode.	2022-11-26 14:29:49 +08:00
Daniel Povey	d1ee1f2d98	Try to save memory in autocast mode.	2022-11-26 14:25:27 +08:00
Daniel Povey	7b5c0382f9	Fix to LinearWithAuxLoss for bias=False case	2022-11-26 14:16:53 +08:00
Daniel Povey	1ebc3dd158	Bug fixes to LinearWithAuxLoss	2022-11-25 16:20:28 +08:00
Daniel Povey	0a997d64c4	Fixes for half precision	2022-11-25 16:07:47 +08:00
Daniel Povey	6a91f343e9	Use LinearWithAuxLoss in squeeze-attention module	2022-11-25 16:04:51 +08:00
Daniel Povey	ee61ec63b3	Introduce schedules for whitening.	2022-11-23 19:49:34 +08:00
Daniel Povey	1d0252d420	Merge branch 'scaled_adam_exp466' into scaled_adam_exp472. Below is a more complete list of the changes I am making, although some of these may be counted in the last numbers XXX below correspond to branches numbered scaled_adam_expXXX. - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules, but simplified this a little to use the same dropout schedule and drop them out all together also have all 3 submodules use separate heads. - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module. - merge from 462->467, about using TanSwish not tanh. - merge 462->465, remove whitening in self-attention module - merge the part of 465->466 that was about diagnostics (name in Whiten module)	2022-11-23 14:41:09 +08:00
Daniel Povey	066f1e4658	Implement TanSwish(), use it as activation in AttentionSqueeze module.	2022-11-22 23:34:11 +08:00
Daniel Povey	6c5763fbb3	Implement subtracted momentum [0.33,0.66], and print name in Whiten module.	2022-11-22 21:57:48 +08:00
Daniel Povey	4e21db07f6	Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers.	2022-11-19 22:05:10 +08:00
Daniel Povey	6ea1706e11	Fix potential/theoretical issue in backward of LimitParamValue	2022-11-14 23:31:00 +08:00
Daniel Povey	ff6431ed0f	Implement limits on parameter values a different way.	2022-11-14 16:02:38 +08:00
Daniel Povey	54048009db	Fix self.training condition	2022-11-14 15:15:24 +08:00
Daniel Povey	e1fb25262a	Refactorize the scheduling code a little	2022-11-14 14:52:27 +08:00

1 2 3

119 Commits