icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-09-19 22:14:19 +00:00

Author	SHA1	Message	Date
Daniel Povey	39ce60bb7c	Decrease final value of max_abs in AttentionSqueeze from 5.0 to 1.0	2022-11-28 10:45:53 +08:00
Daniel Povey	0bfd81d721	fix bug RE dims_to_mean	2022-11-28 10:42:06 +08:00
Daniel Povey	109825cafb	Fix problem with mean offset in LinearWithAuxLoss.	2022-11-28 09:46:01 +08:00
Daniel Povey	a3b07fd098	Double aux_grad scale	2022-11-28 00:19:03 +08:00
Daniel Povey	9752778ee6	Use the same schedule for in_proj as out_proj. Only affects a couple of modules.	2022-11-28 00:09:26 +08:00
Daniel Povey	9e7add6be8	Work out alpha (scale on z) in LinearWithAuxLossFunction	2022-11-27 23:48:26 +08:00
Daniel Povey	0307252832	Bug fix	2022-11-27 21:33:37 +08:00
Daniel Povey	5128ff8797	Changes to balancer min_abs/max_abs limits.	2022-11-27 21:14:41 +08:00
Daniel Povey	a610011c3c	Partially revert sign_gain_factor	2022-11-27 17:18:33 +08:00
Daniel Povey	30d0bc6ad7	Make gain factor 4 times larger, for constraining the sign in ActivationBalancer.	2022-11-27 17:17:11 +08:00
Daniel Povey	785a524341	Increase in_abs of hidden balancer of ff modules from 0.2 to 1.0	2022-11-27 17:06:31 +08:00
Daniel Povey	ff361a7495	Change default prob on limit_param_value from 0.2 to 0.6.	2022-11-27 14:00:59 +08:00
Daniel Povey	2f4df1278d	Have aux_grad_scales for input terminate after 1k batches; double the scale on aux_grad.	2022-11-27 13:56:50 +08:00
Daniel Povey	a6fb9772a8	Remove 4 layers.	2022-11-27 13:29:29 +08:00
Daniel Povey	2e0111e6ef	Halve aux_grad_scale	2022-11-26 23:36:00 +08:00
Daniel Povey	c91014f104	Changes to balancer schedules: start max_abs from 5.0 not 4.0, start min_positive from 0.1 more consistently; finish at 8k not 12k.	2022-11-26 23:10:18 +08:00
Daniel Povey	633b6785f1	Halve final scale of aux_grad, and make schedule decrease more slowly.	2022-11-26 22:27:20 +08:00
Daniel Povey	f71b1d2c3a	Add 4 more layers	2022-11-26 21:18:24 +08:00
Daniel Povey	4874ded2e9	Introduce balancer schedules for the DoubleSwish() in feedforward and conv modules	2022-11-26 20:20:20 +08:00
Daniel Povey	320c58401f	Increase 2 feedforward dims from 1.5k to 2k.	2022-11-26 19:45:41 +08:00
Daniel Povey	9ce99b150d	Remove one attention_squeeze module; halve dimension in NonlinAttention module; put schedule on balancer of ConvolutionModule	2022-11-26 19:42:33 +08:00
Daniel Povey	a96b92fb54	Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now.	2022-11-26 19:38:29 +08:00
Daniel Povey	e19118a966	Merge branch 'scaled_adam_exp503' into scaled_adam_exp505	2022-11-26 19:29:58 +08:00
Daniel Povey	faed28ba6a	Changes for debugging/stats.	2022-11-26 18:59:15 +08:00
Daniel Povey	48d699c94b	Change for speed/memory	2022-11-26 18:42:03 +08:00
Daniel Povey	8858fb38f1	Halve expected value of aux_grad scale, and implement it more efficiently, via a scale on the prob of using it.	2022-11-26 14:52:59 +08:00
Daniel Povey	110c2601ab	Changes for speed	2022-11-26 14:38:16 +08:00
Daniel Povey	c653c66413	Undo cast in autocast mode.	2022-11-26 14:29:49 +08:00
Daniel Povey	d1ee1f2d98	Try to save memory in autocast mode.	2022-11-26 14:25:27 +08:00
Daniel Povey	7b5c0382f9	Fix to LinearWithAuxLoss for bias=False case	2022-11-26 14:16:53 +08:00
Daniel Povey	5f80807027	Add LinearWithAuxLoss in nonlin_attention and AttentionSqueeze modules.	2022-11-26 14:15:09 +08:00
Daniel Povey	4058d56c0d	Remove squeeze_excite from Conv2dSubsampling.	2022-11-26 14:04:41 +08:00
Daniel Povey	281b54e7bf	Use LinearWithAuxLoss in more places.	2022-11-26 12:25:22 +08:00
Daniel Povey	d9c7e4f216	Make the in_proj of feedforward modules also be a LinearWithAuxLoss.	2022-11-26 12:13:31 +08:00
Daniel Povey	029f5869c4	increase schedule init from 0.1 to 0.2	2022-11-25 18:06:13 +08:00
Daniel Povey	2368968114	Make out_proj of feedforward modules be a LinearWithAuxLoss, with nonzero final value at 0.01.	2022-11-25 18:00:46 +08:00
Daniel Povey	8f1ef60951	Integrate LinearWithAuxLoss into SqueezeExcite1d	2022-11-25 16:24:28 +08:00
Daniel Povey	1ebc3dd158	Bug fixes to LinearWithAuxLoss	2022-11-25 16:20:28 +08:00
Daniel Povey	0a997d64c4	Fixes for half precision	2022-11-25 16:07:47 +08:00
Daniel Povey	6a91f343e9	Use LinearWithAuxLoss in squeeze-attention module	2022-11-25 16:04:51 +08:00
Daniel Povey	ba348169bf	Change for diagnostic purposes, sigmoid of NonlinAttention.	2022-11-25 12:39:16 +08:00
Daniel Povey	0614f65428	Bug fix, remove 2nd activation in a row	2022-11-24 17:20:28 +08:00
Daniel Povey	534eca4bf3	Add 1d squeeze and excite (-like) module in Conv2dSubsampling	2022-11-24 16:18:40 +08:00
Daniel Povey	dd3826104e	Start whitening schedules for activation in NonlinAttentionModule, AttentionSqueezeModule lower; increase some whitening probs.	2022-11-24 15:25:59 +08:00
Daniel Povey	0ac26f4234	Increase initial whitening target for self_attn from 2.0 to 3.0.	2022-11-24 15:18:28 +08:00
Daniel Povey	45069175d9	Add a second whitening to the NonlinAttentionModule, after the aggregation.	2022-11-24 14:16:13 +08:00
Daniel Povey	35f0ea0015	Changes to whitening modules for memory efficiency, moving them inside; increase their prob.	2022-11-24 13:47:22 +08:00
Daniel Povey	de73e2e424	Move whitening of NonlinAttentionModule from the output to the interior just apply to the value.	2022-11-24 13:27:32 +08:00
Daniel Povey	ee61ec63b3	Introduce schedules for whitening.	2022-11-23 19:49:34 +08:00
Daniel Povey	a6657e6b40	Harmonize whitening modules, adding them to 3 submodules and changing configuration on 2 others and location in NonlinAttention.	2022-11-23 19:08:19 +08:00

... 7 8 9 10 11 ...

1816 Commits