icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-09-19 14:04:19 +00:00

Author	SHA1	Message	Date
Daniel Povey	8858fb38f1	Halve expected value of aux_grad scale, and implement it more efficiently, via a scale on the prob of using it.	2022-11-26 14:52:59 +08:00
Daniel Povey	110c2601ab	Changes for speed	2022-11-26 14:38:16 +08:00
Daniel Povey	c653c66413	Undo cast in autocast mode.	2022-11-26 14:29:49 +08:00
Daniel Povey	d1ee1f2d98	Try to save memory in autocast mode.	2022-11-26 14:25:27 +08:00
Daniel Povey	7b5c0382f9	Fix to LinearWithAuxLoss for bias=False case	2022-11-26 14:16:53 +08:00
Daniel Povey	5f80807027	Add LinearWithAuxLoss in nonlin_attention and AttentionSqueeze modules.	2022-11-26 14:15:09 +08:00
Daniel Povey	4058d56c0d	Remove squeeze_excite from Conv2dSubsampling.	2022-11-26 14:04:41 +08:00
Daniel Povey	281b54e7bf	Use LinearWithAuxLoss in more places.	2022-11-26 12:25:22 +08:00
Daniel Povey	d9c7e4f216	Make the in_proj of feedforward modules also be a LinearWithAuxLoss.	2022-11-26 12:13:31 +08:00
Daniel Povey	029f5869c4	increase schedule init from 0.1 to 0.2	2022-11-25 18:06:13 +08:00
Daniel Povey	2368968114	Make out_proj of feedforward modules be a LinearWithAuxLoss, with nonzero final value at 0.01.	2022-11-25 18:00:46 +08:00
Daniel Povey	8f1ef60951	Integrate LinearWithAuxLoss into SqueezeExcite1d	2022-11-25 16:24:28 +08:00
Daniel Povey	1ebc3dd158	Bug fixes to LinearWithAuxLoss	2022-11-25 16:20:28 +08:00
Daniel Povey	0a997d64c4	Fixes for half precision	2022-11-25 16:07:47 +08:00
Daniel Povey	6a91f343e9	Use LinearWithAuxLoss in squeeze-attention module	2022-11-25 16:04:51 +08:00
Daniel Povey	ba348169bf	Change for diagnostic purposes, sigmoid of NonlinAttention.	2022-11-25 12:39:16 +08:00
Daniel Povey	0614f65428	Bug fix, remove 2nd activation in a row	2022-11-24 17:20:28 +08:00
Daniel Povey	534eca4bf3	Add 1d squeeze and excite (-like) module in Conv2dSubsampling	2022-11-24 16:18:40 +08:00
Daniel Povey	dd3826104e	Start whitening schedules for activation in NonlinAttentionModule, AttentionSqueezeModule lower; increase some whitening probs.	2022-11-24 15:25:59 +08:00
Daniel Povey	0ac26f4234	Increase initial whitening target for self_attn from 2.0 to 3.0.	2022-11-24 15:18:28 +08:00
Daniel Povey	45069175d9	Add a second whitening to the NonlinAttentionModule, after the aggregation.	2022-11-24 14:16:13 +08:00
Daniel Povey	35f0ea0015	Changes to whitening modules for memory efficiency, moving them inside; increase their prob.	2022-11-24 13:47:22 +08:00
Daniel Povey	de73e2e424	Move whitening of NonlinAttentionModule from the output to the interior just apply to the value.	2022-11-24 13:27:32 +08:00
Daniel Povey	ee61ec63b3	Introduce schedules for whitening.	2022-11-23 19:49:34 +08:00
Daniel Povey	a6657e6b40	Harmonize whitening modules, adding them to 3 submodules and changing configuration on 2 others and location in NonlinAttention.	2022-11-23 19:08:19 +08:00
Daniel Povey	9ceb41acb4	Remove balancer from SelfAttention module.	2022-11-23 18:41:36 +08:00
Daniel Povey	f2dbf87461	Remove invocation of out_balancer	2022-11-23 18:40:27 +08:00
Daniel Povey	b88f12fe83	Remove out_balancer of NonlinAttentionModule	2022-11-23 18:37:45 +08:00
Daniel Povey	9138695dfe	Fix bug RE attn_weights	2022-11-23 17:04:17 +08:00
Daniel Povey	36e49a8d61	Change for mem efficiency	2022-11-23 15:38:34 +08:00
Daniel Povey	1d0252d420	Merge branch 'scaled_adam_exp466' into scaled_adam_exp472. Below is a more complete list of the changes I am making, although some of these may be counted in the last numbers XXX below correspond to branches numbered scaled_adam_expXXX. - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules, but simplified this a little to use the same dropout schedule and drop them out all together also have all 3 submodules use separate heads. - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module. - merge from 462->467, about using TanSwish not tanh. - merge 462->465, remove whitening in self-attention module - merge the part of 465->466 that was about diagnostics (name in Whiten module)	2022-11-23 14:41:09 +08:00
Daniel Povey	f89a85aed8	Merge branch 'scaled_adam_exp465' into scaled_adam_exp472	2022-11-23 14:16:17 +08:00
Daniel Povey	edd4bf5312	Merge branch 'scaled_adam_exp467' into scaled_adam_exp472	2022-11-23 14:13:19 +08:00
Daniel Povey	d95571eacf	From 460->461, revert change about balancing output of attention_squeeze module.	2022-11-23 14:12:08 +08:00
Daniel Povey	fe51eea397	Implement a form of dropout for squeeze_weights, dropout-to-constant.	2022-11-23 14:06:17 +08:00
Daniel Povey	066f1e4658	Implement TanSwish(), use it as activation in AttentionSqueeze module.	2022-11-22 23:34:11 +08:00
Daniel Povey	1826648dde	Fix formulas and constants	2022-11-22 22:54:05 +08:00
Daniel Povey	6c5763fbb3	Implement subtracted momentum [0.33,0.66], and print name in Whiten module.	2022-11-22 21:57:48 +08:00
Daniel Povey	1a2632d0a2	Remove whitening in SelfAttention module.	2022-11-22 20:01:09 +08:00
Daniel Povey	99cd9f5788	Add more layers.	2022-11-22 19:48:42 +08:00
Daniel Povey	19683aa516	Change activation in bottleneck to Tanh.	2022-11-22 17:32:02 +08:00
Daniel Povey	8dfeaa5f92	Restore whitener tha was in the AttentionSqueeze module.	2022-11-22 15:45:53 +08:00
Daniel Povey	7acdaea085	Change balancer to whitener for ff module; tighther min/max-pos limit on NonlinAttentionModule; whitener->balancer for AttentionSqueeze.	2022-11-22 15:42:41 +08:00
Daniel Povey	26916f41e7	Add balancer at output of FeedforwardModule	2022-11-22 14:43:46 +08:00
Daniel Povey	fe1793e288	Add output balancer to NonlinAttentionModule.	2022-11-22 14:29:07 +08:00
Daniel Povey	71f118e725	Use 2 groups in whitening for NonlinAttentionModule; limit 40->20.	2022-11-21 23:23:41 +08:00
Daniel Povey	b3b5e8b9b9	Increase Whiten limit from 10.0 to 40.0.	2022-11-21 22:19:45 +08:00
Daniel Povey	56efdcda49	Reduce whitening limit to 10 and move it to the beginning.	2022-11-21 21:07:32 +08:00
Daniel Povey	584f5bf88c	Also add balancer in NonlinAttentionModule	2022-11-21 18:25:24 +08:00
Daniel Povey	0504f705ec	Add Whiten module in NonlinAttentionModule	2022-11-21 18:19:52 +08:00

... 9 10 11 12 13 ...

1891 Commits