icefall

mirror of https://github.com/k2-fsa/icefall.git synced 2025-09-20 06:24:21 +00:00

Author	SHA1	Message	Date
Daniel Povey	9ceb41acb4	Remove balancer from SelfAttention module.	2022-11-23 18:41:36 +08:00
Daniel Povey	f2dbf87461	Remove invocation of out_balancer	2022-11-23 18:40:27 +08:00
Daniel Povey	b88f12fe83	Remove out_balancer of NonlinAttentionModule	2022-11-23 18:37:45 +08:00
Daniel Povey	9138695dfe	Fix bug RE attn_weights	2022-11-23 17:04:17 +08:00
Daniel Povey	36e49a8d61	Change for mem efficiency	2022-11-23 15:38:34 +08:00
Daniel Povey	1d0252d420	Merge branch 'scaled_adam_exp466' into scaled_adam_exp472. Below is a more complete list of the changes I am making, although some of these may be counted in the last numbers XXX below correspond to branches numbered scaled_adam_expXXX. - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules, but simplified this a little to use the same dropout schedule and drop them out all together also have all 3 submodules use separate heads. - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module. - merge from 462->467, about using TanSwish not tanh. - merge 462->465, remove whitening in self-attention module - merge the part of 465->466 that was about diagnostics (name in Whiten module)	2022-11-23 14:41:09 +08:00
Daniel Povey	f89a85aed8	Merge branch 'scaled_adam_exp465' into scaled_adam_exp472	2022-11-23 14:16:17 +08:00
Daniel Povey	edd4bf5312	Merge branch 'scaled_adam_exp467' into scaled_adam_exp472	2022-11-23 14:13:19 +08:00
Daniel Povey	d95571eacf	From 460->461, revert change about balancing output of attention_squeeze module.	2022-11-23 14:12:08 +08:00
Daniel Povey	fe51eea397	Implement a form of dropout for squeeze_weights, dropout-to-constant.	2022-11-23 14:06:17 +08:00
Daniel Povey	066f1e4658	Implement TanSwish(), use it as activation in AttentionSqueeze module.	2022-11-22 23:34:11 +08:00
Daniel Povey	1826648dde	Fix formulas and constants	2022-11-22 22:54:05 +08:00
Daniel Povey	6c5763fbb3	Implement subtracted momentum [0.33,0.66], and print name in Whiten module.	2022-11-22 21:57:48 +08:00
Daniel Povey	1a2632d0a2	Remove whitening in SelfAttention module.	2022-11-22 20:01:09 +08:00
Daniel Povey	99cd9f5788	Add more layers.	2022-11-22 19:48:42 +08:00
Daniel Povey	19683aa516	Change activation in bottleneck to Tanh.	2022-11-22 17:32:02 +08:00
Daniel Povey	8dfeaa5f92	Restore whitener tha was in the AttentionSqueeze module.	2022-11-22 15:45:53 +08:00
Daniel Povey	7acdaea085	Change balancer to whitener for ff module; tighther min/max-pos limit on NonlinAttentionModule; whitener->balancer for AttentionSqueeze.	2022-11-22 15:42:41 +08:00
Daniel Povey	26916f41e7	Add balancer at output of FeedforwardModule	2022-11-22 14:43:46 +08:00
Daniel Povey	fe1793e288	Add output balancer to NonlinAttentionModule.	2022-11-22 14:29:07 +08:00
Daniel Povey	71f118e725	Use 2 groups in whitening for NonlinAttentionModule; limit 40->20.	2022-11-21 23:23:41 +08:00
Daniel Povey	b3b5e8b9b9	Increase Whiten limit from 10.0 to 40.0.	2022-11-21 22:19:45 +08:00
Daniel Povey	56efdcda49	Reduce whitening limit to 10 and move it to the beginning.	2022-11-21 21:07:32 +08:00
Daniel Povey	584f5bf88c	Also add balancer in NonlinAttentionModule	2022-11-21 18:25:24 +08:00
Daniel Povey	0504f705ec	Add Whiten module in NonlinAttentionModule	2022-11-21 18:19:52 +08:00
Daniel Povey	211e3af680	Remove changes in previous merge commit that did not relate to length_factor.	2022-11-21 14:32:05 +08:00
Daniel Povey	a6770657c8	Merge branch 'scaled_adam_exp445' into scaled_adam_exp450	2022-11-21 14:29:50 +08:00
Daniel Povey	836c72dd36	Changes and bug-fixes RE balancers; restore activation in AttentionSqueeze, remove in NonlinAttention.	2022-11-21 14:29:36 +08:00
Daniel Povey	9fe6add587	Fix to diagnostics.py (fix for max being doubled), from scaled_adam_exp446; small cosmetic fixes.	2022-11-21 14:00:55 +08:00
Daniel Povey	a10a0bce7d	Increase length_factor from 1.5 to 3.0.	2022-11-20 16:36:18 +08:00
Daniel Povey	cdfbbdded2	Refactoring, and change length_factor from 2.0 to 1.5.	2022-11-20 16:34:51 +08:00
Daniel Povey	a52ec3da28	Change feedforward dims: increase 1536->1792 for largest ff dim and move it one step later.	2022-11-20 14:24:41 +08:00
Daniel Povey	31b2a735b8	Move feedforward1 to the beginning, separating it from small_conv_module.	2022-11-20 13:17:39 +08:00
Daniel Povey	40c883343a	Merge branch 'scaled_adam_exp439' into scaled_adam_exp440	2022-11-20 13:08:00 +08:00
Daniel Povey	cf16c96edd	Merge branch 'scaled_adam_exp433' into scaled_adam_exp440	2022-11-20 13:07:35 +08:00
Daniel Povey	8b3303594c	Revert 419->420 change, regarding random shift in pos embedding	2022-11-20 13:07:20 +08:00
Daniel Povey	4e21db07f6	Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers.	2022-11-19 22:05:10 +08:00
Daniel Povey	d23fda7c5f	Multiply length_factor by 2.0.	2022-11-19 13:36:16 +08:00
Daniel Povey	b9871cc4f5	Merge branch 'scaled_adam_exp420' into scaled_adam_exp421	2022-11-18 14:54:36 +08:00
Daniel Povey	0601dd72fd	Bug-fix RE random shift	2022-11-18 14:53:03 +08:00
Daniel Povey	8a095c1cd1	Add SmallConvModule; decrease feedforward dims to keep about same num params.	2022-11-18 12:46:40 +08:00
Daniel Povey	f7c99ed1d1	Introduce random shift with stddev=1.0 into pos_emb	2022-11-18 12:06:12 +08:00
Daniel Povey	e9806950f5	Reduce pos-dim from 96 to 48.	2022-11-17 23:42:39 +08:00
Daniel Povey	8b50932d5a	Merge branch 'scaled_adam_exp416' into scaled_adam_exp418	2022-11-17 18:34:07 +08:00
Daniel Povey	e73ced1607	Bug fix in formula for pos embedding	2022-11-17 16:02:57 +08:00
Daniel Povey	48f32971f3	Reduce final pos_emb_skip rate from 0.075 to 0.0, and add dropout=0.15 for pos embedding module	2022-11-17 14:33:54 +08:00
Daniel Povey	27f8497fea	Reduce pos_dim from 128 to 96.	2022-11-17 10:39:36 +08:00
Daniel Povey	526b5e59a6	Increase pos-head-dim from 2 to 4.	2022-11-16 11:53:55 +08:00
Daniel Povey	fc74ff63fb	Remove one feedforward module and give params to the other 2.	2022-11-16 11:46:05 +08:00
Daniel Povey	3d47335ab6	Double the duration of layer skipping warmup, from 2k to 4k.	2022-11-16 11:41:48 +08:00

... 8 9 10 11 12 ...

1816 Commits