icefall

Author	SHA1	Message	Date
Daniel Povey	fc74ff63fb	Remove one feedforward module and give params to the other 2.	2022-11-16 11:46:05 +08:00
Daniel Povey	3d47335ab6	Double the duration of layer skipping warmup, from 2k to 4k.	2022-11-16 11:41:48 +08:00
Daniel Povey	22a1401f36	Remove self_attn1 module	2022-11-16 11:37:08 +08:00
Daniel Povey	000af07a2a	Increase final pos_emb_skip rate from 0.05 to 0.075	2022-11-16 11:34:26 +08:00
Daniel Povey	867556200f	Have zero dropout in the position embedding, but dropout the entire thing with twice the final prob.	2022-11-15 11:39:20 +08:00
Daniel Povey	380f773069	Merge branch 'scaled_adam_exp387' into scaled_adam_exp390	2022-11-15 11:35:54 +08:00
Daniel Povey	a1a4b715d9	Introduce a dropout schedule for the pos embedding, in training time. # Conflicts: # egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py	2022-11-15 11:28:50 +08:00
Daniel Povey	d1df919547	Cosmetic improvements	2022-11-14 23:26:33 +08:00
Daniel Povey	46bd93b792	Cosmetic fix	2022-11-14 23:17:20 +08:00
Daniel Povey	a680c7de2e	Make bypass_scale be a tensor.	2022-11-14 19:12:16 +08:00
Daniel Povey	ff6431ed0f	Implement limits on parameter values a different way.	2022-11-14 16:02:38 +08:00
Daniel Povey	ce4b50d094	Revert making the dropout of pos_emb independent across the batch.	2022-11-14 15:34:39 +08:00
Daniel Povey	804917837e	Remove pos_emb csales	2022-11-14 15:32:54 +08:00
Daniel Povey	ba69eb48fe	Remove pos_emb schedule	2022-11-14 15:31:56 +08:00
Daniel Povey	e1fb25262a	Refactorize the scheduling code a little	2022-11-14 14:52:27 +08:00
Daniel Povey	614b5b1a52	Treat batch_idx==0.0 separately to get scan_pessimistic_batches_for_oom() to work. should not affect results.	2022-11-14 13:20:31 +08:00
Daniel Povey	cde4ca27ee	Introduce a dropout schedule for the pos embedding, in training time.	2022-11-14 13:00:30 +08:00
Daniel Povey	cd4730b657	Try to refactor the code for scheduling	2022-11-14 12:50:24 +08:00
Daniel Povey	a256425b2f	Reduce dropout_rate for RelPositionalEncoding from 0.2 to 0.15;	2022-11-13 23:29:07 +08:00
Daniel Povey	463fed3d6a	Use compression of large x in the formula for pos_emb	2022-11-13 13:23:42 +08:00
Daniel Povey	6c16d08b4f	Add bias in interior of SelfAttn module	2022-11-13 11:58:01 +08:00
Daniel Povey	4a5a13b678	Increase dropout rate for PosEmb from 0.1 to 0.2.	2022-11-12 23:26:58 +08:00
Daniel Povey	70408d22fe	Add trainable scales for pos_emb # Conflicts: # egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py	2022-11-12 23:25:17 +08:00
Daniel Povey	e67d4ca40d	Make pos_emb be dropped out independently across batch	2022-11-12 19:21:29 +08:00
Daniel Povey	f7aff4f507	Revert "Make sub-module dropped out independently." This reverts commit 3ff3f440ee6d2a367cc3cc45e40f8eb69d122861.	2022-11-11 21:36:36 +08:00
Daniel Povey	742bcaa340	Comment	2022-11-10 23:26:36 +08:00
Daniel Povey	2c6f5e82b2	Use atan not tanh	2022-11-10 22:36:39 +08:00
Daniel Povey	60274ea731	New formula for pos emb	2022-11-10 22:06:05 +08:00
Daniel Povey	fd26b890d2	Tweak formula for widths	2022-11-10 13:04:56 +08:00
Daniel Povey	6091146e91	Change the formula for the embedding to be a bit more symmetric.	2022-11-10 11:39:37 +08:00
Daniel Povey	082b93d911	Remove unused variable.	2022-11-10 11:18:10 +08:00
Daniel Povey	125ea04a42	Rework positional encoding	2022-11-09 20:48:27 +08:00
Daniel Povey	e4a3b2da7d	Mostly-cosmetic fixes found via mypy	2022-11-09 17:40:09 +08:00
Daniel Povey	308059edba	Cosmetic fixes	2022-11-09 17:14:18 +08:00
Daniel Povey	3ff3f440ee	Make sub-module dropped out independently.	2022-11-09 14:15:56 +08:00
Daniel Povey	cba194aa26	Bug fix RE masking	2022-11-09 13:12:34 +08:00
Daniel Povey	20e6d2a157	Rework zipformer code for clarity and extensibility	2022-11-09 12:56:07 +08:00
Daniel Povey	797a0e6ce7	Change order of convolution and nonlin-attention modules	2022-11-08 20:00:25 +08:00
Daniel Povey	36bff9b369	Fix to comment	2022-11-07 12:33:12 +08:00
Daniel Povey	47f42ef5db	Replace the 1st of the ConvolutionModules with NonlinAttentionModule	2022-11-05 14:19:43 +08:00
Daniel Povey	eb6e2b5a1d	Have 2 squeeze-excite modules per layer, using different attention heads.	2022-11-04 17:40:51 +08:00
Daniel Povey	efbe20694f	Use the attention weights as input for the ModifiedSEModule	2022-11-04 16:01:07 +08:00
Daniel Povey	0d94783e76	Instead of a pooling operation, use the first bottleneck_dim dimensions of the preceding self_attn.forward2 as the input to the squeeze-excite module.	2022-11-04 15:16:59 +08:00
Daniel Povey	c27ee8cfcf	Merge branch 'scaled_adam_exp277' into scaled_adam_exp281	2022-11-04 15:06:23 +08:00
Daniel Povey	67d470766f	Revert bottleneck_dim from 8 to 16	2022-11-04 15:02:56 +08:00
Daniel Povey	cefcd061bd	Merge branch 'scaled_adam_exp271' into scaled_adam_exp274	2022-11-04 14:50:00 +08:00
Daniel Povey	31d9bbfb3c	Merge branch 'scaled_adam_exp268b' into scaled_adam_exp279	2022-11-04 14:42:00 +08:00
Daniel Povey	70300e34d3	Merge branch 'scaled_adam_exp273' into scaled_adam_exp277 # Conflicts: # egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py	2022-11-04 12:41:02 +08:00
Daniel Povey	f625810de1	Use the balancer; remove the unused sigmoid module.	2022-11-03 19:21:37 +08:00
Daniel Povey	a9c384e69e	Add Whiten module after squeeze_proj.	2022-11-03 19:04:34 +08:00

1 2

84 Commits