icefall

Author	SHA1	Message	Date
Daniel Povey	80d51efd15	Change cutoff for small_grad_norm	2022-10-14 23:29:55 +08:00
Daniel Povey	822465f73b	Bug fixes; change debug freq	2022-10-14 23:25:29 +08:00
Daniel Povey	0557dbb720	use larger delta but only penalize if small grad norm	2022-10-14 23:23:20 +08:00
Daniel Povey	394d4c95f9	Remove debug statements	2022-10-14 23:09:05 +08:00
Daniel Povey	a780984e6b	Penalize attention-weight entropies above a limit.	2022-10-14 23:01:30 +08:00
Daniel Povey	1812f6cb28	Add different debug info.	2022-10-14 21:16:23 +08:00
Daniel Povey	90953537ad	Remove debug statement	2022-10-14 20:59:26 +08:00
Daniel Povey	18ff1de337	Add debug code for attention weihts and eigs	2022-10-14 20:57:17 +08:00
Daniel Povey	1825336841	Fix issue with diagnostics if stats is None	2022-10-11 11:05:52 +08:00
Daniel Povey	569762397f	Reduce final layerdrop_prob from 0.075 to 0.05.	2022-10-10 19:04:52 +08:00
Daniel Povey	12323f2fbf	Refactor RelPosMultiheadAttention to have 2nd forward function and introduce more modules in conformer encoder layer	2022-10-10 15:27:26 +08:00
Daniel Povey	f941991331	Fix bug in choosing layers to drop	2022-10-10 13:38:36 +08:00
Daniel Povey	857b3735e7	Fix bug where fewer layers were dropped than should be; remove unnecesary print statement.	2022-10-10 13:18:40 +08:00
Daniel Povey	09c9b02f6f	Increase final layerdrop prob from 0.05 to 0.075	2022-10-10 12:20:13 +08:00
Daniel Povey	9f059f7115	Fix s -> scaling for import.	2022-10-10 11:50:15 +08:00
Daniel Povey	d7f6e8eb51	Only apply ActivationBalancer with prob 0.25.	2022-10-10 00:26:31 +08:00
Daniel Povey	dece8ad204	Various fixes from debugging with nvtx, but removed the NVTX annotations.	2022-10-09 21:14:52 +08:00
Daniel Povey	bd7dce460b	Reintroduce batching to the optimizer	2022-10-09 20:29:23 +08:00
Daniel Povey	00841f0f49	Remove unused code LearnedScale.	2022-10-09 16:07:31 +08:00
Daniel Povey	cf450908c6	Revert also the changes in scaled_adam_exp85 regarding warmup schedule	2022-10-09 14:26:32 +08:00
Daniel Povey	40fa33d702	Decrease initial_layerdrop_prob from 0.75 to 0.5	2022-10-09 13:59:56 +08:00
Daniel Povey	44ad73c44f	For speed, drop the same num layers per job.	2022-10-09 13:40:24 +08:00
Daniel Povey	f8f200e2b2	Make layerdrop different in different processes.	2022-10-09 12:25:12 +08:00
Daniel Povey	3e137dda5b	Decrease frequency of logging variance_proportion	2022-10-09 12:05:52 +08:00
Daniel Povey	e6540865f3	Do warmup by dropping out whole layers.	2022-10-09 11:50:24 +08:00
Daniel Povey	5255969544	Revert "Change warmup schedule and increase warmup_batches from 4k to 6k" This reverts commit 86845bd5d859ceb6f83cd83f3719c3e6641de987.	2022-10-09 11:30:27 +08:00
Daniel Povey	d467338837	Limit bypass scale to >= 0.1	2022-10-08 21:37:21 +08:00
Daniel Povey	bc9fbe2579	Bug fix	2022-10-08 21:06:09 +08:00
Daniel Povey	9023fe7151	Change the initial keep-prob back from 0.25 to 0.5	2022-10-08 20:55:15 +08:00
Daniel Povey	97a0fbe44b	Make the bypass scale trainable.	2022-10-08 20:32:49 +08:00
Daniel Povey	86845bd5d8	Change warmup schedule and increase warmup_batches from 4k to 6k	2022-10-08 19:10:26 +08:00
Daniel Povey	2631f05c1f	Make it start warming up from the very start, and increase warmup_batches to 6k	2022-10-08 19:09:41 +08:00
Daniel Povey	5c99e97c3b	Decrease initial keep_prob to 0.25.	2022-10-08 18:35:59 +08:00
Daniel Povey	b1fa3d50fb	Implement layer dropout (in a relatively efficient way)	2022-10-08 16:07:20 +08:00
Daniel Povey	af545e061b	Make the warmup mask per frame.	2022-10-08 15:37:02 +08:00
Daniel Povey	6dc449da84	Remove debug print	2022-10-08 13:10:07 +08:00
Daniel Povey	71b8bfe212	Fix bug in warmup	2022-10-08 13:04:14 +08:00
Daniel Povey	606d3bd2d3	Do dropout a different way	2022-10-08 12:55:11 +08:00
Daniel Povey	fe4a7e904f	Have warmup that gradually removes dropout from layers; multiply initialization scales by 0.1.	2022-10-08 12:45:22 +08:00
Daniel Povey	300da1306d	Add warmup schedule where dropout disappears from earlier layers first.	2022-10-08 12:16:53 +08:00
Daniel Povey	9c1a239931	Fix issue with warmup in test time	2022-10-08 11:01:02 +08:00
Daniel Povey	97bc894f62	Implement layer dropout with probability 0.075	2022-10-07 19:01:35 +08:00
Daniel Povey	b9a95af099	Remove the feature where it was bypassing groups of layers.	2022-10-07 18:50:53 +08:00
Daniel Povey	ff4028df8e	Revert initial_scale to previous values.	2022-10-07 17:19:23 +08:00
Daniel Povey	28e5f46854	Update checkpoint.py to deal with int params	2022-10-07 17:06:38 +08:00
Daniel Povey	ebf8aa129d	Apply layer bypass during warmup in a new way, including 2s and 4s of layers.	2022-10-07 16:56:40 +08:00
Daniel Povey	314f2381e2	Don't compute validation if printing diagnostics.	2022-10-07 14:03:17 +08:00
Daniel Povey	bd325e8769	Remove debug info	2022-10-06 20:31:15 +08:00
Daniel Povey	a3179c30e7	Various fixes, finish implementating frame masking	2022-10-06 20:29:45 +08:00
Daniel Povey	e4c9786e4a	Merge branch 'scaled_adam_exp27' into scaled_adam_exp69 # Conflicts: # egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py	2022-10-06 18:04:48 +08:00

1 2 3 4 5 ...

963 Commits