Daniel Povey
|
691633b049
|
Merge branch 'scaled_adam_exp564' into scaled_adam_exp569
|
2022-12-01 16:25:53 +08:00 |
|
Daniel Povey
|
2102038e0e
|
Fix bug in diagnostics.py
|
2022-12-01 16:23:50 +08:00 |
|
Daniel Povey
|
d294449221
|
Fix typo 0.21->0.2
|
2022-12-01 15:29:46 +08:00 |
|
Daniel Povey
|
f0c46ce564
|
Double nonlin_skip_rate and have it last twice longer.
|
2022-12-01 15:28:44 +08:00 |
|
Daniel Povey
|
cac1a8b860
|
Merge branch 'scaled_adam_exp569' into scaled_adam_exp576
|
2022-12-01 15:20:20 +08:00 |
|
Daniel Povey
|
4621e924ba
|
Introduce dropout schedule for NonlinAttentionModule
|
2022-12-01 15:19:51 +08:00 |
|
Daniel Povey
|
dcf6fced40
|
Change whitening limit of NonlinAttentionModule from _whitening_schedule(7.5), to 5.0
|
2022-12-01 14:28:56 +08:00 |
|
Daniel Povey
|
025bcc155d
|
Change scale_min of Conv2dSubsampling from .01 to .1; some cosmetic changes/unimportant bugfixes.
|
2022-12-01 14:20:15 +08:00 |
|
Daniel Povey
|
ba31272c92
|
Change sigmoid to tanh in NonlinAttentionModule, and adjust abs limits of balancer to compensate.
|
2022-11-30 21:44:45 +08:00 |
|
Daniel Povey
|
d682ecc246
|
Introduce alpha for DoubleSwish, set it to -0.05.
|
2022-11-30 18:58:25 +08:00 |
|
Daniel Povey
|
2969eb5467
|
Fix diagnostics bug
|
2022-11-30 16:52:21 +08:00 |
|
Daniel Povey
|
b79a794706
|
Fix bug in diagnostics RE gpu
|
2022-11-30 16:02:18 +08:00 |
|
Daniel Povey
|
b7cad258bb
|
Draft of new diagnostics for activations
|
2022-11-30 15:57:24 +08:00 |
|
Daniel Povey
|
c75c2dc91d
|
reduce min_abs of zipformer-encoder-layer balancer from 0.4 to 0.25.
|
2022-11-30 13:40:53 +08:00 |
|
Daniel Povey
|
d100aed58b
|
Revert "Reduce min of scale in Conv2dSubsampling from 0.01 to 0.2"
This reverts commit 7589e3768975df10c3d022beb4c88f14c2f25d3d.
|
2022-11-30 13:17:20 +08:00 |
|
Daniel Povey
|
12e8c3f0fa
|
One more layer on input
|
2022-11-29 16:47:24 +08:00 |
|
Daniel Povey
|
640d48262f
|
Double scale on aux_loss
|
2022-11-29 16:20:21 +08:00 |
|
Daniel Povey
|
7589e37689
|
Reduce min of scale in Conv2dSubsampling from 0.01 to 0.2
|
2022-11-29 16:18:41 +08:00 |
|
Daniel Povey
|
441fcf063d
|
Reduce final value of bypass_min from 0.25 to 0.2
|
2022-11-29 16:15:34 +08:00 |
|
Daniel Povey
|
73e420865c
|
Revert min_abs in NonlinAttentionModule from 2.0 to 1.5
|
2022-11-29 15:53:29 +08:00 |
|
Daniel Povey
|
f48786534a
|
Merge branch 'scaled_adam_exp535' into scaled_adam_exp548
|
2022-11-29 15:43:44 +08:00 |
|
Daniel Povey
|
28c5923986
|
Remove max_factor=0.02 option in bottleneck balancer of class AttentionSqueeze, change its min,max positive to 0.2,0.8
|
2022-11-29 15:43:25 +08:00 |
|
Daniel Povey
|
5632782ee1
|
Merge branch 'scaled_adam_exp539' into scaled_adam_exp548
|
2022-11-29 15:40:23 +08:00 |
|
Daniel Povey
|
b90d8aabde
|
Revert the alternate-layers-only thing for nonlin_attention and attention_squeeze
|
2022-11-29 15:38:55 +08:00 |
|
Daniel Povey
|
753269668a
|
Change ratio in NonlinAttentionModule from 8 to 2
|
2022-11-29 15:38:13 +08:00 |
|
Daniel Povey
|
93942725c4
|
Increase min_abs of balancer of encoder layer from 0.2 to 0.4.
|
2022-11-29 13:46:47 +08:00 |
|
Daniel Povey
|
36a2f33a6f
|
Have value dim in NonlinAttentionModule be half of num_channels
|
2022-11-28 21:55:06 +08:00 |
|
Daniel Povey
|
258d4f1353
|
Let ratio be 8, not 2, for sigmoid in NonlinAttentionModule
|
2022-11-28 21:51:29 +08:00 |
|
Daniel Povey
|
7018c722b5
|
Let ratio of values to sigmoids be 8, not 2
|
2022-11-28 21:50:11 +08:00 |
|
Daniel Povey
|
643c547eec
|
Double just the value dim in NonlinAttentionLayer.
|
2022-11-28 20:56:47 +08:00 |
|
Daniel Povey
|
88bc45d596
|
Halve scale on aux_loss
|
2022-11-28 16:37:46 +08:00 |
|
Daniel Povey
|
cee62c823d
|
have final prob of aux_loss for input projections be 0
|
2022-11-28 16:36:17 +08:00 |
|
Daniel Povey
|
9cf5d92f39
|
Have nonlin_attention and attention_squeeze operate only on every other layer.
|
2022-11-28 16:24:24 +08:00 |
|
Daniel Povey
|
87ef4078d3
|
Add two more layers.
|
2022-11-28 13:56:40 +08:00 |
|
Daniel Povey
|
f483f1e0ef
|
Implement attention weights sharing for successive layers, for Zipformer
|
2022-11-28 13:41:11 +08:00 |
|
Daniel Povey
|
2a289d38b7
|
Make max_abs for feedforward module be a constant at 15.0
|
2022-11-28 13:19:37 +08:00 |
|
Daniel Povey
|
27a12a982b
|
Increase min_abs and max_abs in feedforward module.
|
2022-11-28 12:52:28 +08:00 |
|
Daniel Povey
|
121f7e2a45
|
Documentation fix.
|
2022-11-28 12:10:08 +08:00 |
|
Daniel Povey
|
c6d859dd05
|
Increase min_abs of balancer in NonlinAttentionModule from 1.5 to 2.0.
|
2022-11-28 11:35:00 +08:00 |
|
Daniel Povey
|
39ce60bb7c
|
Decrease final value of max_abs in AttentionSqueeze from 5.0 to 1.0
|
2022-11-28 10:45:53 +08:00 |
|
Daniel Povey
|
0bfd81d721
|
fix bug RE dims_to_mean
|
2022-11-28 10:42:06 +08:00 |
|
Daniel Povey
|
109825cafb
|
Fix problem with mean offset in LinearWithAuxLoss.
|
2022-11-28 09:46:01 +08:00 |
|
Daniel Povey
|
a3b07fd098
|
Double aux_grad scale
|
2022-11-28 00:19:03 +08:00 |
|
Daniel Povey
|
9752778ee6
|
Use the same schedule for in_proj as out_proj. Only affects a couple of modules.
|
2022-11-28 00:09:26 +08:00 |
|
Daniel Povey
|
9e7add6be8
|
Work out alpha (scale on z) in LinearWithAuxLossFunction
|
2022-11-27 23:48:26 +08:00 |
|
Daniel Povey
|
0307252832
|
Bug fix
|
2022-11-27 21:33:37 +08:00 |
|
Daniel Povey
|
5128ff8797
|
Changes to balancer min_abs/max_abs limits.
|
2022-11-27 21:14:41 +08:00 |
|
Daniel Povey
|
a610011c3c
|
Partially revert sign_gain_factor
|
2022-11-27 17:18:33 +08:00 |
|
Daniel Povey
|
30d0bc6ad7
|
Make gain factor 4 times larger, for constraining the sign in ActivationBalancer.
|
2022-11-27 17:17:11 +08:00 |
|
Daniel Povey
|
785a524341
|
Increase in_abs of hidden balancer of ff modules from 0.2 to 1.0
|
2022-11-27 17:06:31 +08:00 |
|