Daniel Povey
|
cd4730b657
|
Try to refactor the code for scheduling
|
2022-11-14 12:50:24 +08:00 |
|
Daniel Povey
|
e4a3b2da7d
|
Mostly-cosmetic fixes found via mypy
|
2022-11-09 17:40:09 +08:00 |
|
Daniel Povey
|
e08f5c1bce
|
Replace Pooling module with ModifiedSEModule
|
2022-11-01 14:38:06 +08:00 |
|
Daniel Povey
|
a067fe8026
|
Fix clamping of epsilon
|
2022-10-28 12:50:14 +08:00 |
|
Daniel Povey
|
7b8a0108ea
|
Merge branch 'scaled_adam_exp188' into scaled_adam_exp198b
|
2022-10-28 12:49:36 +08:00 |
|
Daniel Povey
|
b9f6ba1aa2
|
Remove some unused variables.
|
2022-10-28 12:01:45 +08:00 |
|
Daniel Povey
|
bf37c7ca85
|
Regularize how we apply the min and max to the eps of BasicNorm
|
2022-10-26 12:51:20 +08:00 |
|
Daniel Povey
|
78f3cba58c
|
Add logging about memory used.
|
2022-10-25 19:19:33 +08:00 |
|
Daniel Povey
|
6a6df19bde
|
Hopefully make penalize_abs_values_gt more memory efficient.
|
2022-10-25 18:41:33 +08:00 |
|
Daniel Povey
|
dbfbd8016b
|
Cast to float16 in DoubleSwish forward
|
2022-10-25 13:16:00 +08:00 |
|
Daniel Povey
|
36cb279318
|
More memory efficient backprop for DoubleSwish.
|
2022-10-25 12:21:22 +08:00 |
|
Daniel Povey
|
95aaa4a8d2
|
Store only half precision output for softmax.
|
2022-10-23 21:24:46 +08:00 |
|
Daniel Povey
|
d3876e32c4
|
Make it use float16 if in amp but use clamp to avoid wrapping error
|
2022-10-23 21:13:23 +08:00 |
|
Daniel Povey
|
85657946bb
|
Try a more exact way to round to uint8 that should prevent ever wrapping around to zero
|
2022-10-23 20:56:26 +08:00 |
|
Daniel Povey
|
d6aa386552
|
Fix randn to rand
|
2022-10-23 17:19:19 +08:00 |
|
Daniel Povey
|
e586cc319c
|
Change the discretization of the sigmoid to be expectation preserving.
|
2022-10-23 17:11:35 +08:00 |
|
Daniel Povey
|
09cbc9fdab
|
Save some memory in the autograd of DoubleSwish.
|
2022-10-23 16:59:43 +08:00 |
|
Daniel Povey
|
b7083e7aff
|
Increase default max_factor for ActivationBalancer from 0.02 to 0.04; decrease max_abs in ConvolutionModule.deriv_balancer2 from 100.0 to 20.0
|
2022-10-23 00:09:21 +08:00 |
|
Daniel Povey
|
e0c1dc66da
|
Increase probs of activation balancer and make it decay slower.
|
2022-10-22 22:18:38 +08:00 |
|
Daniel Povey
|
84580ec022
|
Configuration changes: scores limit 5->10, min_prob 0.05->0.1, cur_grad_scale more aggressive increase
|
2022-10-22 14:09:53 +08:00 |
|
Daniel Povey
|
9672dffac2
|
Merge branch 'scaled_adam_exp168' into scaled_adam_exp169
|
2022-10-22 14:05:07 +08:00 |
|
Daniel Povey
|
bdbd2cfce6
|
Penalize too large weights in softmax of AttentionDownsample()
|
2022-10-21 20:12:36 +08:00 |
|
Daniel Povey
|
476fb9e9f3
|
Reduce min_prob of ActivationBalancer from 0.1 to 0.05.
|
2022-10-21 15:42:04 +08:00 |
|
Daniel Povey
|
6e6209419c
|
Merge branch 'scaled_adam_exp150' into scaled_adam_exp155
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
|
2022-10-20 15:04:27 +08:00 |
|
Daniel Povey
|
4565d43d5c
|
Add hard limit of attention weights to +- 50
|
2022-10-20 14:28:22 +08:00 |
|
Daniel Povey
|
6601035db1
|
Reduce min_abs from 1.0e-04 to 5.0e-06
|
2022-10-20 13:53:10 +08:00 |
|
Daniel Povey
|
5a0914fdcf
|
Merge branch 'scaled_adam_exp149' into scaled_adam_exp150
|
2022-10-20 13:31:22 +08:00 |
|
Daniel Povey
|
679ba2ee5e
|
Remove debug print
|
2022-10-20 13:30:55 +08:00 |
|
Daniel Povey
|
610281eaa2
|
Keep just the RandomGrad changes, vs. 149. Git history may not reflect real changes.
|
2022-10-20 13:28:50 +08:00 |
|
Daniel Povey
|
d137118484
|
Get the randomized backprop for softmax in autocast mode working.
|
2022-10-20 13:23:48 +08:00 |
|
Daniel Povey
|
d75d646dc4
|
Merge branch 'scaled_adam_exp147' into scaled_adam_exp149
|
2022-10-20 12:59:50 +08:00 |
|
Daniel Povey
|
f6b8f0f631
|
Fix bug in backprop of random_clamp()
|
2022-10-20 12:49:29 +08:00 |
|
Daniel Povey
|
f08a869769
|
Merge branch 'scaled_adam_exp151' into scaled_adam_exp150
|
2022-10-19 19:59:07 +08:00 |
|
Daniel Povey
|
cc15552510
|
Use full precision to do softmax and store ans.
|
2022-10-19 19:53:53 +08:00 |
|
Daniel Povey
|
a4443efa95
|
Add RandomGrad with min_abs=1.0e-04
|
2022-10-19 19:46:17 +08:00 |
|
Daniel Povey
|
0ad4462632
|
Reduce min_abs from 1e-03 to 1e-04
|
2022-10-19 19:27:28 +08:00 |
|
Daniel Povey
|
ef5a27388f
|
Merge branch 'scaled_adam_exp146' into scaled_adam_exp149
|
2022-10-19 19:16:27 +08:00 |
|
Daniel Povey
|
9c54906e63
|
Implement randomized backprop for softmax.
|
2022-10-19 19:16:03 +08:00 |
|
Daniel Povey
|
f4442de1c4
|
Add reflect=0.1 to invocations of random_clamp()
|
2022-10-19 12:34:26 +08:00 |
|
Daniel Povey
|
c3c655d0bd
|
Random clip attention scores to -5..5.
|
2022-10-19 11:59:24 +08:00 |
|
Daniel Povey
|
6b3f9e5036
|
Changes to avoid bug in backward hooks, affecting diagnostics.
|
2022-10-19 11:06:17 +08:00 |
|
Daniel Povey
|
1135669e93
|
Bug fix RE float16
|
2022-10-16 10:58:22 +08:00 |
|
Daniel Povey
|
fc728f2738
|
Reorganize Whiten() code; configs are not the same as before. Also remove MaxEig for self_attn module
|
2022-10-15 23:20:18 +08:00 |
|
Daniel Povey
|
96023419da
|
Reworking of ActivationBalancer code to hopefully balance speed and effectiveness.
|
2022-10-14 19:20:32 +08:00 |
|
Daniel Povey
|
5f375be159
|
Merge branch 'scaled_adam_exp103b2' into scaled_adam_exp103b4
|
2022-10-14 15:27:10 +08:00 |
|
Daniel Povey
|
15b91c12d6
|
Reduce stats period from 10 to 4.
|
2022-10-14 15:14:06 +08:00 |
|
Daniel Povey
|
db8b9919da
|
Reduce beta from 0.75 to 0.0.
|
2022-10-14 15:12:59 +08:00 |
|
Daniel Povey
|
23d6bf7765
|
Fix bug when channel_dim < 0
|
2022-10-13 13:52:28 +08:00 |
|
Daniel Povey
|
49c6b6943d
|
Change scale_factor_scale from 0.5 to 0.8
|
2022-10-12 20:55:52 +08:00 |
|
Daniel Povey
|
b736bb4840
|
Cosmetic improvements
|
2022-10-12 19:34:48 +08:00 |
|