Daniel Povey
6a91f343e9
Use LinearWithAuxLoss in squeeze-attention module
2022-11-25 16:04:51 +08:00
Daniel Povey
ee61ec63b3
Introduce schedules for whitening.
2022-11-23 19:49:34 +08:00
Daniel Povey
1d0252d420
Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
...
Below is a more complete list of the changes I am making, although some of
these may be counted in the last
numbers XXX below correspond to branches numbered scaled_adam_expXXX.
- from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
but simplified this a little to use the same dropout schedule and drop them out all together
also have all 3 submodules use separate heads.
- from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
- merge from 462->467, about using TanSwish not tanh.
- merge 462->465, remove whitening in self-attention module
- merge the part of 465->466 that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
066f1e4658
Implement TanSwish(), use it as activation in AttentionSqueeze module.
2022-11-22 23:34:11 +08:00
Daniel Povey
6c5763fbb3
Implement subtracted momentum [0.33,0.66], and print name in Whiten module.
2022-11-22 21:57:48 +08:00
Daniel Povey
4e21db07f6
Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers.
2022-11-19 22:05:10 +08:00
Daniel Povey
6ea1706e11
Fix potential/theoretical issue in backward of LimitParamValue
2022-11-14 23:31:00 +08:00
Daniel Povey
ff6431ed0f
Implement limits on parameter values a different way.
2022-11-14 16:02:38 +08:00
Daniel Povey
54048009db
Fix self.training condition
2022-11-14 15:15:24 +08:00
Daniel Povey
e1fb25262a
Refactorize the scheduling code a little
2022-11-14 14:52:27 +08:00
Daniel Povey
b32dec1119
Add printing capability
2022-11-14 14:16:28 +08:00
Daniel Povey
4c8575878a
Bug fix in ScheduledSampler
2022-11-14 13:52:14 +08:00
Daniel Povey
cd4730b657
Try to refactor the code for scheduling
2022-11-14 12:50:24 +08:00
Daniel Povey
e4a3b2da7d
Mostly-cosmetic fixes found via mypy
2022-11-09 17:40:09 +08:00
Daniel Povey
e08f5c1bce
Replace Pooling module with ModifiedSEModule
2022-11-01 14:38:06 +08:00
Daniel Povey
a067fe8026
Fix clamping of epsilon
2022-10-28 12:50:14 +08:00
Daniel Povey
7b8a0108ea
Merge branch 'scaled_adam_exp188' into scaled_adam_exp198b
2022-10-28 12:49:36 +08:00
Daniel Povey
b9f6ba1aa2
Remove some unused variables.
2022-10-28 12:01:45 +08:00
Daniel Povey
bf37c7ca85
Regularize how we apply the min and max to the eps of BasicNorm
2022-10-26 12:51:20 +08:00
Daniel Povey
78f3cba58c
Add logging about memory used.
2022-10-25 19:19:33 +08:00
Daniel Povey
6a6df19bde
Hopefully make penalize_abs_values_gt more memory efficient.
2022-10-25 18:41:33 +08:00
Daniel Povey
dbfbd8016b
Cast to float16 in DoubleSwish forward
2022-10-25 13:16:00 +08:00
Daniel Povey
36cb279318
More memory efficient backprop for DoubleSwish.
2022-10-25 12:21:22 +08:00
Daniel Povey
95aaa4a8d2
Store only half precision output for softmax.
2022-10-23 21:24:46 +08:00
Daniel Povey
d3876e32c4
Make it use float16 if in amp but use clamp to avoid wrapping error
2022-10-23 21:13:23 +08:00
Daniel Povey
85657946bb
Try a more exact way to round to uint8 that should prevent ever wrapping around to zero
2022-10-23 20:56:26 +08:00
Daniel Povey
d6aa386552
Fix randn to rand
2022-10-23 17:19:19 +08:00
Daniel Povey
e586cc319c
Change the discretization of the sigmoid to be expectation preserving.
2022-10-23 17:11:35 +08:00
Daniel Povey
09cbc9fdab
Save some memory in the autograd of DoubleSwish.
2022-10-23 16:59:43 +08:00
Daniel Povey
b7083e7aff
Increase default max_factor for ActivationBalancer from 0.02 to 0.04; decrease max_abs in ConvolutionModule.deriv_balancer2 from 100.0 to 20.0
2022-10-23 00:09:21 +08:00
Daniel Povey
e0c1dc66da
Increase probs of activation balancer and make it decay slower.
2022-10-22 22:18:38 +08:00
Daniel Povey
84580ec022
Configuration changes: scores limit 5->10, min_prob 0.05->0.1, cur_grad_scale more aggressive increase
2022-10-22 14:09:53 +08:00
Daniel Povey
9672dffac2
Merge branch 'scaled_adam_exp168' into scaled_adam_exp169
2022-10-22 14:05:07 +08:00
Daniel Povey
bdbd2cfce6
Penalize too large weights in softmax of AttentionDownsample()
2022-10-21 20:12:36 +08:00
Daniel Povey
476fb9e9f3
Reduce min_prob of ActivationBalancer from 0.1 to 0.05.
2022-10-21 15:42:04 +08:00
Daniel Povey
6e6209419c
Merge branch 'scaled_adam_exp150' into scaled_adam_exp155
...
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
2022-10-20 15:04:27 +08:00
Daniel Povey
4565d43d5c
Add hard limit of attention weights to +- 50
2022-10-20 14:28:22 +08:00
Daniel Povey
6601035db1
Reduce min_abs from 1.0e-04 to 5.0e-06
2022-10-20 13:53:10 +08:00
Daniel Povey
5a0914fdcf
Merge branch 'scaled_adam_exp149' into scaled_adam_exp150
2022-10-20 13:31:22 +08:00
Daniel Povey
679ba2ee5e
Remove debug print
2022-10-20 13:30:55 +08:00
Daniel Povey
610281eaa2
Keep just the RandomGrad changes, vs. 149. Git history may not reflect real changes.
2022-10-20 13:28:50 +08:00
Daniel Povey
d137118484
Get the randomized backprop for softmax in autocast mode working.
2022-10-20 13:23:48 +08:00
Daniel Povey
d75d646dc4
Merge branch 'scaled_adam_exp147' into scaled_adam_exp149
2022-10-20 12:59:50 +08:00
Daniel Povey
f6b8f0f631
Fix bug in backprop of random_clamp()
2022-10-20 12:49:29 +08:00
Daniel Povey
f08a869769
Merge branch 'scaled_adam_exp151' into scaled_adam_exp150
2022-10-19 19:59:07 +08:00
Daniel Povey
cc15552510
Use full precision to do softmax and store ans.
2022-10-19 19:53:53 +08:00
Daniel Povey
a4443efa95
Add RandomGrad with min_abs=1.0e-04
2022-10-19 19:46:17 +08:00
Daniel Povey
0ad4462632
Reduce min_abs from 1e-03 to 1e-04
2022-10-19 19:27:28 +08:00
Daniel Povey
ef5a27388f
Merge branch 'scaled_adam_exp146' into scaled_adam_exp149
2022-10-19 19:16:27 +08:00
Daniel Povey
9c54906e63
Implement randomized backprop for softmax.
2022-10-19 19:16:03 +08:00