92 Commits

Author SHA1 Message Date
Daniel Povey
0bfd81d721 fix bug RE dims_to_mean 2022-11-28 10:42:06 +08:00
Daniel Povey
109825cafb Fix problem with mean offset in LinearWithAuxLoss. 2022-11-28 09:46:01 +08:00
Daniel Povey
9e7add6be8 Work out alpha (scale on z) in LinearWithAuxLossFunction 2022-11-27 23:48:26 +08:00
Daniel Povey
a610011c3c Partially revert sign_gain_factor 2022-11-27 17:18:33 +08:00
Daniel Povey
30d0bc6ad7 Make gain factor 4 times larger, for constraining the sign in ActivationBalancer. 2022-11-27 17:17:11 +08:00
Daniel Povey
ff361a7495 Change default prob on limit_param_value from 0.2 to 0.6. 2022-11-27 14:00:59 +08:00
Daniel Povey
a96b92fb54 Make alpha for LinearWithAuxLossFunction be in log space; simplify/rework NonlinAttentionModule, setup more like ConvModule now. 2022-11-26 19:38:29 +08:00
Daniel Povey
110c2601ab Changes for speed 2022-11-26 14:38:16 +08:00
Daniel Povey
c653c66413 Undo cast in autocast mode. 2022-11-26 14:29:49 +08:00
Daniel Povey
d1ee1f2d98 Try to save memory in autocast mode. 2022-11-26 14:25:27 +08:00
Daniel Povey
7b5c0382f9 Fix to LinearWithAuxLoss for bias=False case 2022-11-26 14:16:53 +08:00
Daniel Povey
1ebc3dd158 Bug fixes to LinearWithAuxLoss 2022-11-25 16:20:28 +08:00
Daniel Povey
0a997d64c4 Fixes for half precision 2022-11-25 16:07:47 +08:00
Daniel Povey
6a91f343e9 Use LinearWithAuxLoss in squeeze-attention module 2022-11-25 16:04:51 +08:00
Daniel Povey
ee61ec63b3 Introduce schedules for whitening. 2022-11-23 19:49:34 +08:00
Daniel Povey
1d0252d420 Merge branch 'scaled_adam_exp466' into scaled_adam_exp472.
Below is a more complete list of the changes I am making, although some of
these may be counted in the last

  numbers XXX below correspond to branches numbered scaled_adam_expXXX.
    - from 412/413 (cherry-picked): dropout for attention in attention_squeeze and nonlin_attention modules,
      but simplified this a little to use the same dropout schedule and drop them out all together
      also have all 3 submodules use separate heads.
    - from 460->461, which is in the history of 464, revert the part about balancing output out attention_squeeze module.
    - merge from 462->467, about using TanSwish not tanh.
    - merge 462->465, remove whitening in self-attention module
    - merge the part of 465->466  that was about diagnostics (name in Whiten module)
2022-11-23 14:41:09 +08:00
Daniel Povey
066f1e4658 Implement TanSwish(), use it as activation in AttentionSqueeze module. 2022-11-22 23:34:11 +08:00
Daniel Povey
6c5763fbb3 Implement subtracted momentum [0.33,0.66], and print name in Whiten module. 2022-11-22 21:57:48 +08:00
Daniel Povey
4e21db07f6 Remove activation in AttentionSqueeze; add balancers; fix bugs RE balancers. 2022-11-19 22:05:10 +08:00
Daniel Povey
6ea1706e11 Fix potential/theoretical issue in backward of LimitParamValue 2022-11-14 23:31:00 +08:00
Daniel Povey
ff6431ed0f Implement limits on parameter values a different way. 2022-11-14 16:02:38 +08:00
Daniel Povey
54048009db Fix self.training condition 2022-11-14 15:15:24 +08:00
Daniel Povey
e1fb25262a Refactorize the scheduling code a little 2022-11-14 14:52:27 +08:00
Daniel Povey
b32dec1119 Add printing capability 2022-11-14 14:16:28 +08:00
Daniel Povey
4c8575878a Bug fix in ScheduledSampler 2022-11-14 13:52:14 +08:00
Daniel Povey
cd4730b657 Try to refactor the code for scheduling 2022-11-14 12:50:24 +08:00
Daniel Povey
e4a3b2da7d Mostly-cosmetic fixes found via mypy 2022-11-09 17:40:09 +08:00
Daniel Povey
e08f5c1bce Replace Pooling module with ModifiedSEModule 2022-11-01 14:38:06 +08:00
Daniel Povey
a067fe8026 Fix clamping of epsilon 2022-10-28 12:50:14 +08:00
Daniel Povey
7b8a0108ea Merge branch 'scaled_adam_exp188' into scaled_adam_exp198b 2022-10-28 12:49:36 +08:00
Daniel Povey
b9f6ba1aa2 Remove some unused variables. 2022-10-28 12:01:45 +08:00
Daniel Povey
bf37c7ca85 Regularize how we apply the min and max to the eps of BasicNorm 2022-10-26 12:51:20 +08:00
Daniel Povey
78f3cba58c Add logging about memory used. 2022-10-25 19:19:33 +08:00
Daniel Povey
6a6df19bde Hopefully make penalize_abs_values_gt more memory efficient. 2022-10-25 18:41:33 +08:00
Daniel Povey
dbfbd8016b Cast to float16 in DoubleSwish forward 2022-10-25 13:16:00 +08:00
Daniel Povey
36cb279318 More memory efficient backprop for DoubleSwish. 2022-10-25 12:21:22 +08:00
Daniel Povey
95aaa4a8d2 Store only half precision output for softmax. 2022-10-23 21:24:46 +08:00
Daniel Povey
d3876e32c4 Make it use float16 if in amp but use clamp to avoid wrapping error 2022-10-23 21:13:23 +08:00
Daniel Povey
85657946bb Try a more exact way to round to uint8 that should prevent ever wrapping around to zero 2022-10-23 20:56:26 +08:00
Daniel Povey
d6aa386552 Fix randn to rand 2022-10-23 17:19:19 +08:00
Daniel Povey
e586cc319c Change the discretization of the sigmoid to be expectation preserving. 2022-10-23 17:11:35 +08:00
Daniel Povey
09cbc9fdab Save some memory in the autograd of DoubleSwish. 2022-10-23 16:59:43 +08:00
Daniel Povey
b7083e7aff Increase default max_factor for ActivationBalancer from 0.02 to 0.04; decrease max_abs in ConvolutionModule.deriv_balancer2 from 100.0 to 20.0 2022-10-23 00:09:21 +08:00
Daniel Povey
e0c1dc66da Increase probs of activation balancer and make it decay slower. 2022-10-22 22:18:38 +08:00
Daniel Povey
84580ec022 Configuration changes: scores limit 5->10, min_prob 0.05->0.1, cur_grad_scale more aggressive increase 2022-10-22 14:09:53 +08:00
Daniel Povey
9672dffac2 Merge branch 'scaled_adam_exp168' into scaled_adam_exp169 2022-10-22 14:05:07 +08:00
Daniel Povey
bdbd2cfce6 Penalize too large weights in softmax of AttentionDownsample() 2022-10-21 20:12:36 +08:00
Daniel Povey
476fb9e9f3 Reduce min_prob of ActivationBalancer from 0.1 to 0.05. 2022-10-21 15:42:04 +08:00
Daniel Povey
6e6209419c Merge branch 'scaled_adam_exp150' into scaled_adam_exp155
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
2022-10-20 15:04:27 +08:00
Daniel Povey
4565d43d5c Add hard limit of attention weights to +- 50 2022-10-20 14:28:22 +08:00