Daniel Povey
|
c20fc3be14
|
Randomize order of some modules
|
2022-10-03 13:02:42 +08:00 |
|
Daniel Povey
|
1be455438a
|
Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change.
|
2022-10-02 14:00:36 +08:00 |
|
Daniel Povey
|
cf5f7e5dfd
|
Swap random_prob and single_prob, to reduce prob of being randomized.
|
2022-10-01 23:50:38 +08:00 |
|
Daniel Povey
|
8d517a69e4
|
Increase feature_mask_dropout_prob from 0.15 to 0.2.
|
2022-10-01 23:32:24 +08:00 |
|
Daniel Povey
|
e9326a7d16
|
Remove dropout from inside ConformerEncoderLayer, for adding to residuals
|
2022-10-01 13:13:10 +08:00 |
|
Daniel Povey
|
cc64f2f15c
|
Reduce feature_mask_dropout_prob from 0.25 to 0.15.
|
2022-10-01 12:24:07 +08:00 |
|
Daniel Povey
|
1eb603f4ad
|
Reduce single_prob from 0.5 to 0.25
|
2022-09-30 22:14:53 +08:00 |
|
Daniel Povey
|
ab7c940803
|
Include changes from Liyong about padding conformer module.
|
2022-09-30 18:37:31 +08:00 |
|
Daniel Povey
|
38f89053bd
|
Introduce feature mask per frame
|
2022-09-29 17:31:04 +08:00 |
|
Daniel Povey
|
056b9a4f9a
|
Apply single_prob mask, so sometimes we just get one layer as output.
|
2022-09-29 15:29:37 +08:00 |
|
Daniel Povey
|
d8f7310118
|
Add print statement
|
2022-09-29 14:15:29 +08:00 |
|
Daniel Povey
|
d398f0ed70
|
Decrease random_prob from 0.5 to 0.333
|
2022-09-29 13:55:33 +08:00 |
|
Daniel Povey
|
461ad3655a
|
Implement AttentionCombine as replacement for RandomCombine
|
2022-09-29 13:44:03 +08:00 |
|
Daniel Povey
|
e5a0d8929b
|
Remove unused out_balancer member
|
2022-09-27 13:10:59 +08:00 |
|
Daniel Povey
|
6b12f20995
|
Remove out_balancer and out_norm from conv modules
|
2022-09-27 12:25:11 +08:00 |
|
Daniel Povey
|
76e66408c5
|
Some cosmetic improvements
|
2022-09-27 11:08:44 +08:00 |
|
Daniel Povey
|
71b3756ada
|
Use half the dim per head, in self_attn layers.
|
2022-09-24 15:40:44 +08:00 |
|
Daniel Povey
|
ce3f59d9c7
|
Use dropout in attention, on attn weights.
|
2022-09-22 19:18:50 +08:00 |
|
Daniel Povey
|
24aea947d2
|
Fix issues where grad is None, and unused-grad cases
|
2022-09-22 19:18:16 +08:00 |
|
Daniel Povey
|
c16f795962
|
Avoid error in ddp by using last module'sc scores
|
2022-09-22 18:52:16 +08:00 |
|
Daniel Povey
|
0f85a3c2e5
|
Implement persistent attention scores
|
2022-09-22 18:47:16 +08:00 |
|
Daniel Povey
|
03a77f8ae5
|
Merge branch 'scaled_adam_exp7c' into scaled_adam_exp11c
|
2022-09-22 18:15:44 +08:00 |
|
Daniel Povey
|
ceadfad48d
|
Reduce debug freq
|
2022-09-22 12:30:49 +08:00 |
|
Daniel Povey
|
1d20c12bc0
|
Increase max_var_per_eig to 0.2
|
2022-09-22 12:28:35 +08:00 |
|
Daniel Povey
|
e2fdfe990c
|
Loosen limit on param_max_rms, from 2.0 to 3.0; change how param_min_rms is applied.
|
2022-09-20 15:20:43 +08:00 |
|
Daniel Povey
|
6eb9a0bc9b
|
Halve max_var_per_eig to 0.05
|
2022-09-20 14:39:17 +08:00 |
|
Daniel Povey
|
cd5ac76a05
|
Add max-var-per-eig in encoder layers
|
2022-09-20 14:22:07 +08:00 |
|
Daniel Povey
|
db1f4ccdd1
|
4x scale on max-eig constraint
|
2022-09-20 14:20:13 +08:00 |
|
Daniel Povey
|
3d72a65de8
|
Implement max-eig-proportion..
|
2022-09-19 10:26:37 +08:00 |
|
Daniel Povey
|
5f27cbdb44
|
Merge branch 'scaled_adam_exp4_max_var_per_eig' into scaled_adam_exp7
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
|
2022-09-18 21:23:59 +08:00 |
|
Daniel Povey
|
0f567e27a5
|
Add max_var_per_eig in self-attn
|
2022-09-18 21:22:01 +08:00 |
|
Daniel Povey
|
eb77fa7aaa
|
Restore min_positive,max_positive limits on linear_pos projection
|
2022-09-18 14:38:30 +08:00 |
|
Daniel Povey
|
69404f61ef
|
Use scalar_lr_scale for scalars as well as sizes.
|
2022-09-18 14:12:27 +08:00 |
|
Daniel Povey
|
76031a7c1d
|
Loosen some limits of activation balancers
|
2022-09-18 13:59:44 +08:00 |
|
Daniel Povey
|
3122637266
|
Use ScaledLinear where I previously had StructuredLinear
|
2022-09-17 13:18:58 +08:00 |
|
Daniel Povey
|
4a2b940321
|
Remove StructuredLinear,StructuredConv1d
|
2022-09-17 13:14:08 +08:00 |
|
Daniel Povey
|
1a184596b6
|
A little code refactoring
|
2022-09-16 20:56:21 +08:00 |
|
Daniel Povey
|
bb1bee4a7b
|
Improve how quartiles are printed
|
2022-09-16 17:30:03 +08:00 |
|
Daniel Povey
|
5f55f80fbb
|
Configure train.py with clipping_scale=2.0
|
2022-09-16 17:19:52 +08:00 |
|
Daniel Povey
|
8298333bd2
|
Implement gradient clipping.
|
2022-09-16 16:52:46 +08:00 |
|
Daniel Povey
|
8f876b3f54
|
Remove batching from ScaledAdam, in preparation to add gradient norm clipping
|
2022-09-16 15:42:56 +08:00 |
|
Daniel Povey
|
3b450c2682
|
Bug fix in train.py, fix optimzier name
|
2022-09-16 14:10:42 +08:00 |
|
Daniel Povey
|
257c961b66
|
1st attempt at scaled_adam
|
2022-09-16 13:59:52 +08:00 |
|
Daniel Povey
|
276928655e
|
Merge branch 'pradam_exp1m8' into pradam_exp1m7s2
|
2022-08-24 04:17:30 +08:00 |
|
Daniel Povey
|
80beb9c8d7
|
Merge branch 'pradam_exp1n2' into pradam_exp1m7s2
|
2022-08-24 04:14:25 +08:00 |
|
Daniel Povey
|
64f7166545
|
Some cleanups
|
2022-08-18 07:03:50 +08:00 |
|
Daniel Povey
|
5c33899ddc
|
Increase cov_min[3] from 0.001 to 0.002
|
2022-08-06 16:28:02 +08:00 |
|
Daniel Povey
|
9bbf8ada57
|
Scale up diag of grad_cov by 1.0001 prior to diagonalizing it.
|
2022-08-06 07:06:23 +08:00 |
|
Daniel Povey
|
c021b4fec6
|
Increase cov_min[3] from 0.0001 to 0.001
|
2022-08-06 07:02:52 +08:00 |
|
Daniel Povey
|
a5b9b7b974
|
Cosmetic changes
|
2022-08-05 03:51:00 +08:00 |
|