Daniel Povey
|
d8f7310118
|
Add print statement
|
2022-09-29 14:15:29 +08:00 |
|
Daniel Povey
|
d398f0ed70
|
Decrease random_prob from 0.5 to 0.333
|
2022-09-29 13:55:33 +08:00 |
|
Daniel Povey
|
461ad3655a
|
Implement AttentionCombine as replacement for RandomCombine
|
2022-09-29 13:44:03 +08:00 |
|
Daniel Povey
|
e5a0d8929b
|
Remove unused out_balancer member
|
2022-09-27 13:10:59 +08:00 |
|
Daniel Povey
|
6b12f20995
|
Remove out_balancer and out_norm from conv modules
|
2022-09-27 12:25:11 +08:00 |
|
Daniel Povey
|
71b3756ada
|
Use half the dim per head, in self_attn layers.
|
2022-09-24 15:40:44 +08:00 |
|
Daniel Povey
|
ce3f59d9c7
|
Use dropout in attention, on attn weights.
|
2022-09-22 19:18:50 +08:00 |
|
Daniel Povey
|
24aea947d2
|
Fix issues where grad is None, and unused-grad cases
|
2022-09-22 19:18:16 +08:00 |
|
Daniel Povey
|
c16f795962
|
Avoid error in ddp by using last module'sc scores
|
2022-09-22 18:52:16 +08:00 |
|
Daniel Povey
|
0f85a3c2e5
|
Implement persistent attention scores
|
2022-09-22 18:47:16 +08:00 |
|
Daniel Povey
|
1d20c12bc0
|
Increase max_var_per_eig to 0.2
|
2022-09-22 12:28:35 +08:00 |
|
Daniel Povey
|
6eb9a0bc9b
|
Halve max_var_per_eig to 0.05
|
2022-09-20 14:39:17 +08:00 |
|
Daniel Povey
|
cd5ac76a05
|
Add max-var-per-eig in encoder layers
|
2022-09-20 14:22:07 +08:00 |
|
Daniel Povey
|
3d72a65de8
|
Implement max-eig-proportion..
|
2022-09-19 10:26:37 +08:00 |
|
Daniel Povey
|
0f567e27a5
|
Add max_var_per_eig in self-attn
|
2022-09-18 21:22:01 +08:00 |
|
Daniel Povey
|
76031a7c1d
|
Loosen some limits of activation balancers
|
2022-09-18 13:59:44 +08:00 |
|
Daniel Povey
|
3122637266
|
Use ScaledLinear where I previously had StructuredLinear
|
2022-09-17 13:18:58 +08:00 |
|
Daniel Povey
|
1a184596b6
|
A little code refactoring
|
2022-09-16 20:56:21 +08:00 |
|
Daniel Povey
|
e1182da6ac
|
Restoring min_abs and max_abs defaults for the linear_pos proj.
|
2022-07-31 05:07:50 +08:00 |
|
Daniel Povey
|
3857a87b47
|
Merge branch 'merge_refactor_param_cov_norank1_iter_batch_max4.0_pow0.5_fix2r_lrupdate200_2k_ns' into merge2_refactor_max4.0_pow0.5_200_1k_ma3.0
|
2022-07-17 15:32:43 +08:00 |
|
Daniel Povey
|
f36ebad618
|
Remove 2/3 StructuredLinear/StructuredConv1d modules, use linear/conv1d
|
2022-07-17 06:40:19 +08:00 |
|
Daniel Povey
|
de1fd91435
|
Adding max_abs=3.0 to ActivationBalancer modules inside feedoforward modules.
|
2022-07-16 07:19:26 +08:00 |
|
Daniel Povey
|
7f0756e156
|
Implement structured version of conformer
|
2022-06-17 15:10:21 +08:00 |
|
Daniel Povey
|
7338c60296
|
Remove Decorrelate()
|
2022-06-13 16:07:15 +08:00 |
|
Daniel Povey
|
d301f8ac6c
|
Merge Decorrelate work, and simplification to RandomCombine, into pruned_transducer_stateless7
|
2022-06-11 11:07:07 +08:00 |
|
Daniel Povey
|
bc5c782294
|
Limit magnitude of linear_pos
|
2022-06-01 10:40:54 +08:00 |
|
Daniel Povey
|
61619c031e
|
Add activation balancer to stop activations in self_attn from getting too large
|
2022-06-01 00:40:45 +08:00 |
|
Daniel Povey
|
1651fe0d42
|
Merge changes from pruned_transducer_stateless4->5
|
2022-05-31 13:00:11 +08:00 |
|
Daniel Povey
|
741dcd1d6d
|
Move pruned_transducer_stateless4 to pruned_transducer_stateless7
|
2022-05-31 12:45:28 +08:00 |
|