1863 Commits

Author SHA1 Message Date
Daniel Povey
4eb3e97848 Remove bias from SimpleUpsample, add one to AttentionDownsample 2022-12-16 17:59:15 +08:00
Daniel Povey
bc002a9eda Reduce const_attention_rate 2022-12-16 16:30:49 +08:00
Daniel Povey
66465c8be4 Give attention_skip_rate a longer tail 2022-12-16 15:12:04 +08:00
Daniel Povey
8e6c7ef3e2 Adjust default prob of ActivationBalancer. 2022-12-16 15:08:46 +08:00
Daniel Povey
56ac7354df Remove LinearWithAuxLoss; simplify schedule of prob in ActivationBalancer. 2022-12-16 15:07:42 +08:00
Daniel Povey
3213c18a22 Changes to schedules: _whitening_schedule longer, min_abs schedule on attention_squeeze+nonlin_attention shorter; dip in conv_skip_rate. 2022-12-16 14:58:15 +08:00
Daniel Povey
e84f525840 Fix test condition 2022-12-16 12:24:54 +08:00
Daniel Povey
53ab18a862 Ditch caching_eval; reduce params more. 2022-12-16 00:22:44 +08:00
Daniel Povey
083e5474c4 Reduce ConvNeXt parameters. 2022-12-16 00:21:04 +08:00
Daniel Povey
8d9301e225 Remove potentially wrong typing info 2022-12-15 23:47:41 +08:00
Daniel Povey
6caaa4e9c6 Bug fix in caching_eval, may make no difference. 2022-12-15 23:32:29 +08:00
Daniel Povey
f5d4fb092d Bug fix in caching_eval 2022-12-15 23:24:36 +08:00
Daniel Povey
d26ee2bf81 Try to implement caching evaluation for memory efficient training 2022-12-15 23:06:40 +08:00
Daniel Povey
f66c1600f4 Bug fix to printing code 2022-12-15 21:55:23 +08:00
Daniel Povey
076b18db60 Implement Nextformer-style frontend 2022-12-15 21:48:32 +08:00
Daniel Povey
864ff96322 Remove nonlin_skip_rate, introduce conv_skip_rate. 2022-12-15 19:27:29 +08:00
Daniel Povey
1506b83c7b Change nonlin_skip_rate to be conv_skip_rate. 2022-12-15 19:25:21 +08:00
Daniel Povey
37a8c30136 Merge branch 'scaled_adam_exp699' into scaled_adam_exp711 2022-12-15 00:24:56 +08:00
Daniel Povey
25834453db Merge branch 'scaled_adam_exp698' into scaled_adam_exp710
# Conflicts:
#	egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
2022-12-15 00:21:31 +08:00
Daniel Povey
9e79b296f2 Merge branch 'scaled_adam_exp708' into scaled_adam_exp709 2022-12-14 22:56:09 +08:00
Daniel Povey
aac9bebc62 Bug fix 2022-12-14 22:54:59 +08:00
Daniel Povey
9bc326a9b6 Merge branch 'scaled_adam_exp705' into scaled_adam_exp709 2022-12-14 21:41:50 +08:00
Daniel Povey
159f37ddeb Merge branch 'scaled_adam_exp700' into scaled_adam_exp709 2022-12-14 21:41:43 +08:00
Daniel Povey
cec2162a17 Merge branch 'scaled_adam_exp703' into scaled_adam_exp709 2022-12-14 21:41:32 +08:00
Daniel Povey
87df9f3215 Simplify schedules of output balancers for nonlin_attention_module and attention_squeeze. 2022-12-14 21:37:32 +08:00
Daniel Povey
930f1b8948 Reduce conv_module balancer2 min_abs from 0.75 to 0.5. 2022-12-13 23:01:49 +08:00
Daniel Povey
48445f22e4 Increase ratio from 2.0 to 3.0 on 2 whitening schedules 2022-12-13 22:50:21 +08:00
Daniel Povey
157f4074a2 Halve min_positive schedule of ConvolutionModule. 2022-12-13 21:41:15 +08:00
Daniel Povey
57040e382a Set all aux-loss probs to zero. 2022-12-13 19:25:08 +08:00
Daniel Povey
52d18e405e Change to balancer2 schedule of NonlinAttentionModule, remove peak at 8k. 2022-12-13 19:22:43 +08:00
Daniel Povey
117d418e27 Make nonlin_skip_rate nonzero and end after 20k iters; remove peak at 8k iteras of NonlinAttentionModule balancer2 min_abs. 2022-12-13 19:17:38 +08:00
Daniel Povey
8231350ac4 Make AttentionSqueeze dim smaller, at embed_dim // 2. 2022-12-13 18:54:46 +08:00
Daniel Povey
22204450db Make min_abs of AttentionSqueeze smaller, the same as nonlin_attention_module 2022-12-13 18:51:22 +08:00
Daniel Povey
8d75006d69 Merge branch 'scaled_adam_exp690' into scaled_adam_exp694 2022-12-13 18:48:05 +08:00
Daniel Povey
d2465492f9 Bug fix 2022-12-12 23:32:08 +08:00
Daniel Povey
b5e0676f14 Invoke the out_balancer of attention_squeeze 2022-12-12 23:31:22 +08:00
Daniel Povey
0522425ea8 Change min and max positive 2022-12-12 23:30:12 +08:00
Daniel Povey
7920fa7726 Add out_balancer for attention_squeeze, similar to nonlin_attention_module. 2022-12-12 23:29:42 +08:00
Daniel Povey
7de7753ea2 Change DoubleSwish to SwooshR in Conv2dSubsampling, double max_abs limits. 2022-12-12 15:58:36 +08:00
Daniel Povey
f4ff6188d9 Set max_abs values on Conv2dSubsampling module. 2022-12-11 19:29:35 +08:00
Daniel Povey
a01fc3b220 Change attentionSqueeze dim from 128 to 256. 2022-12-11 19:12:03 +08:00
Daniel Povey
05c7cb5c83 Reduce attention_squeeze dim from 512 to 128. 2022-12-11 18:51:01 +08:00
Daniel Povey
634f1a4b82 Hardcode AttentionSqueeze dim at 512. 2022-12-11 17:20:52 +08:00
Daniel Povey
2d0fe7637c Memory fix in WithLoss 2022-12-11 17:20:26 +08:00
Daniel Povey
0edaf4d25c Merge branch 'scaled_adam_exp667' into scaled_adam_exp671 2022-12-10 19:39:02 +08:00
Daniel Povey
d7dd3f6dac Merge branch 'scaled_adam_exp662' into scaled_adam_exp670 2022-12-10 18:04:21 +08:00
Daniel Povey
cb12014c31 Implement dropout for scores in AttentionDownsample 2022-12-10 16:09:51 +08:00
Daniel Povey
2f617fec43 Set nonlin_skip_rate to zero; make final min_abs value smaller in balancer2 of NonlinAttentionModule. 2022-12-10 00:21:51 +08:00
Daniel Povey
30c6e5b929 Make attention_squeeze use full dim. 2022-12-10 00:08:38 +08:00
Daniel Povey
0fc646f281 Merge branch 'scaled_adam_exp663' into scaled_adam_exp665 2022-12-10 00:07:37 +08:00