1894 Commits

Author SHA1 Message Date
yaozengwei
42800f775e remove score sorting in test mode 2023-08-02 19:26:48 +08:00
Daniel Povey
74bf02bba6 Load num_tokens_seen from disk on checkpoint load. 2023-06-20 02:54:47 +08:00
Daniel Povey
b3b3e5daa0 Pad only on the right 2023-06-20 01:58:27 +08:00
Daniel Povey
85b6450a8a Remove old code 2023-06-19 07:45:57 +08:00
Daniel Povey
6c3ab1e706 Fixes 2023-06-19 04:59:57 +08:00
Daniel Povey
03ad0d7910 Remove concept of epochs from training subformer for language modeling;
revert dimensions to how they were in zlm53.
2023-06-19 04:45:37 +08:00
Daniel Povey
c7e8a7349d Increase dim of middle satck from 512 to 768 2023-06-19 02:16:59 +08:00
Daniel Povey
01ed3bbcc4 Make encoder dims mostly 512. 2023-06-18 04:02:21 +08:00
Daniel Povey
b656b0df36 Changes that should affect nothing: bug fixes etc. 2023-06-18 04:00:43 +08:00
Daniel Povey
70bd58c648 Fix print_diagnostics break statement 2023-06-18 03:55:13 +08:00
Daniel Povey
e9668a5cfd Fix break in fix_diagnostics mode 2023-06-18 03:36:13 +08:00
Daniel Povey
5a8cabd429 Fix max_eig arg to TensorDiagnosticsOptions 2023-06-18 03:28:57 +08:00
Daniel Povey
7d7fc45ab2 Revert model-size changes 2023-05-30 14:49:42 +08:00
Daniel Povey
d0309c3f3d Increase penalty cutoff in NonlinAttention to 40. 2023-05-29 23:02:59 +08:00
Daniel Povey
09294c0b51 Merge branch 'zlm51' into zlm52 2023-05-29 20:01:27 +08:00
Daniel Povey
265e190946 Penalize large values in NonlinAttentionModule 2023-05-29 19:17:47 +08:00
Daniel Povey
e313674dc7 Reduce batch size to 15 2023-05-29 17:38:11 +08:00
Daniel Povey
5fbbeb1d29 Try batch size of 16 2023-05-29 17:34:00 +08:00
Daniel Povey
cd36d149df Reduce encoder-dim and num-heads of center stack. 2023-05-29 17:32:49 +08:00
Daniel Povey
cdd9cf695f Fix bug regarding --start-batch option 2023-05-29 16:41:54 +08:00
Daniel Povey
cbd59b9c68 Don't skip penalize_abs_values_gt due to memory cutoff; remove grad_scale=0.1 2023-05-29 16:29:48 +08:00
Daniel Povey
7fdd125ba9 Merge branch 'zlm50' into zlm51 2023-05-29 13:54:53 +08:00
Daniel Povey
f05f1a6353 Increase grad_scale and prob in score_balancer 2023-05-29 13:20:07 +08:00
Daniel Povey
0f27b14376 Support unbalanced structures 2023-05-29 13:13:29 +08:00
Daniel Povey
b85012aa0b Merge branch 'zlm49' into zlm51 2023-05-29 12:20:43 +08:00
Daniel Povey
42f3ad0a11 Remove grad_scale=0.1 2023-05-29 11:55:18 +08:00
Daniel Povey
16e51a7deb remove find_unused_parameters=True and use bypass module 2023-05-29 11:54:21 +08:00
Daniel Povey
38246c8690 Revert "find_unused_parameters=True removed"
This reverts commit ba337f8554c2b0b7e0ab3462027de59862cb95dc.
2023-05-29 11:51:09 +08:00
Daniel Povey
ba337f8554 find_unused_parameters=True removed 2023-05-29 11:47:03 +08:00
Daniel Povey
d975d59c7d remove bypass_scale 2023-05-29 11:46:18 +08:00
Daniel Povey
d950496d5a Increase grad_scale in score_balancer 2023-05-29 10:56:01 +08:00
Daniel Povey
79f1863a1e Fix SoftmaxFunction bug 2023-05-29 10:55:03 +08:00
Daniel Povey
137ac513bf Some changes to try to reduce mem consumption; decrease batch size 2023-05-28 21:50:34 +08:00
Daniel Povey
625e39fd1a Avoid penalize_abs_values_gt when memory usage high 2023-05-28 20:40:47 +08:00
Daniel Povey
815cc1ba4f Add another middle stack; batch size 18->16. 2023-05-28 20:23:30 +08:00
Daniel Povey
bc55fb96eb Set final skip/bypass rates to zero 2023-05-28 16:30:28 +08:00
Daniel Povey
d045ef7ce7 Change default lr from 0.025 to 0.035 2023-05-28 15:42:54 +08:00
Daniel Povey
da80241179 Use larger valid set; get --print-diagnostics=True to work 2023-05-28 15:17:09 +08:00
Daniel Povey
105fb56db4 Make base-lr default 0.025 2023-05-24 16:30:23 +08:00
Daniel Povey
8483ca2e8f More partial work 2023-05-24 16:04:05 +08:00
Daniel Povey
e51a2c9170 Partial work 2023-05-23 14:01:04 +08:00
Daniel Povey
bcc9971ebe Add clip_grad 2023-05-23 14:00:56 +08:00
Daniel Povey
3351402875 Implement train mode in lm_datamodule 2023-05-23 11:08:05 +08:00
Daniel Povey
3a71a53d8d Set lr_factor on to_scores, max_abs=4.0 on balancer 2023-05-23 10:56:03 +08:00
Daniel Povey
45043e2e21 Merge branch 'zlm25' into zlm26 2023-05-20 22:24:15 +08:00
Daniel Povey
8dc070ce37 Increase all ff dims; decrease batch size. 2023-05-20 13:35:23 +08:00
Daniel Povey
c1de4cc847 Remove factor of 2 in weights_discarded 2023-05-19 20:13:12 +08:00
Daniel Povey
4a425f7eb5 Half the time, flip weights_discarded 2023-05-19 18:04:05 +08:00
Daniel Povey
7d162bf41e mOve where srand called 2023-05-19 16:43:21 +08:00
Daniel Povey
f37ec0f0da Include start batch in seed 2023-05-19 16:39:13 +08:00