Daniel Povey
|
b656b0df36
|
Changes that should affect nothing: bug fixes etc.
|
2023-06-18 04:00:43 +08:00 |
|
Daniel Povey
|
70bd58c648
|
Fix print_diagnostics break statement
|
2023-06-18 03:55:13 +08:00 |
|
Daniel Povey
|
e9668a5cfd
|
Fix break in fix_diagnostics mode
|
2023-06-18 03:36:13 +08:00 |
|
Daniel Povey
|
5a8cabd429
|
Fix max_eig arg to TensorDiagnosticsOptions
|
2023-06-18 03:28:57 +08:00 |
|
Daniel Povey
|
7d7fc45ab2
|
Revert model-size changes
|
2023-05-30 14:49:42 +08:00 |
|
Daniel Povey
|
d0309c3f3d
|
Increase penalty cutoff in NonlinAttention to 40.
|
2023-05-29 23:02:59 +08:00 |
|
Daniel Povey
|
09294c0b51
|
Merge branch 'zlm51' into zlm52
|
2023-05-29 20:01:27 +08:00 |
|
Daniel Povey
|
265e190946
|
Penalize large values in NonlinAttentionModule
|
2023-05-29 19:17:47 +08:00 |
|
Daniel Povey
|
e313674dc7
|
Reduce batch size to 15
|
2023-05-29 17:38:11 +08:00 |
|
Daniel Povey
|
5fbbeb1d29
|
Try batch size of 16
|
2023-05-29 17:34:00 +08:00 |
|
Daniel Povey
|
cd36d149df
|
Reduce encoder-dim and num-heads of center stack.
|
2023-05-29 17:32:49 +08:00 |
|
Daniel Povey
|
cdd9cf695f
|
Fix bug regarding --start-batch option
|
2023-05-29 16:41:54 +08:00 |
|
Daniel Povey
|
cbd59b9c68
|
Don't skip penalize_abs_values_gt due to memory cutoff; remove grad_scale=0.1
|
2023-05-29 16:29:48 +08:00 |
|
Daniel Povey
|
7fdd125ba9
|
Merge branch 'zlm50' into zlm51
|
2023-05-29 13:54:53 +08:00 |
|
Daniel Povey
|
f05f1a6353
|
Increase grad_scale and prob in score_balancer
|
2023-05-29 13:20:07 +08:00 |
|
Daniel Povey
|
0f27b14376
|
Support unbalanced structures
|
2023-05-29 13:13:29 +08:00 |
|
Daniel Povey
|
b85012aa0b
|
Merge branch 'zlm49' into zlm51
|
2023-05-29 12:20:43 +08:00 |
|
Daniel Povey
|
42f3ad0a11
|
Remove grad_scale=0.1
|
2023-05-29 11:55:18 +08:00 |
|
Daniel Povey
|
16e51a7deb
|
remove find_unused_parameters=True and use bypass module
|
2023-05-29 11:54:21 +08:00 |
|
Daniel Povey
|
38246c8690
|
Revert "find_unused_parameters=True removed"
This reverts commit ba337f8554c2b0b7e0ab3462027de59862cb95dc.
|
2023-05-29 11:51:09 +08:00 |
|
Daniel Povey
|
ba337f8554
|
find_unused_parameters=True removed
|
2023-05-29 11:47:03 +08:00 |
|
Daniel Povey
|
d975d59c7d
|
remove bypass_scale
|
2023-05-29 11:46:18 +08:00 |
|
Daniel Povey
|
d950496d5a
|
Increase grad_scale in score_balancer
|
2023-05-29 10:56:01 +08:00 |
|
Daniel Povey
|
79f1863a1e
|
Fix SoftmaxFunction bug
|
2023-05-29 10:55:03 +08:00 |
|
Daniel Povey
|
137ac513bf
|
Some changes to try to reduce mem consumption; decrease batch size
|
2023-05-28 21:50:34 +08:00 |
|
Daniel Povey
|
625e39fd1a
|
Avoid penalize_abs_values_gt when memory usage high
|
2023-05-28 20:40:47 +08:00 |
|
Daniel Povey
|
815cc1ba4f
|
Add another middle stack; batch size 18->16.
|
2023-05-28 20:23:30 +08:00 |
|
Daniel Povey
|
bc55fb96eb
|
Set final skip/bypass rates to zero
|
2023-05-28 16:30:28 +08:00 |
|
Daniel Povey
|
d045ef7ce7
|
Change default lr from 0.025 to 0.035
|
2023-05-28 15:42:54 +08:00 |
|
Daniel Povey
|
da80241179
|
Use larger valid set; get --print-diagnostics=True to work
|
2023-05-28 15:17:09 +08:00 |
|
Daniel Povey
|
105fb56db4
|
Make base-lr default 0.025
|
2023-05-24 16:30:23 +08:00 |
|
Daniel Povey
|
8483ca2e8f
|
More partial work
|
2023-05-24 16:04:05 +08:00 |
|
Daniel Povey
|
e51a2c9170
|
Partial work
|
2023-05-23 14:01:04 +08:00 |
|
Daniel Povey
|
bcc9971ebe
|
Add clip_grad
|
2023-05-23 14:00:56 +08:00 |
|
Daniel Povey
|
3351402875
|
Implement train mode in lm_datamodule
|
2023-05-23 11:08:05 +08:00 |
|
Daniel Povey
|
3a71a53d8d
|
Set lr_factor on to_scores, max_abs=4.0 on balancer
|
2023-05-23 10:56:03 +08:00 |
|
Daniel Povey
|
45043e2e21
|
Merge branch 'zlm25' into zlm26
|
2023-05-20 22:24:15 +08:00 |
|
Daniel Povey
|
8dc070ce37
|
Increase all ff dims; decrease batch size.
|
2023-05-20 13:35:23 +08:00 |
|
Daniel Povey
|
c1de4cc847
|
Remove factor of 2 in weights_discarded
|
2023-05-19 20:13:12 +08:00 |
|
Daniel Povey
|
4a425f7eb5
|
Half the time, flip weights_discarded
|
2023-05-19 18:04:05 +08:00 |
|
Daniel Povey
|
7d162bf41e
|
mOve where srand called
|
2023-05-19 16:43:21 +08:00 |
|
Daniel Povey
|
f37ec0f0da
|
Include start batch in seed
|
2023-05-19 16:39:13 +08:00 |
|
Daniel Povey
|
5fc0cce553
|
Introduce factor of 2 to more strongly penalize discarded weights.
|
2023-05-19 16:31:45 +08:00 |
|
Daniel Povey
|
824d7b4492
|
Add evaluate.py
|
2023-05-19 11:58:32 +08:00 |
|
Daniel Povey
|
fb758b3540
|
Fix f-string bug
|
2023-05-18 22:29:13 +08:00 |
|
Daniel Povey
|
769033c857
|
Increase eps; make it added not applied as floor.
|
2023-05-18 20:08:19 +08:00 |
|
Daniel Povey
|
57a023902c
|
Remove flipping of weights; reduce eps.
|
2023-05-18 19:50:16 +08:00 |
|
Daniel Povey
|
c487f9a0ef
|
Try removing weight_scale
|
2023-05-18 18:41:39 +08:00 |
|
Daniel Povey
|
d2c198c072
|
Implement weight_scale, set weight_scale=10
|
2023-05-18 15:48:14 +08:00 |
|
Daniel Povey
|
f6c7392430
|
Bug fix
|
2023-05-18 15:37:33 +08:00 |
|