Daniel Povey
|
d0309c3f3d
|
Increase penalty cutoff in NonlinAttention to 40.
|
2023-05-29 23:02:59 +08:00 |
|
Daniel Povey
|
265e190946
|
Penalize large values in NonlinAttentionModule
|
2023-05-29 19:17:47 +08:00 |
|
Daniel Povey
|
cbd59b9c68
|
Don't skip penalize_abs_values_gt due to memory cutoff; remove grad_scale=0.1
|
2023-05-29 16:29:48 +08:00 |
|
Daniel Povey
|
7fdd125ba9
|
Merge branch 'zlm50' into zlm51
|
2023-05-29 13:54:53 +08:00 |
|
Daniel Povey
|
f05f1a6353
|
Increase grad_scale and prob in score_balancer
|
2023-05-29 13:20:07 +08:00 |
|
Daniel Povey
|
0f27b14376
|
Support unbalanced structures
|
2023-05-29 13:13:29 +08:00 |
|
Daniel Povey
|
b85012aa0b
|
Merge branch 'zlm49' into zlm51
|
2023-05-29 12:20:43 +08:00 |
|
Daniel Povey
|
42f3ad0a11
|
Remove grad_scale=0.1
|
2023-05-29 11:55:18 +08:00 |
|
Daniel Povey
|
16e51a7deb
|
remove find_unused_parameters=True and use bypass module
|
2023-05-29 11:54:21 +08:00 |
|
Daniel Povey
|
d975d59c7d
|
remove bypass_scale
|
2023-05-29 11:46:18 +08:00 |
|
Daniel Povey
|
d950496d5a
|
Increase grad_scale in score_balancer
|
2023-05-29 10:56:01 +08:00 |
|
Daniel Povey
|
137ac513bf
|
Some changes to try to reduce mem consumption; decrease batch size
|
2023-05-28 21:50:34 +08:00 |
|
Daniel Povey
|
625e39fd1a
|
Avoid penalize_abs_values_gt when memory usage high
|
2023-05-28 20:40:47 +08:00 |
|
Daniel Povey
|
bc55fb96eb
|
Set final skip/bypass rates to zero
|
2023-05-28 16:30:28 +08:00 |
|
Daniel Povey
|
8483ca2e8f
|
More partial work
|
2023-05-24 16:04:05 +08:00 |
|
Daniel Povey
|
e51a2c9170
|
Partial work
|
2023-05-23 14:01:04 +08:00 |
|
Daniel Povey
|
3a71a53d8d
|
Set lr_factor on to_scores, max_abs=4.0 on balancer
|
2023-05-23 10:56:03 +08:00 |
|
Daniel Povey
|
c1de4cc847
|
Remove factor of 2 in weights_discarded
|
2023-05-19 20:13:12 +08:00 |
|
Daniel Povey
|
4a425f7eb5
|
Half the time, flip weights_discarded
|
2023-05-19 18:04:05 +08:00 |
|
Daniel Povey
|
5fc0cce553
|
Introduce factor of 2 to more strongly penalize discarded weights.
|
2023-05-19 16:31:45 +08:00 |
|
Daniel Povey
|
fb758b3540
|
Fix f-string bug
|
2023-05-18 22:29:13 +08:00 |
|
Daniel Povey
|
769033c857
|
Increase eps; make it added not applied as floor.
|
2023-05-18 20:08:19 +08:00 |
|
Daniel Povey
|
57a023902c
|
Remove flipping of weights; reduce eps.
|
2023-05-18 19:50:16 +08:00 |
|
Daniel Povey
|
c487f9a0ef
|
Try removing weight_scale
|
2023-05-18 18:41:39 +08:00 |
|
Daniel Povey
|
d2c198c072
|
Implement weight_scale, set weight_scale=10
|
2023-05-18 15:48:14 +08:00 |
|
Daniel Povey
|
f6c7392430
|
Bug fix
|
2023-05-18 15:37:33 +08:00 |
|
Daniel Povey
|
cdfa388ac0
|
Revert optim schedule
|
2023-05-18 15:35:23 +08:00 |
|
Daniel Povey
|
299482d02d
|
More debug print
|
2023-05-18 15:12:57 +08:00 |
|
Daniel Povey
|
76e6726178
|
Implement random rotation of dims
|
2023-05-18 14:56:44 +08:00 |
|
Daniel Povey
|
d631ffec5b
|
indentation change
|
2023-05-18 14:49:56 +08:00 |
|
Daniel Povey
|
e976af699e
|
Remove unused variable
|
2023-05-18 14:17:31 +08:00 |
|
Daniel Povey
|
a514d23df7
|
Change how we penalize weights
|
2023-05-18 14:14:50 +08:00 |
|
Daniel Povey
|
9367ea3646
|
Don't drop last batch
|
2023-05-18 12:47:28 +08:00 |
|
Daniel Povey
|
24e8a7a8fd
|
Remove pointless assertion
|
2023-05-17 14:54:29 +08:00 |
|
Daniel Povey
|
62c34f15c6
|
Remove print statement
|
2023-05-17 13:22:02 +08:00 |
|
Daniel Povey
|
53410608a6
|
Try to implement test mode; fix issue where middle stack had not been
downsampled.
|
2023-05-17 13:03:19 +08:00 |
|
Daniel Povey
|
399a79ace6
|
Change chunk-size setup
|
2023-05-16 19:47:23 +08:00 |
|
Daniel Povey
|
e062c71076
|
Efficiency, small fix
|
2023-05-16 17:34:21 +08:00 |
|
Daniel Povey
|
cf93d1f129
|
Bug fix regarding chunk-size reshaping
|
2023-05-16 17:30:48 +08:00 |
|
Daniel Povey
|
5f5df4367d
|
Fix error in how src was reshaped
|
2023-05-16 17:19:47 +08:00 |
|
Daniel Povey
|
3f72813a96
|
Various bug fixes, implementing chunking
|
2023-05-16 16:27:09 +08:00 |
|
Daniel Povey
|
0006a4c4db
|
Implement chunk sizes, to the extent that the program runs.
|
2023-05-16 16:13:20 +08:00 |
|
Daniel Povey
|
8001a46758
|
Fix bugs
|
2023-05-15 22:49:43 +08:00 |
|
Daniel Povey
|
cc81ec4f8a
|
bug fix
|
2023-05-15 22:07:27 +08:00 |
|
Daniel Povey
|
0a76215fd7
|
Code cleanup
|
2023-05-15 22:01:19 +08:00 |
|
Daniel Povey
|
d2d0ce0335
|
Try to get rid of gradient blowup
|
2023-05-15 20:26:21 +08:00 |
|
Daniel Povey
|
a397a5973b
|
Increase num parameters
|
2023-05-15 20:11:20 +08:00 |
|
Daniel Povey
|
047c6ffc58
|
First version of subformer that runs.
|
2023-05-15 16:03:01 +08:00 |
|
Daniel Povey
|
1b8be0744f
|
Fix various bugs
|
2023-05-15 15:20:02 +08:00 |
|
Daniel Povey
|
f740282a1a
|
More progress on subformer
|
2023-05-15 10:57:48 +08:00 |
|