Daniel Povey
|
80b2c751e3
|
Merge branch 'scaled_adam_exp896' into scaled_adam_exp904
|
2023-01-16 13:18:42 +08:00 |
|
Daniel Povey
|
ed65330261
|
RemoveAttentionSqueeze
|
2023-01-16 13:18:29 +08:00 |
|
Daniel Povey
|
fb30d11693
|
Merge branch 'scaled_adam_exp891' into scaled_adam_exp896
|
2023-01-15 12:52:41 +08:00 |
|
Daniel Povey
|
048b6b6259
|
Make scale in NonlinAttention have glu nonlinearity.
|
2023-01-15 00:21:01 +08:00 |
|
Daniel Povey
|
eeadc3b0cc
|
Add a multiplication to NonlinAttentionModule
|
2023-01-14 20:41:30 +08:00 |
|
Daniel Povey
|
4fe91ce67c
|
Double hidden_channels in NonlinAttention from embed_dim//4 to embed_dim//2.
|
2023-01-14 17:19:34 +08:00 |
|
Daniel Povey
|
ec8804283c
|
Try to make SmallConvolutionModule more efficient
|
2023-01-14 14:54:46 +08:00 |
|
Daniel Povey
|
167b58baa0
|
Make output dim of Zipformer be max dim
|
2023-01-14 14:29:29 +08:00 |
|
Daniel Povey
|
fb7a967276
|
Increase unmasked dims
|
2023-01-13 17:38:11 +08:00 |
|
Daniel Povey
|
bebc27f274
|
Increasing encoder-dim of some layers, and unmasked-dim
|
2023-01-13 17:36:45 +08:00 |
|
Daniel Povey
|
e6af583ee1
|
Increase encoder-dim of slowest stack from 320 to 384
|
2023-01-13 14:40:42 +08:00 |
|
Daniel Povey
|
a88587dc8a
|
Fix comment; have 6, not 4, layers in most-downsampled stack.
|
2023-01-13 00:12:46 +08:00 |
|
Daniel Povey
|
5958f1ee11
|
Remove memory-allocated printouts
|
2023-01-12 22:14:52 +08:00 |
|
Daniel Povey
|
bac72718f0
|
Bug fixes, config changes
|
2023-01-12 22:11:42 +08:00 |
|
Daniel Povey
|
d3b3592986
|
Fix bug to allow down+up sampling
|
2023-01-12 21:18:34 +08:00 |
|
Daniel Povey
|
1e04c3d892
|
Reduce dimension for speed, have varying dims
|
2023-01-12 21:15:39 +08:00 |
|
Daniel Povey
|
9e4b84f374
|
Simplify Conv2dSubsampling, removing all but one ConvNext layer
|
2023-01-12 20:14:51 +08:00 |
|
Daniel Povey
|
65f15c9d14
|
Reduce final_layerdrop_rate coefficient.
|
2023-01-12 20:00:49 +08:00 |
|
Daniel Povey
|
3fdfec1049
|
Replace dropout2 on Conv2dSubsampling with Dropout3, share time dim
|
2023-01-11 13:18:08 +08:00 |
|
Daniel Povey
|
1774853bdf
|
Remove caching eval
|
2023-01-11 13:12:25 +08:00 |
|
Daniel Povey
|
1580c1c1cc
|
Fix MulForDropout3
|
2023-01-11 12:26:41 +08:00 |
|
Daniel Povey
|
8bbcd81604
|
Memory efficient backprop for dropout3
|
2023-01-10 17:46:32 +08:00 |
|
Daniel Povey
|
4033000730
|
Share dropout masks across time in ff modules
|
2023-01-10 17:12:32 +08:00 |
|
Daniel Povey
|
3110ed045a
|
Increase base final_layerdrop_rate from 0.035 to 0.05
|
2023-01-09 23:32:36 +08:00 |
|
Daniel Povey
|
1d40239d69
|
Merge branch 'scaled_adam_exp872' into scaled_adam_exp873
|
2023-01-09 14:52:48 +08:00 |
|
Daniel Povey
|
e739d8aa38
|
Fix layer_skip_rate so it's actually used, increase its value.
|
2023-01-09 13:34:32 +08:00 |
|
Daniel Povey
|
1a0155fcb5
|
Merge branch 'scaled_adam_exp863' into scaled_adam_exp870
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/scaling.py
|
2023-01-08 23:36:29 +08:00 |
|
Daniel Povey
|
326cb75033
|
Increase layer_skip_rate slightly
|
2023-01-08 15:48:23 +08:00 |
|
Daniel Povey
|
62b42887b4
|
Revert zipformer.py to status on previous commit
|
2023-01-08 13:17:39 +08:00 |
|
Daniel Povey
|
e952598677
|
Merge branch 'scaled_adam_exp846' into scaled_adam_exp866
|
2023-01-08 13:16:24 +08:00 |
|
Daniel Povey
|
117db124d0
|
Implement higher layerdrop for central stacks
|
2023-01-08 13:16:10 +08:00 |
|
Daniel Povey
|
c7107ead64
|
Fix bug in get_adjusted_batch_count
|
2023-01-07 17:45:22 +08:00 |
|
Daniel Povey
|
b3527fe4ac
|
Implement caching evaluation for ConvNeXt
|
2023-01-07 17:31:20 +08:00 |
|
Daniel Povey
|
9242800d42
|
Remove the 8x-subsampled stack
|
2023-01-07 12:59:57 +08:00 |
|
Daniel Povey
|
ef48019d6e
|
Reduce feedforward-dims
|
2023-01-06 22:26:58 +08:00 |
|
Daniel Povey
|
9b0c0aabb2
|
Merge branch 'scaled_adam_exp829' into scaled_adam_exp860
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
|
2023-01-06 22:24:45 +08:00 |
|
Daniel Povey
|
6a762914bf
|
Increase base-lr from 0.05 t to 0.055
|
2023-01-06 13:35:57 +08:00 |
|
Daniel Povey
|
5564a0efb0
|
Further tune lr scales; increase base-lr
|
2023-01-06 13:34:48 +08:00 |
|
Daniel Povey
|
f6f088489d
|
Adjust lr_scales, make them closer to 1.
|
2023-01-05 23:49:42 +08:00 |
|
Daniel Povey
|
ccc38a97f7
|
Reduce lr_scales of soem sub modules
|
2023-01-05 18:50:04 +08:00 |
|
Daniel Povey
|
90c02b471c
|
Revert base LR to 0.05
|
2023-01-05 16:27:43 +08:00 |
|
Daniel Povey
|
067b861c70
|
Use largest LR for printing
|
2023-01-05 14:46:15 +08:00 |
|
Daniel Povey
|
6c7fd8c046
|
Increase base-lr to 0.06
|
2023-01-05 14:23:59 +08:00 |
|
Daniel Povey
|
95e8296014
|
Use downsampling_factor ** -0.333 as the scale for stacks
|
2023-01-05 14:23:40 +08:00 |
|
Daniel Povey
|
0d7161ebec
|
Use get_parameter_groups_with_lr in train.py; bug fixes
|
2023-01-05 14:11:33 +08:00 |
|
Daniel Povey
|
1db509ea31
|
Attempt to implement slower learning for downsampled modules
|
2023-01-05 13:39:22 +08:00 |
|
Daniel Povey
|
b7be18c2f8
|
Keep only needed changes from Liyong's branch
|
2023-01-05 12:23:32 +08:00 |
|
Daniel Povey
|
096ebeaf23
|
take a couple files from liyong's branch
|
2023-01-05 12:01:42 +08:00 |
|
Daniel Povey
|
22b4a417dd
|
Implement extra_layerdrop
|
2023-01-04 20:59:58 +08:00 |
|
Daniel Povey
|
b973929d7c
|
Bug fixes to ScheduledFloat
|
2023-01-04 20:54:05 +08:00 |
|