Daniel Povey
|
6ea1706e11
|
Fix potential/theoretical issue in backward of LimitParamValue
|
2022-11-14 23:31:00 +08:00 |
|
Daniel Povey
|
d1df919547
|
Cosmetic improvements
|
2022-11-14 23:26:33 +08:00 |
|
Daniel Povey
|
46bd93b792
|
Cosmetic fix
|
2022-11-14 23:17:20 +08:00 |
|
Daniel Povey
|
a680c7de2e
|
Make bypass_scale be a tensor.
|
2022-11-14 19:12:16 +08:00 |
|
Daniel Povey
|
ff6431ed0f
|
Implement limits on parameter values a different way.
|
2022-11-14 16:02:38 +08:00 |
|
Daniel Povey
|
ce4b50d094
|
Revert making the dropout of pos_emb independent across the batch.
|
2022-11-14 15:34:39 +08:00 |
|
Daniel Povey
|
804917837e
|
Remove pos_emb csales
|
2022-11-14 15:32:54 +08:00 |
|
Daniel Povey
|
ba69eb48fe
|
Remove pos_emb schedule
|
2022-11-14 15:31:56 +08:00 |
|
Daniel Povey
|
54048009db
|
Fix self.training condition
|
2022-11-14 15:15:24 +08:00 |
|
Daniel Povey
|
e1fb25262a
|
Refactorize the scheduling code a little
|
2022-11-14 14:52:27 +08:00 |
|
Daniel Povey
|
b32dec1119
|
Add printing capability
|
2022-11-14 14:16:28 +08:00 |
|
Daniel Povey
|
4c8575878a
|
Bug fix in ScheduledSampler
|
2022-11-14 13:52:14 +08:00 |
|
Daniel Povey
|
614b5b1a52
|
Treat batch_idx==0.0 separately to get scan_pessimistic_batches_for_oom() to work. should not affect results.
|
2022-11-14 13:20:31 +08:00 |
|
Daniel Povey
|
cde4ca27ee
|
Introduce a dropout schedule for the pos embedding, in training time.
|
2022-11-14 13:00:30 +08:00 |
|
Daniel Povey
|
cd4730b657
|
Try to refactor the code for scheduling
|
2022-11-14 12:50:24 +08:00 |
|
Daniel Povey
|
aa0b1a37cd
|
Change to valid interval for libri-100
|
2022-11-13 23:29:17 +08:00 |
|
Daniel Povey
|
a256425b2f
|
Reduce dropout_rate for RelPositionalEncoding from 0.2 to 0.15;
|
2022-11-13 23:29:07 +08:00 |
|
Daniel Povey
|
a245d39e4c
|
Reduce pos-dim from 128 to 64.
|
2022-11-13 15:29:17 +08:00 |
|
Daniel Povey
|
463fed3d6a
|
Use compression of large x in the formula for pos_emb
|
2022-11-13 13:23:42 +08:00 |
|
Daniel Povey
|
6c16d08b4f
|
Add bias in interior of SelfAttn module
|
2022-11-13 11:58:01 +08:00 |
|
Daniel Povey
|
4a5a13b678
|
Increase dropout rate for PosEmb from 0.1 to 0.2.
|
2022-11-12 23:26:58 +08:00 |
|
Daniel Povey
|
70408d22fe
|
Add trainable scales for pos_emb
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/zipformer.py
|
2022-11-12 23:25:17 +08:00 |
|
Daniel Povey
|
603be9933b
|
Reducd pos-head-dim from 8 to 2
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/train.py
|
2022-11-12 23:22:55 +08:00 |
|
Daniel Povey
|
e67d4ca40d
|
Make pos_emb be dropped out independently across batch
|
2022-11-12 19:21:29 +08:00 |
|
Daniel Povey
|
4988c815c9
|
Use more attention heads in slowest layer.
|
2022-11-11 22:56:14 +08:00 |
|
Daniel Povey
|
f7aff4f507
|
Revert "Make sub-module dropped out independently."
This reverts commit 3ff3f440ee6d2a367cc3cc45e40f8eb69d122861.
|
2022-11-11 21:36:36 +08:00 |
|
Daniel Povey
|
742bcaa340
|
Comment
|
2022-11-10 23:26:36 +08:00 |
|
Daniel Povey
|
2c6f5e82b2
|
Use atan not tanh
|
2022-11-10 22:36:39 +08:00 |
|
Daniel Povey
|
60274ea731
|
New formula for pos emb
|
2022-11-10 22:06:05 +08:00 |
|
Daniel Povey
|
fd26b890d2
|
Tweak formula for widths
|
2022-11-10 13:04:56 +08:00 |
|
Daniel Povey
|
6091146e91
|
Change the formula for the embedding to be a bit more symmetric.
|
2022-11-10 11:39:37 +08:00 |
|
Daniel Povey
|
082b93d911
|
Remove unused variable.
|
2022-11-10 11:18:10 +08:00 |
|
Daniel Povey
|
125ea04a42
|
Rework positional encoding
|
2022-11-09 20:48:27 +08:00 |
|
Daniel Povey
|
e4a3b2da7d
|
Mostly-cosmetic fixes found via mypy
|
2022-11-09 17:40:09 +08:00 |
|
Daniel Povey
|
308059edba
|
Cosmetic fixes
|
2022-11-09 17:14:18 +08:00 |
|
Daniel Povey
|
f8210e1d80
|
Reduce feedforward dim of the 4th and 5th encoder stacks.
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/train.py
|
2022-11-09 14:52:44 +08:00 |
|
Daniel Povey
|
d1d4be8ecc
|
Remove debug statement
|
2022-11-09 14:18:23 +08:00 |
|
Daniel Povey
|
3ff3f440ee
|
Make sub-module dropped out independently.
|
2022-11-09 14:15:56 +08:00 |
|
Daniel Povey
|
423f9e3026
|
Increase query-head-dim from 24 to 32.
|
2022-11-09 13:28:29 +08:00 |
|
Daniel Povey
|
364a4c3838
|
Reduce pos_dim from 384 to 128.
|
2022-11-09 13:27:27 +08:00 |
|
Daniel Povey
|
cc260711b8
|
Make pos_dim the same as it was in scaled_adam_exp229.. although this was probably too high.
|
2022-11-09 13:26:18 +08:00 |
|
Daniel Povey
|
cba194aa26
|
Bug fix RE masking
|
2022-11-09 13:12:34 +08:00 |
|
Daniel Povey
|
20e6d2a157
|
Rework zipformer code for clarity and extensibility
|
2022-11-09 12:56:07 +08:00 |
|
Daniel Povey
|
797a0e6ce7
|
Change order of convolution and nonlin-attention modules
|
2022-11-08 20:00:25 +08:00 |
|
Daniel Povey
|
36bff9b369
|
Fix to comment
|
2022-11-07 12:33:12 +08:00 |
|
Daniel Povey
|
47f42ef5db
|
Replace the 1st of the ConvolutionModules with NonlinAttentionModule
|
2022-11-05 14:19:43 +08:00 |
|
Daniel Povey
|
eb6e2b5a1d
|
Have 2 squeeze-excite modules per layer, using different attention heads.
|
2022-11-04 17:40:51 +08:00 |
|
Daniel Povey
|
efbe20694f
|
Use the attention weights as input for the ModifiedSEModule
|
2022-11-04 16:01:07 +08:00 |
|
Daniel Povey
|
0d94783e76
|
Instead of a pooling operation, use the first bottleneck_dim dimensions of the preceding self_attn.forward2 as the input to the squeeze-excite module.
|
2022-11-04 15:16:59 +08:00 |
|
Daniel Povey
|
c27ee8cfcf
|
Merge branch 'scaled_adam_exp277' into scaled_adam_exp281
|
2022-11-04 15:06:23 +08:00 |
|