Daniel Povey
|
e1d741a632
|
Slight code cleanup/simplification
|
2022-10-06 14:29:51 +08:00 |
|
Daniel Povey
|
99d17d13cf
|
Merge branch 'scaled_adam_exp58' into scaled_adam_exp67
|
2022-10-06 14:27:12 +08:00 |
|
Daniel Povey
|
02eb7af824
|
Don't always apply the frame mask
|
2022-10-06 13:01:36 +08:00 |
|
Daniel Povey
|
0685ac792d
|
Remove layer dropout and model-level warmup
|
2022-10-06 12:36:42 +08:00 |
|
Daniel Povey
|
537c3537c0
|
Remove warmup
|
2022-10-06 12:33:43 +08:00 |
|
Daniel Povey
|
bb233d3449
|
Add debug info
|
2022-10-05 23:18:50 +08:00 |
|
Daniel Povey
|
040592a9e3
|
Fix eigs call
|
2022-10-05 16:22:33 +08:00 |
|
Daniel Povey
|
1cd7e93183
|
Fix bug setting layerdrop mask
|
2022-10-05 16:19:45 +08:00 |
|
Daniel Povey
|
61f62837fa
|
Fix bug RE self.training
|
2022-10-05 15:34:39 +08:00 |
|
Daniel Povey
|
81542832bf
|
Bug fices
|
2022-10-04 22:34:24 +08:00 |
|
Daniel Povey
|
5fe8cb134f
|
Remove final combination; implement layer drop that drops the final layers.
|
2022-10-04 22:19:44 +08:00 |
|
Daniel Povey
|
006fcc18cd
|
Introduce offset in layerdrop_scaleS
|
2022-10-04 12:06:35 +08:00 |
|
Daniel Povey
|
33c24e4114
|
Bug fix
|
2022-10-03 23:07:30 +08:00 |
|
Daniel Povey
|
a9f950a1f7
|
Make the scaling factors more global and the randomness of dropout more random
|
2022-10-03 22:49:32 +08:00 |
|
Daniel Povey
|
96e0d92fb7
|
Compute valid loss on batch 0.
|
2022-10-03 18:24:00 +08:00 |
|
Daniel Povey
|
88d0da7192
|
Simplify the learned scaling factor on the modules
|
2022-10-03 17:54:56 +08:00 |
|
Daniel Povey
|
b3af9f67ae
|
Implement efficient layer dropout
|
2022-10-03 17:19:16 +08:00 |
|
Daniel Povey
|
93dff29243
|
Introduce a scale dependent on the masking value
|
2022-10-03 14:34:37 +08:00 |
|
Daniel Povey
|
5a8995328f
|
Stop backprop bug
|
2022-10-03 13:33:01 +08:00 |
|
Daniel Povey
|
a0a1874415
|
Bug fix
|
2022-10-03 13:23:26 +08:00 |
|
Daniel Povey
|
c20fc3be14
|
Randomize order of some modules
|
2022-10-03 13:02:42 +08:00 |
|
Daniel Povey
|
1be455438a
|
Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change.
|
2022-10-02 14:00:36 +08:00 |
|
shcxlee
|
bf2c4a488e
|
Modified train.py of tedlium3 models (#597)
|
2022-10-02 13:01:15 +08:00 |
|
Daniel Povey
|
cf5f7e5dfd
|
Swap random_prob and single_prob, to reduce prob of being randomized.
|
2022-10-01 23:50:38 +08:00 |
|
Daniel Povey
|
8d517a69e4
|
Increase feature_mask_dropout_prob from 0.15 to 0.2.
|
2022-10-01 23:32:24 +08:00 |
|
Daniel Povey
|
e9326a7d16
|
Remove dropout from inside ConformerEncoderLayer, for adding to residuals
|
2022-10-01 13:13:10 +08:00 |
|
Daniel Povey
|
cc64f2f15c
|
Reduce feature_mask_dropout_prob from 0.25 to 0.15.
|
2022-10-01 12:24:07 +08:00 |
|
Daniel Povey
|
1eb603f4ad
|
Reduce single_prob from 0.5 to 0.25
|
2022-09-30 22:14:53 +08:00 |
|
Daniel Povey
|
ab7c940803
|
Include changes from Liyong about padding conformer module.
|
2022-09-30 18:37:31 +08:00 |
|
Daniel Povey
|
38f89053bd
|
Introduce feature mask per frame
|
2022-09-29 17:31:04 +08:00 |
|
Daniel Povey
|
056b9a4f9a
|
Apply single_prob mask, so sometimes we just get one layer as output.
|
2022-09-29 15:29:37 +08:00 |
|
Daniel Povey
|
d8f7310118
|
Add print statement
|
2022-09-29 14:15:29 +08:00 |
|
Daniel Povey
|
d398f0ed70
|
Decrease random_prob from 0.5 to 0.333
|
2022-09-29 13:55:33 +08:00 |
|
Daniel Povey
|
461ad3655a
|
Implement AttentionCombine as replacement for RandomCombine
|
2022-09-29 13:44:03 +08:00 |
|
Zengwei Yao
|
f3ad32777a
|
Gradient filter for training lstm model (#564)
* init files
* add gradient filter module
* refact getting median value
* add cutoff for grad filter
* delete comments
* apply gradient filter in LSTM module, to filter both input and params
* fix typing and refactor
* filter with soft mask
* rename lstm_transducer_stateless2 to lstm_transducer_stateless3
* fix typos, and update RESULTS.md
* minor fix
* fix return typing
* fix typo
|
2022-09-29 11:15:43 +08:00 |
|
LIyong.Guo
|
923b60a7c6
|
padding zeros (#591)
|
2022-09-28 21:20:33 +08:00 |
|
Daniel Povey
|
d6ef1bec5f
|
Change subsamplling factor from 1 to 2
|
2022-09-28 21:10:13 +08:00 |
|
Daniel Povey
|
14a2603ada
|
Bug fix
|
2022-09-28 20:59:24 +08:00 |
|
Daniel Povey
|
e5666628bd
|
Bug fix
|
2022-09-28 20:58:34 +08:00 |
|
Daniel Povey
|
df795912ed
|
Try to reproduce baseline but with current code with 2 encoder stacks, as a baseline
|
2022-09-28 20:56:40 +08:00 |
|
Fangjun Kuang
|
3b5846effa
|
Update kaldifeat in CI tests (#583)
|
2022-09-28 20:51:06 +08:00 |
|
Daniel Povey
|
1005ff35ba
|
Fix w.r.t. uneven upsampling
|
2022-09-28 13:57:26 +08:00 |
|
Daniel Povey
|
10a3061025
|
Simplify downsampling and upsampling
|
2022-09-28 13:49:11 +08:00 |
|
Daniel Povey
|
01af88c2f6
|
Various fixes
|
2022-09-27 16:09:30 +08:00 |
|
Daniel Povey
|
d34eafa623
|
Closer to working..
|
2022-09-27 15:47:58 +08:00 |
|
Daniel Povey
|
e5a0d8929b
|
Remove unused out_balancer member
|
2022-09-27 13:10:59 +08:00 |
|
Daniel Povey
|
6b12f20995
|
Remove out_balancer and out_norm from conv modules
|
2022-09-27 12:25:11 +08:00 |
|
Daniel Povey
|
76e66408c5
|
Some cosmetic improvements
|
2022-09-27 11:08:44 +08:00 |
|
Daniel Povey
|
71b3756ada
|
Use half the dim per head, in self_attn layers.
|
2022-09-24 15:40:44 +08:00 |
|
Daniel Povey
|
ce3f59d9c7
|
Use dropout in attention, on attn weights.
|
2022-09-22 19:18:50 +08:00 |
|