Daniel Povey
b3af9f67ae
Implement efficient layer dropout
2022-10-03 17:19:16 +08:00
Daniel Povey
93dff29243
Introduce a scale dependent on the masking value
2022-10-03 14:34:37 +08:00
Daniel Povey
5a8995328f
Stop backprop bug
2022-10-03 13:33:01 +08:00
Daniel Povey
a0a1874415
Bug fix
2022-10-03 13:23:26 +08:00
Daniel Povey
c20fc3be14
Randomize order of some modules
2022-10-03 13:02:42 +08:00
Daniel Povey
1be455438a
Decrease feature_mask_dropout_prob back from 0.2 to 0.15, i.e. revert the 43->48 change.
2022-10-02 14:00:36 +08:00
shcxlee
bf2c4a488e
Modified train.py of tedlium3 models ( #597 )
2022-10-02 13:01:15 +08:00
Daniel Povey
cf5f7e5dfd
Swap random_prob and single_prob, to reduce prob of being randomized.
2022-10-01 23:50:38 +08:00
Daniel Povey
8d517a69e4
Increase feature_mask_dropout_prob from 0.15 to 0.2.
2022-10-01 23:32:24 +08:00
Daniel Povey
e9326a7d16
Remove dropout from inside ConformerEncoderLayer, for adding to residuals
2022-10-01 13:13:10 +08:00
Daniel Povey
cc64f2f15c
Reduce feature_mask_dropout_prob from 0.25 to 0.15.
2022-10-01 12:24:07 +08:00
Daniel Povey
1eb603f4ad
Reduce single_prob from 0.5 to 0.25
2022-09-30 22:14:53 +08:00
Daniel Povey
ab7c940803
Include changes from Liyong about padding conformer module.
2022-09-30 18:37:31 +08:00
Daniel Povey
38f89053bd
Introduce feature mask per frame
2022-09-29 17:31:04 +08:00
Daniel Povey
056b9a4f9a
Apply single_prob mask, so sometimes we just get one layer as output.
2022-09-29 15:29:37 +08:00
Daniel Povey
d8f7310118
Add print statement
2022-09-29 14:15:29 +08:00
Daniel Povey
d398f0ed70
Decrease random_prob from 0.5 to 0.333
2022-09-29 13:55:33 +08:00
Daniel Povey
461ad3655a
Implement AttentionCombine as replacement for RandomCombine
2022-09-29 13:44:03 +08:00
Zengwei Yao
f3ad32777a
Gradient filter for training lstm model ( #564 )
...
* init files
* add gradient filter module
* refact getting median value
* add cutoff for grad filter
* delete comments
* apply gradient filter in LSTM module, to filter both input and params
* fix typing and refactor
* filter with soft mask
* rename lstm_transducer_stateless2 to lstm_transducer_stateless3
* fix typos, and update RESULTS.md
* minor fix
* fix return typing
* fix typo
2022-09-29 11:15:43 +08:00
LIyong.Guo
923b60a7c6
padding zeros ( #591 )
2022-09-28 21:20:33 +08:00
Daniel Povey
d6ef1bec5f
Change subsamplling factor from 1 to 2
2022-09-28 21:10:13 +08:00
Daniel Povey
14a2603ada
Bug fix
2022-09-28 20:59:24 +08:00
Daniel Povey
e5666628bd
Bug fix
2022-09-28 20:58:34 +08:00
Daniel Povey
df795912ed
Try to reproduce baseline but with current code with 2 encoder stacks, as a baseline
2022-09-28 20:56:40 +08:00
Fangjun Kuang
3b5846effa
Update kaldifeat in CI tests ( #583 )
2022-09-28 20:51:06 +08:00
Daniel Povey
1005ff35ba
Fix w.r.t. uneven upsampling
2022-09-28 13:57:26 +08:00
Daniel Povey
10a3061025
Simplify downsampling and upsampling
2022-09-28 13:49:11 +08:00
Daniel Povey
01af88c2f6
Various fixes
2022-09-27 16:09:30 +08:00
Daniel Povey
d34eafa623
Closer to working..
2022-09-27 15:47:58 +08:00
Daniel Povey
e5a0d8929b
Remove unused out_balancer member
2022-09-27 13:10:59 +08:00
Daniel Povey
6b12f20995
Remove out_balancer and out_norm from conv modules
2022-09-27 12:25:11 +08:00
Daniel Povey
76e66408c5
Some cosmetic improvements
2022-09-27 11:08:44 +08:00
Daniel Povey
71b3756ada
Use half the dim per head, in self_attn layers.
2022-09-24 15:40:44 +08:00
Daniel Povey
ce3f59d9c7
Use dropout in attention, on attn weights.
2022-09-22 19:18:50 +08:00
Daniel Povey
24aea947d2
Fix issues where grad is None, and unused-grad cases
2022-09-22 19:18:16 +08:00
Daniel Povey
c16f795962
Avoid error in ddp by using last module'sc scores
2022-09-22 18:52:16 +08:00
Daniel Povey
0f85a3c2e5
Implement persistent attention scores
2022-09-22 18:47:16 +08:00
Daniel Povey
03a77f8ae5
Merge branch 'scaled_adam_exp7c' into scaled_adam_exp11c
2022-09-22 18:15:44 +08:00
Daniel Povey
ceadfad48d
Reduce debug freq
2022-09-22 12:30:49 +08:00
Daniel Povey
1d20c12bc0
Increase max_var_per_eig to 0.2
2022-09-22 12:28:35 +08:00
Fangjun Kuang
9ae2f3a3c5
Small fixes to the transducer training doc ( #575 )
2022-09-21 14:20:49 +08:00
Fangjun Kuang
099cd3a215
support exporting to ncnn format via PNNX ( #571 )
2022-09-20 22:52:49 +08:00
Daniel Povey
e2fdfe990c
Loosen limit on param_max_rms, from 2.0 to 3.0; change how param_min_rms is applied.
2022-09-20 15:20:43 +08:00
Daniel Povey
6eb9a0bc9b
Halve max_var_per_eig to 0.05
2022-09-20 14:39:17 +08:00
Daniel Povey
cd5ac76a05
Add max-var-per-eig in encoder layers
2022-09-20 14:22:07 +08:00
Daniel Povey
db1f4ccdd1
4x scale on max-eig constraint
2022-09-20 14:20:13 +08:00
Teo Wen Shen
436942211c
Adding Dockerfile for Ubuntu18.04-pytorch1.12.1-cuda11.3-cudnn8 ( #572 )
...
* Changed Dockerfile
* Update Dockerfile
* Dockerfile
* Update README.md
* Add Dockerfiles
* Update README.md
Removed misleading CUDA version, as the Ubuntu18.04-pytorch1.7.1-cuda11.0-cudnn8 Dockerfile can only support CUDA versions >11.0.
2022-09-20 10:52:24 +08:00
Daniel Povey
3d72a65de8
Implement max-eig-proportion..
2022-09-19 10:26:37 +08:00
Daniel Povey
5f27cbdb44
Merge branch 'scaled_adam_exp4_max_var_per_eig' into scaled_adam_exp7
...
# Conflicts:
# egs/librispeech/ASR/pruned_transducer_stateless7/conformer.py
2022-09-18 21:23:59 +08:00
Daniel Povey
0f567e27a5
Add max_var_per_eig in self-attn
2022-09-18 21:22:01 +08:00