Daniel Povey
a61e21ac85
Change beta to 0.9
2022-06-09 23:33:05 +08:00
Daniel Povey
2c5ebc065e
Change eps to 1e-20
2022-06-09 23:24:33 +08:00
Daniel Povey
c533f91fa2
Remove one line..
2022-06-09 23:13:16 +08:00
Daniel Povey
0fd2cb141f
Code cleanup and refactoring
2022-06-09 22:54:56 +08:00
Daniel Povey
2621cb7f54
Change beta to 0.8
2022-06-09 20:17:12 +08:00
Daniel Povey
082a890635
Fix apply_prob_decay to 500
2022-06-09 19:20:03 +08:00
Daniel Povey
fca844d80c
Make it really have 2k decay and revert to 0.02 scale
2022-06-09 17:45:11 +08:00
Daniel Povey
e99344f15e
Increase scale to 0.04
2022-06-09 13:24:31 +08:00
Daniel Povey
bfcd288afd
Decrease scale on decorrelate component from 0.02 to 0.01
2022-06-09 12:07:19 +08:00
Daniel Povey
56d6dd55ae
Bug fixes
2022-06-09 12:06:35 +08:00
Daniel Povey
1669e21c0c
Use decorrelation in conformer layers also
2022-06-09 11:31:52 +08:00
Daniel Povey
b9a476c7bb
Remove loss factor from decorr_loss_scale
2022-06-08 20:19:17 +08:00
Daniel Povey
8e56445c70
Try to resolve graph-freed problem
2022-06-08 20:07:35 +08:00
Daniel Povey
46ca1cd4c4
Add Decorrelate module that adds something to gradients in backward pass
2022-06-08 19:44:58 +08:00
Daniel Povey
9fb8645168
Implement JoinDropout
2022-06-08 16:17:42 +08:00
Daniel Povey
e7886d49a9
Bug fix
2022-06-08 11:05:29 +08:00
Daniel Povey
a83bde1372
Simplify implementation as current idea was not working to decorrelate
2022-06-08 10:24:41 +08:00
Daniel Povey
135be1e19c
Change dropout_rate from 0.2 to 0.1; fix logging statement; fix assignment to rand_scales, nonrand_scales to use [:]
2022-06-08 00:42:04 +08:00
Daniel Povey
a6050cb2de
Implement new, more principled but maybe slower version.
2022-06-07 23:38:38 +08:00
Daniel Povey
75c822c7e9
Pre and post-multiply by inv_sqrt_stddev,stddev
2022-06-07 20:32:18 +08:00
Daniel Povey
a270973b69
Add gaussian version of decorrelation
2022-06-07 18:55:48 +08:00
Daniel Povey
5d24489752
Have 2 scales on dropout
2022-06-07 18:31:42 +08:00
Daniel Povey
53ca61db7a
Reduce scale on decorrelation by 5, to 0.01
2022-06-07 17:10:54 +08:00
Daniel Povey
7c6d923d3f
Add decorrelation to joiner
2022-06-07 16:47:54 +08:00
Daniel Povey
cd6b707e2b
Various bug fixes
2022-06-07 16:45:32 +08:00
Daniel Povey
40a0934b4e
Implement GaussProjDrop
2022-06-07 11:51:24 +08:00
Daniel Povey
4352a16f57
Fix bug that relates to modifying U in place
2022-06-06 17:43:15 +08:00
Daniel Povey
31848dcd11
Randomize the projections
2022-06-06 16:07:28 +08:00
Daniel Povey
6fdb356315
Bug fix RE GPU device
2022-06-06 15:40:20 +08:00
Daniel Povey
71e927411a
Implement FixedProjDrop
2022-06-06 15:38:59 +08:00
Daniel Povey
28df3ba43f
Fix bug re half precision
2022-06-05 23:26:59 +08:00
Daniel Povey
d76aedb790
Make it work for half
2022-06-05 23:25:51 +08:00
Daniel Povey
e535887abb
Bug fixes.
2022-06-05 23:24:02 +08:00
Daniel Povey
136ffb0597
Add ProjDrop for axis-independent dropout
2022-06-05 23:00:48 +08:00
Daniel Povey
a1ae2f8fa9
Revert some accidental changes
2022-06-05 11:40:55 +08:00
Zengwei Yao
148f69d8d9
Update RESULTS.md ( #388 )
...
* update RESULT.md about pruned_transducer_stateless4
* Update RESULT.md
This PR is only to update RESULT.md about pruned_transducer_stateless4.
* set default value of --use-averaged-model to True
* update RESULTS.md and add decode command
* minor fix
* update export.py
* add uploaded files links
* update link
* fix typos
2022-06-04 15:52:35 +08:00
Daniel Povey
a9a172aa69
Multiply lr by 10; simplify Cain.
2022-06-04 15:48:33 +08:00
Daniel Povey
679972b905
Fix bug; make epsilon work both ways (small+large); increase epsilon to 0.1
2022-06-03 19:37:48 +08:00
Daniel Povey
8085ed6ef9
Turn off natural gradient update for biases.
2022-06-03 18:40:14 +08:00
Daniel Povey
3fff0c75bb
Code cleanup
2022-06-03 11:54:12 +08:00
Daniel Povey
d6e65a0e7f
Remove decompose=True
2022-06-03 11:48:45 +08:00
Daniel Povey
a66a0d84d5
Natural gradient, with power -0.5 (halfway; -1 would be NG)
2022-06-02 14:01:03 +08:00
Daniel Povey
b1f6797af1
Remove some rebalancing code that I am now not going to use.
2022-06-01 22:19:28 +08:00
Daniel Povey
0c73664aef
Reduce threshold to 1024
2022-06-01 14:42:56 +08:00
Fangjun Kuang
fbfc98f1d3
Add streaming Emformer stateless RNN-T. ( #390 )
...
* Add streaming Emformer stateless RNN-T.
* Update results for streaming Emformer.
* Minor fixes.
2022-06-01 14:31:47 +08:00
Daniel Povey
ca09b9798f
Remove decomposition code from checkpoint.py; restore double precision model_avg
2022-06-01 14:01:58 +08:00
Daniel Povey
03e07e80ce
More drafts for rebalancing code
2022-06-01 13:58:42 +08:00
Daniel Povey
9c9bf4f1e3
Some drafts of rebalancing code in optim.py
2022-06-01 11:34:19 +08:00
Daniel Povey
bc5c782294
Limit magnitude of linear_pos
2022-06-01 10:40:54 +08:00
Daniel Povey
61619c031e
Add activation balancer to stop activations in self_attn from getting too large
2022-06-01 00:40:45 +08:00