22 Commits

Author SHA1 Message Date
Daniel Povey
44f4aa5f66 Try to resolve merge issues etc 2022-05-13 11:32:23 +08:00
Daniel Povey
4f933f5413 Merge changes from knowledge_base_1bfast; fix nheads 4->8 2022-05-13 11:26:18 +08:00
Daniel Povey
c4c9b8cf80 Change configuration from M to L 2022-05-12 16:10:08 +08:00
Daniel Povey
0bf538a4a3 Add negentropy_penalty, on individual dims. 2022-05-10 13:20:10 +08:00
Daniel Povey
0f7ff7470f Switch sampling to new C++/CUDA backend 2022-05-05 15:44:04 +08:00
Daniel Povey
eba025a6b4 Mess with thresholds for printing 2022-04-26 10:39:35 +08:00
Daniel Povey
3ba081e6d9 Add more custom_fwd,custom_bwd' 2022-04-25 23:58:34 +08:00
Daniel Povey
2c4478b6d1 Fix for half precision 2022-04-25 23:03:34 +08:00
Daniel Povey
e718c7ac88 Remove unnecessary copy 2022-04-25 20:41:00 +08:00
Daniel Povey
f6619a0b20 Remove unnecessary check 2022-04-25 20:37:06 +08:00
Daniel Povey
7d457a7781 Add some diagnostics 2022-04-25 19:34:19 +08:00
Daniel Povey
edaaec09cd Update backprop of sampling.py to be slightly more efficient. 2022-04-25 19:32:11 +08:00
Daniel Povey
bbfa484196 Decrease model size, baseline is one Fangjun is running.. 2022-04-25 17:07:20 +08:00
Daniel Povey
aea116ea25 Change printing-prob, initial scales 2022-04-25 14:02:43 +08:00
Daniel Povey
bb7cb82b04 Some fixes/refactoring, make parameters shared 2022-04-25 13:55:27 +08:00
Daniel Povey
0d40b4617a Add knowledge-base lookup to model 2022-04-25 13:40:47 +08:00
Daniel Povey
a359bfe504 Test with CUDA, bug fixes 2022-04-25 13:19:09 +08:00
Daniel Povey
f8c7e6ffb3 Add some training code. Seems to be training successfully... 2022-04-24 23:19:46 +08:00
Daniel Povey
df39fc6783 Fix devices 2022-04-24 22:48:52 +08:00
Daniel Povey
a266922678 First version of sampling.py, tests run. 2022-04-24 22:29:11 +08:00
Daniel Povey
fe5586e847 Change dirname 2022-04-24 19:51:27 +08:00
Daniel Povey
65cd1059f3 Init pruned2_knowledge dir 2022-04-24 19:50:22 +08:00