Han Zhu
ab91112909
Improve infinity-check ( #1862 )
...
1. Attach the inf-check hooks if the grad scale is getting too small.
2. Add try-catch to avoid OOM in the inf-check hooks.
3. Set warmup_start=0.1 to reduce chances of divergence
2025-01-09 15:05:38 +08:00
..
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-31 17:17:05 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-02-07 10:16:02 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2025-01-09 15:05:38 +08:00
2024-12-18 16:49:57 +08:00
2025-01-02 15:54:34 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-03-04 23:28:04 +08:00
2023-10-25 12:50:35 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-07-09 09:57:52 +08:00
2024-12-18 16:49:57 +08:00
2024-03-04 23:28:04 +08:00
2024-12-18 16:49:57 +08:00
2024-12-18 16:49:57 +08:00
2024-06-21 11:10:14 +08:00
2024-12-18 16:49:57 +08:00
2023-11-16 14:38:31 +08:00
2024-12-18 16:49:57 +08:00
2024-12-31 07:41:44 +08:00
2024-12-18 16:49:57 +08:00
2024-06-21 11:10:14 +08:00