Han Zhu
ab91112909
Improve infinity-check ( #1862 )
...
1. Attach the inf-check hooks if the grad scale is getting too small.
2. Add try-catch to avoid OOM in the inf-check hooks.
3. Set warmup_start=0.1 to reduce chances of divergence
2025-01-09 15:05:38 +08:00
..
2023-07-25 14:46:18 +08:00
2023-05-19 16:47:59 +08:00
2024-10-21 11:30:14 +08:00
2023-05-19 16:47:59 +08:00
2024-10-30 21:14:12 +08:00
2024-02-06 18:25:43 +08:00
2023-08-09 09:40:58 +08:00
2024-08-13 23:02:14 +08:00
2023-10-24 08:17:17 +08:00
2023-05-19 16:47:59 +08:00
2024-07-04 14:19:45 +08:00
2024-03-18 20:11:47 +08:00
2024-10-21 11:30:14 +08:00
2024-10-21 11:30:14 +08:00
2024-07-05 20:19:18 +08:00
2024-12-18 16:49:57 +08:00
2023-06-26 09:33:18 +08:00
2024-01-04 13:59:32 +08:00
2024-07-04 14:19:45 +08:00
2024-01-04 13:59:32 +08:00
2024-07-04 14:19:45 +08:00
2024-07-05 20:19:18 +08:00
2024-12-18 16:49:57 +08:00
2024-03-04 23:28:04 +08:00
2024-07-04 14:19:45 +08:00
2024-03-04 23:28:04 +08:00
2024-03-04 23:28:04 +08:00
2024-03-04 23:28:04 +08:00
2024-03-18 20:11:47 +08:00
2024-03-04 23:28:04 +08:00
2024-01-04 13:59:32 +08:00
2024-01-04 13:59:32 +08:00
2024-01-04 13:59:32 +08:00
2024-01-04 13:59:32 +08:00
2024-12-30 15:30:02 +08:00
2024-07-05 20:19:18 +08:00
2024-07-04 14:19:45 +08:00
2024-07-04 14:19:45 +08:00
2024-12-18 16:49:57 +08:00
2023-06-26 09:33:18 +08:00
2024-08-13 23:02:14 +08:00
2024-03-04 23:28:04 +08:00
2023-07-25 14:46:18 +08:00
2023-07-25 14:46:18 +08:00
2025-01-09 15:05:38 +08:00
2024-12-18 16:49:57 +08:00