Han Zhu a580f7f794 Improve infinity-check
1. Attach the inf-check hooks if the grad scale is getting too small.
2. Add try-catch to avoid OOM in the inf-check hooks.
3. Set warmup_start=0.1 to reduce chances of divergence
2025-01-09 12:33:06 +08:00
..
2025-01-09 12:33:06 +08:00
2023-10-25 12:50:35 +08:00
2024-06-21 11:10:14 +08:00
2024-12-31 07:41:44 +08:00
2024-06-21 11:10:14 +08:00