- Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle
deprecations in PyTorch ≥2.3.0
- Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast
with the new utilities across all training and inference scripts
- Update all torch.load calls to include weights_only=False for compatibility with
newer PyTorch versions
* add decode seamlessm4t
* add requirements
* add decoding with avg model
* add token files
* add custom tokenizer
* support deepspeed to finetune large model
* support large-v3
* add model saving
* using monkey patch to replace models
* add manifest dir option