Fangjun Kuang fba5e67d5e
Fix CI tests. (#1974)
- Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle 
  deprecations in PyTorch ≥2.3.0

- Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast 
  with the new utilities across all training and inference scripts

- Update all torch.load calls to include weights_only=False for compatibility with 
  newer PyTorch versions
2025-07-01 13:47:55 +08:00
..

Introduction

This recipe trains multi-domain ASR models for AliMeeting. By multi-domain, we mean that we train a single model on close-talk and far-field conditions. This recipe optionally uses [GSS]-based enhancement for far-field array microphone. We pool data in the following 4 ways and train a single model on the pooled data:

(i) individual headset microphone (IHM) (ii) IHM with simulated reverb (iii) Single distant microphone (SDM) (iv) GSS-enhanced array microphones

This is different from alimeeting/ASR since that recipe trains a model only on the far-field audio. Additionally, we use text normalization here similar to the original M2MeT challenge, so the results should be more comparable to those from Table 4 of the paper.

The following additional packages need to be installed to run this recipe:

  • pip install jieba
  • pip install paddlepaddle
  • pip install git+https://github.com/desh2608/gss.git

./RESULTS.md contains the latest results.

Performance Record

pruned_transducer_stateless7

The following are decoded using modified_beam_search:

Evaluation set eval WER test WER
IHM 9.58 11.53
SDM 23.37 25.85
MDM (GSS-enhanced) 11.82 14.22

See RESULTS for details.