History

Fix CI tests. (#1974 )

- Introduce unified AMP helpers (create_grad_scaler, torch_autocast) to handle 
  deprecations in PyTorch ≥2.3.0

- Replace direct uses of torch.cuda.amp.GradScaler and torch.cuda.amp.autocast 
  with the new utilities across all training and inference scripts

- Update all torch.load calls to include weights_only=False for compatibility with 
  newer PyTorch versions

2025-07-01 13:47:55 +08:00

local

Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483 )

2024-03-07 19:04:27 +08:00

pruned_transducer_stateless5

Fix CI tests. (#1974 )

2025-07-01 13:47:55 +08:00

prepare.sh

Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483 )

2024-03-07 19:04:27 +08:00

README.md

Migrate zipformer model to other Chinese datasets (#1216 )

2023-10-24 16:24:46 +08:00

RESULTS.md

Whisper large fine-tuning on wenetspeech, mutli-hans-zh (#1483 )

2024-03-07 19:04:27 +08:00

shared

[Ready] [Recipes] add aishell2 (#465 )

2022-07-14 14:46:56 +08:00

README.md

Introduction

This recipe contains various different ASR models trained with Aishell2.

In AISHELL-2, 1000 hours of clean read-speech data from iOS is published, which is free for academic usage. On top of AISHELL-2 corpus, an improved recipe is developed and released, containing key components for industrial applications, such as Chinese word segmentation, flexible vocabulary expension and phone set transformation etc. Pipelines support various state-of-the-art techniques, such as time-delayed neural networks and Lattic-Free MMI objective funciton. In addition, we also release dev and test data from other channels (Android and Mic).

(From AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale)

./RESULTS.md contains the latest results.

Transducers

There are various folders containing the name transducer in this folder. The following table lists the differences among them.

	Encoder	Decoder	Comment
`pruned_transducer_stateless5`	Conformer(modified)	Embedding + Conv1d	same as pruned_transducer_stateless5 in librispeech recipe

The decoder in transducer_stateless is modified from the paper Rnn-Transducer with Stateless Prediction Network. We place an additional Conv1d layer right after the input embedding layer.