Introduction

This is an audio tagging recipe for Audioset. It aims at predicting the sound events of an audio clip.

./RESULTS.md contains the latest results.

Zipformer

Encoder Feature type
Zipformer Frame level fbank