diff --git a/egs/audioset/AT/README.md b/egs/audioset/AT/README.md new file mode 100644 index 000000000..368188325 --- /dev/null +++ b/egs/audioset/AT/README.md @@ -0,0 +1,12 @@ +# Introduction + +This is an audio tagging recipe. It aims at predicting the sound events of an audio clip. + +[./RESULTS.md](./RESULTS.md) contains the latest results. + + +# Zipformer + +| Encoder | Feature type | +| --------| -------------| +| Zipformer | Frame level fbank| diff --git a/egs/audioset/AT/RESULTS.md b/egs/audioset/AT/RESULTS.md new file mode 100644 index 000000000..0c75dfe4e --- /dev/null +++ b/egs/audioset/AT/RESULTS.md @@ -0,0 +1,44 @@ +## Results + +### zipformer +See for more details + +[zipformer](./zipformer) + +You can find a pretrained model, training logs, decoding logs, and decoding results at: + + +The model achieves the following mean averaged precision on AudioSet: + +| Model | mAP | +| ------ | ------- | +| Zipformer-AT | 45.1 | + +The training command is: + +```bash +export CUDA_VISIBLE_DEVICES="4,5,6,7" +subset=full + +python zipformer/train.py \ + --world-size 4 \ + --num-epochs 50 \ + --exp-dir zipformer/exp_at_as_${subset} \ + --start-epoch 1 \ + --use-fp16 1 \ + --num-events 527 \ + --audioset-subset $subset \ + --max-duration 1000 \ + --enable-musan True \ + --master-port 13455 +``` + +The evaluation command is: + +```bash +python zipformer/evaluate.py \ + --epoch 32 \ + --avg 8 \ + --exp-dir zipformer/exp_at_as_full \ + --max-duration 500 +```