add readme and results

2025-12-10 22:45:27 +00:00 · 2024-03-29 17:31:33 +08:00 · 2024-03-29 17:31:33 +08:00 · 39e7de47b1
commit 39e7de47b1
parent 9e9bc7593e
2 changed files with 56 additions and 0 deletions
--- a/egs/audioset/AT/README.md
+++ b/egs/audioset/AT/README.md
@ -0,0 +1,12 @@
+# Introduction
+
+This is an audio tagging recipe. It aims at predicting the sound events of an audio clip.
+
+[./RESULTS.md](./RESULTS.md) contains the latest results.
+
+
+# Zipformer
+
+| Encoder | Feature type |
+| --------| -------------|
+| Zipformer | Frame level fbank|
--- a/egs/audioset/AT/RESULTS.md
+++ b/egs/audioset/AT/RESULTS.md
@ -0,0 +1,44 @@
+## Results
+
+### zipformer
+See <https://github.com/k2-fsa/icefall/pull/1421> for more details
+
+[zipformer](./zipformer)
+
+You can find a pretrained model, training logs, decoding logs, and decoding results at:
+<https://huggingface.co/marcoyang/icefall-audio-tagging-audioset-zipformer-2024-03-12#/>
+
+The model achieves the following mean averaged precision on AudioSet:
+
+| Model | mAP |
+| ------ | ------- |
+| Zipformer-AT | 45.1 |
+
+The training command is:
+
+```bash
+export CUDA_VISIBLE_DEVICES="4,5,6,7"
+subset=full
+
+python zipformer/train.py \
+    --world-size 4 \
+    --num-epochs 50 \
+    --exp-dir zipformer/exp_at_as_${subset} \
+    --start-epoch 1 \
+    --use-fp16 1 \
+    --num-events 527 \
+    --audioset-subset $subset \
+    --max-duration 1000 \
+    --enable-musan True \
+    --master-port 13455
+```
+
+The evaluation command is:
+
+```bash
+python zipformer/evaluate.py \
+    --epoch 32 \
+    --avg 8 \
+    --exp-dir zipformer/exp_at_as_full \
+    --max-duration 500
+```