mirror of
https://github.com/k2-fsa/icefall.git
synced 2025-08-26 10:16:14 +00:00
add readme and results
This commit is contained in:
parent
9e9bc7593e
commit
39e7de47b1
12
egs/audioset/AT/README.md
Normal file
12
egs/audioset/AT/README.md
Normal file
@ -0,0 +1,12 @@
|
||||
# Introduction
|
||||
|
||||
This is an audio tagging recipe. It aims at predicting the sound events of an audio clip.
|
||||
|
||||
[./RESULTS.md](./RESULTS.md) contains the latest results.
|
||||
|
||||
|
||||
# Zipformer
|
||||
|
||||
| Encoder | Feature type |
|
||||
| --------| -------------|
|
||||
| Zipformer | Frame level fbank|
|
44
egs/audioset/AT/RESULTS.md
Normal file
44
egs/audioset/AT/RESULTS.md
Normal file
@ -0,0 +1,44 @@
|
||||
## Results
|
||||
|
||||
### zipformer
|
||||
See <https://github.com/k2-fsa/icefall/pull/1421> for more details
|
||||
|
||||
[zipformer](./zipformer)
|
||||
|
||||
You can find a pretrained model, training logs, decoding logs, and decoding results at:
|
||||
<https://huggingface.co/marcoyang/icefall-audio-tagging-audioset-zipformer-2024-03-12#/>
|
||||
|
||||
The model achieves the following mean averaged precision on AudioSet:
|
||||
|
||||
| Model | mAP |
|
||||
| ------ | ------- |
|
||||
| Zipformer-AT | 45.1 |
|
||||
|
||||
The training command is:
|
||||
|
||||
```bash
|
||||
export CUDA_VISIBLE_DEVICES="4,5,6,7"
|
||||
subset=full
|
||||
|
||||
python zipformer/train.py \
|
||||
--world-size 4 \
|
||||
--num-epochs 50 \
|
||||
--exp-dir zipformer/exp_at_as_${subset} \
|
||||
--start-epoch 1 \
|
||||
--use-fp16 1 \
|
||||
--num-events 527 \
|
||||
--audioset-subset $subset \
|
||||
--max-duration 1000 \
|
||||
--enable-musan True \
|
||||
--master-port 13455
|
||||
```
|
||||
|
||||
The evaluation command is:
|
||||
|
||||
```bash
|
||||
python zipformer/evaluate.py \
|
||||
--epoch 32 \
|
||||
--avg 8 \
|
||||
--exp-dir zipformer/exp_at_as_full \
|
||||
--max-duration 500
|
||||
```
|
Loading…
x
Reference in New Issue
Block a user