Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,60 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-sa-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-sa-4.0
|
| 3 |
+
pipeline_tag: audio-classification
|
| 4 |
+
tags:
|
| 5 |
+
- music
|
| 6 |
+
- audio
|
| 7 |
---
|
| 8 |
+
|
| 9 |
+
# Model Card: Pre-trained Audio Representation Models on AudioSet
|
| 10 |
+
|
| 11 |
+
## Overview
|
| 12 |
+
|
| 13 |
+
This model card presents information about pre-trained audio representation models released by ALM. These models are pre-trained on the full AudioSet dataset and are intended for general-purpose Audio Representation Learning (ARL) tasks.
|
| 14 |
+
|
| 15 |
+
## Models
|
| 16 |
+
|
| 17 |
+
### 1. [ALM/hubert-base-audioset](https://huggingface.co/ALM/hubert-base-audioset)
|
| 18 |
+
|
| 19 |
+
- **Architecture**: HuBERT (Hubert-Base) transformer-based model
|
| 20 |
+
- **Description**: This model is based on the HuBERT architecture, pre-trained on the full AudioSet dataset.
|
| 21 |
+
|
| 22 |
+
### 2. [ALM/hubert-large-audioset](https://huggingface.co/ALM/hubert-large-audioset)
|
| 23 |
+
|
| 24 |
+
- **Architecture**: HuBERT (Hubert-Large) transformer-based model
|
| 25 |
+
- **Description**: Similar to the hubert-base-audioset model, this variant is larger in size, providing increased capacity for capturing audio representations from the full AudioSet dataset.
|
| 26 |
+
|
| 27 |
+
### 3. [ALM/wav2vec2-base-audioset](https://huggingface.co/ALM/wav2vec2-base-audioset)
|
| 28 |
+
|
| 29 |
+
- **Architecture**: Wav2Vec 2.0 (Wav2Vec2-Base) transformer-based model
|
| 30 |
+
- **Description**: This model is based on the Wav2Vec 2.0 architecture, trained on the full AudioSet dataset using SSL with CPC. It offers a different approach to audio representation learning compared to the HuBERT models.
|
| 31 |
+
|
| 32 |
+
### 4. [ALM/wav2vec2-large-audioset](https://huggingface.co/ALM/wav2vec2-large-audioset)
|
| 33 |
+
|
| 34 |
+
- **Architecture**: Wav2Vec 2.0 (Wav2Vec2-Large) transformer-based model
|
| 35 |
+
- **Description**: Similar to the wav2vec2-base-audioset model, this variant is larger in size, providing enhanced capacity for learning audio representations from the full AudioSet dataset.
|
| 36 |
+
|
| 37 |
+
## Intended Use
|
| 38 |
+
|
| 39 |
+
These pre-trained models are intended for a wide range of ARL tasks, including but not limited to speech recognition, music classification, and acoustic event detection. They serve as powerful tools for feature extraction and can be fine-tuned on task-specific datasets for downstream applications.
|
| 40 |
+
It's important to note that while these models offer versatility across various audio domains, their performance in speech-related tasks may be relatively lower compared to specialized models such as the original Wav2Vec and HuBERT models.
|
| 41 |
+
This is due to the diverse nature of the AudioSet dataset used for pre-training, which includes a wide range of audio sources beyond speech.
|
| 42 |
+
|
| 43 |
+
## Limitations and Considerations
|
| 44 |
+
|
| 45 |
+
- The models are pre-trained on the full AudioSet dataset, which may not cover all possible audio domains comprehensively.
|
| 46 |
+
- Fine-tuning on domain-specific data may be necessary to achieve optimal performance for certain tasks.
|
| 47 |
+
- Computational resources may be required for deploying and fine-tuning these models, especially the larger variants.
|
| 48 |
+
|
| 49 |
+
## Citation
|
| 50 |
+
|
| 51 |
+
If you use these pre-trained models in your work, please cite the following:
|
| 52 |
+
|
| 53 |
+
```bib
|
| 54 |
+
@article{ARCH,
|
| 55 |
+
title={Benchmarking Representations for Speech, Music, and Acoustic Events},
|
| 56 |
+
author={La Quatra, Moreno and Koudounas, Alkis and Vaiani, Lorenzo and Baralis, Elena and Garza, Paolo and Cagliero, Luca, and Siniscalchi, Sabato Marco},
|
| 57 |
+
year={2024},
|
| 58 |
+
booktitle={2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
|
| 59 |
+
}
|
| 60 |
+
```
|