Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .
Visit the DCASE Automated Audio Captioning task page for the most recent version (v2.1). Download 736 740 zip
💡 If you were looking for the 7-Zip software tool instead of a dataset, ensure you only download it from the official site 7-zip.org to avoid malware variants hosted on lookalike domains. Reference the original paper: Drossos, K
If you are writing a technical report or paper using this data, ensure you include these standard sections: Reference the original paper: Drossos
Mention the diversity of the audio (natural sounds, urban environments, etc.) and the linguistic variety of the captions.
The full development set is approximately 6.5 GB .