Audio Annotation Standards and Frameworks
compiled by Bethany Radcliff and Kylie Warkentin
The list below includes a selection of citations and resources related to audio annotation. The goal of this list is to compile current standards or reference frameworks being used to structure audio annotations or analysis. We found that audio annotation standards or frameworks are being used and researched in areas related to linguistics, archives and libraries, music annotation, and machine learning and automation, among others. Further, standards for transcription for accessibility of recordings and for oral histories are included below as related resources.
Auer, E., Russel, A., Sloetjes, H., Wittenburg, P., Schreer, O., Masnieri, S., Schneider, D., & Tschöpel, S. (2010). ELAN as flexible annotation framework for sound and image processing detectors. In Seventh conference on International Language Resources and Evaluation [LREC 2010] (pp. 890-893). European Language Resources Association (ELRA) http://hdl.handle.net/11858/00-001M-0000-0012-B72E-A
- This article presents the annotation tool ELAN as part of a larger annotation framework for classifying and annotating linguistic elements in recordings.
Bergelson, E. (2020). Annotation Introduction for SEEDLingS Annotations. Retrieved from https://bergelsonlab.gitbook.io/blab/data-pipeline/annotation-introduction
- This article describes the workflow at SEEDLingS (Study of Environmental Effects on Developing Linguistic Skills) Lab @ Duke, including their annotation and tagging process using CLAN.
Bird, S., & Liberman, M. (2001). A Formal Framework for Linguistic Annotation. Speech Communication, 33(1-2), 23-60. doi:10.1016/s0167-6393(00)00068-6
- Surveys existing annotation formats (linguistic) and focuses on logical structuring of linguistic annotations.
De Sutter R., Notebaert S., Van de Walle R. (2006) Evaluation of Metadata Standards in the Context of Digital Audio-Visual Libraries. In: Gonzalo J., Thanos C., Verdejo M.F., Carrasco R.C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2006. Lecture Notes in Computer Science, vol 4172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11863878_19
Hedeland, Hanna. “Providing Digital Infrastructure for Audio-Visual Linguistic Research Data with Diverse Usage Scenarios: Lessons Learnt .” Publications, vol. 8, no. 2, ser. 33, 2020. 33, doi:https://doi.org/10.3390/publications8020033
- This article shares information about the development of digital infrastructure for AV linguistic research data at the Hamburg Centre for Language Corpora (HZSK) at the University of Hamburg in Germany. The author also discusses annotation and visualization of annotation for AV materials.
Meléndez Catalán, B., Molina, E., & Gómez Gutiérrez, E. (2017). BAT: An open-source, web-based audio events annotation tool. http://repositori.upf.edu/handle/10230/43406
- BAT is a tool that exists to “provides an easy way to annotate the salience of simultaneous sound sources” and to “define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data.” This web-based tool and framework allows for annotation of audio events and cross-annotation by multiple users.
Simon, R., Jung, J., & Haslhofer, B. (2011). The YUMA Media Annotation Framework. In S. Gradmann, F. Borri, C. Meghini, & H. Schuldt (Eds.), Research and Advanced Technology for Digital Libraries (pp. 434–437). Springer. https://doi.org/10.1007/978-3-642-24469-8_43
- The goal of the YUMA framework is to advance scholarly annotation and “provide integrated collaborative annotation functionality for digital library portals and online multimedia collections.” Though it is called a framework, the article suggests that it is actually an application that provides a way to collect and document annotations.
Library/Archive Specific Annotation
Egan, P. (2020). Enriching Metadata for Irish Traditional Music at the American Folklife Center. https://hcommons.org/deposits/item/hc:31761/
Kowalczyk, S. T., & Holmes, A. S. (2020). The Studs Terkel Radio Archive: A Journey to Enhanced Usability for Audio. Journal of Archival Organization, 17(1–2), 95–112. https://doi.org/10.1080/15332748.2020.1765634
- This article discusses a project between the Studs Terkel Radio Archive and the Library of Congress to enrich and enhance usability and accessibility. While it is not explicitly related to annotation standards, the article does talk about metadata standards used for this audio collection. They discuss how they developed and maintained their own local descriptive terms that were added as “tags” by project workers as they listened to audio. Metadata was then enriched with the identification of “thematic clips” in three categories.
Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2011). A Survey of Audio-Based Music Classification and Annotation. IEEE Transactions on Multimedia, 13(2), 303–319. https://doi.org/10.1109/TMM.2010.2098858
- This work is in the area of Music information retrieval (MIR), and surveys the existing classification frameworks in MIR. They evaluate and discuss feature taxonomies in music genre, mood, artist identification, instrument recognition, and annotation.
Music Information Retrieval Evaluation eXchange https://www.music-ir.org/mirex/wiki/MIREX_HOME
MEI Music Encoding Initiative https://music-encoding.org/
Machine learning/Automated Annotation
Li, B., Burgoyne, J., & Fujinaga, I. (2006). Extending Audacity for Audio Annotation. (p. 380). https://www.researchgate.net/profile/Ichiro_Fujinaga/publication/220723382_Extending_Audacity_for_Audio_Annotation/links/09e4150a2948f4a93c000000.pdf
Wang, Y., Mendez, A. E. M., Cartwright, M., & Bello, J. P. (2019). Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 880–884. https://doi.org/10.1109/ICASSP.2019.8683063
- W3.org transcription accessibility guidelines
- Archival transcription guidelines for oral histories
Salway, A. (2007). “A corpus-based analysis of audio description”. In Media for All. Leiden, The Netherlands: Brill Rodopi. doi:https://doi.org/10.1163/9789401209564_012: Beginning of an investigation into language used for audio descriptions; proposes audio description as a basis for indexing digital video archives