Audio Annotation Standards and Frameworks

compiled by Bethany Radcliff and Kylie Warkentin

Summary

The list below includes a selection of citations and resources related to audio annotation. The goal of this list is to compile current standards or reference frameworks being used to structure audio annotations or analysis. We found that audio annotation standards or frameworks are being used and researched in areas related to linguistics, archives and libraries, music annotation, and machine learning and automation, among others. Further, standards for transcription for accessibility of recordings and for oral histories are included below as related resources.

Linguistic Annotation

Auer, E., Russel, A., Sloetjes, H., Wittenburg, P., Schreer, O., Masnieri, S., Schneider, D., & Tschöpel, S. (2010). ELAN as flexible annotation framework for sound and image processing detectors. In Seventh conference on International Language Resources and Evaluation [LREC 2010] (pp. 890-893). European Language Resources Association (ELRA) http://hdl.handle.net/11858/00-001M-0000-0012-B72E-A

This article presents the annotation tool ELAN as part of a larger annotation framework for classifying and annotating linguistic elements in recordings.

Bergelson, E. (2020). Annotation Introduction for SEEDLingS Annotations. Retrieved from https://bergelsonlab.gitbook.io/blab/data-pipeline/annotation-introduction

This article describes the workflow at SEEDLingS (Study of Environmental Effects on Developing Linguistic Skills) Lab @ Duke, including their annotation and tagging process using CLAN.

Bird, S., & Liberman, M. (2001). A Formal Framework for Linguistic Annotation. Speech Communication, 33(1-2), 23-60. doi:10.1016/s0167-6393(00)00068-6

Surveys existing annotation formats (linguistic) and focuses on logical structuring of linguistic annotations.

De Sutter R., Notebaert S., Van de Walle R. (2006) Evaluation of Metadata Standards in the Context of Digital Audio-Visual Libraries. In: Gonzalo J., Thanos C., Verdejo M.F., Carrasco R.C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2006. Lecture Notes in Computer Science, vol 4172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11863878_19

Hedeland, Hanna. “Providing Digital Infrastructure for Audio-Visual Linguistic Research Data with Diverse Usage Scenarios: Lessons Learnt .” Publications, vol. 8, no. 2, ser. 33, 2020. 33, doi:https://doi.org/10.3390/publications8020033

This article shares information about the development of digital infrastructure for AV linguistic research data at the Hamburg Centre for Language Corpora (HZSK) at the University of Hamburg in Germany. The author also discusses annotation and visualization of annotation for AV materials.

Meléndez Catalán, B., Molina, E., & Gómez Gutiérrez, E. (2017). BAT: An open-source, web-based audio events annotation tool. http://repositori.upf.edu/handle/10230/43406

BAT is a tool that exists to “provides an easy way to annotate the salience of simultaneous sound sources” and to “define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data.” This web-based tool and framework allows for annotation of audio events and cross-annotation by multiple users.

Simon, R., Jung, J., & Haslhofer, B. (2011). The YUMA Media Annotation Framework. In S. Gradmann, F. Borri, C. Meghini, & H. Schuldt (Eds.), Research and Advanced Technology for Digital Libraries (pp. 434–437). Springer. https://doi.org/10.1007/978-3-642-24469-8_43

The goal of the YUMA framework is to advance scholarly annotation and “provide integrated collaborative annotation functionality for digital library portals and online multimedia collections.” Though it is called a framework, the article suggests that it is actually an application that provides a way to collect and document annotations.

Library/Archive Specific Annotation

Egan, P. (2020). Enriching Metadata for Irish Traditional Music at the American Folklife Center. https://hcommons.org/deposits/item/hc:31761/

Kowalczyk, S. T., & Holmes, A. S. (2020). The Studs Terkel Radio Archive: A Journey to Enhanced Usability for Audio. Journal of Archival Organization, 17(1–2), 95–112. https://doi.org/10.1080/15332748.2020.1765634

This article discusses a project between the Studs Terkel Radio Archive and the Library of Congress to enrich and enhance usability and accessibility. While it is not explicitly related to annotation standards, the article does talk about metadata standards used for this audio collection. They discuss how they developed and maintained their own local descriptive terms that were added as “tags” by project workers as they listened to audio. Metadata was then enriched with the identification of “thematic clips” in three categories.

Music Annotation

Fu, Z., Lu, G., Ting, K. M., & Zhang, D. (2011). A Survey of Audio-Based Music Classification and Annotation. IEEE Transactions on Multimedia, 13(2), 303–319. https://doi.org/10.1109/TMM.2010.2098858

This work is in the area of Music information retrieval (MIR), and surveys the existing classification frameworks in MIR. They evaluate and discuss feature taxonomies in music genre, mood, artist identification, instrument recognition, and annotation.

Music Information Retrieval Evaluation eXchange https://www.music-ir.org/mirex/wiki/MIREX_HOME

MEI Music Encoding Initiative https://music-encoding.org/

Machine learning/Automated Annotation

Li, B., Burgoyne, J., & Fujinaga, I. (2006). Extending Audacity for Audio Annotation. (p. 380). https://www.researchgate.net/profile/Ichiro_Fujinaga/publication/220723382_Extending_Audacity_for_Audio_Annotation/links/09e4150a2948f4a93c000000.pdf

Wang, Y., Mendez, A. E. M., Cartwright, M., & Bello, J. P. (2019). Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 880–884. https://doi.org/10.1109/ICASSP.2019.8683063

Other

Transcription Guidelines

W3.org transcription accessibility guidelines
Archival transcription guidelines for oral histories
- The Oral History Association’s Archiving Oral History: Manual of Best Practices
- Baylor University Institute for Oral History’s Style Guide: A Quick Reference for Editing Oral History Transcripts

Salway, A. (2007). “A corpus-based analysis of audio description”. In Media for All. Leiden, The Netherlands: Brill

Rodopi. doi:https://doi.org/10.1163/9789401209564_012: Beginning of an investigation into language used for audio descriptions; proposes audio description as a basis for indexing digital video archives

AudiAnnotate

Workflows for generating AV editions and exhibits using IIIF manifests by HiPSTAS and Brumfield Labs.