Capstone project AUCT

Audio capture and labelling

project picture

South Africa has eleven official languages. Building out tools to support small-scale voice recognition across a large number of languages requires having a large amount of labeled audio files for training and testing. Your task is to create a web-based, or combination web- and mobile-based tool to support the collection and in situ segmentation and labeling of audio data.

Front-end Development




The use case is: a participant would use your tool to record a set of words from a list. These would then be segmented, with some control to throw out mistakes that may have been made during data capture, resulting in a set of individual, labeled audio files.

The purpose of the project was to engineer a software system that allows research participants to submit audio data over the web, have it automatically segmented, and stored for later labelling and analysis by a researcher. This was achieved by using hierarchical architectures.

The overall system took a central-repository pattern, with a Firebase cloud server at the core. Each of the three sub-systems had a separate architecture, with the processing node being a data pipeline, and the two web-applications been Model-View-Controller oriented. This design choice allowed for rapid parallel development of all the core features—wordlists, recording, segmentation, and labelling—and provided a robust overall system and enough time to add additional features— suggested labels, file export, database statistics, functional aesthetics, etc.

The tests designed at the outset of the project ran successfully and the results demonstrate that this software system achieved its goals spectacularly.

See the Github Repo