07 May 2018
by Aaron Karp **Winner of Best Research Writing blog post**
For my thesis I am investigating sonic spaces that exist under the coverage of mass surveillance technology. My aim is to identify the properties of sounds that make them “surveillable” and to create sounds that exist outside of surveillable spheres. There has been a moderate amount of investigation into issues of mass video surveillance from a theory standpoint, but surprisingly little research has been conducted within the realm of audio surveillance. What makes sound unique in the discussion of surveillance technology? Is there something about the listening that people view as less inherently sacred and private than seeing? With what we have learned from recent government whistleblowers and the state-of-the-art technological capabilities of surveillance systems, it is clear that all citizens in Western society should operate on the assumption that they are always being listened to. Given this reality, the need for a technology that responds to such an invasion of privacy is apparent.
Surveillance technology at its core attempts to identify “important” versus “unimportant sounds” and categorize them into further degrees of utility. An example of a technology I’m referring to is speech transcription software. A speech detection system would first perform what is known as a source separation task, where it would attempt to separate out spoken words from background noise in a recording that contains both sounds on top of each other. After this point the algorithm would try to understand what words were being said and then flag words and phrases of significance. These algorithms work because of physical properties of speech (the “important” sound) in comparison to something like a washing machine whirring (the “unimportant” sound). In other words, the speech often looks different to the machine on a fundamental level. In the early days of Machine Listening these differences were explored as mathematical truths, but the properties of those maths can only extend so far. Presently most source separation algorithms are employing some level of machine learning, and the most cutting-edge research revolves around deep learning techniques using massive amounts of data.
My project attempts to understand why some sounds are easily separated and others are not. I will then be creating a system to synthesize sounds that, when played in conjunction with “important” sounds, result in sounds that don’t have the properties that would identify them as “important”. This process would convert “important” sounds to “unimportant” ones, marking them as useless and thus makes them, for all intents and purposes, undetectable.