Telephony has developed substantially over the years, but the fundamental auditory model of mixing all the audio from different sources together into a single monaural stream has not changed since the telephone was first invented. Monaural audio is very difficult to follow in a multiple-source situation such as a conference call.
Sound originating from a specific point in space will travel along a slightly different path to each ear. Although we are not consciously aware of it, our brain processes these spatial cues to help us to locate sounds in space. It is this spatial information that allows us to focus our attention and listen to a single speaker in an environment where many different sources may be active at the same time; a phenomenon known as the “cocktail party effect”. It is possible to reproduce these spatial cues in a sound recording, using head-related transfer functions to allow a listener to experience localised audio, even when sound is reproduced through a headset.
In my research, spatial audio was implemented in a telephony application as well as in a virtual world. Experiments were conducted which demonstrated that spatial audio increases the intelligibility of speech in a multiple-source environment and aids active speaker identification. Resource usage measurements show that these benefits are, however, not without a cost. In conclusion, spatial audio was shown to be an improvement over the monaural audio model traditionally implemented in telephony.