Phonon and TrueAudio Next: Accelerating Physics-Based Audio for VR

  • Share:
  • Facebook
  • Twitter

This is a guest post by Lakulish Antani, VP Engineering at Impulsonic, Inc.


Phonon is a set of software tools that let developers and sound designers add physics-based environmental audio to games and VR apps across platforms. Essentially, Phonon uses the layout of the virtual world, and the materials that each object is made of, to figure out what different sound sources sound like to the listener (user). For example, Phonon can tell that there’s a wall between the player and an NPC, so the NPC’s speech should sound as if it’s coming through a nearby open door or window. Also, Phonon can automatically alter sounds so they’re quieter in a carpeted hallway, and more reverberant in a cathedral. Phonon works as a set of plugins for game engines (like Unity or Unreal) and audio middleware (like Wwise or FMOD Studio). A simple analogy with graphics rendering technology may be useful: what physically-based lighting is to graphics, geometric acoustics is to sound.

TrueAudio Next is an API that lets developers use the user’s GPU for performing audio rendering. Physics-based audio consists of two main parts: audio simulation, and audio rendering. Audio simulation involves capturing how each sound is affected by the environment, in what is called an impulse response (IR). One IR is generated for each sound source. These IRs can have multiple channels, for example 2 channels for stereo IRs, or 4, 9, or 16 channels for Higher-Order Ambisonics IRs. These can be generated on-the-fly as the game runs, or precomputed, depending on performance constraints. Audio rendering involves using IRs to modify the audio emitted by each sound source, using a process called convolution. Simply put, TrueAudio Next is an API that lets you use the GPU for convolution. In this blog post, we’ll focus on how TrueAudio Next speeds up the audio rendering portion of Phonon.

1

2

Phonon and TrueAudio Next were used to create this Hospital demo in Unity. The demo was showcased at AMD’s Capsaicin event at GDC 2016.

The computational cost of convolution depends on two main factors: the number of sources, and the length of the IRs. Longer IRs describe environments with long echoes and reverberation, like cathedrals or canyons. Lots of sources occur in realistic, acoustically-complex environments. What makes this challenging is that all of this computation must complete within a strict deadline: equal to the size of the audio frames processed by the audio engine, typically around 20 milliseconds. This puts a limit on how much we can do with the CPU. We generally can’t use lots of CPU cores, because this may interfere with other game engine and OS tasks, potentially impacting draw call submission. TrueAudio Next lets us use the GPU to get massive speed boosts in audio rendering.

Normally, audio emitted by each source is processed on the CPU by Phonon, using convolution. After this, the audio is sent down the audio pipeline for further effects processing, or for final mixing. With TrueAudio Next, Phonon sends the audio data to the GPU as soon as it gets it. Once per audio frame, the GPU runs convolution on a big batch of audio data, and sends the results back to the CPU. Not only does this bring the massive parallel compute power of GPUs to audio rendering, but TrueAudio Next also lets Phonon continuously update IRs as the user moves and interacts with the environment, without affecting the performance of GPU convolution.

A future update to Phonon will make this transparent to users: if they have an AMD GPU that supports TrueAudio Next, and the game developer has set up Phonon to use TrueAudio Next, Phonon will automatically use TrueAudio Next when available and will be able to physically model additional sounds that would have to fall back to conventional sound rendering on a non-equipped system.

The graph below compares the CPU and the GPU in terms of the time required for running convolution on varying numbers of sources, for varying IR lengths. The black horizontal line is the duration of an audio frame, and is the time limit within which all audio processing must finish. As can be seen, the CPU runs out of time after processing only a few sources. For example, with a 2 second IR, the CPU only manages to render ~20 sources within the 21 ms budget of an audio frame. On the other hand, the GPU can render 64 sources using a 4 second IR, while staying well within the time limit. This indicates that TrueAudio Next allows Phonon to render many more sources and/or longer IRs than would be possible with the CPU.

3

System Configuration
CPU: Intel Core i7 4770 (3.4 GHz)
RAM: 16 GB
GPU: AMD Radeon R9 285 (“Tonga”), 2 GB VRAM
OS: Windows 8.1
Driver: Catalyst 15.7.1

With VR audio, designers are creating more acoustically complex environments, to create a better sense of presence. The use of formats like Ambisonics, spurred on by the advent of VR and 360 video, also increases the computational cost of convolution. TrueAudio Next represents a major step forward for high-performance audio rendering, and is a critical foundation on which the next few years of innovation in VR audio will rest.

In a future blog post, we’ll talk about how other AMD technologies can speed up the audio simulation portion of Phonon. Stay tuned!


DISCLAIMER

Lakulish Antani is VP of Engineering at Impulsonic, Inc. The information contained in this blog represents the view of a third party as of the given date. This content, including but not limited to data, images, system configurations, test results, and analysis, has not been reviewed, approved, verified, or endorsed by AMD, and is provided solely as a convenience to our customers and users. AMD has no obligation to update the information provided in this blog. Under no circumstances will AMD be liable in any way for any errors, omissions or damages of any kind that may result from this third party content and your reliance on its accuracy, completeness, or usefulness.

Leave a Reply

Your email address will not be published. Required fields are marked *