A Sound Architect's Guide to Spatial Audio on XR Devices


Audio Engineering Society
Convention Paper

 

Contributors
Dr. Kaushik Sunder - Director of Engineering; Embody
Kevin Boettger - Spatial Audio Mastering Engineer; Embody


A Sound Architect's Guide to Spatial Audio on XR Devices

ABSTRACT

The spatial computing industry has seen exponential growth and complexity, driven by the rise of interactive experiences. Traditional stereo audio formats are insufficient for immersive environments, prompting the adoption of spatial audio formats like Ambisonics and Dolby Atmos. However, producing such content often requires expensive multi-channel speaker setups, which can be prohibitive for smaller studios. Given that headphones are the primary mode of audio consumption for users, accurately monitoring spatial audio on headphones is crucial. This paper introduces innovative tools and workflows for virtual production on XR devices, enabling sound designers to mix and master spatial audio using audiophile headphones. Leveraging personalized head-related transfer functions (HRTFs) and physics-based modeling, these tools capture studio acoustics and deliver immersive experiences, supporting object and multichannel processing.

INTRODUCTION

The Need for Spatial Audio in XR

Spatial audio is a cornerstone of immersive experiences, allowing developers to create realistic and engaging soundscapes. Its importance lies in:

  • Enhanced Realism: Essential for creating immersive soundscapes.
  • Closer Representation in Design Stage: Allows for more accurate
    sonic storytelling by approximating final sound during design.
  • Addressing Discontinuity: Ensures consistency between design
    intentions and implementation in game engines and audio
    middleware.

Challenges in Spatial Audio

  • Audio Architecture Priority: Spatial audio can be deprioritized
    due to other sound design needs or limited monitoring tools.
  • Discontinuity Between Phases: Breaks in workflow from DAW
    to game engine can lead to loss of original sound design intent.
  • Pipeline Design: Critical for maintaining immersive experiences;
    audio pipelines must support dynamic interaction with visual
    components.

High-Level Workflows

  • End-to-End Workflow:
    • Tools: DAW → Unreal Engine → Wwise → Plugins
  • Sound Design Phases:
    • Phase 1: Asset creation in DAW (stereo format).
    • Phase 2: Implementation in game engines and audio middleware

Phase 1: Sound Design Assets in DAW

  • Monitoring in Binaural:
    • Use tools like Immerse Virtual Studio, Apple Spatial Rendering, or Oculus Spatializer.
  • Key Considerations:
    • Is the sound from the player or the environment?
    • Is it part of 3D space or 2D space?
    • How do these elements interact?

Example: Adjusting enemy gunfire's low-end frequency to enhance localization
and separation from the main player's gunfire.

Phase 2.1: Setting Up in Wwise

  • Audio Architecture Basis:Design bus structure for both traditional stereo and
    immersive audio.
  • Key Considerations:
    • Multichannel Bus Format: Choose between Channel-Based or Ambisonic.
    • Audio Objects: Are they used? What’s their role?
    • Immersive Experience: Determine which renderer (e.g., Immerse
      Audio/Object Renderer, Meta XR SDK) and where it’s implemented (Plugin
      vs. Endpoint)
    • Flow of Assets:
      • Bus routing.
      • Immersive state switching.
      • Game parameter controls (RTPC).

Phase 2.2: Unreal Engine Implementation

  • Emitter Positioning:Crucial for real-time interaction between player and
    sound.
  • Questions to Consider:
    ○ Is the sound in front or behind the player?
    ○ Is it associated with a moving character, e.g., footsteps or enemy
    attacks?
  • Integration with Wwise: Use blueprints for event triggers, RTPC
    adjustments, state toggling, and distance/attenuation communication

Phase 2.3 : Customization and Tuning

  • Game-Specific HRTF Customization: Align binaural rendering with the
    game’s unique style and design.
  • Key Considerations:
    • Smoothness in sound movement
    • Front-to-back frequency spectrum
    • Gain adjustments and height information
    • Clarity and impact of main player sounds
    • Externalization and localization of on-screen vs. off-screen sounds
  • Personalized Headphone EQ - playback device specific headphone EQ

Conclusion

  • Aligning Audio with Visuals: The importance of integrating sound architecture
    with visual elements to create cohesive and immersive XR experiences
  • Continual Iteration: Regularly refine and adjust the audio pipeline to
    accommodate the evolving nature of XR technologies and storytelling needs.

REFERENCES

[1] Sunder, Kaushik. "Binaural audio engineering." 3D Audio. Routledge, 2021. 130-159

[2] Xie, Bosun. Head-related transfer function and virtual auditory display. J. Ross Publishing, 2013.

[3] Zotkin, D. N., Duraiswami, R., Grassi, E., Gumerov, N. A. (2006). Fast head-related transfer function measurement via reciprocity. The Journal of the Acoustical Society of America, 120(4), 2202-2215.

[4] Brungart, D. S., Romigh, G., and Simpson, B. D. (2011). Rapid collection of head related transfer functions and comparison to free-field listening. In Principles and Applications of Spatial Hearing (pp. 139-148).

[5] J. C. Middlebrooks, “Individual differences in external-ear transfer functions reduced by scaling in frequency,” J. Acoust. Soc. Am., vol. 106, no. 3, pp. 1480–1492, 1999.

[6] M. Dellepiane, N. Pietroni, N. Tsingos, M. Asselot, and R. Scopigno, “Reconstructing head models from photographs for individualized 3D-audio processing,” in Computer Graphics Forum, vol. 27, pp. 1719–1727.

[7] Shahid, Faiyadh, et al."AI DevOps for Large-Scale HRTF Prediction and Evaluation: An End to End Pipeline." Audio Engineering Society Conference: 2018 AES International Conference on Audio for Virtual and Augmented Reality. Audio Engineering Society, 2018.

[8] https://www.bksv.com/en/transducers/simulators/head-and-torso/hats-type-4100

[9] A. Kulkarni and H. S. Colburn, “Variability in the characterization of the headphone transfer-function,” J. Acoust. Soc. Am., vol. 107, no. 2, pp. 1071–1074, Feb. 2000.

[10] Møller, H. (1992). Fundamentals of binaural technology. Applied acoustics, 36(3-4), 171-218.