Audio Description in 360° videos

On this page we will explain

  • how we integrated audio description (AD) into a 360° environment. Two aspects have been explored in particular: using different types of script and different sound designs (AD presentation modes)
  • key aspects of the technical realisation / production.

Please follow the links provided (in the text and at the end of this page) to get more detailed information on a particular topic.

We also have a summary of the key findings from user tests.

Test different script types

We explored different ways of scripting the Audio Description in order to gather feedback on the quality of experience in respect of:

  • Understanding
  • Enjoyment
  • Immersion

The following scripting types were tested with users:

Script in first person

This is where the audio description is scripted in first person and the main character of the story becomes the describer. The style of writing and delivery is also similar to how the character sounds and speaks.

Example: “I reversed away from him and he watched me go. I learned to drive with my knees while I played my guitar. There were four of us in the car now, Nat on guitar, Chuck on melodica, Big Sam on dashboard drums.”

Script in second person

This is where the describer is sitting next to the viewer or standing over their shoulder describing the scene the scene. Style of writing and delivery is casual, informal and friendly.

Example: “She reverses the car. He watches her go. Later she steers with her knees, playing a guitar. There are three friends with her – a girl on guitar, a boy on melodica, while a bulky lad drums on the dashboard.”

Standard audio description

This is the standard audio description style where describer objectively sets out what is happening in the scene, the characters etc.

Example: “She puts the car into reverse and pulls away from him. He watches her go. Later she steers the car with her knees as she plays her guitar. Three friends are with her, playing guitar, melodica and one drumming on the dashboard.”

Audio introduction

In addition to the above, an audio introduction could be useful. This is where a viewer listens to the audio introduction to set the scene before the content starts. It refers to details that the describer cannot include in the content because of a lack of time. Audio introductions are quite common in the theatre world and are used to introduce to the viewers to the characters, their physical description, their costumes, the set design etc.

AD presentation modes (spatial audio)

The presentation modes make use of spatial sound (often, 3D sound or 3D audio is used as a catchier term) to explore the impact of audio design for the quality of the user experience. The project has tested different spatial positions for the audio description within the 3D audio scene which are described below.

Dynamic mode (previously we called it “AD on action”)

One possibility is to place the audio description utterances in the direction of the action they’re describing. This feels intuitively like it will give the listener more information but could be disorientating if the AD ends up bouncing around the user and coming from all directions.

Static mode (previously we called it “Friend on Sofa”)

Another option is to have the audio description coming from a fixed point within the scene. This is similar to the precursor of audio description: friends and family trying to describe what’s happening. Depending on how the technology is implemented users could even be given control over where the audio description is placed.

Classic mode (previously we called it “Voice of God”)

Perhaps the easiest presentation mode is not to fix a position for the audio description which makes it appear to come from everywhere and nowhere. In the ImAc project this is dubbed “Classic mode”.

AD production

Web AD editor

To create the metadata needed for spatial audio a web-based AD editor was developed in the project.

Key features of the AD Editor:

  • Web (cloud) based
  • Responsive design for universal use
  • Embedded 360° video player
  • Editing of AD scripts directly within the editor
  • Recording and authoring Audio Description segments
  • Author AD spatial position within the 360° scene for different audio processing (classic, static, dynamic)
  • Export Audio Description segments as object-based audio file (proprietary format)

More information can be found on the AD editor factsheet.

If you’re interested, please request a test account for the web editor here: info@anglatecnic.com

Pre-processing 3D audio

To merge the main audio (i.e. the audio track without audio description or spoken subtitles) with audio description, we implemented an audio renderer that pre-renders all audio variations into the delivery format (Ambisonics in our case). This is done on the server side before playout.

The audio description segments, that are produced by the AD web editor, are stored and sent to the renderer as a list of audio objects (i.e. audio files plus a set of metadata). The renderer puts these audio objects into the main audio scene, at their intended spatial position.

The metadata format that describes the AD audio objects is currently a proprietary one. It contains information regarding the spatial position of the object and its volume. Specific for AD objects we added information about the dipping of the main audio. That means how much the volume of the main audio track will be lowered by whenever audio description is active.

Audio playout and rendering

To deliver audio to the user device in ImAc we use the Ambisionics format (first order), a widely used format with decent representation of 3D or spatial sound. Within the streaming format MPEG DASH, audio tracks that contain Ambisonics can be sent, but not signalized correctly. To identify the tracks as Ambisonics to the ImAc player, we added a custom extension to the DASH manifest. A proposal for a standardized solution for this use case has been made to .

Additionally, the playback device has an impact on how the audio can be perceived. Headphones typically allow for the best immersive experiences. In the document Deliverable 3.1, section 2.4.4 you can find a description on how different playback systems affect the immersive experience.

You can test some of the developed AD features in the ImAc player yourself.

References & Links

Explanations and demos for object-based audio

ImAc portal including the ImAc player and demo content

Paper “User profiling in audio description reception studies: questionnaires for all

Paper “Audio description in 360º videos: results from focus groups in Barcelona and Kraków

Java script library for decoding Ambisonics

AD related ImAc news articles: