Audio Description in 360° videos
On this page we will explain
- how we integrated audio description (AD) into a 360° environment. Two aspects have been explored in particular: using different types of script and different sound designs (AD presentation modes)
- key aspects of the technical realisation / production.
Please follow the links provided (in the text and at the end of this page) to get more detailed information on a particular topic.
We also have a summary of the key findings from user tests.
Test different script types
We explored different ways of scripting the Audio Description in order to gather feedback on the quality of experience in respect of:
The following scripting types were tested with users:
Script in first person
This is where the audio description is scripted in first person and the main character of the story becomes the describer. The style of writing and delivery is also similar to how the character sounds and speaks.
Example: “I reversed away from him and he watched me go. I learned to drive with my knees while I played my guitar. There were four of us in the car now, Nat on guitar, Chuck on melodica, Big Sam on dashboard drums.”
Script in second person
This is where the describer is sitting next to the viewer or standing over their shoulder describing the scene the scene. Style of writing and delivery is casual, informal and friendly.
Example: “She reverses the car. He watches her go. Later she steers with her knees, playing a guitar. There are three friends with her – a girl on guitar, a boy on melodica, while a bulky lad drums on the dashboard.”
Standard audio description
This is the standard audio description style where describer objectively sets out what is happening in the scene, the characters etc.
Example: “She puts the car into reverse and pulls away from him. He watches her go. Later she steers the car with her knees as she plays her guitar. Three friends are with her, playing guitar, melodica and one drumming on the dashboard.”
In addition to the above, an audio introduction could be useful. This is where a viewer listens to the audio introduction to set the scene before the content starts. It refers to details that the describer cannot include in the content because of a lack of time. Audio introductions are quite common in the theatre world and are used to introduce to the viewers to the characters, their physical description, their costumes, the set design etc.
AD presentation modes (spatial audio)
The presentation modes make use of spatial sound (often, 3D sound or 3D audio is used as a catchier term) to explore the impact of audio design for the quality of the user experience. The project has tested different spatial positions for the audio description within the 3D audio scene which are described below.
Dynamic mode (previously we called it “AD on action”)
One possibility is to place the audio description utterances in the direction of the action they’re describing. This feels intuitively like it will give the listener more information but could be disorientating if the AD ends up bouncing around the user and coming from all directions.
Static mode (previously we called it “Friend on Sofa”)
Another option is to have the audio description coming from a fixed point within the scene. This is similar to the precursor of audio description: friends and family trying to describe what’s happening. Depending on how the technology is implemented users could even be given control over where the audio description is placed.
Classic mode (previously we called it “Voice of God”)
Perhaps the easiest presentation mode is not to fix a position for the audio description which makes it appear to come from everywhere and nowhere. In the ImAc project this is dubbed “Classic mode”.
Web AD editor
To create the metadata needed for spatial audio a web-based AD editor was developed in the project.
Key features of the AD Editor:
- Web (cloud) based
- Responsive design for universal use
- Embedded 360° video player
- Editing of AD scripts directly within the editor
- Recording and authoring Audio Description segments
- Author AD spatial position within the 360° scene for different audio processing (classic, static, dynamic)
- Export Audio Description segments as object-based audio file (proprietary format)
More information can be found on the AD editor factsheet.
If you’re interested, please request a test account for the web editor here: firstname.lastname@example.org
Pre-processing 3D audio
To merge the main audio (i.e. the audio track without audio description or spoken subtitles) with audio description, we implemented an audio renderer that pre-renders all audio variations into the delivery format (Ambisonics in our case). This is done on the server side before playout.
The audio description segments, that are produced by the AD web editor, are stored and sent to the renderer as a list of audio objects (i.e. audio files plus a set of metadata). The renderer puts these audio objects into the main audio scene, at their intended spatial position.
The metadata format that describes the AD audio objects is currently a proprietary one. It contains information regarding the spatial position of the object and its volume. Specific for AD objects we added information about the dipping of the main audio. That means how much the volume of the main audio track will be lowered by whenever audio description is active.
Audio playout and rendering
To deliver audio to the user device in ImAc we use the Ambisionics format (first order), a widely used format with decent representation of 3D or spatial sound. Within the streaming format MPEG DASH, audio tracks that contain Ambisonics can be sent, but not signalized correctly. To identify the tracks as Ambisonics to the ImAc player, we added a custom extension to the DASH manifest. A proposal for a standardized solution for this use case has been made to .
Additionally, the playback device has an impact on how the audio can be perceived. Headphones typically allow for the best immersive experiences. In the document Deliverable 3.1, section 2.4.4 you can find a description on how different playback systems affect the immersive experience.
You can test some of the developed AD features in the ImAc player yourself.
References & Links
Explanations and demos for object-based audio
ImAc portal including the ImAc player and demo content
Paper “User profiling in audio description reception studies: questionnaires for all”
Paper “Audio description in 360º videos: results from focus groups in Barcelona and Kraków”
Java script library for decoding Ambisonics
AD related ImAc news articles:
- Audio Description for 360° content
- Immersive Audio Description Workshop at Media4All
- Audio Description for 360° video content
- Audio Description in 3D Audio