ImAc project partner Motion Spell participated at the last MPEG meetings (#126) in Geneva with the intention of following brand-new evolutions around Immersive Media activities. It appears that many activities regarding Immersive Media blossom on audio, video and subtitles. These activities are relevant for the ImAc project which intends to integrate accessibility services with immersive media, mainly 360° video contents by providing a set of tools to produce immersive accessibility, distribute the accessibility enhanced 360°content and the playout of that content with appropriate personalization options for the accessibility services.
MPEG is the Motion Picture Expert Group, a group from IEC and ISO which created some of the foundations of the multimedia industry: MPEG-2 TS and the MP4 file format, a series of successful codecs both in video (MPEG-2 Video, AVC/H264) and audio (MP3, AAC). A new generation (MPEG-H) emerged in 2013 with MPEG 3D Audio, HEVC and MMT, and other activities like MPEG-I.
The GPAC team and its commercial arm (GPAC Licensing) which is led by Motion Spell are active contributors at MPEG.
MPEG meetings are organized as a set of thematic meeting rooms that represent different working groups. Each working group follows its way from requirements to a working draft and then to an international standard. Each MPEG meeting gathers around 500 participants from all over the world.
In the Audio coding area MPEG worked on 2 standards: MPEG-D, and -I.
- MPEG-D: In Part 5 – Uncompressed Audio in MP4 File Format, MPEG extends MP4 to enable carriage of uncompressed audio (e.g. PCM). Before, only carries compressed audio.
- MPEG-I : Part 4 Immersive Audio. As MPEG-H 3D Audio already supports a 3DoF user experience, MPEG-I builds upon it to provide a 6DoF immersive audio experience. A Call for Proposal will be issued in October 2019. Submissions are expected in October 2021 and FDIS stage is expected to be reached in April 2022. Even though this standard will not be about compression, but about metadata as for 3DoF+ Visual, we have kept this activity under Audio Coding. ImAc contribution is to make sure we can use legacy codec to convey 3D spatialized audio. The proposal was about using CICP (codec-independent coding points) to convey this information. Unfortunately the whole process takes time. MPEG-I Audio started also to investigate on test material.
In the Video coding area, the activities are intense since MPEG is currently developing specifications for 4 standards: MPEG-H, -I, -5 and -CICP) :
- MPEG-H: In Part 2 – High Efficiency Video Coding, the 4th edition specifies a new profile of HEVC which enable to encode single color plan video with some restriction on bits per sample, and which include additional Supplemental Enhancement Information (SEI) messages.
- MPEG-I: In Part 3 – Versatile Video Coding, jointly developed with VCEG, MPEG is working on the new video compression standard after HEVC. VVC is expected to reach FDIS stage in July 2020 for the core compression engine.
- MPEG-CICP: Part 4 – Usage of video signal type code points 2nd edition will document additional combinations of commonly used code points and baseband signaling as it is done with audio.
- MPEG-5 : This standard is still awaiting approval, but MPEG has already obtained all technologies necessary to develop standards with the intended functionalities and performance from the Calls for Proposals (CfP). Part 1 – Essential Video Coding will specify a video codec with two layers: layer 1 significantly improves over AVC but performs significantly less than HEVC and layer 2 significantly improves over HEVC but performs significantly less than VVC. Part – Low Complexity Video Coding Enhancements will specify a data stream structure defined by two component streams: stream 1 is decodable by a hardware decoder, stream 2 can be decoded in software with sustainable power consumption. Stream 2 provides new features such as compression capability extension to existing codecs, lower encoding and decoding complexity, for on demand and live streaming applications. This new Low Complexity Enhancement Video Coding (LCEVC) standard is aimed at bridging the gaps between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.
Video: Neural Network Compression for Multimedia Applications
At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters for networks trained with multimedia data to reduce their size for transmission and their efficiency, while not or only moderately reducing their performance in specific multimedia applications.
Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications are a considerable size because they contain many parameters such as weights. So, it’s important to use a compressed representation of neural networks to transfer them to various clients and use them in applications like mobile phones or smart cameras.
After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).
MPEG expects that the compression of neural networks for multimedia content description and analysis (ISO/IEC 15938-17) to reach Final Draft International Standard (FDIS) in April 2021. In the framework of the ImAc project, scooting those exploring activities is still relevant to drive some future choices and effort to go further than the project scope;
At its 126th meeting, there were also some activities and work done on Immersive subtitles. In particular, two documents focus mentioned the work done regarding subtitles:
- N18127: Requirements MPEG-I Phase 1b. In particular requirements #6 and #7 relate to subtitles. This document includes a Liaison Statement with W3C relative to Timed Text WG .
- N18227: WD of ISO/IEC 23090-2 2nd edition OMAF. The section 7.11 defines storage of timed text for omnidirectional video and Section 10.4 defines timed text profiles.
Next MPEG meeting
The next meeting (127th) will stand on July 8-12, 2019, Gothenburg, Sweden