How ClearVR Drives and Leverages Standards

Tiledmedia believes that good standards help develop markets. Below, we explain how Tiledmedia’s products rely on standards, how we contribute to better and more efficient standards for immersive media. We also describe why ClearVR is more advanced than some of the currently published standards, and how Tiledmedia’s s ClearVR technology is at the heart of the new version of MPEG’s OMAF standard – The Omnidirectional MediA Format. To make our point, we need to go into a bit of technical detail, but this page is useful even to the reader that is not intimately familiar with all the technical terms.

Different implementations of “tiled streaming” exist, and ClearVR is one of them. The first version of MPEG’s OMAF (Omnidirectional MediA Format) specification also specified a form of tiled streaming. The spec also includes a “viewport-dependent media profile” that relies on tiled streaming. This specification forms the basis of VRIF‘s Viewport Dependent Profile. Tiledmedia’s ClearVR technology is more advanced than this profile, and we’ll explain why below.

Let’s first note that ClearVR follows the HEVC standard and the MP4 file format specifications. ClearVR is not compatible with all aspects of the “viewport-dependent” profile in the currently published version of OMAF, and this is a deliberate choice. Tiledmedia believes that ClearVR, the result of more than eight years of tiled streaming R&D, is significantly ahead of the technology that the current version of OMAF specifies. It’s the much tighter integration of the media processing and the networking stack in the ClearVR solution that determines its performance.

At the same time, Tiledmedia is convinced that good-quality standards create markets, and we seek to provide standards-based solutions to those markets.

We have adopted, and will help improve, any standards that help our customers. We are doing this in two ways.

First, we are adopting an increasing amount of relevant standards as we evolve our platform. We rely on the MP4 file format and the Common Media Application Format (CMAF). This allows the use of existing packagers in a ClearVR-enabled deployment, and it enables ClearVR to be integrated with most DRM systems. We also rely on HEVC for our video processing, working with unchanged, standard (usually hardware) implementations of this decoder in the devices that we support.

Next, we have contributed to significant efficiency improvements to the file format, bringing the standard closer to our solution. These improvements lower overhead in use cases with lots of smaller element (tiles) and significantly decrease tile switching latency. Interestingly, other applications will also benefit from these updates.

MPEG has recognized the paradox of immersive media distribution: as we move towards ever larger data volumes in immersive media, the elementary chunks of data need to get smaller rather than larger. This is because the entire scene will be too large to deliver and process, and media delivery will increasingly be individualized. It will depend on where we stand in a virtual world, where we look, and how we interact with the media. We call this approach “late binding”, a term that has been adopted in MPEG.

Tiledmedia’s CTO Ray van Brandenburg is editor of one of the file format specs (to be exact, an amendment to Part 15 of the File Format, which contains the HEVC bindings).

And this brings us back to MPEG’s OMAF specification, MPEG’s specification for VR360 distribution. The new version of OMAF, planned for approval in Fall s020, will support late binding based on ClearVR. It will be in the form of an “Advanced Tiling Profile” that was included into the draft OMAF v.2 specification in July 2019. We are now working with the MPEG community to further develop this profile.

Working in MPEG doesn’t just mean that we bring ideas – we also learn from other experts that believe in our approach and that have their own, smart ideas on how to improve the specifications. Participating in standardization is a significant commitment and investment, but also a source of inspiration. We believe it is worth it.

How ClearVR is more advanced than current standards

Clear VR implements a philosophy called “late binding”, while the first OMAF version relies on “early binding”. The main difference is as follows. Early binding establishes the configuration of the tiles in the final image when the content is being prepared for distribution, at the server side. Late binding (ClearVR) only configures the tiles when content is being played, in the end-user device.

Early binding poses a slightly lower processing load on the client device. But there are many advantages to late binding that justify the extra bit of processing. First, late binding systems respond much faster to user action. Second, can cope with the vagaries of unpredictable OTT distribution more smoothly. Third, they provide significantly more flexibility for playing the same encoded content on different types of devices – including devices that didn’t even exist yet at the time that content was being prepared.

And while we address content preparation: this is much more straightforward for late binding distribution, as there is no requirement to pre-configure the content for all conceivable devices and playback conditions. We explain this in more detail below.

The main advantages over late binding (and ClearVR) over early binding (including OMAF v.1) are as follows.

Efficiency: ClearVR can reduce bitrate requirement by a factor of up to 5 when compared to full-sphere streaming; current standards reach about a factor of two. In other words, ClearVR uses less than half the bandwidth of MPEG OMAF v.1.

Switching Latency: On a good-quality CDN, the ClearVR Client can switch to high-resolution imagery within a few frames – unnoticeable to the user. In a standards-based solution, switching after head motion relies on segment and GOP boundaries, and takes hundreds of milliseconds and sometimes even a few seconds, which is very visible. Some implementations try to alleviate this by creating more encoded versions of the same content, but this adds significant inefficiencies and cost to processing, storage and distribution.

Flexibility: A single ClearVR encoding can cater to all HMDs and various types of flat screens, regardless of their viewport size and aspect ratio. That single representation can cover monoscopic and stereoscopic content, where a flat device just retrieves the tiles for one eye. Contrast that to current standards, where the content distributor needs to provide separate representations for each conceivable viewport size / aspect ratio combination – again a significant cost factor.

Graceful degradation: With ClearVR, each tile forms an independent stream, which the client combines with other tiles to create a single HEVC-compliant bitstream. Such client-side processing allows the ClearVR library to make last-millisecond decisions, and to dynamically replace tiles with their low-resolution equivalent on a frame-by-frame basis. It is also possible to seamlessly switch between stereoscopic and monoscopic content, as bandwidth comes and goes. This allows playback to continue uninterrupted even when not all data is available and prevents buffering – no spinning wheels. Early binding approaches hard-code all tile combinations in the bitstream during content production. When data for a single tile is not (yet) available, the client is helpless and can only resort to buffering.

Bitrate variability: Bitrate spikes in ClearVR are limited, even with extensive head motion, because ClearVR doesn’t require clearing the decoding buffer whenever the viewport changes. In contrast, existing specs relies on field-of-view-specific metadata hard-coded in the bitstream, forcing the client to download a new significant hefty chunk of data whenever the field-of-view changes even slightly. This causes a significant spike in required bandwidth and gives a very noticeable motion-to-high-resolution latency.

User interaction: ClearVR does all processing client-side instead of during content preparation, which allows for complex forms of user interaction without adding any latency. Examples are dynamic zooming, field-of-view adjustments, pause with the ability to look all around while still getting high-quality imagery, and fast seeking. None of this is supported by any available standard.

Recognising these advantages, MPEG has decided to document late binding operation in OMAF v.2, and to include a late binding profile in the specification called Advanced Tiling Profile. It is slated for final approval in July 2020.

Supporting standards to stay ahead of the pack

Good standards allow interoperability and innovation. They lower the barrier to entry and facilitate further quality increases. Take, for example, the tried and tested MPEG-2 Video standard. While set halfway the nineties, innovation continues even today, and compression efficiency has more than doubled since the standard was frozen. This is because the standard defines only the decoder, not the encoder – and encoder manufacturers keep improving their tools.

The same applies to tiled streaming today, especially with ClearVR’s late binding approach. The Advanced Tiling profile in OMAF v.2 allows the client device to determine what tiles to retrieve, at which exact moment and at what exact quality. The profile and specification define only how they should be decoded and rendered.

When we noted that not all tiled streaming systems are equal, this fully applies for different implementations of this profile. Performance differences will be very significant. ClearVR tightly integrates networking and video coding, and we use the data collected from over 500 different device make/model combinations and millions of streaming minutes to continuously optimize ClearVR. And we do this while relying on standards, and providing our customers the benefits that comes with using these standards, including easy integration and re-use of existing elements in the distribution chain.

Tiledmedia’s solutions rely on standards

Tiledmedia’s ClearVR solutions rely on international standards. This makes our technology straightforward to deploy to existing devices over existing content distribution channels. We believe in the interoperability that good standards bring, which benefits both consumers (things just work), our ecosystem partners (our technology is easy to integrate) and content providers (being able to relay on standards significantly reduces their cost).

We rely on standardized HEVC decoders in consumer devices and personal computing systems. Obviously, we also rely on HEVC encoders. Encoders simply need to observe a few well-defined constraints that are clearly defined in the specification.

Tiledmedia is an enthusiastic member of the VR Industry Forum, VRIF, which seeks to facilitate the widespread adoption of VR services by working on quality and interoperability. Rob Koenen, one of Tiledmedia’s Founders, was the also a founder of VRIF and served as its first President.