How ClearVR is more advanced than current standards
Clear VR implements a philosophy called “late binding”, while current viewport-dependent MPEG specs rely on “early binding”. The main difference between the two approaches is as follows. Early binding establishes the configuration of the tiles in the final image when the content is being prepared for distribution, at the server side. Late binding (ClearVR) only configures the tiles when content is being played, in the user device.
The advantage of early binding is that it poses a slightly lower processing load on the client device. But there are many advantages to late binding that justify that extra bit of processing. First, late binding systems respond much faster to user action. Second, the better cope with the vagaries of unpredictable OTT distribution. Third, they provide significantly more flexibility for playing the same encoded content on different types of devices – including devices that didn’t even exist yet at the time that content was being prepared.
And while we address content preparation: this is much more straightforward for late binding distribution, as there is no requirement to pre-configure the content for all conceivable devices and playback conditions. We explain this in more detail below.
The main advantages over late binding (and ClearVR) over early binding (including OMAF v.1) are as follows.
Efficiency: ClearVR can reduce bitrate requirement by a factor of up to 5 when compared to full-sphere streaming; current standards reach about a factor of two. In other words, ClearVR uses less than half the bandwidth of MPEG OMAF v.1.
Switching Latency: On a good-quality CDN, the ClearVR Client can switch to high-resolution imagery within one or two frames – unnoticeable to the user. In a standards-based solution, switching after head motion relies on segment and GOP boundaries, and takes hundreds of milliseconds and sometimes even a few seconds, which is very visible. Some implementations try to alleviate this by creating more encoded versions of the same content, but this adds significant inefficiencies and cost to processing, storage and distribution.
Flexibility: A single ClearVR encoding can cater to all HMDs and various types of flat screens, regardless of their viewport size and aspect ratio. That single representation can cover monoscopic and stereoscopic content, where a flat device just retrieves the tiles for one eye. Contrast that to current standards, where the content distributor needs to provide separate representations for each conceivable viewport size / aspect ratio combination – again a significant cost factor.
Graceful degradation: With ClearVR, each tile forms an independent stream, which the client combines with other tiles to create a single HEVC-compliant bitstream. Such client-side processing allows the ClearVR library to make last-millisecond decisions, and to dynamically replace tiles with their low-resolution equivalent on a frame-by-frame basis. It is also possible to seamlessly switch between stereoscopic and monoscopic content, as bandwidth comes and goes. This allows playback to continue uninterrupted even when not all data is available and prevents buffering – no spinning wheels. Early binding approaches hard-code all tile combinations in the bitstream during content production. When data for a single tile is not (yet) available, the client is helpless and can only resort to buffering.
Bitrate variability: Bitrate spikes in ClearVR are limited, even with extensive head motion, because ClearVR doesn’t require clearing the decoding buffer whenever the viewport changes. In contrast, existing specs relies on field-of-view-specific metadata hard-coded in the bitstream, forcing the client to download a new significant hefty chunk of data whenever the field-of-view changes even slightly. This causes a significant spike in required bandwidth and gives a very noticeable motion-to-high-resolution latency.
User interaction: ClearVR does all processing client-side instead of during content preparation, which allows for complex forms of user interaction without adding any latency. Examples are dynamic zooming, field-of-view adjustments, pause with the ability to look all around while still getting high-quality imagery, and fast seeking. None of this is supported by any available standard.
Recognising these advantages, MPEG has decided to document late binding operation in OMAF v.2, and to include a late binding profile in the specification called Advanced Tiling Profile. It is slated for final approval in July 2020.