Differences Between ClearVR and Current Standards
The main advantages of ClearVR over the current generation of standards (MPEG OMAF, VRIF Viewport Dependent Profile) are:
Efficiency: ClearVR can reduce bitrate requirement by a factor of up to 5 when compared to full-sphere streaming; current standards reach about a factor of two. In other words, ClearVR uses less than half the bandwidth of MPEG OMAF.
Switching Latency: On a good-quality CDN, the ClearVR Client can switch to high-resolution imagery within one or two frames – unnoticeable to the user. In a standards-based solution, switching after head motion relies on segment and GOP boundaries, and takes hundreds of milliseconds and sometimes even a few seconds, which is very visible. Some implementations try to alleviate this by creating more encoded versions of the same content, but this adds significant inefficiencies and cost to both processing and distribution.
Flexibility: With a single representation, ClearVR can cater to all HMDs and various types of flat screens, regardless of their viewport angle. A single representation can cover monoscopic and steroscopic content, where a flat device just retrieves the tiles for one eye. With a fully standards-based solution, the content distributor needs to provide separate representations for each viewport angle – again a significant cost factor.
Graceful degradation: With ClearVR, each tile forms an independent stream, which the client combines with other tiles to create a single HEVC-compliant bitstream. Such client-side processing allows the ClearVR library to make last-millisecond decisions, and to dynamically replace tiles with their low-resolution equivalent on a frame-by-frame basis. It is also possible to seamlessly switch between steoscopic and monoscopic content, as bandwidth comes and goes. This helps when data is not yet available and prevents buffering. Current standards hard-code all tile combinations in the bitstream during content production. When data for a single tile is not (yet) available, the client can only resort to buffering.
Bitrate variability: Bitrate spikes in ClearVR are limited, even with extensive head motion, because ClearVR doesn’t require clearing the decoding buffer whenever the viewport changes. In contrast, existing specs relies on field-of-view-specific metadata hard-coded in the bitstream, forcing the client to download a new batch of data whenever the field-of-view changes even slightly. This causes a significant spike in required bandwidth and gives a very noticeable motion-to-high-resolution latency.
User interaction: ClearVR does all processing client-side instead of during content preparation, which makes complex forms of user interaction possible without adding any latency. Examples are dynamic zooming, field-of-view adjustments, pause with the ability to look around and still get high-quality imagery, and fast seeking. None of this is supported by any available standard.