Tiledmedia’s Solutions Rely on Standards


Tiledmedia’s ClearVR solutions rely on international standards. This makes our technology straightforward to deploy to existing devices over existing content distribution channels. We believe in the interoperability that good standards bring, which benefits both consumers (things just work) and content providers (it significantly reduces their cost).


We rely on standardized HEVC decoders in consumer devices and personal computing systems. Obviously, we also rely on HEVC encoders, which require some restrictions that are also defined in the specification.


Tiledmedia is an enthusiastic member of the VR Industry Forum, VRIF, which seeks to facilitate the widespread adoption of VR services by working on quality and interoperability. Rob Koenen, one of Tiledmedia’s Founders, is the President of VRIF.




Tiled streaming comes in different flavors


It is important to understand that there are different implementations of “tiled streaming”, with ClearVR one of them. The first version of MPEG’s OMAF (Omnidirectional MediA Format) specification also specifies a form of tiled streaming, and a “viewport-dependent media profile” that relies on tiled streaming. This specification also forms the basis of VRIF‘s Viewport Dependent Profile.


Tiledmedia’s ClearVR technology follow the HEVC standard and the file format specifications. ClearVR is currently not compatible with all aspects of the “viewport-dependent” OMAF profile, which is a deliberate choice. We believe that ClearVR, the result of more than seven years of tiled streaming R&D, is significantly ahead of the technology that the , first current version of OMAF specifies. A brief summary goes below; it’s the much deeper integration of the media processing and the networking stack in the ClearVR solution that determines the performance of our solution.


At the same time, Tiledmedia is convinced that good-quality standards create markets, and we seek to provide standards-based solutions to those markets.


We have adopted, and will help improve, any standards that help our customers. We are doing this in two ways.

First, we are adopting an increasing amount of relevant standards as we evolve our platform. We rely on the MP4 file format (keeping as close to the Common Media Application Format – CMAF – as we can) and we’re adding support for Common Encryption (CENC). This allows the use of existing packagers in a ClearVR-enabled deployment. We also rely on HEVC for our video processing, working with unchanged, standard (often hardware) implementations of this decoder in the devices that we support.


Next, we contribute our ideas to the relevant (MPEG) standards, improving those standards and bringing the standard closer to our solution. Again by way of example, we are defining additional elements to the ISO Base Media File Format that will significantly decrease tile switching latency. Interestingly, other applications will also benefit from these updates, and MPEG has recognized the paradox of immersive media distribution: as we move towards ever larger data volumes in immersive media, the elementary chunks of data need to get smaller rather than larger. This is because the entire scene will be too large to deliver and process, and media delivery will increasingly be individualized. It will depend on where we look, and how we interact with the media. We call this approach “late binding”, a term that has been adopted in MPEG and we expect new releases of OMAF will support late binding.


Working in MPEG doesn’t just mean that we bring ideas – we also learn from other experts that believe in our approach and that have their own, smart ideas on how to improve the specifications. Participating in standardization is a significant commitment and investment, but we believe it is worth it.

Differences Between ClearVR and Current Standards


The main advantages of ClearVR over the current generation of standards (MPEG OMAF, VRIF Viewport Dependent Profile) are:


Efficiency: ClearVR can reduce bitrate requirement by a factor of up to 5 when compared to full-sphere streaming; current standards reach about a factor of two. In other words, ClearVR uses less than half the bandwidth of MPEG OMAF.


Switching Latency: On a good-quality CDN,  the ClearVR Client can switch to high-resolution imagery within one or two frames – unnoticeable to the user. In a standards-based solution, switching after head motion relies on segment and GOP boundaries, and takes hundreds of milliseconds and sometimes even a few seconds, which is very visible. Some implementations try to alleviate this by creating more encoded versions of the same content, but this adds significant inefficiencies and cost to both processing and distribution.


Flexibility: With a single representation, ClearVR can cater to all HMDs and various types of flat screens, regardless of their viewport angle. A single representation can cover monoscopic and steroscopic content, where a flat device just retrieves the tiles for one eye. With a fully standards-based solution, the content distributor needs to provide separate representations for each viewport angle – again a significant cost factor.


Graceful degradation: With ClearVR, each tile forms an independent stream, which the client combines with other tiles to create a single HEVC-compliant bitstream. Such client-side processing allows the ClearVR library to make last-millisecond decisions, and to dynamically replace tiles with their low-resolution equivalent on a frame-by-frame basis. It is also possible to seamlessly switch between steoscopic and monoscopic content, as bandwidth comes and goes. This helps when data is not yet available and prevents buffering. Current standards hard-code all tile combinations in the bitstream during content production. When data for a single tile is not (yet) available, the client can only resort to buffering.


Bitrate variability: Bitrate spikes in ClearVR are limited, even with extensive head motion, because ClearVR doesn’t require clearing the decoding buffer whenever the viewport changes. In contrast, existing specs relies on field-of-view-specific metadata hard-coded in the bitstream, forcing the client to download a new batch of data whenever the field-of-view changes even slightly. This causes a significant spike in required bandwidth and gives a very noticeable motion-to-high-resolution latency.


User interaction: ClearVR does all processing client-side instead of during content preparation, which makes complex forms of user interaction possible without adding any latency. Examples are dynamic zooming, field-of-view adjustments, pause with the ability to look around and still get high-quality imagery, and fast seeking. None of this is supported by any available standard.