28 Sep How 9 companies distributed IBC ’19 in 8K VR
Together with a team of world-leading companies, the Intel Visual Cloud Conference at we delivered the 2019 IBC conference live to a global audience in 8K 360VR. This visual experience (i.e. IBC 360 Live) was available on a variety of mobile and head mounted devices, showcasing what the ecosystem can deliver over today’s networks.
Besides Intel and Tiledmedia, the demonstration partnership that Intel and Tiledmedia brought together included Akamai, Google, IBC, Iconic Engine, KPN, Oculus, and Voysys. Together we enabled the first-ever live 8K VR distribution with worldwide availability, stringing together hardware, software and services that are all commercially available today.
Bringing High-Quality VR to a Mass Market
Virtual and Augmented Reality services have a great future ahead of them. A well-designed VR experience can be transformative: the user feels completely transported to a remote environment, away from home to a new and exciting location. While VR adoption has been growing steadily, mainstream adoption has been held back by ecosystem and technology development. This is not a surprise, given the complexities of delivering these very rich live experiences. The end user Quality of Experience (QoE) is key to accelerating adoption VR360 services. To enable a great QoE, we need to be able to produce amazing content in high quality, and to distribute it to large audiences while maintaining that high quality. This is exactly what the IBC cooperation shows.
The Challenge: High-Quality VR Distribution
To understand why this project was unique, we will first explain what makes distribution at 8K quality hard. A VR camera is usually a set of conventional video cameras that all record a piece of the environment. Stitching software then combines these video images into a single spherical video in the form of an “equirectangular projection” or “ERP” (a flat map of the earth is also an ERP). A VR user only sees a small part of that sphere at any one point in time in their VR headset: about 1/8th of the complete picture. Since the video plays right in front of the user’s eye, magnified by special lenses, the resolution needs to be very good. The industry standard in VR is to use 4K video for the entire sphere. 4K in VR is 4 096 x 2 048 pixels, but the user will see, approximately, a mere 1K x 1K per eye, right in front of them – not a good experience!
If we go to 8K video (8 192 x 4 096 pixels) the image becomes considerably better. But to get that quality to the user, we need to send a huge number of pixels: the equivalent of about 16 high definition channels, requiring some 60 – 100 Mbit/s second. Even if we could get that bandwidth across, there are only very few devices that could decode it.
Solution: Smart VR Delivery Through Tiles
The IBC 360 Live distribution system gives consumers an 8K experience while transmitting and decoding much less information. The platform sends only what a user actually sees. To make this possible, we relied on tiled streaming. We cut the image up in some 100 high-quality tiles and send, decode and displayed only the tiles in the user’s view. The real-time nature of the experience, including the need to instantly respond to a user’s head motion, makes this approach particularly challenging. When the user turns their head, the system retrieves new high-resolution tiles, decodes and displays them – all within a tenth of a second. This happens so fast that users will hardly notice the low-resolution background layer, which is always present to prevent black areas from appearing in their field of view.
The following components powered the IBC 360 Live experience:
High-resolution VR360 cameras captured the event. We employed Kandao™ Obsidian R cameras as they are compact and affordable. Each of the VR cameras produces 3 individual feeds, which are stitched in real-time into a single 360 sphere. We recorded Intel’s Visual Cloud Conference at IBC with two such cameras, letting the user freely switch between these in the application.
Voysys VR Producer software performed the stitching of the three individual camera signals into a single “Equirectangular Projection” with a resolution of 8 092 x 4 046 pixels. (An ERP is like a flat image of the globe as used in maps). Using Voysys VR Producer, we also mixed a crisp image of the conference speaker slides as a virtual screen into the scene. Further, we had the option to mix in the “director’s cut” as produced by IBC TV. The ERP was then converted into a “cubemap” (a globe projected onto the six faces of a cube), which is a format more suitable for tiled distribution – see below.
We used two servers with Intel® Xeon® Scalable processors and graphics cards to run the Voysys software and perform an HEVC mezzanine encode for each of the cubemaps, at 150 Mbit/s. The mezzanine encoder lightly compresses the camera feeds for contribution to the cloud data center where Tilemdedia’s ClearVR Cloud processes them for tiled distribution.
KPN supplied a dedicated, 400 Mbit/s fiber link to the Internet, used for transmitting the two mezzanine-compressed ERPs to the Eemshaven Google Cloud Platform (GCP) instance in Groningen, The Netherlands. This was the GCP closest to the RAI in Amsterdam.
Tiledmedia’s ClearVR Cloud Live software running in Google Cloud first decoded the cubemaps and divided the decoded video in 96 tiles, 16 per cube face. For each of the tiles, two versions were encoded using Intel’s open source SVT-HEVC encoder, which supports the encoding constraints that are required for tiled distribution. There are two encoded versions of each tile because we need to enable both fast switching (a short GOP) and high efficiency (a longer GOP). Six larger tiles are also encoded, one for each cube face – these provide a lower-resolution background that is always present at the decoder.
The HEVC-encoded tiles were then packaged into mp4 files for distribution. The processes to create the cubemap, and then tile, encode and package the full 8K content run on multiple hundreds of Intel Xeon cores, in parallel processes managed by the ClearVR Orchestrator. ClearVR Orchestrator ensures all tiles are ready in time for ingest into the CDN and provides for redundancy and fail-over in the encoding and packaging. The total bitrate of the packaged files amounts to about 120 Mbit/s. Note that these numbers all apply to one single 8K VR feed; we had two parallel camera feeds, so the numbers doubled.
Next, the packaged MP4 files were ingested into Akamai’s CDN using 6 parallel Media Services Live (MSL) ingest points per camera, where each of these ingests can handle 45 Mbit/s. We require these parallel ingests to accommodate the exceptionally high 120 Mbit/s bandwidth mentioned in the previous bullet, and allow for some variation in the individual ingests. Note that non-VR streams do not come close to Akamai’s current 45 Mbit/s ingest limit, but that VR streams easily exceed them. Akamai and Tiledmedia developed the multiple-ingest configuration was developed in 2018 to accommodate live 6K and 8K VR events. As the IBC 360 Live production used two parallel 8K cameras, we used twelve (2 x 6) MSL ingests in total. The ingested files were then recombined and uploaded onto Akamai’s origin for its EU region and, from there, distributed across all regions globally.
To distribute VR content, live or on-demand, the Akamai network uses a VR-specific configuration that optimizes the likelihood that requested tiles are available at the edge cache. This is important in giving the optimal response when users move their head (or swipe on a flat device – see below). This configuration, again developed by Akamai and Tiledmedia, has been available since 2017. It relies on the http/2 or QUIC (http/3) protocol using with multipart byterange requests and smart pre-fetching of tiles to the edge cache. With tiled streaming, a short request/response delay is much more important than for regular on-demand services. In our configuration, the “motion-to-high-resolution” latency is typically 3 frames or less at 30 frames per second: after head motion, over 85% of high-resolution tiles are in the user’s viewport within these 3 frames. Given then way the human visual system works, this is virtually instant to the viewer. The final distributed resolution was 8 192 x 4 096, slightly higher than the ingested resolution because tiling requires the use of certain multiples of 16.
The application handles the final part of the delivery . Iconic Engine, an end-to-end provider of XR solutions, developed a special IBC 360 Live application for the Oculus platform, available on the Oculus Go and Gear VR. Another version of the same app was available in Apple’s App Store and the Google Play Store for iOS and Android tablets and phones. The Android and iOS apps allowed playback of the VR streams on “flat” devices and in so-called cardboard viewers. Iconic Engine’s application platform integrates Tiledmedia’s ClearVR library, which enabled Iconic Engine to develop the special IBC app in a matter of weeks.
In the IBC 360 Live app, ClearVR which determines what the users sees and retrieves the required tiles for that viewport using http retrieval as described above. The SDK next reassembles the bitstream snippets for these individual tiles into one single legal bitstream, and then sends it off for decoding by the hardware HEVC decoder in the user device.
The last step is putting all the tiles in the right place on the rendered sphere. All of this happens at the last possible moment, in real time, giving the user the highest possible quality on existing headsets and flat screens. The final end-user bitrate hovered between 12 and 15 Mbit/s; we were able to tune this bitrate in real-time be operating the ClearVR Cloud Live platform. The end-to-end latency was about 30 seconds. Using the latest chunk-based streaming techniques, the partners expect to bring this down to the single-digit domain before the end of the year.
From Idea to Production in Two Months
The partners conceived an realized the production of the IBC 360 Live event in two months, which was only possible because no new elements needed to be developed. All partners contributed elements that they have commercially available today. The production was still unique: never had a 5-day event been broadcast non-stop in 8K VR, and never was such a distribution truly global. The encoded and distributed video was compliant with the HEVC standard, which all modern mobile devices support in hardware.
Because the tiled streaming relies on regular http requests and does not require any per-user (edge) processing, the distribution system used for this event was as scalable as the CDN itself. This makes the solution suitable for mass distribution of the next major sports or music event. The 8K quality makes users feel more present than ever before.