W3C Chinese Web Interest Group (CWIG) Media Task Force F2F Meeting Summary

23 March 2019, Beijing

Meeting Page  Topics  Minutes  Group Homepage  Chinese Version


The W3C Chinese Web Interest Group (CWIG) Media Task Force face-to-face meeting was held in Beijing on 23 March 2019.

W3C gratefully acknowledges Alibaba for hosting this meeting, Migu's support during the preparation of the meeting, and the support from the participants.

W3C Entertainment Champion & Media Specialist, Francois Daoust, as well as W3C members including Alibaba, Baidu, 360, Samsung, Tencent, Xiaomi, Intel, China Mobile, and other companies including iQiyi, bilibili, DeepMap, kwai, KKStream, Bread & Dream, Momo, Agora, Sina, Zhihu, Bytedance, and ZYBang attended this meeting. 77 participants gathered at the Alibaba Beijing and conducted a fruitful discussion.

The meeting was co-chaired by the Interest Group co-chairs Anqi Li (Alibaba) and Qingqian Tao (Baidu). There were technical presentations and discussions covering current media and entertainment standards work in W3C, potential technologies for standardization, and RTC/audio/video practice and problems met in China.

Judy Zhu (Alibaba Standardization Department) gave an opening speech for this meeting, introduced the current standards and future work in W3C, and analyzed the gap between the current standards and the actual domestic demand. Alibaba will work with the Chinese Web community, sorting out better descriptions of the use cases and requirements, and hope to form an industrial consensus on the direction of the standardization, and provide feedback in W3C and other SDOs.

W3C Entertainment Champion & Media Specialist, Francois Daoust, and Media & Entertainment Interest Group member, Song Xu, introduced the current standard work in Media:

A perspective on Media & Entertainment for the Web [slides]

François Daoust (W3C): W3C's current standardization work in media mainly focuses on pre-recorded, live, and interactive content. The corresponding content consumption mechanisms are on-demand, linear, and immersive viewing. Current standardization activities include reducing device fragmentation, improving content quality, IP replacing SDI, content personalization, and XR/360° scenarios. Future developments include the convergence of on-demand and linear viewing experiences, more immersive experiences, more natural interaction mechanisms including voice and gestures, and AI. The development needs to be supported by more powerful technologies such as WebXR, WebGPU, WebAssembly, machine learning and voice control. Web technologies are constantly changing, and the media & entertainment industry continues to face and embrace new challenges. We cannot predict the future, but we have always been exploring the direction of future technologies.

W3C Media and Entertainment Interest Group Working Status [slides]

Song Xu (China Mobile - Migu) introduced the current working status of the W3C Media & Entertainment Interest Group, and mentioned that W3C was preparing a new working group to work on relevant standards. He also introduced Media Capabilities, Picture-in-Picture, Autoplay Policy, 5G-based low-latency technologies, as well as cloud-based processing.

Representatives from Chinese companies shared their practices and issues with RTC technologies, and proposed relevant requirements and standardization suggestions:

RTC practice in Alibaba

Jian Lou (Alibaba): Alibaba has more than five years of practice history in this respect. Many products and BUs use RTC technologies. Alibaba continuously polishes the network and video quality, and highlights and sharpens the areas of interest in the video by image processing algorithms. The self-developed ARWNT algorithm analyzes the causes of network congestion and network jitter through scene recognition, and adjusts the corresponding strategies. Alibaba Cloud is also building a large number of real-time audio and video edge computing nodes with partners.

The Practice of Adapting QUIC in WebRTC-based Realtime Streaming System [slides]

Jianjun Zhu (Intel): QUIC has the advantages of fast initial connection establishment because it combines the TCP and TLS handshakes. It is multiplexed and has better congestion control comparing to TCP. When a TCP packet is lost, it will cause subsequent packets to wait, but QUIC does not have this issue. QUIC has two types of modes in the WebRTC scenario: P2P and C/S. We would like to transfer media streams over QUIC, which requires support for unreliable transfer. QUIC in C/S mode can increase the speed of the initial connection, remove the dependency on ICE, and make WebRTC and HTTP share a single connection.

MPEG-DASH application and practice [slides]

Jianqiang Ding (bilibili): MPEG-DASH is an adaptive bitrate streaming technique that enables high quality streaming of media content delivered from HTTP web servers. The reason for choosing it is that MPEG-DASH is an open standard and has some technical advantages over the alternatives. When we were optimizing the error rate in production, we encountered a lot of issues because the production environment was very complicated and the users often had network issues. For network jitter, bilibili has error handling for timeout circumstances. With regard to suggestions to current standards, we want to add firstFrameTime (first frame time) and bufferTimes (buffering times) in HTMLVideoElement.getVideoPlaybackQuality(), and hope to have a mechanism for getting the total memory limit of MSE.

Building audio and video services in Xiaomi [slides]

Zijing Wu (Xiaomi): Xiaomi Live is an early adpoter to support Lianmai (which means in the live streaming, one viewer initiates a connection request to the broadcaster, establishing a low-latency connection between the broadcaster and the viewer, while other viewers can see the synthesized audio and video content of the "broadcaster + viewer"), and combines real-time communication and live streaming technologies. Issues encountered include packet loss and AEC (echo problems). WebRTC should provide a way to manipulate the audio, as well as lip sync on the player side. In addition, currently the quick seek techonology is only supported in native apps, we (Xiaomi) are considering porting it to the Web player.

Media Source Extensions™ practice in Zhihu web player [slides]

Tianxiao Wang (Zhihu): Media Source Extensions (MSE) is a W3C specification that allows JavaScript to send byte streams to media codecs. The problems Zhihu encountered when using MSE include buffering, long delays when displaying the first frame, and unable to switch between resolutions smoothly and automatically. Media Source Extensions™ practice in Zhihu web player is as follows: MP4 file -> first request (meta information) -> MP4 Demuxer conversion -> request video information -> convert to FMP4 and play. The main issue is how to convert from MP4 to FMP4, how to implement dynamic resolution switching, and how to deal with the issue of audio/video not synchronizing.

Exploration of video container analysis in xgplayer [slides]

Guohui Yin (Bytedance): considering the experience (seamless resolution switching) and cost (traffic saving) we started to build a media player. The technical difficulties we met were: 1) the HTML video element does not support HLS and FLV+, and 2) regular MP4 does not support seamless resolution switching. One of the current solutions is to optimize the front-end player, mainly by controlling the buffering and loading of the video. The second solution is writing a parser that can parse video in multiple formats. The video quality and the buffering time rely on the algorithm of the parser. Suggestions for standardization are: 1) we hope that the browser can have built-in multimedia objects, such as for MP4, FLV, and TS parsing; 2) we hope that the browser can expose the FFmpeg interface to meet the requirements of video uploading, video editing, and video effects.

In addition, we discussed several technical proposals:

Proposal to add the <video-image> element [slides]

Vic Yao (Tencent): currently the video and img elements do not fully meet the users' needs. video requires users to click (do not support autoplay), can not play in small windows, and some Chinese browsers do not support playing multiple videos simultaneously. However, GIF files are too large and have low resolution and frame rate. Existing solutions (such as Media Fragments API and CDN dynamic interception) also have their own issues. So I propose adding a <video-image> element (or extend the <video> element). Also, I think it would be good to add a bufferTime attribute to video (to let the developers control the video buffer size) and a bytesReceived attribute to get the amount of data currently downloaded. I also hope the media error codes be more specific.

Commentary subtitles standardization Plan and Proposal [slides]

Song Xu (China Mobile - Migu): commentary subtitles (弹幕 in Chinese) refers to the streams of moving subtitles overlaid on the video. It is popular among young people in East Asia, and even in some places of Europe and the United States. The value of commentary subtitles standardization lies in a unified subtitle structure, which reduces development complexity and cost. The technology itself is not complex, but has many current implementations. With regard to how to standardize commentary subtitles, we can consider starting with the following: 1) collect user scenarios (already shared on GitHub, feedback welcome); 2) write an Interest Group Note, summarizing the standardization direction; 3) write the technology details and API documentation; 4) submit to WICG to invite browser vendors to discuss.

Development and practice of commentary subtitles in bilibili [slides]

Zhaoxin Tan (bilibili): first we need to standardize the English name of 弹幕 (bullet subtitles, commentary subtitles, danmaku, danmu/tanmu etc.). The commentary subtitles in bilibili has attributes in seven dimensions: font, spacing, shadow, transparency, speed, font size, and color. bilibili supports more than a dozen kinds of commentary subtitles and filtering methods, and designed a commentary subtitles language called bas. bas is highly interactive and is open sourced on GitHub. Suggested directions for the commentary subtitles standard include: 1) standardize the default layout of commentary subtitles; 2) provide an HTML element for the commentary subtitles, such as <danmakulist> or <danmaku>, or add a new layout mode called display: danmaku. We also hope that Picture-in-Picture supports commentary subtitles, and broswers implement the CSS mask-image property more interoperably. We hope to work with you to promote the standardization of the commentary subtitles.

Lastly, participants discussed media technologies/standards related to improving video quality and streaming speed, media in mobile and PC, browser support, and other topics. Two co-chairs of the Interest Group of summarized the meeting, and welcomed everyone to provide feedback.

W3C and the chairs of CWIG would like to express their sincere gratitude to all of you for your support! The slides and minutes are open to the public. If you have any suggestion, or would like to learn more about / get involved in the W3C Chinese Web Interest Group, please contact us.

Note: CWIG plans to hold a meeting on Mini Programs in late April or early May, so stay tuned!

About the W3C Chinese Web Interest Group

The W3C Chinese Web Interest Group was established on September 20, 2018, to provide a forum for W3C members to enhance the participation in Web standards work from the Chinese Web community. The group will focus primarily on identifying unique requirements from China, on helping the Chinese members to get familiar with the process of W3C standards activities, on discussion of technical ideas with the potential to be proposed to W3C, on standards testing and implementation, as well as corresponding standardization opportunities for W3C while assisting the participation and contribution from the Chinese Web community.

Anqi Li (Alibaba), Wanming Lin (Intel), Qingqian Tao (Baidu), and Zhiqiang Yu (Huawei) co-chair the group and coordinate the daily work. The W3C team contacts for the group, Fuqiao Xue and Xueyuan Jia, are responsible for the technical and communications work respectively.

We welcome W3C members and the public to follow and participate the group discussions.

W3C is proud to be an open and inclusive organization, focused on productive discussions and actions. Our Code of Ethics and Professional Conduct ensures that all voices can be heard. For any comment or suggestion about this summary report, please contact Xueyuan Jia or Fuqiao Xue.