From W3C Wiki
Session 8: Making Video a First-Class Citizen of the Web
/!\ This session transcript needs to be edited. Please consider updating this entry
Philippe Le Hégaret: Okay. Let's look at my video on the web. How many ^[$-1òô I have too many people on stage. What's going on?
< laughing >
Philippe Le Hégaret: < slides > Charlton? Do you want to talk about video on the web? So, let me present you the 3 people who actually are going to present. And I don't know about the others. I will first introduce to you the video workshop, that we will do in California. On my right, I have Jason Gaedtke, from Cable Labs. He is Chief Scientist, focusing on peer-to-peer architecture and metadata management. After that, I have Håkon Wium Lie, the CTO of Opera Software. I think he was involved in CSS and he is also pretending to make Video a first-class citizen on the Web lately as well. And I have Jon. Jon if you are raise your hand from WGBH, Director of emerging technologies. So starting quickly on Video on the Web. What's happening on that side? < slide 2 > Well the first thing which is happening in lots of continents is there is a decline of television. People are spending more time on the internet than on television. The television latest numbers shows that there is a -- numbers are going down for television and numbers are going up on internet. What are you saying Michael? Bigger fonts. Let me try to do that. I'm trying that. I'm trying that. Oh, just +, not ctrl +. Sorry. So that's one thing. And , what's coming up as well is we gonna have HD television screen in living rooms in the upcoming years. What we mean by that is a computer -- big computer screen. Finish the time actually have your laptop on your laps in order to watch a video while you are watching television. You really want to watch the video on your television screen right there. Of course, if you're trying to watch poor quality of video content we have right now, you are going to be pretty disappointed. That will also drive HD content on the Web as well. Camcorder prices are going down and resolution is also increasing. Those nice videos that you're getting at home, you will want to put them on the Web, and with also HD quality as well. It's easier also to create video content than music. So we are talking about a lot of content, billions and billions of video that are going to be rushed on the Web in upcoming years. There is also the fact that there is a lot of video content which has been published which is completely inaccessble. You will not see it on television and you will not find it in your local video store either. There are a lot of markets out there and people would actually like to see this video content. So we're talking about millions of small markets out there that are waiting for that. That's basically the theory of the long tail. < slide 3 > So some of the challenges that are coming to us in the upcoming future is how can I find all this video content that is waiting for me out there. Can I just find all the videos with George Clooney out there? Right now, the situation doesn't look. All this content which is completely unaccessible, <missing word> is a desesperate area, captioning, video description, internationalization... If I want to go and watch my movie in french instead of english, how do I do that? And, finally, how is it going to change the user experience is an other component. < slide 4 > So what we want to do for the upcoming workshop is to do some kind of strategy/long term thinking about the current impact and future impact of Video on the Web, looking at the user experience in terms of search, accessing the content, finding captioning. parental control is also an other aspect. We would like to look at the entire chain of video production as well, because you have to take into account what is being used to produce the video content. Digital rights is another aspect we'd like to look into. Content adaptation: if I want to watch this video on my cellphone. So that is going to be discussed at the workshop. And the impact on Web Architecture: How can I actually link into a video content from within a video content? So I can actually really make video a first-class citizen on the Web. This is something that we would like to look into at the workshop. < slide 5 > So finally, the workshop is going to be at the beginning of December, 12 and 13. It's going to be in two locations. One in San Jose California and the other one is going to be Brussels, in Belgium. We will have telepresence link between the two locations. The workshop will be in Californian time. So in Brussels, it's going to be from 6pm to 2am. But you have to balance with travelling, taking an airplane all the way to California. I certainly encourage to attend at both locations. The Chairs are Ian Blaine, from thePlatform, a media content management company, Paul Bosco, who is sitting in the back, from Cisco Systems, and myself. So, if you want to know more, don't hesitate to come to me.
Jason Gaedtke: < slides > So, before I jump into this, one qualification. < slide 1 > As Philippe mentioned I'm with Cable Labs, officially today. That's a new position for me. In fact, I have only been on the job for two and a half days this week so far. Prior to that I was with Comcast and I lead a group of architects in our interactive media group, looking at Web technologies and specifically video on the Web. So, I am here representing myself today but drawing upon that experience and also looking forward to some of the work I plan to do at Cable Labs. < slide 2 > So, what I have, I won't be talking about presentation layer technologies, or codecs, or metadata, or DRM. All of which that are very interesting topics and I'm looking forward to the workshop to get into some of those. What I would like to focus on today is the content distribution and specifically the peer-to-peer architectural paradigm, and a notion which I'd like to introduce called protocol. This is not my idea, this is something that I am pulling some cultural theorist actually, but I think it is a very powerful one and it speaks to lot of the work that is already gone out here in the past, so hopefully that will resonate. The overarching theme is the move from centralized control and hierarchical tree structure that is our traditional CDNs, Akamai, Limelight and so forth. to a more distributed graph like structure in the form of peer-to-peer networks. As Philippe has already introduced, there are a number of enabling technologies or factors that are allowing for the explosion we're seeing today in IP video. Codec efficiencies have increased. Recording and storage devices have commodotized and have been accessible to the consumer. Broadband access network are improving and the bandwith continues to go up with @@? home and access 3.0 and so forth. In the midst of all this, both traditional media companies and emerging internet operators are struggling with new business models and one of the key factors in those models from the cost prospective is the cost distribution. What I would like to argue here is peer to peer offering a compelling option to look at, in order to dispel some of those costs, but they are certainly other considerations that have motivated move to peer to peer, and I'll get into some those. < slide 3 > So, to this term "protocol", it's not the protocol that we think of, it's not protocol on a technical sense. As I said, it's a cultural term or even a philosophical one. There is a theorist at NYU called Alex Galloway. He published a book called Protocol in 2004, and I highly recommend it if you're into the philosophy behind this. But essentially what he is proposing is a name for a movement that has been taking shape for some time, and it is a more distributed democratic horizontal control structure on the Internet, facilitating direct peer-to-peer relationships between users and <missing word> rather than a centralized controlled model, and essentially what we're talking about is a move from control residing within a small number of authorities, be it governments or corporations and so forth, into the protocols themselves. It isn't that those <missing word> are loosing control, it's that the protocols that are embodying that control structure for them. < slide 4 > I'm gonna to through this slide quickly but essentially a strong metaphor here or a strong example of this phenomenon is a comparison of TCP/IP versus DNS. TCP/IP is inherently horizontal and democratic. There is no essential routing authority for example, the routing protocol take care of that work, while DNS is. There are route servers, and if for whatever reasons, court injunction and what have you, your record is pulled from DNS, you no longer exist in that namespace. There are obviously alternative models of DNS but the current implementation is hierarchical. And I think there are both pros and cons with those two models. We also see historically a movement from centralization to decentralization to distribution and I think the peer-to-peer movement is a continuation of that in the particular domain of networking technologies. I am sensitive on the time here, so I am going to jump through some of this quickly. < slide 5 > This is probably the sort of sweet spot in the presentation. This is the slide that the business types in these startups out there working in the peer-to-peer space are really using to secure funding and so forth. The primary appeal of peer-to-peer is the cost avoidance one. That is you can avoid a large percentage of your storage processing and bandwidth cost by enticing consumers to contribute those resources for you on the network. Now, historically in the past, the incentive has been free content, either in the form of file sharing of MP3 or video files whatever, or add supported content, as we're seeing in the more emerging business models. My contention is that the model makes a lot of sense from a technical standpoint. However in order to allow it to translate and make the leap into other domains, such as online virtual storage or gaming or persistent virtual world, we're going to need think about a value proposition for the users [$-1òô beyond free. < slide 6 > This is a technical slide, I assume that most of you are familiar with the virtues of peer-to-peer networks. If not, please grab the deck off the Web and take a look through this, but there are a lot of nice characteristics in these networks. There is also a slide in the appendix of this deck, which points to a number of interesting research efforts going in this space; but I'll differ on that for now. < slide 7 > Finally, this is my last slide, there has been a substantial amount of academic research taking place around peer to peer and around this evolution that is talking about over the past five years or so. Quite surprising to me, even being closed to this industry. This is sort of an history and sort of the future road maps in some cases for where peer-top-peer has been and where it's going. On the top, we see file sharing applications like Napster, Nutella, Kazaa, and so forth, which are primarly concerned with moving that "free content" around. The folks that brought us Kazaa, Niklas Zennström and Janus Friis specifically, are actually thought leaders in this space. And they've proven that you can take the same the peer-to-peer paradigm and apply it to a different problem through Skype. They're currently working on a video startup called Joost, which I'm assuming most of you are familiar with. If not, look into that, and the business model there is an ad supported one, so this is sort of an embodiment of the observation earlier. The rest of the list, except for the commercial content distributed networks I should say, is mostly academic research. The commercial exception is Move networks, Grid networks, Pando networks, Red swoosh [$-1òô which was recently acquired by Akamai òô Kontiki by Verisin. These are all internet start ups that are trying to supplement or replace those hierarchical traditional CDNs with a peer to peer based distribution paradigm. None of them, that I'm aware of have however, have a strong incentive for the user to participate. It's sort of, if you want the content that the service provider is offering through those mechanisms, you have to opt-in. I think there is room for improvement over there and one potential direction is something called virtual economies. It's essentially this idea that if you're contributing resources into the network whether it's processing storage or bandwith, you should get credits for that, and that should substain accross sessions, boundaries and essentially create a virtual authority system and so forth. There is some interesting research going on in this space at Harvard and at Rice. If anyone is interested in this work, my email contact address < firstname.lastname@example.org > is in the deck and I'd be very interested in following up. Thanks.
Philippe Le Hégaret: Thank you Jason. Håkon? We'll open the microphone and take questions at the end obviously of the presentations.
Håkon Wium Lie: <slides> Can you hear me? Very good. My name is Håkon Wium Lie. I have with me, Erik, to my right here, whose going help me run some of the demos. I'm gonna talk about video in browsers, and we're going to show you some things to do. You all have laptops and if you want to play with some of this code, you can go to this URL and download Opera with video. We just released a build 30 minutes ago.
< laugh >
Philippe Le Hégaret: You have already used your 10 minutes so if we can ^[$-1ó¦
Håkon Wium Lie: Okay, we're gonna quick.
Philippe Le Hégaret: if we can speed up
Håkon Wium Lie: speed up, speed up
Erik Dahlström: Alright, so I'll quickly show some of the SVG video demos. Here is the first one, it's doing realtime clipping, and that'ss using the SVG tiny 1.2 video element. It shows you some of the things you can do with SVG video and it's using SVG with script and declared animation. Okay, so the next demo is showing that you can do reflection effect. You can take video and transform it, and you can put mask on it, so it is reflected here. It is a common Web 2.0 feature. The next thing is showing that you can do realtime SVG filters on video. And as you can see here, you can filter the video in gray scale or tracing the edges or any other SVG filter, in realtime on the video. The last demo I have is a very quick one about canvas 3D, and this is using experimental features in Opera. It's doing 3D and it's using SVG for texturing this cube here. And this is also available in this Opera special released on labs.opera.com.
Håkon Wium Lie: I said there were two problems, and there are actually three. The third one is accessibility. We haven't solved that yet. I know there are a lot of people in the audience here who need it, and who want to work to find solutions there and I think we can find those solutions but, for now, I think we should play a little and also think about the long term things that you listed.
Philippe Le Hégaret: thank you Håkon. Jon, yes, you're on number 4.
Jon Alper: Hi, I am Jon Alper, director of technologies and R&D at WGBH Interactive in Boston. And I'm coming from the content producer and user centric prospective. And I want to take a couple of minutes to go a little higher level on the idea of video being a full peer citizen on the Web and all that comes with that. In order to get where I'm coming from I need to burn about 45 seconds on who we are and why we think this way. WBGH is a PBS member station and producer public television in the US. That means we are viewer supported, we are a non-profit instution producing public educational media. WBGHi, the unit I work in, is the major rich media production unit at WGBH, and we work extremely closely with the WBGH media access group and the National Center for Accessible Media at WGBH, doing a lot of demos and testing and R&D and experimentation with what accessibility means in rich media. And, for the purposes of this talk, what the implications are, the sort of moral implications as we heard earlier about how we think about accessibility and where we think about it in the process of making video a full peer citizen. So, I want to show you something actually quite old, this is about six years old, this is a demo that we shipped, that was actually quite popular, it was an Apple QTV feature. And it's using ideas that we worked on using Quicktime accessibility. It's ouf of browsers as we used to say. But the point is that there are basic bits of functionalities here, and obviously you can't hear it.
< VIDEO PLAYING >
Jon Alper: It's not terrible, it's actually quite good. The point in this piece was, first of all, to show case this artist who was featured on a television program we produced called the Plaza, but also to use this, sort of, entertainment mode, to do some educational and accessibility work. So the lyrics are all closed captioned in both English and Spanish. Of course I landed on an instrumental break for my demo, and using Quicktime text track. There is bilingual implications to this. And this matters for people who don't necessarily have a deafness or harder hearing concern, but extending it to the general audience. Ditto, we have a musical logical/pop up video, Alex Alvear, the leader of the band, is mixing a lot of really interesting musical genres, and doing in a very fluid way and he is talking about the instrumentation and how all that stuff works together so you can have sort of two experiences of this video, both watching the video, but also using as a stepping stone into an exploration of musicology. And the fundamental engine behind that is the idea of accessibility. Accessibility, to me, I think of it as media accessibility as the digital curb cut. When they mandated ramps on the corners of sidewalks in this country, it was to help people who were in wheelchairs navigate the streets. The side effect and the societal benefit extended to mothers and babies carriages, and kids on rollerblades. When you think about your media in terms of accessibility, the user with sensory disabilities or ther disabilities, you are in fact making your media more universally accessible. An accessible web site is better indeaxble by search, it follows a greater semantic structure. It matters on a very basic level and why that consideration needs to factor into videos. So, back to slides. Those things, those underlying ideas of accessibility and the practicalities of delivering it are the underlying drivers behind the two main concerns for us, as media producers. One are pragmatic issues of workflow, which Philippe elluded to earlier, and two the notion of equitable ease of use. Can everybody get to it in an equitable and hopefully equivalent way irrespective of a variety of factors, disabilities being one of them. The reason workflow matters and why workflows has to enter into your process in arriving at these standards, is that if the metadata that is generated in the course of making video can be preserved from acquisitions through assembly, editing, production, etc. That metadata can then be delivered at the end to the end user, either directly as a side effect of transcripts and captions, or indirectly as metadata that searchs and other indexing mechanisms and purposes can use. And so far, the only architecture that has helped us do this, and it got have more than a few challenges with it, with tools etc, has been Quicktime, and conversely because of the similarities in the container format, theoratically maybe MPEG-4. There is no endorsement of any particular technological approach, and why I say that will make sense in a minute. Now, the last bullet: This may seem out of the purview of the W3C, but if we don't have standards that we can practically and usefully deliver that don't veer off in normal production workflow. It will reduce our ability to deliver more content and make that equitability available to all users. What that means is: automatic delivery to user need, meaning the users have to get what they need when they expect it. And the content producers need to be able to offer for all users ideally, irrespective of platform, bandwidth, disabilities. The most optimized experience that we can give them. So, what that means to us: That whatever the specifications are that evolve over the course of HTML 5 and the implementations of videos in browsers, there has to be basic negotiation fonctionality built into the spec and the way it's implemented from format, whether it's an directly browser supported format like Ogg Theora or whether it's a set of specifications that plug-ins makers have to adhere in order to allow their plug-ins to work in this universe. Those things need to exist. The next are accessibility options. Can the users choices in this environment about the features that they need all the way to the end of the future.
>> Some of the negotiation and degradation remains in support.
>> And the last one, in order for the accessibility media can have the close captioning for the deaf and hard of hearing, and in order to do that the user needs to remain in control.
They need to get what they expect when they load them, the hearing person needs to sit next to that person or capture what they ^[$-1òô miss.
That is why accessibility cannot be bolted on and the format choice and the sophistication is needed.
>> That is my sort of moral pitch for the day if you will.
>> We are going to now turn the floor to questions.
>> I am very interested in the last speaker, I think when you think about video on the web as if web and TV are kind of like two different things, the difference between the document and the video presentation, the video presentation is the shared experience between the ^[$-1òô video presentation is the shared experience between the participates.
This is a way to browse online and throw the video on TV and go back and forth.
>> They won't let me on site because I am a British national so I had to Pod cast it and send it off to them.
>> And then I said why won't you read my paper?
>> And yes, we want people looking at everything at the same time, and discussing everything at the same time.
It is a shared experience.
I think we need that dimension in terms of the video on the web.
I am not just referring to a narrow of one persons looking at the document.
>> I absolutely degree, and the way we talk about it is the 18^[$-1òñinch versus the 18òñfoot experience.
I will confess that I tend to leave the shared experience of the 18 footwork to those who can do the narrative.
>> And when I think of the web, and this may be my failure of imagination I think of it in an 18^[$-1òñfoot context.
>> I think that is exceeding cool, but I leave that out of scope and leave it to the editors for the editorial perspective.
>> Does that make sense at all?
>> < Speaker/Audio Faint & Unclear >
>> Please go back to the microphone.
>> It makes great sense for you, but for us, we have to think about the convergence of those two worlds.
>> < Speaker/Audio Faint & Unclear >
>> Do you have a reaction?
>> Yes, next question.
>> I am the chair of the multimedia groups the smile people.
>> Please remember that a first class citizen is a peer citizen and not a child citizen, and as soon as something becomes a peer citizen, the synchronization is an activity, when they are displayed, they are not baked in the video, I would hope they are not, but they are shared timing videos.
That is the shared timing and current basis.
We see some very nice demos but we see the reintroduction of the concepts that are specified and thought up and the impression that we get is that we see the reinvention of the reel with no attention paid to the work that is already done.
I think that is a start up problem, but I think that strikes me as being one of the W3C video.
>> Video is not displayed in the isolation.
>> There are many challenges.
>> I was really encouraged from doing that that can go from anywhere of the attribute naming through the entire scope.
>> These are the things that developing and anything that are expired are anything but expired.
>> Maybe we need to think briefly about that.
>> This is the expired and the < Speaker/Audio Not Clear > limitation and we are very aware of the issues that were raised and we want to make sure that go towards the same.
>> One of the things that make ^[$-1òô
>> < Overlapping/Multiple Speakers >
>> Go ahead.
>> We have done a lot of experiments with that.
>> If you look at the Nova online today, you will see a lot of the presentations and the full Nova programs athe chapters for the 18^[$-1òñinch experience.
We do what we call the lower level of the third, and when the speaker is identified, you can click on that speaker and look for deeper demo graphics etc.
>> We have the < Speaker/Audio Not Clear > heights and that is in the classroom and implications of it were.
That is in the case of rich media and maps times and video are all interrelated.
>> The nature of the web is that technology has moved on in the way where the fluids are no longer there, and there are challenges where you can update something that is complex and we have done a lot of work to identify that and implications on how users can use that.
And we find the way to do it, and the temporal units and exchanges.
I would argue not just the temporal but the mediate changes.
>> Those are critical to making the video a full citizen on the web.
If we can have a image map, why can't we have a moving image map.
>> And implicit in that work are the things you need to do in accessibility to make that available for everybody.
>> Just a quick follow up.
>> One of positive things for working with com cast is the ^[$-1òô you can select a range of time and mark a hot spot.
There is a fundamental problem that we ran into.
There are plenty technologies, there isn't a consistent or defact to on that issue.
>> < indiscernible > from < indiscernible >
>> I was looking at the presentation and the stuff you are talking about.
The video has certain challenges, but what you are talking about is the all content distribution and that is all of the formats that we are talking about.
>> This is in terms tying to the multimedia projects.
It looks like the solutions that you are after are the formats that we don't know and haven't been invented yet.
>> So stepping forward it is informative but it seems like it is the step of the iceberg.
>> Chose initial targets were the images that you expect.
You serve this from one and java from the other.
I agree with you but at the same time I would say if we solved it properly it not only helps us with the video content distribution, but what is coming next.
>> Virtual worlds or games or anything else.
>> This is around the video tag.
I noticed that you mentioned that the controls were UA dependent.
I saw pause, play, stop, and volume.
Is it volume ^[$-1òô it is important to have it.
We all have video cameras in our pockets these days.
>> We realize that we do need to support those things as well.
>> One of the questions, I guess for the panel is ^[$-1òô < Speaker/Audio unclear due to strong accent >
>> So do I have to install one of the players out of the channel.
I am wondering if it is ^[$-1òô are we going to try to change those trends?
>> < Speaker/Audio unclear due to strong accent >
>> You should install the browser, you know, pick your favorite one.
>> We know, of course, it is not going to be quite that easy.
But we think that the ^[$-1òô this is the main platform where the audio is coming from.
>> And imposing on the video makers on that architecture and the standard architectures in the controls and the platform ways of what we call arbitration or a ^[$-1òô context analysis issues.
>> In term offense the proprietary players and produces and the fact that there is new technologies that we are emerging that I am not aware of that would enable them to do it more, I think that causes the probably of the ideal notion of the web of what we have grown up with the web and what that is.
>> It is extremely challenging to get the rights to presently the media that you and I may feel that serves the public good to have it available.
As long as there is a society concern about the acronym of DRM and as long as there are non operable standards to do that.
You are going to get that bulkization into the content of the user.
>> In more successful, though of us that are there are the promising produces of ^[$-1òô we will be able to compete out propriorityship.
>> Thank you for such an excellent presentation.
>> I think we should recycle within this afternoon the < indiscernible > to the web constant and start at least for vision purposes thinking of the world you are in in and interactive video experience as a web.
>> Beyond the match up there is the elements of the experience, because beyond captions, two as was suggested, the things that were accepted by the user inputs and the interactive inputs are the on <indiscernible> of the web.
>> This is going to give you the clipping on the IT.
>> How to create the proclamation.
We need to think about that.
>> We need to think about virtual meetings.
>> You have to synchronize the experience.
I would love to have the video to share.
We are not using the MC.
>> That is where this could go.
>> Once we get the ^[$-1òô you know, the peer to peer, to get the content there, we need to understand what we want to do with it, and it is social net working in terms of realtime experience, with hiccup groups, sorry.
>> End of speech.
>> Thank you.
>> When did we start this?
>> < Speaker/Audio Not Clear >
>> Thank you.
>> < Speaker/Audio unclear due to strong accent >
>> Well, I think you are printing out a real problem.
We are not going to find one format that is going to suit everybody for all of the hardware in the significant future.
>> We are starting a baseline that everybody can support.
>> There are other requirements for legal and other reasons, we can't stop that, but we have to make sure there is a channel for everyone to use.
>> We have to make sure we have to ^[$-1òô I have 100òñdollar laptop with me and it can't do the most modern decoders, but it can show video and real video, and we want those kids to publish the videos.
>> We can't rely on some patented format here.
>> I think WCC has a role to ^[$-1òô W3C has a role to see if it is encumbered by the patent so to say.
We really need that format.
>> Thank you.
>> We have one more question.
>> Danny White at ^[$-1òô < Speaker/Audio Not Clear >
I am curious what the panel thinks about the ^[$-1òô < Speaker/Audio Not Clear > on the video content.
I think everybody knows that there are elaborate envisioned teams and that is for the description and the rights management.
Those are the enforcement codes.
It seems to me that the contrast and complex technical development, the trends seem to be that the content providers are to let it just slide out of there.
The video are not the same as the audio and other type of contents are on the spectrum or the entry.
But I am wondering in order to do that successfully we have to take that on complexity with the DRM writing description on the format or the simpler model that is emerged a I think that is more of a look on the view.
>> I think that is right to pointing to the music industry.
>> I don't think W3C should do anything with DRM.
We know they are going to be out there and let them do it if they have to.
We have the user name and password method on the web currently.
>> Perhaps not all, but I think for the kind of stuff that the web community that wants to be published we have everything that we need we don't need the letters.
>> Speaker/audio unclear due to audio feedback.
>> As to where that is going to go in the future that is anyone's call at this point.
I take the analogy and that is a lot more invested there.
Also there is an arm raised for the quick time and the windows media, that has scared them off also because they have been exposed to those paths and architectures.
There may be a separation between the standard policy and management issues.
>> I think it is an open problem right now.
>> I think we should stop here and I would like to thank the panelist for coming today.
>> Hello everybody.
>> So when I was put on the program committee I had to pick the best that had the least amount of work.
>> I picked ^[$-1òô < Speaker/Audio unclear due to strong accent >