17463 – Make video always focusable and interactive content

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17463 - Make video always focusable and interactive content

Summary: Make video always focusable and interactive content

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 enhancement
Target Milestone:	CR
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	a11y, media

Depends on:
Blocks:

Reported:	2012-06-11 19:13 UTC by Devarshi Pant
Modified:	2012-06-22 01:43 UTC (History)
CC List:	9 users (show)

See Also:

Attachments

Description Devarshi Pant 2012-06-11 19:13:48 UTC

Can there be attributes like 'sign' and 'caption'? Right now, when a screen reader lists multimedia on a page, it seems to detect the alt text of the embedded player, or the title, with an expectation that a user tabs to different controls within the player to consume the content -- this issue gets amplified with multiple media content in one page. 

-Devarshi

Comment 1 Silvia Pfeiffer 2012-06-12 03:14:36 UTC

What are you trying to achieve with 'sign' or 'caption' attributes? They won't help you in exposing a video or audio element better to a screenreader.

Right now, indeed, the way to consume video is through its controls.

==

In a related note: I've been thinking about video interaction for a bit and I think we have some problems in our spec.

Firstly, the "video" element is not regarded as "interactive content" when there are no @controls, see http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#interactive-content .

It would be better to assume that video is always interactive content. Then, when there are no controls, we would have click interaction on the poster frame for play/pause. This is the most fundamental interaction that we need for these elements and it should always be possible.

Secondly, as a consequence of making video interactive, we also need it to be tabfocussable, which it currently isn't:
http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#focus-management .

Thus, the interaction I am after would be to be able to find the video element during normal tabbing and be able to hit "enter" to toggle between play/pause.

Would that satisfy your use case?

Comment 2 Devarshi Pant 2012-06-12 11:51:39 UTC

This is what I have in mind: Users who interrogate pages using commands like the links list (say, on YouTube), will get something like, 'Video Name XYZ 1, Signing available, Captions not available, Transcript available'; 'Video Name XYZ 2, Signing Not available, Captions not available, Transcript not available' etc.
I would like to know if such attributes can be exposed for video. Could AT vendors use these values?
Can't it expose itself like the title attribute? The only difference here would be that unlike the title, which a user can suppress, this use case will always pass on the information. For example, on a control like an edit box, a screen reader will always voice 'Edit' regardless of whether a label is present or not; or when there is a read only field, it will say 'Unavailable.' 
Coming to your proposed use case, yes, making the video tabfocusable would work, but isn't that already happening? Wouldn’t it help when assistive technology could announce / convey information regarding these attributes (discussed above) when the video has keyboard focus?

Thanks,
Devarshi

(In reply to comment #1)
> What are you trying to achieve with 'sign' or 'caption' attributes? They won't
> help you in exposing a video or audio element better to a screenreader.
> Right now, indeed, the way to consume video is through its controls.
> ==
> In a related note: I've been thinking about video interaction for a bit and I
> think we have some problems in our spec.
> Firstly, the "video" element is not regarded as "interactive content" when
> there are no @controls, see
> http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#interactive-content
> .
> It would be better to assume that video is always interactive content. Then,
> when there are no controls, we would have click interaction on the poster frame
> for play/pause. This is the most fundamental interaction that we need for these
> elements and it should always be possible.
> Secondly, as a consequence of making video interactive, we also need it to be
> tabfocussable, which it currently isn't:
> http://www.whatwg.org/specs/web-apps/current-work/multipage/editing.html#focus-management
> .
> Thus, the interaction I am after would be to be able to find the video element
> during normal tabbing and be able to hit "enter" to toggle between play/pause.
> Would that satisfy your use case?

Comment 3 Silvia Pfeiffer 2012-06-12 12:43:19 UTC

(In reply to comment #2)
> This is what I have in mind: Users who interrogate pages using commands like
> the links list (say, on YouTube), will get something like, 'Video Name XYZ 1,
> Signing available, Captions not available, Transcript available'; 'Video Name
> XYZ 2, Signing Not available, Captions not available, Transcript not available'
> etc.
> I would like to know if such attributes can be exposed for video. Could AT
> vendors use these values?

We already have markup for captions, video descriptions, synchronized video like sign language video, and synchronized audio like audio descriptions. We are currently solving the transcript use case (see bug 12964). All of these will be machine discoverable, so a screen reader does indeed have all functionality available from HTML for making the announcements that you are asking for.

 
> Coming to your proposed use case, yes, making the video tabfocusable would
> work, but isn't that already happening? Wouldn’t it help when assistive
> technology could announce / convey information regarding these attributes
> (discussed above) when the video has keyboard focus?

It is possible to focus on the video element with AT, i.e. your screenreader may choose to focus on video and audio elements. However, my last test has shown that neither VoiceOver nor ChromeVox provide focus on audio and video elements. And neither does the browser in their default tabfocus. I had to put explicit @tabindex attributes onto the media elements to allow them to receive focus.

As the video is specified right now, it is not tabfocusable and only interactive when it has controls. Thus, the interaction that you are after with announcing information when the video has keyboard focus can't be achieved.

I would like to refocus this bug on this problem if it's ok, seen as the information that you are after is already available through text tracks and synchronized media resources.

Comment 4 Devarshi Pant 2012-06-12 12:58:42 UTC

> I would like to refocus this bug on this problem if it's ok, seen as the
> information that you are after is already available through text tracks and
> synchronized media resources.

Yes, it is ok with me. 

(In reply to comment #3)
> (In reply to comment #2)
> > This is what I have in mind: Users who interrogate pages using commands like
> > the links list (say, on YouTube), will get something like, 'Video Name XYZ 1,
> > Signing available, Captions not available, Transcript available'; 'Video Name
> > XYZ 2, Signing Not available, Captions not available, Transcript not available'
> > etc.
> > I would like to know if such attributes can be exposed for video. Could AT
> > vendors use these values?
> We already have markup for captions, video descriptions, synchronized video
> like sign language video, and synchronized audio like audio descriptions. We
> are currently solving the transcript use case (see bug 12964). All of these
> will be machine discoverable, so a screen reader does indeed have all
> functionality available from HTML for making the announcements that you are
> asking for.
> > Coming to your proposed use case, yes, making the video tabfocusable would
> > work, but isn't that already happening? Wouldn’t it help when assistive
> > technology could announce / convey information regarding these attributes
> > (discussed above) when the video has keyboard focus?
> It is possible to focus on the video element with AT, i.e. your screenreader
> may choose to focus on video and audio elements. However, my last test has
> shown that neither VoiceOver nor ChromeVox provide focus on audio and video
> elements. And neither does the browser in their default tabfocus. I had to put
> explicit @tabindex attributes onto the media elements to allow them to receive
> focus.
> As the video is specified right now, it is not tabfocusable and only
> interactive when it has controls. Thus, the interaction that you are after with
> announcing information when the video has keyboard focus can't be achieved.
> I would like to refocus this bug on this problem if it's ok, seen as the
> information that you are after is already available through text tracks and
> synchronized media resources.

Comment 5 Silvia Pfeiffer 2012-06-12 13:03:40 UTC

re-assigning to HTML WG

Comment 6 Maciej Stachowiak 2012-06-21 16:55:56 UTC

In Safari on Mac OS X, we choose what is tab focusable based on OS conventions, where not all controls are in the tab focus cycle by default. For Mac UI, video should not be tab-focusable unless the user has enabled the special tab-to-everything mode. I think the spec should continue to allow this behavior. I think it's fine to also allow the other behavior, where video is focusable by default. I think the spec actually already allows both behaviors when it says: "An element is focusable if the user agent's default behavior allows it to be focusable". As with any other control, the default focusability should be up to the UA.

Comment 7 Silvia Pfeiffer 2012-06-22 01:43:18 UTC

I'll close this bug, since it seems that most of the issues I am having are with implementations of browsers and screen readers.