This is a DRAFT resource that supports Working Drafts of WCAG 3. Content in this resource is not mature and should not be considered authoritative. It may be changed, replaced or removed at any time.


🔙 WCAG 3.0 (Silver) Guidelines (Captions)




Captioning in Mixed Reality relies on two related concepts:

  1. The ability to understand the purpose of a sound within an environment
  2. The ability to locate that sound within 3D space

The combination of these concepts soften the boundaries between captioning and audio description as both elements are required within XR captioning. Understanding sound purpose is most likely to be achieved through captioning. Locating sound within space is most likely to be achieved through audio description.


Captioning provides an alternative method for users to understand auditory content. There are many reasons why people may prefer to use captions as an alternative to traditional audio or as a method to supplement audio content.

Who it helps

  • Usage Without Vision - People that do not have access to visual information will require to have information presented in an alternative method. For XR captioning this should take into consideration the text that is being output and the orientation of sound location. In addition, any meta-information that is attached to textual content (e.g. the name of the person speaking) should be made available.
  • Usage with Limited Vision - People with limited vision may have similar requirements to those without vision and all of the items mentioned previously should be considered. In addition to this, screen magnification users may need to carry out additional customization options that relate to the size of textual content and meta-information.
  • Usage without Perception of Color - People that have atypical perceptions of colour may need to customise the presentation of captions within an XR environment. The presentation of captions in XR should take into account the real/virtual world that the user is interacting with and should make sure that text remains legible in regard to contrast with the background. Care should also be taken to ensure that any meta-information relating to the directionality of sound (e.g. radar plots, directional arrows) also takes color contrast issues into consideration.
  • Usage without Hearing - People that use XR without hearing will require auditory information to be translated into an alternative format. Auditory information can include, but is not limited to, speech and key sound effects. In addition, the directionality of any sound will also have to be communicated to the user with this taking into consideration sound that takes place outside of the current view screen. The exact format that auditory information is translated into is not confined to captions. People may have a preference for signing of text alternatives or equivalents
  • Usage with Limited Hearing - People with limited hearing may have some of the needs that are described when using XR captions without hearing. In addition to this alternative customisation options relating to sound direction may be required.
  • Usage with Limited Manipulation or Strength - People with limited manipulation or strength may want to interact with content in an immersive environment that doesn't require particular bodily movement. These interactions can include captioning services where the timings for interactions may need to be modified or extended. In addition users of assistive technology may want to identify locations, objects, and interact with immersive environments.
  • Usage with Limited Reach - People with limited reach may have similar user needs to people with limited manipulation or strength so these should be considered.
  • Minimise Photosensitive Seizure Triggers - In order to minimise photosensitive seizure triggers, people may need to personalise the immersive environment in various ways. This can include personalisation of XR captions which should take into consideration methods that can reduce photosensitive seizures.
  • Usage with Limited Cognition - People with limited cognition may require to change the speed at which they travel through an immersive environment. The timing of captioned content should take this into consideration. Personalisation of captioned services may be required in order to assist in creating an accessible immersive environment.


To be done:

  • Different ways that the research has said to implement it.
  • Explain the difference between what is done in the engine and what is done by the content developer.
  • Research needed on how to mark caption file in order to take into consideration the location of sound source.
Back to Top