This extend abstract is a contribution to the Online Symposium on Mobile Accessibility. The contents of this paper has not been developed by W3C Web Accessibility Initiative (WAI) and does not necessarily represent the consensus view of its membership.
The use of videos and audio on smart phones and other mobile devices has been made possible with speedier connections and the availability of streaming media. However, media players tend to lock out any consideration for interactions other than those directly related to the playing of the media itself. The user can download, play, pause, stop and go back to listen again, but is unable to make notes or mark sections that they may wish to return to at a later date or search the transcript and replay the media from that point. These features could be extremely useful as memory aids for many users in education and workplace including those with hearing impairments, dyslexia and other learning difficulties as well as for those for whom English is not their first language.
Research in both educational settings and workplace settings has illustrated the increase in the use of mobile technology to access data as well as general communication. An Open University's Institute of Educational Technology survey  showed that 56% of those interviewed were happy to read the screen and Gartner predicted that by 2013 web access through smart phones will exceed web access through laptops . Mobile phones were identified in the Horizon Reports   as the technologies with the highest likelihood of entry into the mainstream of learning-focused institutions within the next year. The fastest-growing sales segment belongs to smart phones. A small survey of 220 disabled students carried out in Japan the UK and USA in 2010  showed that across the nations smart phones were widely used (83%) with the iPhone (20%), Samsung (14%) and Nokia (13%) being most popular. Although these were considered accessible in many ways students still highlighted difficulties such as learning new features (29.20%), understanding the menus (18.23%), reading text and spelling (21.33%), seeing things on a screen (17.00%), seeing any part of the phone (5.37%), using buttons (30.73%), hearing sounds(15.30%), having a conversation (20.47%) and remembering items over time (22.73%). Considering the difficulties encountered by users it was the memory aid aspect that seemed to cross disability boundaries. However, there is the problem of working in one application or area on a mobile phone and switching to another in order to remember an item or to make additional notes. Students mentioned their use of the camera for capturing ideas and audio and video to aid learning. When using a computer it is possible to watch a video and take notes or to collaborate with others to caption or annotate the video. This has been found to be particularly helpful for not only deaf students but also for students who attend lectures where a discussion has been recorded or videoed and can be reviewed for revision purposes. This type of system is also used extensively in training situations and could even be considered useful for home videos where an alternative format may be required.
The use of mobile technologies as a memory aid with the capture of videos and audio could provide an enhanced interactive teaching and learning experience if it was possible to interact with these learning resources in a flexible way. Online video services, such as YouTube, are used extensively for teaching and learning but making synchronised annotations is limited to only the creator of the video (such as YouTube), or this service is not widely available for mobile devices. This service would be further enchanced with the use of note taking as well as annotation in mobile collaborative settings. Individuals could not only transcribe content but could also make time stamped annotations that would offer a swift return to the data at a later date. It could also provide users with easy search and tagging features for a personal store of multimedia clips where precise sections within an audio file or video can be reached within seconds.
The idea of synchronised annotations and transcriptions alongside a video or audio file on a mobile device as an integrated application that would work within a web browser and media player has proved to be almost impossible. The user who surfs the web to find a video or audio file soon discovers that once the file is downloaded the media player mobile device is activated and no further interaction between the web page or any other application can take place. The media can be played, paused, stopped and even replayed but it cannot be annotated or transcribed whilst these actions are taking place. There are occasions when the media can be exported or linked to another program such as EverNote or an e-mail client, but in both cases there is no way to review or edit any notes whilst watching or listening to the file. It appears that with mobile devices the built in player applications are required to present media files even when using the HTML 5 code to embed the files within webpages. The mobile access to the Web usually is limited by the bandwidth. So it is very important for the application to download the specific fragment of the video or audio file, which the synchronised annotations are directly related to. To make this happen, both client side (the media player) and the server side (multimedia delivery server) must cooperate with each other under the Web architecture. The term 'media fragment' refers to the inside content of multimedia resources and W3C and other standards bodies had specifications, such as SMIL, SVG, MPEG-7, etc. and have come up with the specification of media fragments  and the draft implementation recommendations  but currently, the support for media fragments is still very limited on both client and server side.
Having researched the situation thoroughly and looked at the options it was decided that although it would have been easier to have made a specific device dependent app that would allow for playing of video and audio files with edit features, an HTML 5 solution would work on all mobile browsers and platforms and make full use of the mature Web architecture and save the cost of repeating developments. Solutions lie in the mode of user's interaction with mobile devices with tabs or swiping between pages to display different annotations.
There are still many future challenges in terms of responsive design for audio and video annotation systems on mobiles, multimedia resources manipulation over APIs in mobile browsers, HCI and accessibility for annotation making on mobile devices.