Transcoding Server Functions

Gregg Vanderheiden 3/28/2003

Recognition of any text images (and conversion to Alt text or to text)
Ability to recognize structure from formatting
Ability to take a image of a document and change it into marked up HTML
Ability to recognize pie charts and other stereotypic graphics and re-render them
The ability to text over the top of a picture and separate it into the text and the picture so that it could knock the graphic down eight levels of transparency so that the text stood out boldly against what appeared to be a very light watermark (on request)
Side services which are not actually transcoding, but are assistance services. For example, a side service might allow you to send in an audio video program and have it come back with synchronized file where the speech had been "recognized" and changed into captions, etc. The output would not be useful to users, but could be something that might be easily edited into something which was a captioned video. Note that a large part of the problem in captioning is getting all the synchronization, etc., not just capturing the words. In fact, the tool might actually require that the text be transcribed first and it would simply synchronize them. Although that yields the problem of identifying speakers, which would then have to be done after that (or there would be too many words in the synchronization file).
Language transcoding
- Adding of vowels or other marks necessary for screen readers to be able to easily handle
- The identification of language and the adding of language tags
- The identification of common foreign phrases which could then be marked
The ability to perform language level shifting so that complex text could be re-presented in simpler forms
Possibly the addition of "forced pronunciation" of foreign phrases in the natural language of the page for users who don't have a synthesizer that could handle the foreign phrases even if marked
The addition of topic icons which are meaningful to the user if the server had both the user's icon set and was able to determine the concepts from the content. However, I suspect that this one will yield false clues as much as useful information.
The ability to create visualizations of data (e.g. charts, graphs, or tables – on request).
The ability to detect and fix seizure-inducing content. This could include both the ability to remove material which would rapid flash and to also detect and break visual patterns known to cause seizures.
The creation of a logical, linear order for complex pages.
The relaying of a page into a more consistent fashion for users (e.g. putting title bars in the same location or marking them with the same jump commands or repositioning submit buttons or labeling submit buttons, etc., although any of these need to not create problems)
In all cases, if the page doesn't make sense or doesn't seem to work, there should be a way to tell the machine to be less smart and less helpful and present the page increasingly more like it was originally presented (using modules submitted by manufacturers).
Translate proprietary formats into more standard accessible formats. This could include document formats, text formats, graphic formats, etc. (This would allow companies to introduce some new format that had particular advantages while still having that format be instantly accessible).
Relay a page removing interactive material and substituting with the static material.
Translating between accessible multimedia formats
- This would make it easier for people who didn't have all of the different players
- It would also allow vendors to introduce new multimedia formats, even if the users didn't all yet have accessible players, or didn't have accessible players for their particular agents or operating systems
Event handler conversion (e.g. converts mouse in and mouse out to focus in and focus out) (Note: for this to work, there can't be other focus in/focus outs which would collide)
To allow companies to submit pages to see whether or not there were collisions or whether or not the servers could handle them
To allow archival and legacy materials to be made accessible without having to have each document called up and converted (especially since it may be illegal to actually make changes to legacy documents).
Automatic abbreviation, acronym ambiguous or unusual or invented word identification tagging and linking
Using "cascading" glossaries – the system could automatically take pages and identify words which were not in the standard dictionary, tag them, and then attach the proper expansion or definition of the terms based upon the set of cascading glossaries that could be attached to the page, to the site, or to the domain of discussion, etc. by either the source of the page, by others, or by intelligent algorithm. Such an approach would need to specify the uncertainty for the definition if it was not specified by the source.
Providing a non-secure, but accessible view of a digital secure document. For example, a digitally signed secure document could be sent into the server and what would come out would be the same digitally signed secure document attached to a accessible, readable version of it. The individual who is blind would still have the visible, signed, secure version, but they would also have a version which "was read to them" or was "transcribed for them" by the public transcribing server (which generally has no reason to lie to the user, but if it did, it would be caught by anyone cited when looking at the original document). Since the public transcoding server could actually be running in any place including an association for individuals who are blind or even the person's own computer, if it were powerful enough, the reliability of the transcoding server could be as reliable or more reliable than having the individual who is blind rely on somebody reading something to them.
Another possible function would be the re-codification of material. For example, a form in a proprietary format might be translated into HTML and forwarded on to the user. The user may fill it out and send it back to the server which, if the original manufacturer created such a module, might be translatable back into the original proprietary format for submission back to the source.