Transcoding Server Functions
Gregg Vanderheiden 3/28/2003
- Recognition of any text images (and conversion to Alt text or to text)
- Ability to recognize structure from formatting
- Ability to take a image of a document and change it into marked up HTML
- Ability to recognize pie charts and other stereotypic graphics and re-render
them
- The ability to text over the top of a picture and separate it into the
text and the picture so that it could knock the graphic down eight levels
of transparency so that the text stood out boldly against what appeared to
be a very light watermark (on request)
- Side services which are not actually transcoding, but are assistance services.
For example, a side service might allow you to send in an audio video program
and have it come back with synchronized file where the speech had been "recognized"
and changed into captions, etc. The output would not be useful to users, but
could be something that might be easily edited into something which was a
captioned video. Note that a large part of the problem in captioning is getting
all the synchronization, etc., not just capturing the words. In fact, the
tool might actually require that the text be transcribed first and it would
simply synchronize them. Although that yields the problem of identifying speakers,
which would then have to be done after that (or there would be too many words
in the synchronization file).
- Language transcoding
- Adding of vowels or other marks necessary for screen readers to be
able to easily handle
- The identification of language and the adding of language tags
- The identification of common foreign phrases which could then be marked
- The ability to perform language level shifting so that complex text could
be re-presented in simpler forms
- Possibly the addition of "forced pronunciation" of foreign phrases in the
natural language of the page for users who don't have a synthesizer that could
handle the foreign phrases even if marked
- The addition of topic icons which are meaningful to the user if the server
had both the user's icon set and was able to determine the concepts from the
content. However, I suspect that this one will yield false clues as much as
useful information.
- The ability to create visualizations of data (e.g. charts, graphs, or tables
– on request).
- The ability to detect and fix seizure-inducing content. This could include
both the ability to remove material which would rapid flash and to also detect
and break visual patterns known to cause seizures.
- The creation of a logical, linear order for complex pages.
- The relaying of a page into a more consistent fashion for users (e.g. putting
title bars in the same location or marking them with the same jump commands
or repositioning submit buttons or labeling submit buttons, etc., although
any of these need to not create problems)
- In all cases, if the page doesn't make sense or doesn't seem to work, there
should be a way to tell the machine to be less smart and less helpful and
present the page increasingly more like it was originally presented (using
modules submitted by manufacturers).
- Translate proprietary formats into more standard accessible formats. This
could include document formats, text formats, graphic formats, etc. (This
would allow companies to introduce some new format that had particular advantages
while still having that format be instantly accessible).
- Relay a page removing interactive material and substituting with the static
material.
- Translating between accessible multimedia formats
- This would make it easier for people who didn't have all of the different
players
- It would also allow vendors to introduce new multimedia formats, even
if the users didn't all yet have accessible players, or didn't have accessible
players for their particular agents or operating systems
- Event handler conversion (e.g. converts mouse in and mouse out to focus
in and focus out) (Note: for this to work, there can't be other focus in/focus
outs which would collide)
- To allow companies to submit pages to see whether or not there were collisions
or whether or not the servers could handle them
- To allow archival and legacy materials to be made accessible without having
to have each document called up and converted (especially since it may be
illegal to actually make changes to legacy documents).
- Automatic abbreviation, acronym ambiguous or unusual or invented word identification
tagging and linking
- Using "cascading" glossaries – the system could automatically take pages
and identify words which were not in the standard dictionary, tag them, and
then attach the proper expansion or definition of the terms based upon the
set of cascading glossaries that could be attached to the page, to the site,
or to the domain of discussion, etc. by either the source of the page, by
others, or by intelligent algorithm. Such an approach would need to specify
the uncertainty for the definition if it was not specified by the source.
- Providing a non-secure, but accessible view of a digital secure document.
For example, a digitally signed secure document could be sent into the server
and what would come out would be the same digitally signed secure document
attached to a accessible, readable version of it. The individual who is blind
would still have the visible, signed, secure version, but they would also
have a version which "was read to them" or was "transcribed for them" by the
public transcribing server (which generally has no reason to lie to the user,
but if it did, it would be caught by anyone cited when looking at the original
document). Since the public transcoding server could actually be running in
any place including an association for individuals who are blind or even the
person's own computer, if it were powerful enough, the reliability of the
transcoding server could be as reliable or more reliable than having the individual
who is blind rely on somebody reading something to them.
- Another possible function would be the re-codification of material. For
example, a form in a proprietary format might be translated into HTML and
forwarded on to the user. The user may fill it out and send it back to the
server which, if the original manufacturer created such a module, might be
translatable back into the original proprietary format for submission back
to the source.