SVG Linearizer tools

Internship report and software download

Guillaume Lovet - July to August 2000 training period
Please send comments to Daniel Dardailler (dd@w3.org) and Guillaume Lovet (guillaume.lovet@supelec.fr)
Last udated 08/25/2000

Quick tour: Convert graphics into structured text using SVG and Metadata in RDF.

Introduction:

This summer 2000 project was about writing an SVG-to-text converter, more preciselly doing it in 3 different steps:

The development of an RDF vocabulary, allowing the description of an SVG document (pictures, schemas, graphics), in order to make the information carried by such a document accessible, regardless of the support at one disposal to exploit it (computer screen, speaker, tactil screen).
The development of a tool (written in Java) able to exploit such an RDF description, thus using the elements of the previous vocabulary. The results of the proceeding will have to be presented in a textual form, ready then to be exploited by various accessibility tools (for example, a vocal module or a "Braille" screen for people with visual impairment).
Eventually, the development of another tool (in Java again) implementing the edition process of an SVG document, in order to attach a description oriented to its accessibility. The user of such a tool will have few or no RDF notions, therefore the edition process will have to be as tranparent, simple and graphic as possible.

I. RDF Vocabulary

The review of various pictures (see the example pages in this dependency statement) which may be SVG documents "to be described" helped me to build a set of properties that should correctly fill the specified requests.

The RDF vocabulary is made of these 29 properties (or words) . Such a vocabulary forms the namespace axsvg (for Accessibity SVG), and has an associated RDF Schema. Most of these properties are 'by reference', ie in the RDF statement for which they are the predicate, the subject and the object are some 'entities' of the SVG document, identified by their 'id' attribute. For example, if the SVG document presents somewhere the following piece of code:

<g id="Roof">: ...svg code that draws a roof...
</g>

Then somewhere else:

<g id="House">

...svg code that draws a house...

</g>

In the RDF description, we could see the following piece of code:

<rdf:Description about="#Roof">

<axsvg:SitsOnTop resource="#House" />

</rdf:Description>

Which means, in English: The entity identified by 'Roof' sits on top of the entity identified by 'House'.

The vocabulary covers several categories of properties:

Structural properties
These properties are dedicated to the description of the document structure, in a conceptual way:
- Regroups
- IsConvergencePoint
- IsConnected
- IsPartOf
- PointsTo
- Links
- Contents
- IsFatherOf
- Has
- SitsOnTop
- HasOnTop
- IsGoingThrough
- IsLayeredOn
- HasForValue
- Associated

Geographic properties
Useful to situate relatively the SVG document elements, in order to have a better memory representation (for the user, not the computer) of the document.
- AtRight
- AtLeft
- IsBehind
- IsOver
- InFrontOf
- MaskedBy
- On
- Under

Graphic properties
Dedicated to the description of graphics with dots, curves, or 'cheese-like' graphics:
- HasDots
- HasCategories
- XLegend: by value, ie: the statement for which the property is the predicate has an object which is not an SVG entity of the document, pointed by the 'resource' attribute of the property, but rather the value of another attribute (such attributes will be listed here).
- YLegend: by value.
- Curve: by value.

Special
- Importance: by value.
  Useful to set a degree of importance for an SVG entity, relatively to the rest of the document. Thus, a tactil screen user will be able to choose to have only the more important entities to be rendered, ignoring the less significant details that could blur the low-resolution screen rendering.

Moreover, some attributes can be added to -theorically- each of these properties (the user who creates the description is responsible for the meaning of his statements. For example, a property 'by value' should logically have at least an attribute, which cannot be 'resource'):

value
range
coord
shape
max
min
start
end

Notes:

- The names of properties and attributes have been chosen as close as possible to their semantic content. The study of the Java tool implementing the presentation of the RDF code in text mode will enlight more precisely the semantic attached to each of these attributes or properties (see the RDF Schema and the class Sentence that proceeds to the translation property/attribute -> English sentence).

- For more elegance, lisibility, and an easier proceeding, we set a syntaxic rule for the RDF code redaction: In a statement, the object (when it's an SVG entity: 'by reference' case) must be pointed thanks to the attribute resource of the property. Thus, one must write:

<axsvg:InFrontOf resource="#ObjectId" />

And not:

<axsvg:InFrontOf parseType="resource">#ObjetId</axsvg:Propriete>

As a consequence, in the DOM tree image of the RDF document, the nodes should not have any children with type TEXT_NODE (these ones will be ignored when proceeding the DOM). Yet, they can have some children of type ELEMENT_NODE whose name will be rdf:Bag or rdf:li, when a property concerns several objects (multiple-statement).

Example:

<rdf:Description about="#Pool">

<axsvg:Regroups>

<rdf:Bag>

<rdf:li resource="#BigComputer" />

<rdf:li resource="#MidComputer" />

<rdf:li resource="#SmallComputer" />

</rdf:Bag>

</axsvg:Regroups>

...other properties...

</rdf:Description>

II. Java translator

As announced, such a tool will take an SVG file as a parameter, with its own RDF description within a <metadata> tag. Here is a typical example of an SVG document, that presents some structural concepts which may be described with the axsvg namespace properties: lan.svg (browsable version of the SVG markup, pixmap version as screen dump of Jackaroo)

Such a document can be rendered in graphic mode thanks to an SVG browser (for example: Jackaroo). It's the pale copy of a jpg picture taken from the example page previously quoted (we tried to reproduce the same and only structure which is important for our study, regardless of the pictural aesthetic).

The translator will proceed in two steps:

First of all, he has to parse the XML document (indeed, SVG is an XML application), and build the associated tree: the DOM, which will then contain some elements of both namespaces svg and rdf.
It will work on this tree in order to extract in a textual form the RDF description (associated with the one already contained in the svg part of the document, thanks to the <desc> element). In order to have a smart text, with consitent sentences, here again we had to impose a syntaxic rule that has to be respected when writing the SVG description of an entity (ie, the content of a <desc> node) : Such a descritpion should begin by a verb, as if the desc was coming just after the entity id. Ex:

<g id='computer'>

<desc>is the symbol of a laptop computer</desc>

...svg code that draws the computer...

</g>

The first task is practically done by a set of classes from Jackaroo, the SVG browser developed by the Koala team.

8 classes are in charge of the second task. Those classes are constituting the package 'axsvg':

Descriptor
The "boot" class which implements the main method of the program.
DocumentMixte
Implements the type of document the program will work on : a DOM tree image of an SVG document containing some metadatas coded with the RDF formalism.

Has some methods allowing the extraction of information about such a document (find the RDF root, find the SVG root, etc..) or the building of some objects that will be useful to the DOM exploitation (for example the hashtables which do the link between the RDF or SVG nodes and their id, in order to quickly reach the first ones, the last ones being the only data).
AElementSVG ("A" for "Accessibilty")
An instance of this classs will point to an Element node of the DOM, representing a full entity in the SVG code, thus having an id attribute. Most of the time, it will be an element of the form <g id="entityId">, for an entity is usually made of a set of several elements.

It implements some methods dedicated to the extraction of information in the SVG code : content of element <desc> or <title>
AElementRDF
An instance of this class will point to an RDF node, whose name is 'rdf:Description'. Such an instance allows then the potential access to the full RDF description relative to the SVG entity whose id is equal to the about attribute of the pointed RDF node.

Thus, this class implements some methods relative to the interpretation in text mode of such a description. To print the interpretation, it calls some methods of the Property class.
Property
An instance of this class will point to a property node, whose name should begin with 'axsvg:', within an RDF description. This class manages the impression of the elements of a given RDF statement (whose pointed property node is the predicate), thanks to specific methods called in the AElementRDF class algorithmes. To print the English sentence associated to the predicate (or the object if this last one is an attribute other than 'resource', ie if it's not another SVG entity), it calls some simple methods of the Sentence class.
Sentence
Implements the methods which proceed to the last step of the process: Those which do the translation property/attribute -> English sentences, and print these last ones (the possible properties and their semantic meaning are fetched in the RDF Schema of the axsvg namespace). The algorithm that manages the print order of these sentences is class AElementRDF business.
ListOfAttributeNames
The class holding the attributes that can be used in the axsvg namespace and their meaning in such a namespace. Somehow a temporary class, given that such a feature should be included in the RDF Schema soon, as well as for properties. However, it's still easy to add new attributes directly in this class code.
HtmlPrintStream
A custom made PrintStream which adds some html markups to the description printed, allowing such a description to be viewed with a browser. Only used when the -h option is activated.

The program requires jdk1.2.2 (or higher) to run correctly. Here is an executable jar file gathering all the required classes : axsvg.jar. You will also need to have the RDF Schema in the same folder: axsvg-schema.rdf.

To run the program, just type:

java -jar axsvg.jar file.svg

Where file.svg is the SVG file (with an embedded RDF description) to be described in textual mode. You can use for testing purpose lan.svg which can also be viewed with Jackaroo.

You may also use the -h option. In this case, the description is printed in HTML rather than in plain text:

java -jar axsvg.jar -h file.svg

Here's the output in HTML for lan.svg: lan.htm.

Remarks:

All the textual outputs are directed to the standard output channel. Therefore, one can redirect the description to a file on *nix systems with a command like:
java -jar axsvg.jar file.svg > text

Such a feature is quite useful when used along with the -h option:

java -jar axsvg.jar -h file.svg > text.html

Now, you can open the textual description with your favourite browser. I've made a little bash script (ie for *nix users only) which does everything itself (I tried to make it a little bit smart: it does not launch the browser if an error occured, etc...). Actually, I guess that it's the best way to use the axsvg package: Download the script called hdesc, and place it in a directory which is in your PATH environment variable (usually /usr/bin/ or /usr/local/bin/) or simply adds the directory where you placed it in your PATH. Now make a dir where you will place the two required files axsvg.jar and axsvg-schema.rdf (must be the same dir. By default, use ~/axsvg-files/). You still have a few settings to do:

Edit hdesc and set the dir where you placed the two required files (default: axsvg-files/ in your home dir), as well as the browser you want to use to view the description (default: amaya. One may want to set it to netscape, which is more common, but a lot slower).

Now you can type hdesc file.svg from wherever you want, and the description will be showed by the browser you've chosen.
You may experience some problems due to the SAX parser I used to build the DOM. For example, a 'org.xml.sax.SAXParseException: File "SVG-20000202.dtd" not found' error may occur when trying to proceed some svg files (typically, those from the jackaroo samples folder). It means that the file has a <!DOCTYPE svg SYSTEM "SVG-20000202.dtd"> tag, which forces the parser to look for this dtd file in the local directory. You can either remove this tag (but jackaroo may not be able to render the file anymore), or set it to:
<!DOCTYPE svg SYSTEM "http://www.w3.org/TR/2000/03/WD-SVG-20000303/DTD/svg-20000 303-stylable.dtd" >

III. Java Editor

Such a tool should allow the edition of metadatas within SVG files, using the properties of the axsvg namespace defined in the RDF Schema axsvg-schema.rdf. It has a Graphic User Interface (using the java swing package), which is supposed to be as convenient as possible. Eventually, it should run on any platform which implements a java virtual machine (reported to work on Solaris, Linux, Windows 98, NT 5).

Starting the editor
The editor requires only two files to run correctly:
- The executable jar file itself, called edition.jar
- The RDF Schema of the axsvg namespace: axsvg-schema.rdf
Of course, for it is a java program, a Java virtual machine -including the swing package, such as jdk.1.2.2 or higher- is also required.

Now don't bother with your classpath, just let the two files (edition.jar and axsvg-schema.rdf) in the same folder and type in:

java -jar edition.jar

Or:

java -jar edition.jar file.svg

if you want an svg file to be directly opened...

Remark: Windows users can also double click on the edition.jar icon.
The Graphic User Interface
Here is a screenshot of the GUI.

As one may see on the screenshot, the interface is basically made of 4 areas and a File menu.
- The File menu
  Allows the user to open an SVG file to be edited or to save the currently processed SVG file.
- The text area
  Any output is directed to this area. It can be an error message, or more likely the result of a user command, such as 'check statement' or 'check whole desc'.
- The statement boxes
  An area made of three comboboxes, each of these representing one of the three elements of an RDF statement: The subject, the predicate, and the object.
  
  When an svg file is opened (with the file menu), the subject and object comboboxes are loaded with all the svg entities of the file, that is to say with all the existing id attributes in the document. The predicate combobox is loaded with the namespace axsvg properties, as read from the axsvg-schema.rdf file.
  
  The user can select the entities and a property in the boxes in order to form the RDF statement he wants to add to the processed file.
- The control buttons
  Of course, a user who has few notions of RDF should be able to add a description of his svg file with the editor. Thus, the properties to be selecetd in the statement are semantically rich, and can be tested thanks to the 'check statement' button : A click on this button will output in the text area the meaning -in English- of the selected statement.
  
  The 'validate statement' button will effectively add the selected statement to the description of the processed file (ie it will add the corresponding node to this file's DOM), provided it does not exist already.
  
  On the other hand, if the selected statement already exists in the description, it may be removed with a click on the 'remove statement' button.
  
  Eventually, the whole description being edited can be checked with the 'Check whole desc' button: Click on it and such a description will be output in the text area exactly as the program axsvg.jar would do.
- The attribute table
  Such a table allows the user to add attributes to the selected statement (or to change them).
  
  Remark for the 'RDF aware people': Actually, in the RDF philosophy, saying that a statement has some attributes is a non-sense. Such an abuse comes from the DOM specification itself (implied by the XML one), where a node -representing a statement when applied to RDF- may have some attributes: then, this node does not represent one statement anymore, but as many statements as its number of attributes (plus one if it has a text node child), each one being the 'object' of one of these statements.
  
  Therefore, I should have said 'such a table allows the user to add attributes to the DOM Node which represents the selected statement, among other statements inducted by its attributes'.
  
  Warning: When 'check statement' is clicked, the effective attributes of the selected statement (which are blank if the statement does not exist already) are loaded into the attribute table, before the output of the corresponding meaning. If you want to test the effect of attributes you have added or changed, you'll have to 'validate statement' first, then 'check' it. Of course you can still remove it afterwards, or even only reset the attributes you don't want anymore (don't forget to revalidate it then). It may sound strange, but it's the more convenient way I found for the attributes edition.
The source code
The 5 classes that actually constitute the editor are gathered in a package called 'edition', child of the 'axsvg' package. By the way they use all the axsvg package classes. Here are the sources for these 5 classes:
- Editor
  The boot class of the editor. Sets up the main frame, and adds it some components such as the Statement area, the command buttons area (which includes the attribute table), the text area, and the file menu. It also implements the event listeners attached to the File menu.
- Xstatement
  Implements some methods that can set up the comboboxes of the statement and fill (or re-fill) them. Has the boxes and the processed document as protected attributes, which can be accessed from other classes of the package, allowing these ones to reach some critical datas about the DOM and the selected statement.
- StatementCommandButtons
  The methods which set up the control buttons and the attribute table (thru calls to the AttributeJTable class methods) are implemented here, as well as those which manage the events fired by clicks on the buttons.
- AttributeJTable
  Extends the JTable class, and implements the methods that set up the table, as well as few other ones useful to copy attributes to -or from- a given Node of the DOM.
- TextAreaPrintStream
  Extends the PrintStream class. Basically, it overrides the println(String) and print(String) methods, so that when System.out is set to this class (with System.setOut(PrintStream) method), all the outputs subsequent of a call to those two methods are redirected to the given JTextArea. In our case it's useful to output the whole description in the editor's text area, for a click on the 'check whole desc' button calls the axsvg.Descriptor.printWholeDesc() static method which uses System.out.print(String) and System.out.println(String) for all its outputs.
Comments and future extensions
As one may have already noticed, the editor only deals with the RDF code embedded in the processed edited file, within a <metadata> markup as specified in the SVG spec (NB: it creates such a markup if it does not exist yet). Unfortunatly, the svg code structure is critical, on account of it basically sets the possibilities of RDF statements to be formed.

For example, let's say that we have an svg file representing a house. Thinking in terms of Accessibilty, we would like to state that the roof sits on top of the walls, thanks to the axsvg property 'SitsOnTop'. If the svg code is not structured so that the roof is separated from the walls, with the appropriate id attributes (like <g id='roof'> markup parenting the roof svg code, and <g id='house'> the house one), we won't be able to add such a simple description with our editor...unless we cope with the svg code, manually or thanks to another appropriate editor which would allow us to deal with the svg structure in a convenient way. Yet, as I'm writing, I haven't heard of the existence of such an editor.

Of course, we could state that in a 'brave new world', all the SVG files should be well structured and commented, but until then, a good extension of the RDF editor would be to add functionalities to cope with the svg structure directly on the rendered image. This will be done by adapting the code of svg renderers which will efficiently implement event listeners attached to SVG nodes on the rendered image.

By that time, the editor tool presented here may be useful to:
- People who create their own svg files, provided they keep in mind to give them a logical structure with appropriate id attributes and <desc> markups (let's remind that if we want the decription output by axsvg.jar to make sense, all the <desc> markups of the SVG code must begin by a verb).
- People who don't fear to cope with the svg code of a document, in order to adapt it. (they may don't fear to cope with RDF code either, but obviously, using the editor goes faster than typing the markups).
- People who have already read a little (or even nothing) about RDF and just want to experience a practical approach of such an XML application.
At last, I will point that the rdf Schema (axsvg-schema.rdf) where the properties and their meanings are fetched by the descriptor/translator as well as by the editor is an external file (ie external to the runnable jar file). Thus, it can be updated easily, new properties being added or modified

Guillaume Lovet

Copyright © 2000 W3C (MIT, INRIA, Keio ), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.