This page collects some thoughts on XML and links to some software. It dates from 1997 and is not currently maintained.
This variant of XML is based on the XML draft and documents written in it look very similar to those written in the language of the draft. But there are a few important differences. The goals are similar to those of XML, but I want to stress the following:
I'm thinking of adding another goal: it must have an associated machine-readable format for expressing restrictions to the format. This set of restrictions (similar to the `DTD' of SGML) allows generic tools to be written that can check the suitability of an XML file for a particular application. Maybe this format should itself be an application of XML.
Some examples of XML files are available on a separate page. The program packages below also include a few test files. The data model of XML is described in `the XML data model.' There are also some thoughts on transporting the contents of databases with XML.
Here are some examples of programs that process (simple) XML. All Java software is in xmllink.zip. The documentation is made with javadoc. The software is in three packages: parser, tree and xptr. Included are a few test programs:
The zip file contains both the source and the class files (compiled with JDK 1.1; you'll need to recompile for JDK 1.0). If you have a CLASSPATH variable, the zip-file can be added to it directly. For example under Unix, Bourne shell:
CLASSPATH=$CLASSPATH:xmltest.zip java xmltest <some-XML-file> java xmlpipe <some-XML-file>
(If you don't have a CLASSPATH variable or the above doesn't work, you might try unzipping the file, or ask a local guru.)
A Bison/Lex parser in C is also available. See the separate description. It shows a XML parser (core syntax only, no linking, no validation) in just 13 productions and 12 tokens.
xmlbyhand (with documentation) is a (non-validating) XML parser written in Java. It stores the parse tree in memory. The current main program just dumps the parse tree again, in XML format. (The program can read its own output.) The program may be useful as a `normalizer', but the intention is really to provide some Java code that can be used in other programs. [This program is `old', but still useful if you want to see a parser that is not machine-generated.]
unix2coll is a small AWK script that takes a Unix-style database (one record per line, fields separated by a separator character) and outputs a "Web-collection". Web-collections will probably use XML syntax, but the precise form is not yet decided. This is just one of the possibilities, and probably not the best.
coll2unix is an AWK script that does the opposite. It is meant to be used in a pipe after xmlpipe, and it converts a Web-collection back into a table. Its arguments are the table to extract (called `profile') and the field names to put into that table. An example shows how xmlpipe, unix2coll and coll2unix work together.
The XML parsers above are very simple. They don't validate the input, and they don't try to resolve a reference to a DTD. They rely on the well-formedness of the input.
This is a variant of the Java-based parser above which may be more suitable for certain kinds of XML data. It accepts the subset of XML 1.0 defined below, and interprets certain constructs before passing the data on. The sources are in a zip file.
This is the grammar (compare the file Parser.ll1 in the zip file):
document : [ NEWLINE | misc ]* [ doctypedecl [ NEWLINE | misc ]* ]? [ element [ NEWLINE | misc ]* ]+ ; misc : COMMENT | PI | xmlinstruction ; xmlinstruction : XML [ NAME [ %if (key.equals("version")) EQ LITERAL | %if (key.equals("encoding")) EQ qencoding | %if (key.equals("default")) defaultinfo ] ]* ENDPI | NAMESPACE attribute* ENDPI ; doctypedecl : DOCTYPE NAME extid GT ; attribute : NAME [ EQ LITERAL ]? ; etag : [ ETAGO NAME? GT | ETAG ] ; content : [ element | PCDATA | NEWLINE | ms | misc ]* ; element : LT NAME attribute* [ GT content etag | EMPTY ] ; extid : LITERAL ; ms : MSSTART MSDATA MSEND ; qencoding : LITERAL ; quotedpairs : LITERAL ; defaultinfo : NAME [ NAME EQ LITERAL ]* ;