Class XMLParser

java.lang.Object
   |
   +----XMLTokenizer
           |
           +----XMLParser

public class XMLParser
extends XMLTokenizer

Parse an XML file and construct a tree. At certain points, callback functions are called. An application needs to implement the XMLParserUser interface, which declares the functions the XMLParser object will call. At the moment, the only callbacks are for errors.

Grammar differences

The grammar used in this parser is different from the one in the first XML draft. This is the grammar (lowercase = nonterminal, uppercase = terminal):

 document: prolog element misc*
 prolog: encodingdecl? misc* [doctypedecl misc*]? [dtdsummary misc*]?
 misc: COMMENT | PI
 doctypedecl: DOCTYPE NAME extid? GT
 attribute: NAME EQ LITERAL
 etag: ETAGO NAME GT
 content: [element | PCDATA | ms | PI | COMMENT]*
 element: LT NAME attribute* [GT content etag | EMPTY]
 dtdsummary: [idinfo | defaultinfo]+
 encodingdecl: ENCODING EQ qencoding ENDPI
 extid: LITERAL
 ms: MSSTART MSDATA MSEND
 qencoding: LITERAL
 idinfo: IDINFO NAME EQ quotedpairs NAME EQ quotedpairs ENDPI
 quotedpairs: LITERAL
 defaultinfo: DEFAULT NAME [NAME EQ LITERAL]* ENDPI

Some of the differences are:

White space is handled at the lexical level (see XMLtokenizer)
No internal DTD subset
No `required markup declaration;' parsers should never need the DTD in parsing, only in generating.
The DTD summary is reduced to ID info and default info (ID info is not used by this parser, but it would be easy to add).

Entities other than character entities are not permitted. Character entities are handled invisibly by the tokenizer and are not reported to the parser.

Character entities are permitted in element content.

Validation

The parser doesn't validate.

Also, error messages are not the most helpful. This is a hand-generated parser, so it was easier to only use insert() and not delete(). Some tool should be used to generate the director sets for resynchronizing after a syntax error.

See Also:: XMLTokenizer

XMLParser(InputStream, XMLParserUser, int[], XMLNode[]): Construct a new XMLParser object, giving an XMLStreamTokenizer to read from and an object that implements XMLParserUser.
XMLParser(InputStream, XMLParserUser, String, int[], XMLNode[]): Construct a new XMLParse object, giving an XMLStreamTokenizer to read from and an object that implements XMLParserUser.

XMLParser

 public XMLParser(InputStream aStream,
                  XMLParserUser aUser,
                  int nrerrors[],
                  XMLNode tree[]) throws IOException, UnknownEncoding

Construct a new XMLParser object, giving an XMLStreamTokenizer to read from and an object that implements XMLParserUser.

Parameters:: aStream - a byte stream; aUser - an object that implements the callbacks
Throws: UnknownEncoding: if the encoding isn't either UTF8 or ISO8859-1

XMLParser

 public XMLParser(InputStream aStream,
                  XMLParserUser aUser,
                  String encoding,
                  int nrerrors[],
                  XMLNode tree[]) throws IOException, UnknownEncoding

Construct a new XMLParse object, giving an XMLStreamTokenizer to read from and an object that implements XMLParserUser. Also set the default encoding of the input stream,

Parameters:: aStream - a byte stream; aUser - an object that implements the callbacks; encoding - a string such as "UTF8", "ISO8859-1", etc.; nrerrors - an output parameter for the number of errors; tree - an output parameter for the XML tree
Throws: UnknownEncoding: if the encoding isn't either UTF8 or ISO8859-1