Note that this section is not intended to be a complete
translation of XML Schema into DCTG, but a sample small
enough to follow and large enough to make a persuasive
case that all of XML Schema can be translated.
3.1. Introduction to DCTG notation
DCTG notation was invented to make it easier to calculate
values for grammatical attributes and to refer to the values
of grammatical attributes on other rules.
Here is a simple example. Consider the following
grammar for binary strings:
bit ::= '0'.
bit ::= '1'.
bitstring ::= '' /* nothing */.
bitstring ::= bit, bitstring.
number ::= bitstring, fraction.
fraction ::= '.', bitstring.
fraction ::= ''.
We might wish to calculate the length and the (unsigned base-two)
value of the bitstring as attributes. Using a yacc-like notation that
might look like this. Notice that
scale is a top-down
attribute and
value and
fractionalvalue
are bottom-up attributes.
bit ::= '0' { $0.bitvalue = 0; }.
bit ::= '1' { $0.bitvalue = power(2,$0.scale); }.
bitstring ::= '' {
$0.value = 0;
$0.length = 0;
/* scale doesn't matter here */
}.
bitstring ::= bit, bitstring {
$0.length = $2.length + 1;
$1.scale = $0.scale;
$2.scale = $0.scale - 1;
$0.value = $1.value + $2.value;
}.
number ::= bitstring, fraction {
$1.scale = $1.length - 1;
$0.value = $1.value + $2.fractionalvalue;
}.
fraction ::= '.', bitstring {
$2.scale = -1;
$0.fractionalvalue = $2.value;
}.
fraction ::= '' {
$0.fractionalvalue = 0;
}.
In DCTG notation, this grammar looks like this:[
9]
bit ::= [0]
<:> bitval(0,_).
bit ::= [1]
<:> bitval(V,Scale) ::- V is **(2,Scale).
bitstring ::= []
<:> length(0)
&& value(0,_).
bitstring ::= bit^^B, bitstring^^B1
<:> length(Length) ::-
B1 ^^ length(Length1),
Length is Length1 + 1
&& value(Value,ScaleB) ::-
B ^^ bitval(VB,ScaleB),
S1 is ScaleB - 1,
B1 ^^ value(V1,S1),
Value is VB + V1.
number ::= bitstring ^^ B, fraction ^^ F
<:> value(V) ::-
B ^^ length(Length),
S is Length-1,
B ^^ value(VB,S),
F ^^ fractional_value(VF),
V is VB + VF.
fraction ::= ['.'], bitstring ^^ B
<:> fractional_value(V) ::-
S is -1,
B ^^ value(V,S).
fraction ::= []
<:> fractional_value(0).
As may be seen, each rule in DCTG consists of a left hand side,
a right-hand side, and optionally a list of attributes.
The right-hand side is separated from the left-hand side
by the operator ::= and from the following
list of attributes (if any) by the
operator <:>.
The expression on the right hand side is a sequence of
non-terminal symbols, each optionally suffixed with the
operator ^^ and a variable name
(e.g. bitstring^^B).
Each attribute is identified by a Prolog structure,
e.g. value(V),
whose functor is the name of the attribute and whose arguments
are its values. The attribute structure may be followed by the operator
::- (designed to look a lot like the standard Prolog
operator :- or ‘neck’) and a series
of goals which will be satisfied when the attribute value is to
be instantiated. In practice, these goals help to calculate the
attribute values. Values of attributes attached to the
items on the right-hand side may be referred to by the variable
name associated with the item and the name of the attribute,
joined by the operator ^^,
as for example in B ^^ bitval(VB,ScaleB).
A partial EBNF grammar for DCTG notation[
10] is:
grammar ::= rule*
rule ::= lhs '::=' rhs ('<:>' att-spec ('&&' att-spec)*)?
lhs ::= term
rhs ::= term (',' term)*
attspec ::= compound-term ('::-' goal (',' goal)*)?
compound-term ::= ATOM '(' term (',' term)* ')'
3.2. Another example: the English grammar fragment E1
Here is another simple example: the trivial fragment of English
grammar given above, translated directly into DCTG notation, looks
like the following. The only difference from the DCG form is that the
separator between the left- and right-hand side of each production is
“
::=” instead of “
-->”.
%%% E1 (trivial context-free grammar for a fragment of English)
%%% in DCTG notation.
s ::= np, vp.
np ::= det, n.
np ::= n.
vp ::= v, np.
vp ::= v.
n ::= [mary].
n ::= [john].
n ::= [woman].
n ::= [man].
n ::= [apple].
det ::= [the].
v ::= [loves].
v ::= [eats].
v ::= [sings].
The translation into standard Prolog is similar to that
used for DCGs, but instead of adding two arguments to each
non-terminal, the DCTG translation adds three. The first additional
argument is a Prolog structure with the functor node,
described further below. The second and third are (like the
two arguments added in DCG translations) difference lists.
The
node structure has three arguments:
- the non-terminal of the grammar rule (here s,
np, vp, etc.)
- a list of the node structures associated with the items
on the right-hand side of the grammar rule
- a list of grammatical attributes (in this grammar, this list will
be empty)
The translation of our trivial grammar into standard Prolog is
thus:
?- dctg_reconsult('ks81dctg.pl').
Yes
?- listing([s,np,vp,n,det,v]).
:- dynamic s/3.
s(node(s, [A, B], []), C, D) :-
np(A, C, E),
vp(B, E, D).
:- dynamic np/3.
np(node(np, [A, B], []), C, D) :-
det(A, C, E),
n(B, E, D).
np(node(np, [A], []), B, C) :-
n(A, B, C).
:- dynamic vp/3.
vp(node(vp, [A, B], []), C, D) :-
v(A, C, E),
np(B, E, D).
vp(node(vp, [A], []), B, C) :-
v(A, B, C).
:- dynamic n/3.
n(node(n, [[mary]], []), A, B) :-
c(A, mary, B).
n(node(n, [[john]], []), A, B) :-
c(A, john, B).
n(node(n, [[woman]], []), A, B) :-
c(A, woman, B).
n(node(n, [[man]], []), A, B) :-
c(A, man, B).
n(node(n, [[apple]], []), A, B) :-
c(A, apple, B).
:- dynamic det/3.
det(node(det, [[the]], []), A, B) :-
c(A, the, B).
:- dynamic v/3.
v(node(v, [[loves]], []), A, B) :-
c(A, loves, B).
v(node(v, [[eats]], []), A, B) :-
c(A, eats, B).
v(node(v, [[sings]], []), A, B) :-
c(A, sings, B).
Yes
?-
The predicate
dctg_reconsult(File) is used to
translate a DCTG grammar into Prolog clauses and load them;
it is provided by [
Abramson/Dahl/Paine 1990] and
is available from a variety of sources on the net.[
11]
A short terminal session should make the nature of the results
a bit clearer:[
12]
?- s(S,[john,loves,mary],[]), write(S).
node(s,
[node(np,
[node(n, [[john]], [])],
[]),
node(vp,
[node(v, [[loves]], []),
node(np,
[node(n, [[mary]], [])],
[])],
[])],
[])
S = node(s, [node(np, [node(n, [[john]], [])], []),
node(vp, [node(v, [[loves]], []), node(np, [node(n,
[...], [])], [])], [])], [])
Yes
?- s(S,[the,woman,eats,the,apple],[]), write(S).
node(s,
[node(np,
[node(det, [[the]], []),
node(n, [[woman]], [])],
[]),
node(vp,
[node(v, [[eats]], []),
node(np,
[node(det, [[the]], []),
node(n, [[apple]], [])],
[])],
[])],
[])
S = node(s, [node(np, [node(det, [[the]], []), node(n,
[[woman]], [])], []), node(vp, [node(v, [[eats]], []),
node(np, [node(det, [...], []), node(..., ..., ...)],
[])], [])], [])
Yes
?- s(S,[the,man,sings],[]), write(S).
node(s,
[node(np,
[node(det, [[the]], []),
node(n, [[man]], [])],
[]),
node(vp,
[node(v, [[sings]], [])],
[])],
[])
S = node(s, [node(np, [node(det, [[the]], []),
node(n, [[man]], [])], []), node(vp, [node(v,
[[sings]], [])], [])], [])
Yes
?- The node structure constructed in the first added argument
resembles and serves much the same purpose as the structure attribute
used in E2, the attribute-grammar version of the English grammar
fragment.
3.3. English grammar with attributes (E2) in DCTG notation
The fragment of English grammar E2, which was presented earlier
to illustrate the use of DCGs for attribute grammars, may also be
used to illustrate DCTGs.
Let's walk through the grammar.
A sentence is an NP followed by a VP; we will call the NP
S
(‘
subject’),
and the VP we will call
P (‘
predicate’).
The goals enclosed in
braces (
S^^number(Num), P^^number(Num))
together express the constraint that the NP and VP must agree in number:
the
number attribute of the NP
S
and the
number attribute of the VP
P
must unify with each other.
The non-terminal
s has only one grammatical
attribute; let us call it
structure.
When
s is made up (as here) of an NP and a VP,
we represent its structure by a Prolog term with
s as its functor and the structure of the
NP and VP as its two arguments:
A noun phrase (NP) is made up of a determiner and a noun;
they must agree in number. This covers phrases like
“the apple”,
“the apples”,
“one apple”,
“some apples”. The agreement rule excludes the phrases
“one apples” and “some apple”.[
13]
The non-terminal
np has two attributes,
named
structure and
number.
The structure of the NP is a Prolog term with the name
of the non-terminal (
np) as its functor and
the constituents as the arguments. This is the pattern
of the structure attribute
on all non-terminals, and I won't comment on it again.
The
number attribute of the NP illustrates an
important idiom: the guard in the syntactic part of the rule
has already checked the
number attributes of
the determiner and the noun, to make sure they unify with
each other; the variable
Num has the value
sg or
pl already, and we don't need
to do any more computation. We just say that whatever
Num
is, that's the grammatical number of the NP.
Noun phrases can also take the form of a plural noun by
itself, as in “Men love apples”.
The final form of NP recognized by this grammar is
a singular proper noun by itself, as in “John loves
Mary”.
A verb phrase (VP) can include a direct object in the form
of a noun phrase:
or they can be just a verb with no direct object.[
14]
Although both the verb and the direct object have a
number attribute, only that of the verb
counts in determining the value of the
number
attribute for the VP as a whole.
Nouns, proper nouns, verbs, and determiners
(the ‘pre-terminal’ categories of the
grammar) all have rules of the same structure: a
token in the string counts as one of these if the lexicon
says it's one, and the
number attribute
has whatever value the lexicon gives.
< 29 Pre-terminal rules [continues 19 English grammar fragment with attributes] > ≡
n ::= [L], { lex(L,n,Num) }
<:> structure(n(L))
&& number(Num).
pn ::= [L], { lex(L,pn,Num) }
<:> structure(pn(L))
&& number(Num).
v ::= [L], { lex(L,v,Num) }
<:> structure(v(L))
&& number(Num).
det ::= [L], { lex(L,det,Num) }
<:> structure(det(L))
&& number(Num).
Finally, the lexicon. To keep things simple, the lexicon here
is just a set of facts, with literal values.[
15] The entry for
the word
the is the exception: it does not have
a literal value, but the anonymous variable
_
to indicate that
the can be
either
sg or
pl.
< 30 Lexicon [continues 19 English grammar fragment with attributes] > ≡
lex(mary,pn,sg).
lex(john,pn,sg).
lex(woman,n,sg).
lex(women,n,pl).
lex(man,n,sg).
lex(men,n,pl).
lex(apple,n,sg).
lex(apples,n,pl).
lex(the,det,_).
lex(some,det,pl).
lex(one,det,sg).
lex(loves,v,sg).
lex(love,v,pl).
lex(eats,v,sg).
lex(eat,v,pl).
lex(sings,v,sg).
lex(sing,v,pl).
A session log illustrates the structure built by the
DCTG rules:
?- s(S,[john,loves,mary],[]), write(S).
node(s,
[node(np,
[node(pn,
[[john]],
[structure(pn(john)),
(number(sg)::-lex(john, pn, sg))])],
[ (structure(np(_G292))::-
node(pn,
[[john]],
[structure(pn(john)),
(number(sg)::-lex(john, pn, sg))])^^structure(_G292)),
number(sg)]),
node(vp,
[node(v,
[[loves]],
[structure(v(loves)),
(number(sg)::-lex(loves, v, sg))]),
node(np,
[node(pn,
[[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])],
[(structure(np(_G424))::-
node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])
^^structure(_G424)),
number(sg)])],
[ (structure(vp(_G351, _G352))::-
node(v, [[loves]],
[structure(v(loves)),
(number(sg)::-lex(loves, v, sg))])^^structure(_G351),
node(np,
[node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])],
[(structure(np(_G424))::-
node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])
^^structure(_G424)),
number(sg)])^^structure(_G352)),
(number(_G373)::-
node(v, [[loves]],
[structure(v(loves)),
(number(sg)::-lex(loves, v, sg))])^^number(_G373))])],
[(structure(s(_G261, _G262))::-
node(np,
[node(pn, [[john]],
[structure(pn(john)),
(number(sg)::-lex(john, pn, sg))])],
[(structure(np(_G292))::-
node(pn, [[john]],
[structure(pn(john)),
(number(sg)::-lex(john, pn, sg))])^^structure(_G292)),
number(sg)])^^structure(_G261),
node(vp, [node(v, [[loves]],
[structure(v(loves)),
(number(sg)::-lex(loves, v, sg))]),
node(np,
[node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])],
[(structure(np(_G424))::-
node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])
^^structure(_G424)),
number(sg)])],
[(structure(vp(_G351, _G352))::-
node(v, [[loves]],
[structure(v(loves)),
(number(sg)::-lex(loves, v, sg))])
^^structure(_G351),
node(np, [node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])],
[(structure(np(_G424))::-
node(pn, [[mary]],
[structure(pn(mary)),
(number(sg)::-lex(mary, pn, sg))])
^^structure(_G424)),
number(sg)])^^structure(_G352)),
(number(_G373)::-
node(v, [[loves]],
[structure(v(loves)),
(number(sg)::-lex(loves, v, sg))])
^^number(_G373))])
^^structure(_G262))])
In examining the structure just shown, the reader will note a great
deal of apparent repetition; this results from the high incidence of
structure sharing, which is not shown explicitly: in the grammar
itself, the variables which have child nodes as values are used
quite freely and appear both in the list of children and in the
rules for calculating synthetic attributes.
Note also that the values of the attributes are not always
pre-calculated: instead, the structure has the rules necessary
to perform the evaluation on demand. In the example, the
structure attribute for the pre-terminal
categories is already completely grounded: it has no variables,
but only literal values, while the structure attributes
for the higher-level parts of the linguistic structure still
have unbound variables.
3.4. Layer 3: Translation into DCTG notation
As a first step toward providing grammatical attributes with PSVI
information, we will translate the existing purchase-order schema into
DCTG notation, adding grammatical attributes corresponding to some
basic information-set properties which are required to be in the
input infoset:
- for Attribute Information Items:
- [local name]
- [namespace name]
- [normalized value]
- for Element Information Items:
- [local name]
- [namespace name]
- [children]
- [attributes]
- [in-scope namespaces] or [namespace attributes]
- for Namespace Information Items:
- [prefix]
- [namespace name]
In principle, we ought perhaps also to provide properties for
character information items:
- for Character Information Items:
but it seems extraordinarily inconvenient to use grammatical
attributes for this purpose. The
information
involved is obviously present, and can be isolated by
using the standard Prolog predicate
atom_codes(Atom,String)
(or
atom_chars(Atom,CharList) and
char_code(Atom,ASCII)).
Additionally, we will add some more interesting properties
of the PSVI:
- type definition name, namespace, anonymous, and type
- schema specified (schema or infoset)
- validation attempted (always full)
- validity (always valid, because when the document
is not valid, we fail)
Like the DCG version above, the DCTG
version of the schema has several distinct kinds of rules:
- element rules
- attribute-list rules (for checking the attributes of a complex
type)
- content-model rules (for checking the content of a complex
type)
- simple-type checking rules
The following sections give these in DCTG notation.
3.4.1. Top-level rules for element types
An element rule will serve to match the start-tag and
get the attributes and contents of each element; from it, we
will call routines to check the attributes and content against the
complex type. These differ from the DCG rules in two ways:
when we call them, we must specify three arguments, not two,
and we provide explicit grammatical attributes for infoset
properties. The basic pattern is simple: for any element
in namespace
n with local name
gi
and complex type
ct, we will construct an
appropriate non-terminal symbol
nt, and the
element rule will look like this:
nt ::= [element(n:gi,Attributes,Content)],
{
ct_atts(A,NA,Attributes),
ct_cont(C,Content,[])
}
<:> attributes(A)
&& namespaceAttributes(NA)
&& children(C)
&& localName(gi)
&& namespacename(n)
&& type_definition_anonymous(Boolean)
&& type_definition_namespace(URI)
&& type_definition_name(NCName)
&& type_definition_type(complex)
&& validation_attempted(full)
&& validity(valid)
.
Later, we will add further grammatical attributes, and
use values other than
full and
valid
for invalid elements.
Note that the rule for
XML attributes is not a simple call to the parser, but a call
to a wrapper predicate. Since the SWI parser returns namespace
attributes in the same list as other attributes, while the infoset
spec requires that they be listed in different properties, the
ct_attributes predicate will need to
filter the attribute information items into two different lists,
one to become the value of the attributes infoset property,
and one to become the value of namespaceAttributes.
We also should become a little more systematic about naming
conventions. If we continue to use generic identifiers (element
type names) directly as names of Prolog predicates, we risk name
collisions between elements and predicates defined as part of the
parser, or built in to Prolog. To eliminate this risk, we will
prefix names taken over from the schema with e_
(for elements), t_ (for types), etc.,
and we will avoid those prefixes otherwise.
If a schema has any names beginning with
e_ or t_,
this rule may become slightly confusing. But there won't be
collisions between schema-based names and other names in the parser.
The purchase order schema
po.xsd defines the
following fifteen element types: the list
gives the simple names which will be used to refer to them in the
grammar below, as well as
their schema-component designator as defined in
Holstege/Vedamuthu 2002.
Since their local names are
all unique, the grammar below simply uses
e_
plus their local names to refer
to them. In other schemas, it will be necessary
to mangle the names, or generate arbitrary identifiers,
in order to distinguish different element types
which have the same local names.
- e_purchaseOrder = /element(purchaseOrder)
- e_comment = /element(comment)
- e_shipTo = /complexType(po:PurchaseOrderType)/sequence()/element(shipTo)
- e_billTo = /complexType(po:PurchaseOrderType)/sequence()/element(billTo)
- e_items = /complexType(po:PurchaseOrderType)/sequence()/element(items)
- e_name = /complexType(po:USAddress)/sequence()/element(name)
- e_street = /complexType(po:USAddress)/sequence()/element(street)
- e_city = /complexType(po:USAddress)/sequence()/element(city)
- e_state = /complexType(po:USAddress)/sequence()/element(state)
- e_zip = /complexType(po:USAddress)/sequence()/element(zip)
- e_item = /complexType(po:Items)/sequence()/element(item)
- e_productName = /complexType(po:Items)/sequence()/element(item)/complexType()/sequence()/element(productName)
- e_quantity = /complexType(po:Items)/sequence()/element(item)/complexType()/sequence()/element(quantity)
- e_USPrice = /complexType(po:Items)/sequence()/element(item)/complexType()/sequence()/element(USPrice)
- e_shipDate = /complexType(po:Items)/sequence()/element(item)/complexType()/sequence()/element(shipDate)
The simple purchase-order schema defines four complex types; one
is anonymous; we'll use the local name of its host element after the
t_ prefix:
- t_PurchaseOrderType = /complexType(po:PurchaseOrderType)
- t_USAddress = /complexType(po:USAddress)
- t_Items = /complexType(po:Items)
- t_item = /complexType(po:Items)/sequence()/element(item)/complexType()
The elements with complex types get these rules:
< 31 Rules for elements with complex types > ≡
/* e_purchaseOrder: grammatical rule for purchaseOrder element.
e_purchaseOrder(ParsedNode,L1,L2): holds if the difference
between L1 and L2 (difference lists) is a purchase order
element in SWI Prolog notation.
And so on for the other element types.
*/
e_purchaseOrder ::= [element('http://www.example.com/PO1':purchaseOrder,
Attributes,Content)],
{
t_PurchaseOrderType_atts(A,NA,Attributes),
t_PurchaseOrderType_cont(C,Content,[])
}
<:> localname(purchaseOrder)
&& type_definition_anonymous('false')
&& type_definition_namespace('http://www.example.com/PO1')
&& type_definition_name('PurchaseOrderType')
&& type_definition_type(complex)
{Common infoset properties for elements in po namespace 32}
.
e_shipTo ::= [element(shipTo,Attributes,Content)],
{
t_USAddress_atts(A,NA,Attributes),
t_USAddress_cont(C,Content,[])
}
<:> localname(shipTo)
&& type_definition_anonymous('false')
&& type_definition_namespace('http://www.example.com/PO1')
&& type_definition_name('USAddress')
&& type_definition_type(complex)
{Common infoset properties for elements in po namespace 32}
.
e_billTo ::= [element(billTo,Attributes,Content)],
{
t_USAddress_atts(A,NA,Attributes),
t_USAddress_cont(C,Content,[])
}
<:> localname(billTo)
&& type_definition_anonymous('false')
&& type_definition_namespace('http://www.example.com/PO1')
&& type_definition_name('USAddress')
&& type_definition_type(complex)
{Common infoset properties for elements in po namespace 32}
.
e_items ::= [element(items,Attributes,Content)],
{
t_Items_atts(A,NA,Attributes),
t_Items_cont(C,Content,[])
}
<:> localname(items)
&& type_definition_anonymous('false')
&& type_definition_namespace('http://www.example.com/PO1')
&& type_definition_name('Items')
&& type_definition_type(complex)
{Common infoset properties for elements in po namespace 32}
.
e_item ::= [element(item,Attributes,Content)],
{
t_item_atts(A,NA,Attributes),
t_item_cont(C,Content,[])
}
<:> localname(item)
&& type_definition_anonymous('true')
&& type_definition_namespace('http://www.example.com/PO1')
&& type_definition_name('t_item')
&& type_definition_type(complex)
{Common infoset properties for elements in po namespace 32}
.
This code is used in < Predicates for purchase-order material 84 > < Predicates for purchase-order material 154 >
Note that the
type_definition_name property for
the
item element provides the generated name we
use for the type. That this name is not assigned by the schema
is clarified by
type_definition_anonymous('true').
Since the
attributes,
children, and
namespacename properties
have identical definitions for all element types in the purchase-order
namespace, we can factor them out into a single code fragment:
The rules for elements with simple types are slightly simpler
than those for elements with complex types, but
follow the same basic pattern.
In the DCG version of this schema given above, we wrote
the comment element
and others associated with simple types using a hard-coded
requirement for an empty list of attributes. In fact, that is
too simple, since such elements may in fact have
xsi:type, xsi:nil,
xsi:schemaLocation and
xsi:noNamespaceSchemaLocation attributes.
So we write these element rules with the same basic structure
as was used for complex types, except that we use a standard
predicate for checking that no attributes outside the xsi
namespace were used.
The schema
po.xsd defines two simple types: SKU
and the anonymous simple type used for quantities:
- t_quantity =
/complexType(po:Items)/sequence()/element(item)/complexType()/sequence()/element(quantity)/simpleType()
- t_SKU = /simpleType(SKU)
In addition, several built-in simple types are used:
- t_string = xsd:string
- t_integer = xsd:integer
- t_decimal = xsd:decimal
- t_date = xsd:date
In a full implementation, we'll need to do some more serious name
mangling to ensure uniqueness of relatively short, handy names for
all types. For now, we just choose the names manually.
The rules for simple types are:
< 33 Rules for elements with simple types > ≡
e_comment ::= [element('http://www.example.com/PO1':comment,Attributes,Content)],
{Guard to check attributes and content of strings 34}
<:> localname(comment)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for strings 35}
.
e_name ::= [element(name,Attributes,Content)],
{Guard to check attributes and content of strings 34}
<:> localname(name)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for strings 35}
.
e_street ::= [element(street,Attributes,Content)],
{Guard to check attributes and content of strings 34}
<:> localname(street)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for strings 35}
.
e_city ::= [element(city,Attributes,Content)],
{Guard to check attributes and content of strings 34}
<:> localname(city)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for strings 35}
.
e_state ::= [element(state,Attributes,Content)],
{Guard to check attributes and content of strings 34}
<:> localname(state)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for strings 35}
.
e_zip ::= [element(zip,Attributes,Content)],
{
sT_atts(A,Attributes,[]),
xsd_decimal_cont(C,Content,[])
}
<:> localname(zip)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for decimals 36}
.
e_productName ::= [element(productName,
Attributes,Content)],
{Guard to check attributes and content of strings 34}
<:> localname(productName)
{Common infoset properties for elements in po namespace 32}
{PSVI properties for strings 35}
.
e_quantity ::= [element(quantity,
Attributes,Content)],
{
sT_atts(A,Attributes,[]),
t_quantity_cont(C,Content,[])
}
<:> localname(quantity)
{Common infoset properties for elements in po namespace 32}
&& type_definition_anonymous('true')
&& type_definition_namespace('http://www.example.com/PO1')
&& type_definition_name('t_quantity')
&& type_definition_type(simple)
.
e_USPrice ::= [element('USPrice',Attributes,Content)],
{
sT_atts(A,Attributes,[]),
xsd_decimal_cont(C,Content,[])
}
<:> localname('USPrice')
{Common infoset properties for elements in po namespace 32}
{PSVI properties for decimals 36}
.
e_shipDate ::= [element(shipDate,Attributes,Content)],
{
sT_atts(A,Attributes,[]),
xsd_date_cont(C,Content,[])
}
<:> localname(shipDate)
{Common infoset properties for elements in po namespace 32}
&& type_definition_anonymous('false')
&& type_definition_namespace('http://www.w3.org/2001/XMLSchema')
&& type_definition_name('date')
&& type_definition_type(simple)
.
This code is used in < Predicates for purchase-order material 84 > < Predicates for purchase-order material 154 >
Just as we factor out the common infoset properties, we
can also factor out the checking against frequently used built-in
simple types, notably string:
Similarly, the type identifications for string and
decimal are used more than once:
3.4.2. Rules for attributes
For each complex type, we need to do several things in order to
validate all the attributes on occurrence of that type and provide
appropriate nodes and infoset properties:
- The input structure has namespace attributes and other
attributes in the same list, while we need them in separate lists so
we can assign them to two different infoset properties. So we need to
partition the list of attributes. We can perform the partition either
before all other processing, or after; doing it afterwards leads to
more compact code, so we choose that.
- For each non-namespace attribute found, we need to validate it:
if it is declared, we need to check it against its declared type. If
the attribute is declared with a fixed type, we should check that the
value given matches the prescribed value. If it is not declared, we
should raise an error, but we'll save that for a later layer.
- We need to ensure that attributes required by the complex type
are present and that attributes forbidden by the complex type are not
present. For any attributes declared with default values, we need to
supply an attribute information item with the default value, if the
document didn't supply a value. Rather than trying to interleave this
with other tasks, we will perform a separate check on attribute
occurrences.
And we want to provide basic infoset properties for the XML
attributes, in the form of grammatical attributes in the
attribute-grammar sense.
For each complex or simple type
dt, the basic pattern of the
attribute-checking rule will be:
dt_atts(Lpa,Lpna,Lavs) :-
lavs_dt(LpaAll,Lavs,[]), /* parse against grammar of attributes */
partition(LpaAll,LpaPresent,Lpna), /* partition the result */
attocc_dt(LpaPresent,Lpa). /* check min, max occurrence rules */
The logical variables have the following meanings:
- Lpa
- List of parsed attributes (i.e. of node() structures
of the kind returned by any DCTG rule) for this complex type,
including defaulted attributes
- Lpna
- List of parsed namespace attributes
- Lavs
- The list of attribute-value specifications provided by the
input structure returned by the SWI Prolog parser.
- LpaAll
- Combined list of parsed-attribute node()
structures for all attributes, both namespace attributes and others
- LpaPresent
- List of parsed-attribute nodes for attributes explicitly assigned
values in the document instance (without defaulted attributes)
For each type, a grammar defining the legal attributes will
be constructed; if type
dt has attributes
an1
and
an2, of types
st1 and
st2
respectively, then the core context-free grammar will have a form
like this:
lavs_dt ::= [].
lavs_dt ::= avs_dt, lavs_dt. /* declared attributes */
lavs_dt ::= avs_nsd, lavs_dt. /* namespace declarations */
lavs_dt ::= avs_xsi, lavs_dt. /* XSI attributes */
avs_dt ::= [an1=Av], { st1_value(Av) }.
avs_dt ::= [an2=Av], { st2_value(Av) }.
Simple types will, of course, have no declared attributes, and
the rules for declared attributes and occurrence-checking
(together with the rules for individual attributes) will be omitted.
Wildcard support can also be added here when needed.
3.4.2.2. Namespace attributes and XSI attributes
One set of rules for namespace attributes and XSI attributes
will suffice:
< 37 Grammar rules for namespace and XSI attributes > ≡
/* avs_nsd: grammatical rule for namespace-attribute specifications */
avs_nsd ::= [xmlns=DefaultNS]
<:> localname(xmlns)
&& namespacename('http://www.w3.org/2000/xmlns/')
&& normalizedvalue(DefaultNS).
avs_nsd ::= [xmlns:Prefix=NSName]
<:> localname(Prefix)
&& namespacename('http://www.w3.org/2000/xmlns/')
&& normalizedvalue(NSName).
/* avs_nsd: grammatical rule for XSI attribute specifications */
avs_xsi ::= ['http://www.w3.org/2001/XMLSchema-instance':Localname=Value]
<:> localname(Localname)
&& namespacename('http://www.w3.org/2001/XMLSchema-instance')
&& normalizedvalue(Value).
This code is used in < Generic utilities for DCTG-encoded schemas 85 > < Generic utilities for DCTG-encoded schemas 158 >
Note that default namespace declarations do have a namespace
property, despite not having a prefixed name; this is in accord with
Section 2.2 of the Infoset spec, which says
“By definition, all namespace attributes (including those named
xmlns, whose [prefix] property has no value) have a
namespace URI of
http://www.w3.org/2000/xmlns/.”
3.4.2.3. Occurrence checking
Each complex type will also have a rule for occurrence-checking,
which will take something like the following form (assuming that
Lreq,
Ldft, and
Lnot
are lists of required, defaulted, and forbidden attributes:
attocc_dt(LpaPres,LpaAll) :-
atts_present(LpaPres,Lreq),
atts_absent(LpaPres,Lnot),
atts_defaulted(LpaPres,Ldft,LpaAll).
Since the form of the attribute lists has changed (we are now
dealing with lists of
node structures), we need
new forms of
atts_present, etc. for this:
< 38 Utilities for checking attribute occurrences > ≡
/* atts_present(Lpa,Lreq): true if a parsed attribute node
is present in Lpa for each attribute name in Lreq */
atts_present(LAVS,[]).
atts_present(LAVS,[HRA|RequiredTail]) :-
att_present(LAVS,HRA),
atts_present(LAVS,RequiredTail).
/* An attribute name matches if namespace and local part match */
/* att_present(Lpa,Attname): true if a parsed attribute node
is present in Lpa which has name Attname */
att_present([Pa|Lpa],NS:Attname) :-
Pa^^localname(Attname),
Pa^^namespacename(NS).
att_present([Pa|Lpa],Attname) :-
att_present(Lpa,Attname).
/* no base step: if we reach att_present([],Attname) we want to fail. */
This code is used in < Generic utilities for DCTG-encoded schemas 85 > < Generic utilities for DCTG-encoded schemas 158 >
The rule for checking forbidden attributes is very similar:
The rule for providing defaults must go through all of the
attributes with defaults; this happens in the
atts_defaulted
predicate in the usual way of recursion on the list.
< 40 Utility for providing defaulted attributes > ≡
/* atts_defaulted(L1,L2,L3): true if L3 has all the attributes in L1,
plus all of the attributes in L2 which are not also in L1 */
atts_defaulted(Lpa,[],Lpa).
atts_defaulted(Lpa,[Padft|Ldft],LpaAll) :-
atts_defaulted(Lpa,Ldft,Lpa2),
att_merge(Lpa2,Padft,LpaAll).
Continued in <Utility for providing defaulted attributes 41>This code is used in < Generic utilities for DCTG-encoded schemas 85 > < Generic utilities for DCTG-encoded schemas 158 >
For each of these attributes individually, the default value
must be added to the list if a value is not already there;
this involves recursion on the list of attributes already
present. We expect only ever to call this predicate when the
first and third arguments (the defaulted attribute and the list into
which it is to be merged) are inst