Sevastopol

An XSD schema represented as a definite-clause translation grammar

A working paper prepared for the W3C XML Schema Working Group

C. M. Sperberg-McQueen

21 October 2005

$Id: podctg.html,v 1.7 2005/10/22 02:42:57 cmsmcq Exp $



This document describes Sevastopol, a conforming implementation of XML Schema 1.0, which uses definite-clause translation grammars (DCTGs) to perform schema-validity assessment on instances of the sample purchase-order schema defined in [W3C 2001a]. In the process, it illustrates a more general application of logic grammars to schema processing as described in the XML Schema specification and shows how schemas can be represented using DCTGs. This paper assumes a working knowledge of DCTG notation, which is perhaps most simply thought of as an adaptation for Prolog of attribute grammars as described by [Knuth 1968] and later writers. For a brief introduction and pointers to further reading, see [Sperberg-McQueen 2004a]

1. Introduction

1.1. Context

This is one of a series of papers on the application of logical grammars to XML Schema processing.
The first ([Sperberg-McQueen 2004a]) provides a brief introduction to definite-clause grammar (DCG) and definite-clause translation grammar (DCTG) notation; it may be skipped by readers already familiar with the notation.
The second ([Sperberg-McQueen 2004b]) illustrates the application of logic grammars to schema processing by showing the translation (by hand) of a relatively simple schema (the purchase-order schema described by [W3C 2001a]) into DCG form. The resulting grammar can be used (with a Prolog interpreter) to validate XML documents against the schema. The grammar is only an incomplete representation of the schema, however; there are some schema features it does not illustrate or support (xsi:type and xsi:nil attributes, mixed content, substitution groups), and it produces no post-schema-validation information set (PSVI).
This paper is the third in the series. It continues the development of a logic grammar representation of the purchase-order schema, using hand translation to DCTG notation. It shows how to use attributes (in the attribute-grammar sense) to provide a PSVI, and it supports various additional features of XML Schema 1.0 (hereinafter XSD). Full source code is shown; this paper is not a report on the DCTG translation of the schema, but the source code for that translation.
Future papers may develop a more systematic account of DCTGs as attribute grammars, attempt to prove, or at least to argue informally, that the DCTG representation shown fulfils all of XSD's constraints on schema and that parsing using the DCTG fulfils all the validation rules of XSD, and compile the schema for schemas itself into a DCTG so as to provide a a schema processor which can read schema documents, compile them into DCTGs or equivalent Prolog data structures, and assess the schema-validity of XML documents.

1.2. How to read this paper

This paper contains some relatively high-level discussion of issues, intermixed with a large mass of detail. Since the source code for Sevastopol is generated from this document, every line of source code in the processor must be given here; there is no opportunity to give one example of a pattern and then say “... and similarly for all the other types (or components, or elements, or ...) in question” and pass over the rest in silence: every single one must be written out in full. (And what's worse, much of the code is given three times, in different versions of the program.) Readers interested in the details of the implementation will, I hope, find the exposition reasonably useful, although at times the ratio of expository prose to source code is very low.
Readers uninterested in implementation details, however, will wish to skip part, or most, of the source code; a good rule is probably to skip to the next section heading, or at least the next sizeable block of prose, whenever the code in a particular section begins to lose your interest. Some attention to the discussions of naming conventions will help make the code fragments easier to dip into without excessive disorientation.
Readers in a hurry may find that they can get the gist of the paper by reading or skimming the introductory section (1), the beginning and ending of the sections on the Core, PV, and 2L grammars (sections 2, 4, 5), and the concluding sections 6, 7, 8).

1.3. Layering

In the interests of clarity, I will work through the example grammar here in several layers, starting with some core features of XML Schema and gradually adding others.
Note that this paper is not intended to be a complete translation of XML Schema into DCTG, but a sample small enough to follow and large enough to make a persuasive case that all of XML Schema can be translated. A fuller translation may be given in a follow-on paper.
Some features of XSD won't be covered here, simply because the purchase-order schema doesn't illustrate them. These characteristics of the purchase-order schema are probably worth mentioning, since they simplify our task:
  • No types have mixed content.
  • No elements are in any substitution groups.
  • The type hierarchy is very shallow, and there is little scope for non-vacuous use of xsi:type in the document instance.
  • No types are nillable, so there is little use for the xsi:nil attribute in document instances.
  • There are no wildcards.
  • All content-model particles have minimum and maximum occurrence indicators of zero, one, or unbounded; there are no arbitrary numeric exponents.
  • The schema is designed for single-namespace documents and no schema composition operations (import, include, redefine) are needed.
  • The schema document has no undischarged references to types or elements, so it provides no examples of missing components.
  • The schema imposes no identity constraints and uses no IDs or IDREFs.
  • The schema document provides no annotations.
The DCTG representation of the schema will be developed in layers:
  • The core of the grammar will provide some, but not all, of the infoset properties defined for the PSVI and the input infoset; it will provide a PSVI only for valid input documents; it will fail on invalid input. The first layer illustrates the representation of content models and attribute declarations in DCTG form.
    The purchase-order schema does not contain any mixed-content types or substitution groups, but after building the first layer it will be reasonably clear how to support those.
  • The partial-validity layer (PV) returns a PSVI for all documents, not just schema-valid ones.
  • The reification or second-level(2L) layer represents content models not as Prolog rules, but as Prolog data structures; this makes possible a more concise representation of the schema components, at the cost of having a more abstract validation process. At this point, it becomes possible to offer the user control over the starting point of schema-validity assessment: it need not start at the root of the document, and it need not begin in lax validation mode.
Some simplifying assumptions are made, at least for the first layers of the DCTG. Some are later replaced with more realistic assumptions:
  • When the document is invalid, the schema processor may (or should) exit with an error code. This restriction is lifted in the partial-validation layer.
  • Schema-validity assessment always begins at the root element, with a known element declaration. Consequently, there is no need to provide the validation root property in the PSVI. This restriction is lifted in the reification layer (at least in the sense that the validation root property is provided, and that it would in principle be possible to start somewhere other than the root — no use is actually made of that possibility).
  • The schema we are working with obeys the constraints on schemas; no checking of these constraints is necessary. Since the purchase-order schema does in fact obey all applicable constraints, this is perhaps more of an observation than an assumption. But in extending the patterns of DCTG construction shown here to other cases, it will be necessary to enforce the constraints on schema components and on XML representation of schemas.
  • The schema we are working with has no missing components, again an observation more than an assumption.
  • The xsi:type and xsi:nil attributes are not used in the document instance (or are used only vacuously).
  • When an element is invalid, the schema processor should skip its children and move on: there is no fallback processing.

1.4. Naming conventions and terminology

The DCTG version of the schema has several distinct kinds of rules, some with subgroups. Not all kinds of rules appear in every version of the grammar:
  • element rules match a single element in the input document and check it against a given declaration
  • attribute-list rules check the attributes on an element against the relevant type declaration
  • content-model rules check sequences of child elements against the content model of a given complex type
  • simple-type checking rules check a character sequence in the input infoset against the definition of a given simple type
  • type-sva rules check sets of attributes and sequences of child nodes against the definition of a given type; these serve as wrappers for the attribute-list, content-model, and simple-type checking rules
Some of these rules are schema-specific, while others are generic and can be supplied by a general-purpose library.
Some of the rules are expressed by DCTG grammar rules, others by native Prolog predicates; in each case, there is a fairly clear naming rule:
  • ELEMID: element rules show up as grammar rules with names of the form ELEMID (e.g. e_purchaseOrder); a semantic action calls attribute-list and content rules to validate the element
  • sva_atts_TYPEID: attribute-list rules, with names of the form sva_atts_ + TYPEID, check the attributes of an element against a type
  • attocc_TYPEID: subsidiary rules named attocc_ + TYPEID check attribute occurrences for a given type
  • ras_TYPEID and lras_TYPEID: grammar rules for attribute specifications of a given type, in single and list form
  • sva_content_TYPEID: content-model rules carry names of the form sva_content_ + TYPEID and check the content of an element against the type of the element, whether simple or complex; these predicates are wrappers around lower-level predicates
  • content_TYPEID: for complex types, this is the content model itself, in a grammar rule
  • sva_plf_TYPEID: sva_plf_ + TYPEID rules check pre-lexical forms against simple types
  • lexform_TYPEID: a grammar rule for checking the lexical representation of a simple type value; there are various auxiliaries which vary in the different levels of the grammar
Some of these rules are schema-specific, while others are generic and can be supplied by a general-purpose library.
Ignoring the various auxiliary predicates and clumping classes of similar predicates together, the call graph for the core validator will look like this:
Figure 1: Abstract call graph for the core layer
  • The top-level routines load_go_file and load_file (at the top) call an ELEMID rule (specifically e_purchaseOrder).
  • The oval labeled ELEMID represents the element rules.
  • The type-sva rules sva_content_TYPEID and sva_atts_TYPEID check sets of attributes and sequences of child nodes against the definition of a given type; they call the attribute rules and content-model rules to do the core work.
  • The attribute rules (at the right) include the grammar rules lras_TYPEID and ras_TYPEID, which define the attributes legal for the type.
  • The content-model rules have names of the form content_TYPEID; they typically contain references to element rules (hence the cycle).
  • The simple-type checking rules (sva_plf_TYPEID) check a character sequence against a given simple type; they are called both by individual attribute rules and by sva_content_TYPEID.

1.4.1. Name mangling rules

The development of the DCTG will be easier to follow if we are systematic about naming conventions for the various types of rules and the objects they work upon. If we simply use generic identifiers (element type names) directly as names of Prolog predicates, we risk name collisions between elements and predicates defined as part of the parser, or built in to Prolog. To eliminate this risk, we will perform a fairly simple form of name mangling to produce distinct identifiers for elements, attributes, and types, and to generate Prolog identifiers from them.[1]
  • e_ + name: top-level elements
  • t_ + name: top-level types
  • a_ + name: top-level attributes
  • e_ + name + _ + TYPEID: elements local to a complex type (the TYPEID is the type identifier for the enclosing type)
  • t_ + ELEMID: types local to an element; ELEMID is the element identifier for the enclosing element
  • a_ + name + _ + TYPEID: attributes local to a complex type; the TYPEID is the type identifier for the enclosing type
Since the purchase-order schema does not import any other namespaces, we do not need to associate the elements, attributes, or types with a particular namespace; a system which supports schema-composition will need to pair element-, attribute-, and type-identifiers with namespace names.

1.4.2. Element types in the purchase-order schema

The purchase order schema po.xsd defines the following fifteen element types: the list gives the simple names which will be used to refer to them in the grammar below, as well as their schema-component designator as defined in Holstege/Vedamuthu 2002.[2]
  • e_purchaseOrder = /element(purchaseOrder)
  • e_comment = /element(comment)
  • e_shipTo_t_PurchaseOrderType = /complexType(po:PurchaseOrderType) /sequence() /element(shipTo)
  • e_billTo_t_PurchaseOrderType = /complexType(po:PurchaseOrderType) /sequence() /element(billTo)
  • e_items_t_PurchaseOrderType = /complexType(po:PurchaseOrderType) /sequence() /element(items)
  • e_name_t_USAddress = /complexType(po:USAddress) /sequence() /element(name)
  • e_street_t_USAddress = /complexType(po:USAddress) /sequence() /element(street)
  • e_city_t_USAddress = /complexType(po:USAddress) /sequence() /element(city)
  • e_state_t_USAddress = /complexType(po:USAddress) /sequence() /element(state)
  • e_zip_t_USAddress = /complexType(po:USAddress) /sequence() /element(zip)
  • e_item_t_Items = /complexType(po:Items) /sequence() /element(item)
  • e_productName_t_e_item_t_Items = /complexType(po:Items) /sequence() /element(item) /complexType() /sequence() /element(productName)
  • e_quantity_t_e_item_t_Items = /complexType(po:Items) /sequence() /element(item) /complexType() /sequence() /element(quantity)
  • e_USPrice_t_e_item_t_Items = /complexType(po:Items) /sequence() /element(item) /complexType() /sequence() /element(USPrice)
  • e_shipDate_t_e_item_t_Items = /complexType(po:Items) /sequence() /element(item) /complexType() /sequence() /element(shipDate)

1.4.3. Complex types

The simple purchase-order schema defines four complex types; one is anonymous.
  • t_PurchaseOrderType = /complexType(po:PurchaseOrderType)
  • t_USAddress = /complexType(po:USAddress)
  • t_Items = /complexType(po:Items)
  • t_e_item_t_Items = /complexType(po:Items)/sequence()/element(item)/complexType()

1.4.4. Simple types

The schema po.xsd defines two simple types: SKU and the anonymous simple type used for quantities:
  • t_e_quantity_t_e_item_t_Items = /complexType(po:Items) /sequence() /element(item) /complexType() /sequence() /element(quantity) /simpleType()
  • t_SKU = /simpleType(SKU)
In addition, several built-in simple types are used:
  • t_xsd_string = xsd:string
  • t_xsd_integer = xsd:integer
  • t_xsd_decimal = xsd:decimal
  • t_xsd_date = xsd:date

1.4.5. Terminology and variable names

Some terminology used in the prose and in the construction of variable names may be usefully defined here.
  • Anjewierden/Wielemaker form: the Prolog representation of XML used by the XML parser in SWI Prolog, originally designed by Anjo Anjewierden and documented by Jan Wielemaker in [Wielemaker 2001]
  • ATTID (in pseudo-code): a meta-syntactic variable indicating a place where, in actual code, an attribute identifier will occur
  • AWF: Anjewierden/Wielemaker form, a representation of XML in Prolog datastructures
  • attribute specification: the name-value pair given in an XML document (or information set) to specify the value for the attribute of that name; may be referred to as raw to distinguish it from a parsed attribute node; in variable names, often as or ras; a variable bound to a set or list of attribute specifications is often named Las or Lras
  • DCTG properties: grammatical attributes provided by a DCTG
  • grammatical attributes: the named values associated with nodes in the parse tree of a DCTG; in attribute grammars, these are normally referred to as attributes; the terms grammatical attributes and DCTG properties or just properties are sometimes used here to avoid confusion with XML attributes
  • ELEMID (in pseudo-code): a meta-syntactic variable indicating a place where, in actual code, an element identifier will occur
  • Las (in variable names): a list of attribute specifications
  • Lf or LF (in predicate or variable names): lexical form
  • Lpa (in variable names): a list of parsed attribute nodes (with DCTG properties)
  • Lpe (in variable names): a list of parsed element nodes
  • Lpna (in variable names): a list of parsed namespace-attribute nodes (with DCTG properties)
  • Lras (in variable names): a list of raw attribute specifications
  • Plf or PLF (in predicate or variable names): pre-lexical form
  • PN (in predicate or variable names): parsed node with DCTG properties (as returned by grammar predicates)
  • pre-lexical form: the sequence of characters presented in the input information set as an attribute value or the content of an simply-typed element; the application of the whitespace processing rules associated with a given simple type will transform the pre-lexical form into a lexical form which may or may not be legal for that type
  • property: a grammatical attribute, a DCTG property
  • raw: not yet provided with DCTG properties
  • simply typed (of elements): being declared as having a simple (rather than a complex) type
  • sva (in predicate names): schema-validity assessment
  • TYPEID (in pseudo-code): a meta-syntactic variable indicating a place where, in actual code, a type identifier will occur (may occasionally appear as TID)
  • XML attributes: the named values associated with elements in an XML document; the qualification XML is used to avoid confusion with grammatical attributes
See also section 2.6.3.

2. The core: Providing PSVI properties

Another paper ([Sperberg-McQueen 2004b]) has already illustrated the translation of the purchase-order schema ([W3C 2001a]) into definite-clause form. To model schema-validity assessment properly, however, we need to provide more output than the DCG provides: specifically, we need to provide information about the input document together with some additional properties (the schema infoset contributions). It's possible to do that in DCG notation, but it rapidly becomes cumbersome. We'll use DCTG notation instead; it was devised to handle grammatical attributes more conveniently than DCG, and to separate the semantics more effectively from the syntax [Abramson 1984].
As a first step toward providing grammatical attributes with PSVI information, we will translate the purchase-order schema into DCTG notation, adding grammatical attributes corresponding to some basic information-set properties which are required to be in the input infoset:[3]
  • for Attribute Information Items:
    • [local name]
    • [namespace name]
    • [normalized value]
  • for Element Information Items:
    • [local name]
    • [namespace name]
    • [children]
    • [attributes]
    • [in-scope namespaces] or [namespace attributes]
  • for Namespace Information Items:
    • [prefix]
    • [namespace name]
Additionally, we will add some more interesting properties of the PSVI:
  • type definition name, namespace, anonymous, and type
  • schema specified (schema or infoset)
  • validation attempted (always full)
  • validity (always valid, because when the document is not valid, we fail)
Some further information will also prove convenient for following what's going on: and when we have references to types, we specify both the SCD and the shortname of the type.
  • info_item: on elements and attributes, specifies what kind of information item it is (i.e. element or attribute)
As in [Sperberg-McQueen 2004b], the input will be in Anjewierden/Wielemaker form ([Wielemaker 2001]).

2.1. Top-level rules for element types

2.1.1. Basic pattern

An element rule will serve to match the element node in the input and get the attributes and contents of each element; from it, we will call routines to check the attributes and content against the complex type. These differ from the DCG rules in two ways: when we call them, we must specify three arguments, not two, and we provide explicit grammatical attributes for infoset properties. The basic pattern is simple: for any element in namespace N with local name GI and complex type TYPEID, we will construct an appropriate element identifier ELEMID, and the element rule will look like this:
ELEMID ::= [element(N:GI,Lras,Lre)],
  {
    sva_atts_TYPEID(Lras,Lpa,Lpna),
    sva_content_TYPEID(Lre,Lpe)
  }
  <:> info_item(element)
  && attributes(Lpa) 
  && namespace_attributes(Lpna)
  && children(Lpe) 
  && local_name(GI) 
  && namespace_name(N)
  && type_definition_anonymous(Boolean)
  && type_definition_namespace(URI)
  && type_definition_name(NCName)
  && type_definition_type(complex)
  && validation_attempted(full)
  && validity(valid)
.
Later, we will add further grammatical attributes, and use values other than full and valid for invalid elements.
Note that predicates sva_atts_TYPEID and sva_content_TYPEID are not simple calls to the parser but to wrapper predicates which handle some routine bookkeeping. Since the SWI parser returns namespace attributes in the same list as other attributes, while the infoset spec requires that they be listed in different properties, the sva_atts_TYPEID predicate will need to filter the attribute information items into two different lists, one to become the value of the attributes infoset property, and one to become the value of namespace_attributes.

2.1.2. Elements with complex types

The elements with complex types get these rules:
< 1 Rules for elements with complex types > ≡
/* e_purchaseOrder: grammatical rule for purchaseOrder element.
   e_purchaseOrder(ParsedNode,L1,L2): holds if the difference
      between L1 and L2 (difference lists) is a purchase order
      element in SWI Prolog notation. 
   And so on for the other element types.
*/
e_purchaseOrder ::= [
  element('http://www.example.com/PO1':purchaseOrder,
          Lras,Lre)],
  {
    sva_atts_t_PurchaseOrderType(Lras,Lpa,Lpna),
    sva_content_t_PurchaseOrderType(Lre,Lpe)
  } 
  <:> local_name(purchaseOrder)
  && namespace_name('http://www.example.com/PO1')
  && type_definition_anonymous('false')
  && type_definition_namespace('http://www.example.com/PO1')
  && type_definition_name('PurchaseOrderType')
  && type_definition_type(complex)
  {Common infoset properties for elements in po namespace 2}
  .
e_shipTo_t_PurchaseOrderType ::= [element(shipTo,Lras,Lre)],
  {
    sva_atts_t_USAddress(Lras,Lpa,Lpna),
    sva_content_t_USAddress(Lre,Lpe)
  } 
  <:> local_name(shipTo)
  && namespace_name('')
  && type_definition_anonymous('false')
  && type_definition_namespace('http://www.example.com/PO1')
  && type_definition_name('USAddress')
  && type_definition_type(complex)
  {Common infoset properties for elements in po namespace 2}

  .
e_billTo_t_PurchaseOrderType ::= [element(billTo,Lras,Lre)],
  {
    sva_atts_t_USAddress(Lras,Lpa,Lpna),
    sva_content_t_USAddress(Lre,Lpe)
  } 
  <:> local_name(billTo)
  && namespace_name('')
  && type_definition_anonymous('false')
  && type_definition_namespace('http://www.example.com/PO1')
  && type_definition_name('USAddress')
  && type_definition_type(complex)
  {Common infoset properties for elements in po namespace 2}

  .
e_items_t_PurchaseOrderType ::= [element(items,Lras,Lre)],
  {
    sva_atts_t_Items(Lras,Lpa,Lpna),
    sva_content_t_Items(Lre,Lpe)
  } 
  <:> local_name(items)
  && namespace_name('')
  && type_definition_anonymous('false')
  && type_definition_namespace('http://www.example.com/PO1')
  && type_definition_name('Items')
  && type_definition_type(complex)
  {Common infoset properties for elements in po namespace 2}

  .
e_item_t_Items ::= [element(item,Lras,Lre)],
  {
    sva_atts_t_e_item_t_Items(Lras,Lpa,Lpna),
    sva_content_t_e_item_t_Items(Lre,Lpe)
  } 
  <:> local_name(item)
  && namespace_name('')
  && type_definition_anonymous('true')
  && type_definition_namespace('http://www.example.com/PO1')
  && type_definition_name('t_e_item_t_Items')
  && type_definition_type(complex)
  {Common infoset properties for elements in po namespace 2}

  .

This code is used in < DCTG core version of the purchase order schema 85 >

Note that the type_definition_name property for the item element provides the generated name we use for the type. That this name is not assigned by the schema is clarified by type_definition_anonymous('true'). Some of the elements have namespace_name('http://www.example.com/PO1') and some namespace_name('') because the schema document specifies that local elements should be unqualified (or rather it omits to override the default).
Since the attributes, children, and namespace_name properties have identical definitions for all element types in the purchase-order namespace, we can factor them out into a single code fragment:
< 2 Common infoset properties for elements in po namespace > ≡
  && info_item(element)
  && attributes(Lpa)
  && namespace_attributes(Lpna)
  && children(Lpe)
  && validation_attempted(full)
  && validity(valid)

This code is used in < Rules for elements with complex types 1 > < Rules for elements with simple types 3 >

2.1.3. Elements with simple types

The rules for elements with simple types are slightly simpler than those for elements with complex types, but follow the same basic pattern.
Since they have simple types, we might be tempted to assume these elements cannot have any attributes, but in fact they can have xsi:type, xsi:nil, xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes, as well as namespace attributes. So we write these element rules with the same basic structure as was used for complex types, except that we use a standard predicate (sva_atts_simpletype) for checking that no attributes outside the xsi namespace were used.
The rules for simple types are:
< 3 Rules for elements with simple types > ≡
e_comment ::= 
  [element('http://www.example.com/PO1':comment,Lras,Lre)],
  {Guard to check attributes and content of strings 4}
  <:> local_name(comment) 
  && namespace_name('http://www.example.com/PO1')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for strings 5}
  .

e_name_t_USAddress ::= [element(name,Lras,Lre)],
  {Guard to check attributes and content of strings 4}
  <:> local_name(name) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for strings 5}
  .

e_street_t_USAddress ::= [element(street,Lras,Lre)],
  {Guard to check attributes and content of strings 4}
  <:> local_name(street) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for strings 5}
  .

e_city_t_USAddress ::= [element(city,Lras,Lre)],
  {Guard to check attributes and content of strings 4}
  <:> local_name(city) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for strings 5}
  .

e_state_t_USAddress ::= [element(state,Lras,Lre)],
  {Guard to check attributes and content of strings 4}
  <:> local_name(state) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for strings 5}
  .

e_zip_t_USAddress ::= [element(zip,Lras,Lre)],
  {
    sva_atts_simpletype(Lras,Lpa,Lpna),
    sva_content_t_xsd_decimal(Lre,Lpe)
  }
  <:> local_name(zip) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for decimals 6}
  .

e_productName_t_e_item_t_Items ::= [element(productName,
    Lras,Lre)],
  {Guard to check attributes and content of strings 4}
  <:> local_name(productName) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for strings 5}
  .

e_quantity_t_e_item_t_Items ::= [element(quantity,
    Lras,Lre)],
  {
    sva_atts_simpletype(Lras,Lpa,Lpna),
    sva_content_t_e_quantity_t_e_item_t_Items(Lre,Lpe)
  }
  <:> local_name(quantity) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  && type_definition_anonymous('true')
  && type_definition_namespace('http://www.example.com/PO1')
  && type_definition_name('t_e_quantity_t_e_item_t_Items')
  && type_definition_type(simple)
  .

e_USPrice_t_e_item_t_Items ::= [element('USPrice',Lras,Lre)],
  {
    sva_atts_simpletype(Lras,Lpa,Lpna),
    sva_content_t_xsd_decimal(Lre,Lpe)
  }
  <:> local_name('USPrice') 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  {PSVI properties for decimals 6}
  .

e_shipDate_t_e_item_t_Items ::= [element(shipDate,Lras,Lre)],
  {
    sva_atts_simpletype(Lras,Lpa,Lpna),
    sva_content_t_xsd_date(Lre,Lpe)
  }
  <:> local_name(shipDate) 
  && namespace_name('')
  {Common infoset properties for elements in po namespace 2}
  && type_definition_anonymous('false')
  && type_definition_namespace(
       'http://www.w3.org/2001/XMLSchema')
  && type_definition_name('date')
  && type_definition_type(simple)
  .

This code is used in < DCTG core version of the purchase order schema 85 >

Just as we factor out the common infoset properties, we can also factor out the checking against frequently used built-in simple types, notably string:
< 4 Guard to check attributes and content of strings > ≡
  {
    sva_atts_simpletype(Lras,Lpa,Lpna),
    sva_content_t_xsd_string(Lre,Lpe)
  }

This code is used in < Rules for elements with simple types 3 >

Similarly, the type identifications for string and decimal are used more than once:
< 5 PSVI properties for strings > ≡
  && type_definition_anonymous('false')
  && type_definition_namespace(
       'http://www.w3.org/2001/XMLSchema')
  && type_definition_name('string')
  && type_definition_type(simple)

This code is used in < Rules for elements with simple types 3 > < Rules for elements with simple types (PV) 183 >

< 6 PSVI properties for decimals > ≡
  && type_definition_anonymous('false')
  && type_definition_namespace(
       'http://www.w3.org/2001/XMLSchema')
  && type_definition_name('decimal')
  && type_definition_type(simple)

This code is used in < Rules for elements with simple types 3 > < Rules for elements with simple types (PV) 183 >

2.2. Rules for attributes

For each complex type, we need to do several things in order to validate all the attributes on occurrence of that type and provide appropriate nodes and infoset properties:
  • The input structure has namespace attributes and other attributes in the same list, while we need them in separate lists so we can assign them to two different infoset properties. So we need to partition the list of attributes. We can perform the partition either before all other processing, or after; doing it afterwards leads to more compact code in this version of the grammar, so we choose that.
  • For each non-namespace attribute found, we need to validate it: if it is declared, we need to check it against its declared type. If the attribute is declared with a fixed value, we should check that the value given matches the prescribed value. If the attribute is not declared, we should raise an error, but we'll save that for a later layer. For now, we simply fail instead.
  • We need to ensure that attributes required by the complex type are present and that attributes forbidden by the complex type are not present. For any attributes declared with default values, we need to supply an attribute information item with the default value, if the document didn't supply a value. Rather than trying to interleave this with other tasks, we will perform a separate check on attribute occurrences.
  • We need to write the predicate sva_atts_TYPEID to wrap all attribute processing for the complex type TYPEID.
And we want to provide basic infoset properties for the XML attributes, in the form of grammatical attributes in the attribute-grammar sense.

2.2.1. Basic pattern

For each complex or simple type TYPEID, the basic pattern of the attribute-checking rule will be:
sva_atts_TYPEID(Lras,Lpa,Lpna) :-
  lras_TYPEID(LpaAll,Lras,[]),         /* parse w/ grammar */
  partition(LpaAll,LpaPresent,Lpna),   /* partition result */
  attocc_TYPEID(LpaPresent,Lpa).   /* check min, max rules */
The logical variables have the following meanings:
Lpa
List of parsed attributes (i.e. of node() structures of the kind returned by any DCTG rule) for this complex type, including defaulted attributes
Lpna
List of parsed namespace attributes
Lras
The list of attribute-value specifications provided by the input structure returned by the SWI Prolog parser.
LpaAll
Combined list of parsed-attribute node() structures for all attributes, both namespace attributes and others
LpaPresent
List of parsed-attribute nodes for attributes explicitly assigned values in the document instance (without defaulted attributes)
For each type, a grammar defining the legal attributes will be constructed; if type dt has attributes an1 and an2, of types st1 and st2 respectively, then the core context-free grammar will have a form like this:
lras_dt ::= [].
lras_dt ::= ras_dt, lras_dt.       /* declared attributes */
lras_dt ::= ras_nsd, lras_dt.   /* namespace declarations */
lras_dt ::= ras_xsi, lras_dt.           /* XSI attributes */

ras_dt ::= [an1=Av], { sva_plf_st1(Av) }.
ras_dt ::= [an2=Av], { sva_plf_st2(Av) }.
Simple types will, of course, have no declared attributes, and the rules for declared attributes and occurrence-checking (together with the rules for individual attributes) will be omitted. Wildcard support can also be added here when needed.

2.2.2. Namespace attributes and XSI attributes

One set of rules for namespace attributes and XSI attributes will suffice:
< 7 Grammar rules for namespace and XSI attributes > ≡
/* ras_nsd: grammatical rule for namespace-attribute 
 * specifications */
ras_nsd ::= [xmlns=DefaultNS]
  <:> info_item(attribute)
  && local_name(xmlns)
  && namespace_name('http://www.w3.org/2000/xmlns/')
  && normalized_value(DefaultNS)
  && prefix('##NONE')
  && namespace(DefaultNS).
ras_nsd ::= [xmlns:Prefix=NSName]
  <:> info_item(attribute)
  && local_name(Prefix)
  && namespace_name('http://www.w3.org/2000/xmlns/')
  && normalized_value(NSName)
  && prefix(Prefix)
  && namespace(NSName).
Continued in <Grammar rules for XSI attributes 8>
This code is used in < Generic DCTG rules for DCTG-encoded schemas 89 >

Note that default namespace declarations do have a namespace property, despite not having a prefixed name; this is in accord with Section 2.2 of the Infoset spec, which says “By definition, all namespace attributes (including those named xmlns, whose [prefix] property has no value) have a namespace URI of http://www.w3.org/2000/xmlns/.”
We calculate the properties prefix and namespace for use in maintaining the set of namespace bindings we'll need when serializing the PSVI as XML.
Four attributes are defined in the XSI namespace: type, nil, schemaLocation, and noNamespaceSchemaLocation:
< 8 Grammar rules for XSI attributes [continues 7 Grammar rules for namespace and XSI attributes] > ≡
/* ras_xsi: grammar rule for XSI attribute specifications */
ras_xsi ::= 
  ['http://www.w3.org/2001/XMLSchema-instance':type=Value],
  { sva_plf_t_xsd_qname(Value) }
  <:> local_name(type)
  && type_definition_name('QName')
  && type_definition_anonymous('false')
  {Common properties for xsi attributes 9}
ras_xsi ::= 
  ['http://www.w3.org/2001/XMLSchema-instance':nil=Value],
  { sva_plf_t_xsd_boolean(Value) }
  <:> local_name(nil)
  && type_definition_name('boolean')
  && type_definition_anonymous('false')
  {Common properties for xsi attributes 9}
ras_xsi ::= 
  ['http://www.w3.org/2001/XMLSchema-instance':schemaLocation=Value],
  { sva_plf_t_xsd_list_of_qname(Value) }
  <:> local_name(schemaLocation)
  && type_definition_name('t_a_schemaLocation')
  && type_definition_anonymous('true')
  {Common properties for xsi attributes 9}
ras_xsi ::= 
  ['http://www.w3.org/2001/XMLSchema-instance':noNamespaceSchemaLocation=Value],
  { sva_plf_t_xsd_qname(Value) }
  <:> local_name(noNamespaceSchemaLocation)
  && type_definition_name('QName')
  && type_definition_anonymous('false')
  {Common properties for xsi attributes 9}




These are all in the same namespace, and many of their properties are common:
< 9 Common properties for xsi attributes > ≡
  && info_item(attribute)
  && namespace_name('http://www.w3.org/2001/XMLSchema-instance')
  && normalized_value(Value)
  && type_definition_namespace(
       'http://www.w3.org/2001/XMLSchema')
  && type_definition_type(simple)
  && schema_specified(infoset)
  && validation_attempted(full)
  && validity(valid).

This code is used in < Grammar rules for XSI attributes 8 >

We need predicates to check pre-lexical forms for these types:
< 10 sva_plf rules for built-in types [continues 46 sva_plf rules for built-in types] > ≡
/* QName has no meaningful restrictions on lexical form, so we 
 * don't check anything.  Even the whitespace normalization is
 * pointless in the core grammar. */
sva_plf_t_xsd_qname(PLF) :- 
  ws_normalize(collapse,PLF,_LF),
  atom(PLF).
sva_plf_t_xsd_list_of_qname(PLF) :- 
  ws_normalize(collapse,PLF,_LF),
  atom(PLF).

sva_plf_t_xsd_boolean(PLF) :- 
  ws_normalize(collapse,PLF,LF),
  atom_chars(LF,L),
  lexform_boolean(_,L,[]).



The grammar for Booleans is straightforward:
< 11 Lexical form for boolean > ≡
lexform_boolean ::= bool_true.
lexform_boolean ::= bool_false.
bool_true ::= ['1'].
bool_true ::= [t], [r], [u], [e].
bool_false ::= ['0'].
bool_false ::= [f], [a], [l], [s], [e].

This code is used in < Generic DCTG rules for DCTG-encoded schemas 89 >

2.2.3. Occurrence checking

Each complex type will also have a rule for occurrence-checking, which will take something like the following form (assuming that Lreq, Ldft, and Lnot are lists of required, defaulted, and forbidden attributes):
attocc_dt(LpaPres,LpaAll) :-
  atts_present(LpaPres,Lreq),
  atts_absent(LpaPres,Lnot),
  atts_defaulted(LpaPres,Ldft,LpaAll).
A list of raw attribute specifications Lras contains all the attributes in a list Lreq of required attributes if (a) Lreq is empty, or (b1) Lras contains the head of Lreq and (b2) Lras contains everything in the tail of Lreq:
< 12 Utilities for checking attribute occurrences > ≡
/* atts_present(Lpa,Lreq):  true if a parsed attribute node
   is present in Lpa for each attribute name in Lreq */
atts_present(_LRAS,[]).
atts_present(LRAS,[HRA|RequiredTail]) :-
  att_present(LRAS,HRA),
  atts_present(LRAS,RequiredTail).

/* An attribute name matches if namespace name and local 
 * name part match */
/* att_present(Lpa,Attname):  true if a parsed attribute node
 * is present in Lpa which has name Attname */
att_present([Pa|_Lpa],NS:Attname) :- 
  Pa^^local_name(Attname), 
  Pa^^namespace_name(NS).
att_present([_Pa|Lpa],Attname) :-
  att_present(Lpa,Attname).
/* no base step: if we reach att_present([],Attname) we want 
 * to fail. */
Continued in <Utility for checking absent attributes 13>, <Utility for providing defaulted attributes 14>
This code is used in < Generic utilities for DCTG-encoded schemas 88 >

The rule for checking forbidden attributes is very similar:
< 13 Utility for checking absent attributes [continues 12 Utilities for checking attribute occurrences] > ≡
/* atts_absent(Lpa,Ltabu): true if no attribute named in 
 * Ltabu is present in Lpa */
atts_absent(_LRAS,[]).
atts_absent(LRAS,[H|T]) :-
  not(att_present(LRAS,H)),
  atts_absent(LRAS,T).



The rule for providing defaults must go through all of the attributes with defaults; this happens in the atts_defaulted predicate in the usual way of recursion on the list.
< 14 Utility for providing defaulted attributes [continues 12 Utilities for checking attribute occurrences] > ≡
/* atts_defaulted(L1,L2,L3): true if L3 has all the 
 * attributes in L1, plus all of the attributes in L2 which 
 * are not also in L1 */
atts_defaulted(Lpa,[],Lpa).
atts_defaulted(Lpa,[Padft|Ldft],LpaAll) :-
  atts_defaulted(Lpa,Ldft,Lpa2),
  att_merge(Lpa2,Padft,LpaAll).
Continued in <Utility for providing defaulted attributes 15>
This code is used in < Utility for providing defaulted attributes (PV) 239 >

For each of these attributes individually, the default value must be added to the list if a value is not already there; this involves recursion on the list of attributes already present. We expect only ever to call the att_merge predicate when the first and third arguments (the defaulted attribute and the list into which it is to be merged) are instantiated, but experience shows that we run into problems when Prolog backtracks into this predicate (e.g. after it finds an error further along in the XML document and is retrying everything it has done before). When backtracking, Prolog does call this predicate with uninstantiated arguments and then falls into an infinite loop trying to find the namespace_name attribute of an uninstantiated variable. To prevent this loop, we check to ensure that the first two arguments are instantiated, using the standard Prolog predicate nonvar. Strictly speaking, this test has nothing whatever to do with the declarative meaning of the predicate, and it would be preferable to do without it, but it is essential for practical purposes.
< 15 Utility for providing defaulted attributes [continues 14 Utility for providing defaulted attributes] > ≡
/* att_merge(L1,Pa,L2): if Pa is present in L1, then L3 = L1,
   otherwise L3 = L1 + Pa. */
att_merge([],Padft,[Padft]).
att_merge([Pa|Lpa],Padft,[Pa|Lpa]) :-
  nonvar(Pa), nonvar(Lpa), nonvar(Padft),
  Pa^^namespace_name(NS),
  Padft^^namespace_name(NS),
  Pa^^local_name(Lnm),
  Padft^^local_name(Lnm).
att_merge([Pa|Lpa],Padft,Lpa2) :-
  nonvar(Pa), nonvar(Lpa), nonvar(Padft),
  not( (Pa^^namespace_name(NS),
    Padft^^namespace_name(NS),
    Pa^^local_name(Lnm),
    Padft^^local_name(Lnm) ) ),
  att_merge(Lpa,Padft,Lpa2).



The explicit not() in the third rule is similarly intended to prevent the third rule from firing inappropriately during backtracking.[4]

2.2.4. Rules for the Purchase-order type

The PurchaseOrderType defines only one attribute, orderDate, of type xsd:date. In addition, we need to accept xsi attributes. No attributes here are required, forbidden, or defaulted, so we don't need any calls to atts_present, atts_absent, or atts_defaulted. Following the patterns described above, this gives us the following definitions for the relevant predicates:
< 16 Attribute handling for PurchaseOrderType > ≡
/* sva_atts_TYPENAME(Lras,Lpa,Lpna): true if Lras contains 
 * an input-form list of attribute specifications which 
 * is legal for complex type TYPENAME, and which 
 * corresponds to the list of parsed attributes Lpa plus
 * the list of parsed namespace attributes Lpna. */

sva_atts_t_PurchaseOrderType(Lras,Lpa,Lpna) :-
  lras_t_PurchaseOrderType(LpaAll,Lras,[]),
  partition(LpaAll,Lpa,Lpna),
  attocc_t_PurchaseOrderType(LpaPres,Lpa).

lras_t_PurchaseOrderType ::= []
  {Grammatical attributes for empty attribute list 22}.
lras_t_PurchaseOrderType ::= ras_t_PurchaseOrderType^^Pa, 
                             lras_t_PurchaseOrderType^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_PurchaseOrderType ::= ras_nsd^^Pa, 
                             lras_t_PurchaseOrderType^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_PurchaseOrderType ::= ras_xsi^^Pa, 
                             lras_t_PurchaseOrderType^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.

ras_t_PurchaseOrderType ::= [orderDate=Value],
  { sva_plf_t_xsd_date(Value) }
  {Properties for orderDate attribute 24}.

/* Literally copying the pattern would give us this:

attocc_t_PurchaseOrderType(LpaPres,LpaAll) :-
  atts_present(LpaPres,[]),
  atts_absent(LpaPres,[]),
  atts_defaulted(LpaPres,[],LpaAll).

but that's pointless.  Instead, we'll do the equivalent: */
attocc_t_PurchaseOrderType(L,L).

This code is used in < DCTG core version of the purchase order schema 85 >

2.2.5. White-space normalization of simple types

The rule for the orderDate attribute specifies that whitespace handling (with the keyword collapse) should be done before the attribute value is validated. We haven't done whitespace-normalization yet, so we should stop to define it. We specify a predicate ws_normalize(+kw,+Atom,-Atom), which takes three arguments: a keyword to say what kind of normalization to perform, an atom representing the character string to be normalized, and an atom representing the same string after normalization. (The arguments marked + are expected to be used as input, i.e. the arguments will be instantiated at the time the relation is called; the argument marked - will normally be uninstantiated when the predicate is called and will be bound to an appropriate value. Readers used to other programming languages may think of it, without too much distortion, as a VAR parameter called by reference and used to return the result of a computation.)
There are three values for the keyword, described in the XML Schema 1.0 specification as follows:
  • preserve No normalization is done, the value is not changed (this is the behavior required by [XML 1.0 (Second Edition)] for element content)
This one is easy to implement: just make the third argument (the output argument) identical to the second.
< 17 Utility for whitespace normalization > ≡
/* ws_normalize(Keyword,Input,Output): true if Output is
 * an atom identical to the whitespace-normalized form of 
 * Input, with the whitespace mode indicated by Keyword. */
ws_normalize(preserve,Atom,Atom).
Continued in <Utility for whitespace normalization 18>, <Utility for whitespace normalization 20>
This code is used in < Generic utilities for DCTG-encoded schemas 88 >

The second method of normalization is used in XML 1.0 for CDATA attributes:
  • replace All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
< 18 Utility for whitespace normalization [continues 17 Utility for whitespace normalization] > ≡
ws_normalize(replace,In,Out) :-
  atom_codes(In,Lcin),
  ws_blanks(Lcin,Lcout),
  atom_codes(Out,Lcout).



This one requires an auxiliary predicate to replace all whitespace characters in an atom with blanks; ws_blanks walks through a list, changing each tab, linefeed, or carriage return (characters 9, 10, or 13) to blanks (character 32), and leaving all other characters alone.[5]
< 19 Utility to change whitespace characters to blanks [continues 20 Utility for whitespace normalization] > ≡
/* ws_blanks(A,B): where A has any whitespace, B has a blank */
ws_blanks([],[]).
ws_blanks([9|T1],[32|T2]) :- ws_blanks(T1,T2).
ws_blanks([10|T1],[32|T2]) :- ws_blanks(T1,T2).
ws_blanks([13|T1],[32|T2]) :- ws_blanks(T1,T2).
ws_blanks([H|T1],[H|T2]) :- 
  not(member(H,[9,10,13])), 
  ws_blanks(T1,T2).



The third method of normalization is used in XML 1.0 for non-CDATA attributes:
  • collapse After the processing implied by replace, contiguous sequences of #x20's are collapsed to a single #x20, and leading and trailing #x20's are removed.
< 20 Utility for whitespace normalization [continues 17 Utility for whitespace normalization] > ≡
ws_normalize(collapse,In,Out) :-
  ws_normalize(replace,In,Temp),
  atom_codes(Temp,Lctemp),
  ws_collapse(Lctemp,Lcout),
  atom_codes(Out,Lcout).
Continued in <Utility to change whitespace characters to blanks 19>, <Utility for collapsing whitespace 21>


This method, too, requires an auxiliary predicate, ws_collapse:
< 21 Utility for collapsing whitespace [continues 20 Utility for whitespace normalization] > ≡
/* ws_collapse(A,B): B is like A, with all strings of blanks 
 * collapsed to single blanks, and leading and trailing 
 * blanks stripped. */
/* ws_collapse/2 strips leading blanks, then calls 
 * ws_collapse/3 */
ws_collapse([],[]).
ws_collapse([32|T1],T2) :- 
  ws_collapse(T1,T2).
ws_collapse([H|T1],[H|T2]) :- 
  not(H=32), 
  ws_collapse(internal,T1,T2).

/* ws_collapse/3 walks past non-blanks, and when it hits a 
 * string of blanks, it drops all but the last one before 
 * a non-blank. */
ws_collapse(internal,[],[]).
ws_collapse(internal,[32],[]).
ws_collapse(internal,[H|T1],[H|T2]) :- 
  not(H=32), 
  ws_collapse(internal,T1,T2).
ws_collapse(internal,[32,32|T1],T2) :- 
  ws_collapse(internal,[32|T1],T2).
ws_collapse(internal,[32,H|T1],[32,H|T2]) :- 
  not(H=32), 
  ws_collapse(internal,T1,T2).



2.2.6. Attributes for PurchaseOrderType, continued

We need to provide grammatical attributes for each of the non-terminals in the grammar for parsing XML attributes.
The non-terminal lras_t_PurchaseOrderType carries one grammatical attribute, whose value is the list of parsed-attribute nodes which was matched. In the case of the empty list, the attribute is simple: In the recursion steps, we need to flatten the list; otherwise we end up with a lopsided binary tree, rather than a simple list:
The orderDate attribute has the usual infoset properties:
< 24 Properties for orderDate attribute > ≡
  <:> info_item(attribute)
  && local_name('orderDate')
  && namespace_name('')
  && normalized_value(Value)
  && type_definition_anonymous('false')
  && type_definition_namespace(
       'http://www.w3.org/2001/XMLSchema')
  && type_definition_name('date')
  && type_definition_type(simple)
  && schema_specified(infoset)
  && validation_attempted(full)
  && validity(valid)

This code is used in < Attribute handling for PurchaseOrderType 16 >

2.2.7. Rules for attributes of other complex types

2.2.7.1. US Address
The USAddress type defines one attribute (country), of type NMTOKEN; it has a fixed value (US).
< 25 Attribute handling for USAddress > ≡
sva_atts_t_USAddress(Lras,Lpa,Lpna) :-
  lras_t_USAddress(LpaAll,Lras,[]),
  partition(LpaAll,LpaPres,Lpna),
  attocc_t_USAddress(LpaPres,Lpa).

lras_t_USAddress ::= []
  {Grammatical attributes for empty attribute list 22}.
lras_t_USAddress ::= ras_t_USAddress^^Pa, 
                     lras_t_USAddress^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_USAddress ::= ras_nsd^^Pa, lras_t_USAddress^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_USAddress ::= ras_xsi^^Pa, lras_t_USAddress^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.

ras_t_USAddress ::= [country='US']
  <:> info_item(attribute)
  && local_name('country')
  && namespace_name('')
  && normalized_value('US')
  && type_definition_anonymous('false')
  && type_definition_namespace(
       'http://www.w3.org/2001/XMLSchema')
  && type_definition_name('NMTOKEN')
  && type_definition_type(simple)
  && schema_specified(infoset)
  && validation_attempted(full)
  && validity(valid)
.
Continued in <Attribute occurrence checking for USAddress 26>
This code is used in < DCTG core version of the purchase order schema 85 >

Since the country attribute has a fixed value, we need to supply a complete parsed-attribute node for use in case the document instance doesn't supply one. We do this as part of the definition of attocc_t_USAddress.
< 26 Attribute occurrence checking for USAddress [continues 25 Attribute handling for USAddress] > ≡
attocc_t_USAddress(LpaPresent,LpaAll) :-
  CountryAtt = node(
    attribute(country),
    [],
    [ (info_item(attribute)),
      (namespace_name('')),
      (local_name('country')),
      (normalized_value('US')),
      (type_definition_anonymous('false')),
      (type_definition_namespace(
        'http://www.w3.org/2001/XMLSchema')),
      (type_definition_name('NMTOKEN')),
      (type_definition_type(simple)),
      (schema_specified(schema)),
      (validation_attempted(full)),
      (validity(valid))
    ]),
  atts_defaulted(LpaPres,[CountryAtt],LpaAll).



2.2.7.2. Items
The complex type t_Items defines no attributes, so its grammar for attributes only has rules for namespace declarations and attributes in the XSI namespace. Since there are no attributes, there are no required, defaulted, or forbidden attributes, so we don't need the usual call to attocc_Type.
< 27 Attribute handling for Items type > ≡
sva_atts_t_Items(Lras,Lpa,Lpna) :-
  lras_t_Items(LpaAll,Lras,[]),
  partition(LpaAll,LpaPres,Lpna).

lras_t_Items ::= []
  {Grammatical attributes for empty attribute list 22}.
lras_t_Items ::= ras_nsd^^Pa, lras_t_Items^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_Items ::= ras_xsi^^Pa, lras_t_Items^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.

This code is used in < DCTG core version of the purchase order schema 85 >

A similar simplification can be used for simple types.
2.2.7.3. Type t_e_item_t_Items
The complex type t_e_item_t_Items defines the partNum attribute:
< 28 Attribute handling for t_e_item_t_Items > ≡
sva_atts_t_e_item_t_Items(Lras,Lpa,Lpna) :-
  lras_t_e_item_t_Items(LpaAll,Lras,[]),
  partition(LpaAll,LpaPres,Lpna),
  attocc_t_e_item_t_Items(LpaPres,Lpa).

lras_t_e_item_t_Items ::= []
  {Grammatical attributes for empty attribute list 22}.
lras_t_e_item_t_Items ::= ras_t_e_item_t_Items^^Pa, 
                          lras_t_e_item_t_Items^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_e_item_t_Items ::= ras_nsd^^Pa, 
                          lras_t_e_item_t_Items^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_t_e_item_t_Items ::= ras_xsi^^Pa, 
                          lras_t_e_item_t_Items^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
Continued in <PartNum attribute 29>
This code is used in < DCTG core version of the purchase order schema 85 >

The grammatical attributes for the partNum attribute illustrate PSVI properties for user-defined types.
< 29 PartNum attribute [continues 28 Attribute handling for t_e_item_t_Items] > ≡
ras_t_e_item_t_Items ::= [partNum=Value],
  { sva_plf_t_SKU(Value) }
  <:> info_item(attribute)
  && local_name('partNum')
  && namespace_name('')
  && normalized_value(Value)
  && type_definition_anonymous('false')
  && type_definition_namespace(
       'http://www.example.com/PO1')
  && type_definition_name('SKU')
  && type_definition_type(simple)
  && schema_specified(infoset)
  && validation_attempted(full)
  && validity(valid)
.

/* one required attribute: partNum */
attocc_t_e_item_t_Items(LpaPres,LpaAll) :-
  atts_present(LpaPres,['':partNum]),
  atts_absent(LpaPres,[]),
  atts_defaulted(LpaPres,[],LpaAll).



2.2.8. Simple types (namespace and XSI attributes)

A single set of rules will suffice for all simple types (string, decimal, integer, date), because by definition simple types have no attributes; any attributes which occur in the instance must be namespace declarations or XSI attributes.
< 30 Attribute handling for simple types > ≡
sva_atts_simpletype(Lras,Lpa,Lpna) :-
  lras_sT(LpaAll,Lras,[]),
  partition(LpaAll,LpaPres,Lpna).

lras_sT ::= []
  {Grammatical attributes for empty attribute list 22}.
lras_sT ::= ras_nsd^^Pa, lras_sT^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.
lras_sT ::= ras_xsi^^Pa, lras_sT^^Lpa
  {Grammatical attributes for attribute-list recursion 23}.

This code is used in < DCTG core version of the purchase order schema 85 >

2.2.9. Partitioning the list of attributes

The rule for partitioning the list of parsed attribute nodes must extract the actual list from the node passed as the first argument, and then the partition is easy:
< 31 partition predicate > ≡
partition(LpaAll,LpaPresent,Lpna) :-
  LpaAll^^attributes(L),
  partition2(L,LpaPresent,Lpna).
partition2([],[],[]).
partition2([Pa|Lpa],LpaPres,[Pa|Lpna]) :-
  Pa^^local_name(xmlns), 
  partition2(Lpa,LpaPres,Lpna).
partition2([Pa|Lpa],LpaPres,[Pa|Lpna]) :-
  Pa^^namespace_name('http://www.w3.org/2000/xmlns/'), 
  partition2(Lpa,LpaPres,Lpna).
partition2([Pa|Lpa],[Pa|LpaPres],Lpna) :-
  not(Pa^^local_name(xmlns)),
  not(Pa^^namespace_name('http://www.w3.org/2000/xmlns/')),
  partition2(Lpa,LpaPres,Lpna).

This code is used in < Generic utilities for DCTG-encoded schemas 88 > < Generic utilities for DCTG-encoded schemas (PV) 95 > < Utilities for checking attribute occurrences (2L) 409 >

It might be desirable to add the line
  Pa^^namespace_name('http://www.w3.org/2000/xmlns/'),
to the rule for namespace attributes declaring default namespaces, to avoid problems if xmlns were to appear as a local name in some other namespace. Since all names beginning with xml are reserved, though, it would be illegal for xmlns to appear in an application namespace (other than one defined in the future by W3C), so I have not added this test.

2.3. Rules for content of complex types

The most conventional-looking part of our DCTG grammar is the representation of the content models. The base context-free grammar in DCTG notation is given below. We add names to the various items on the right-hand side, for use in flattening the lists of children (repeating and optional items otherwise would cause nesting of nodes).
< 32 Rules for purchase-order content models > ≡
content_t_PurchaseOrderType ::= 
  e_shipTo_t_PurchaseOrderType^^S, 
  e_billTo_t_PurchaseOrderType^^B, 
  opt_e_comment^^C, 
  e_items_t_PurchaseOrderType^^I
{Children attribute of t_PurchaseOrder 36}
.
opt_e_comment ::= []
{Empty list of children for opt_e_comment nonterminal 34}
.
opt_e_comment ::= e_comment^^Comm
{Children for opt_e_comment nonterminal 35}
.

content_t_USAddress ::= 
  e_name_t_USAddress^^N, 
  e_street_t_USAddress^^S, 
  e_city_t_USAddress^^C, 
  e_state_t_USAddress^^ST, 
  e_zip_t_USAddress^^Z
{Children attribute of t_USAddress 33}
.

content_t_Items ::= star_e_item_t_Items^^L
{Children attribute of content_t_Items 40}
.
star_e_item_t_Items    ::= []
{Empty list of children for star_e_item_t_Items nonterminal 41}
.
star_e_item_t_Items    ::= 
  e_item_t_Items^^I, 
  star_e_item_t_Items^^L
{Children for star_e_item_t_Items nonterminal 42}
.

content_t_e_item_t_Items ::= 
  e_productName_t_e_item_t_Items^^PN, 
  e_quantity_t_e_item_t_Items^^Q, 
  e_USPrice_t_e_item_t_Items^^USP, 
  opt_e_comment^^C, 
  opt_e_shipDate_t_e_item_t_Items^^S
{Children attribute of t_e_item_t_Items 37}
.

opt_e_shipDate_t_e_item_t_Items ::= []
{Empty list of children for opt_e_shipdate nonterminal 38}
.
opt_e_shipDate_t_e_item_t_Items ::= 
  e_shipDate_t_e_item_t_Items^^S
{Children for opt_e_shipdate nonterminal 39}
.

This code is used in < DCTG core version of the purchase order schema 85 >

The only grammatical attribute we need to calculate for these non-terminals right now is children, which will be used to supply the children property of the parent element. Since we wish to supply a flat list, rather than an arbitrarily deep one-sided binary tree, we can't simply take the node returned by each rule.
Perhaps the simplest to calculate is the children attribute of the content_t_USAddress non-terminal: it's just a list of the children. Since no child is optional, there is no variation.
< 33 Children attribute of t_USAddress > ≡
  <:> children([N,S,C,ST,Z])

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 > < Rules for purchase-order content models (2L) 389 >

More complex, because the comment element is optional, is the children attribute of the t_PurchaseOrderType non-terminal. When a comment is present, we want it listed among the children; when it is not present, however, we don't want any dummy node. The opt_e_comment non-terminal, that is, should have a children attribute which is either the empty list
< 34 Empty list of children for opt_e_comment nonterminal > ≡
  <:> children([])

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 >

or a list containing the comment node.
< 35 Children for opt_e_comment nonterminal > ≡
  <:> children([Comm])

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 >

The standard list-concatenation methods can now be used to yield either [S,B,C,I] or [S,B,I]. The simplest is probably to use flatten, which generates, for a list possibly containing lists as elements, a flat list with no nested lists, by replacing each list with its elements.
< 36 Children attribute of t_PurchaseOrder > ≡
  <:> children(Lpe) ::- 
    C^^children(CC), 
    flatten([S,B,CC,I],Lpe)

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 > < Rules for purchase-order content models (2L) 389 >

A similar method is used for the item element, which also has optional children.
< 37 Children attribute of t_e_item_t_Items > ≡
  <:> children(Lpe) ::- 
    C^^children(CC), 
    S^^children(SC), 
    flatten([PN,Q,USP,CC,SC],Lpe)

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 > < Rules for purchase-order content models (2L) 389 >

This requires that the opt_e_shipDate_t_e_item_t_Items non-terminal produce (like opt_e_comment) its own children property:
< 38 Empty list of children for opt_e_shipdate nonterminal > ≡
  <:> children([])

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 >

< 39 Children for opt_e_shipdate nonterminal > ≡
  <:> children([S])

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 >

The items element is just a simple list; its children property can be done using the same methods we used to generate a flat list of attributes, above.
< 40 Children attribute of content_t_Items > ≡
  <:> children(List) ::- L^^children(List)

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 > < Rules for purchase-order content models (2L) 389 >

< 41 Empty list of children for star_e_item_t_Items nonterminal > ≡
  <:> children([])

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 >

< 42 Children for star_e_item_t_Items nonterminal > ≡
  <:> children([I|T]) ::- L^^children(T)

This code is used in < Rules for purchase-order content models 32 > < Rules for purchase-order content models (PV) 241 >

For each complex type, we also need to write the sva_content_TYPEID wrapper which calls the grammar. In the rules which follow, the content_TYPEID predicate parses the content of the element against the grammar for the element's complex type; the Topnode ^^ children(Lpe) clause unifies the parsed children of the element with the variable Lpe, so that it can be used as the value of the element's PSVI children attribute.
< 43 Wrapper predicates (sva_content_TYPE) for complex content > ≡
sva_content_t_PurchaseOrderType(Lre,Lpe) :-
  content_t_PurchaseOrderType(Topnode,Lre,[]),
  Topnode ^^ children(Lpe).
sva_content_t_USAddress(Lre,Lpe) :-
  content_t_USAddress(Topnode,Lre,[]),
  Topnode ^^ children(Lpe).
sva_content_t_Items(Lre,Lpe) :-
  content_t_Items(Topnode,Lre,[]),
  Topnode ^^ children(Lpe).
sva_content_t_e_item_t_Items(Lre,Lpe) :-
  content_t_e_item_t_Items(Topnode,Lre,[]),
  Topnode ^^ children(Lpe).

This code is used in < DCTG core version of the purchase order schema 85 >

2.4. Rules for checking values of simple types

The top-level rules in section 2.1 call rules with names of the form sva_content + TYPEID. These rules are responsible for checking that the character content of the element is a legal pre-lexical form for the type in question.

2.4.1. Rules called from top-level element predicates

There are two kinds of rules to provide (this may be an unnecessary distinction, but it's what the rest of the program is expecting): sva_content_TYPEID (called from element rules) and sva_plf_TYPEID (called from elsewhere).
The rules of the first kind are all similar in structure:
< 44 sva_content rules for built-in Types > ≡
sva_content_t_xsd_string([PLF],[PLF]) :-
  sva_plf_t_xsd_string(PLF).
sva_content_t_xsd_decimal([PLF],[PLF]) :-
  sva_plf_t_xsd_decimal(PLF).
sva_content_t_xsd_integer([PLF],[PLF]) :- 
  sva_plf_t_xsd_integer(PLF).
sva_content_t_xsd_date([PLF],[PLF]) :- 
  sva_plf_t_xsd_date(PLF).

This code is used in < Generic utilities for DCTG-encoded schemas 88 >

The content rules for the user-defined type are simple and follow the same pattern as those of the builtin types.
< 45 Simple-type content rules for purchase-order types > ≡
sva_content_t_SKU([PLF],[PLF]) :- 
  sva_plf_t_SKU(PLF).
sva_content_t_e_quantity_t_e_item_t_Items([PLF],[PLF]) :- 
  sva_plf_t_e_quantity_t_e_item_t_Items(PLF).

This code is used in < DCTG core version of the purchase order schema 85 >

2.4.2. Checking strings

Strings are trivial to check.
< 46 sva_plf rules for built-in types > ≡
/* In our representation of XML, character data is 
 * represented as atoms.  Handling of non-ASCII characters is 
 * OK if they are in UTF8, but the SWI parser currently has 
 * trouble with some named entity references to non-ASCII 
 * characters */
sva_plf_t_xsd_string(LF) :- atom(LF).
Continued in <sva_plf rules for built-in types 10>, <Checking decimal and integer values 47>, <Checking date values 49>, <Checking date values 56>, <Checking date values 57>
This code is used in < Generic utilities for DCTG-encoded schemas 88 >

2.4.3. Checking decimals

Decimals match the pattern [+-]? [0-9]+ ('.' [0-9]*), integers match [+-]? [0-9]+ — we'll use DCTG notation to check the lexical form against these patterns:
< 47 Checking decimal and integer values [continues 46 sva_plf rules for built-in types] > ≡
sva_plf_t_xsd_decimal(PLF) :- 
  ws_normalize(collapse,PLF,LF),
  atom_chars(LF,L),
  lexform_decimal(_,L,[]).
sva_plf_t_xsd_integer(PLF) :- 
  ws_normalize(collapse,PLF,LF),
  atom_chars(LF,L),
  lexform_integer(_,L,[]).



< 48 Lexical form for decimal and integer > ≡
lexform_decimal ::= lexform_integer, fractionalpart.
lexform_integer ::= opt_sign, digits.
fractionalpart ::= [].
fractionalpart ::= decimalpoint.
fractionalpart ::= decimalpoint, opt_digits.
opt_sign ::= [].
opt_sign ::= ['+'].
opt_sign ::= ['-'].
decimalpoint ::= ['.'].
opt_digits ::= [].
opt_digits ::= digits.
/* We supply a 'lexval' property on digits, for use in 
 * date checking */
digits ::= digit^^D
  <:> lexval([Dv]) ::- D^^lexval(Dv).
digits ::= digit^^D1, digits^^Dd
  <:> lexval([D1val|Ddval]) ::- 
          D1^^lexval(D1val), 
          Dd^^lexval(Ddval).
digit ::= [Ch], { char_type(Ch,digit) }
  <:> lexval(Ch).

This code is used in < Generic DCTG rules for DCTG-encoded schemas 89 >

2.4.4. Checking dates

Date values can be checked fully using an appropriate grammar; the grammatical attributes of the DCTG notation make it easy to express the leap-year constraints as guards. A lexical form for date is OK if the date is OK; the predicate dateok takes the integer values of year, month, and day as arguments.
< 49 Checking date values [continues 46 sva_plf rules for built-in types] > ≡
sva_plf_t_xsd_date(PLF) :- 
  ws_normalize(collapse,PLF,LF),
  atom_chars(LF,Lc),
  lexform_date(_,Lc,[]).



Years may take an optional leading minus sign; their value (the val property) is composed by reading their lexical form as a number (using the standard number_chars predicate).
< 50 Lexical form for year > ≡
lexform_date ::= year^^Y, hyphen, month^^M, hyphen, day^^D,
  { Y^^val(Yv), M^^val(Mv), D^^val(Dv), dateok(Yv,Mv,Dv) }.
Continued in <Lexical form for year 51>, <Lexical form for month 54>, <Lexical form for day of month 55>
This code is used in < Generic DCTG rules for DCTG-encoded schemas 89 >

< 51 Lexical form for year [continues 50 Lexical form for year] > ≡
/* Years must have at least four digits */
yearnum ::= digit^^D1, digit^^D2, digit^^D3, digits^^Dd
  <:> val(Num) ::- D1^^lexval(Dv1),
          D2^^lexval(Dv2),
          D3^^lexval(Dv3),
          Dd^^lexval(Dv4),
          flatten([Dv1,Dv2,Dv3,Dv4],LF),
          number_chars(Num,LF).
year ::= yearnum^^Y
  <:> val(Num) ::- Y^^val(Num).
year ::= ['-'], yearnum^^Y
  <:> val(Num) ::- Y^^val(N), Num is 0 - N.
hyphen ::= ['-'].



We can, in principle, constrain month values in purely grammatical terms:
< 52 Purely grammatical rule for month > ≡
month ::= ['0'], ['1'].
month ::= ['0'], ['2'].
...
month ::= ['0'], ['9'].
month ::= ['1'], ['0'].
month ::= ['1'], ['1'].
month ::= ['1'], ['2'].

This code is not used elsewhere.

It's a little more compact if we use the number_chars predicate and test the number arithmetically.
< 53 Semi-grammatical rule for month > ≡
month ::= ['0'], digit^^D
  { D^^lexval(Dv), number_chars(V,Dv), V > 0 }
  <:> val(V).
month ::= ['1'], digit^^D
  { D^^lexval(Dv), number_chars(V,Dv), V < 3 }
  <:> val(Val) ::- Val is 10 + V.

This code is not used elsewhere.

And it's easiest to follow, probably, if the context-free part of the grammar allows any two-digit number and we have a guard do the range check, arithmetically. So that's what we'll do:
< 54 Lexical form for month [continues 50 Lexical form for year] > ≡
month ::= digit^^D1, digit^^D2,
  { D1^^lexval(Dv1),
    D2^^lexval(Dv2),
    number_chars(Num,[Dv1,Dv2]),
    Num > 0,
    Num < 13 }
  <:> val(Num).



We'll do the same for day of the month:
< 55 Lexical form for day of month [continues 50 Lexical form for year] > ≡
day ::= digit^^D1, digit^^D2,
  { D1^^lexval(Dv1),
    D2^^lexval(Dv2),
    number_chars(Num,[Dv1,Dv2]),
    Num > 0,
    Num < 32 }
  <:> val(Num).



2.4.5. Checking leap years

No one is ever happy, of course, unless a date field is also checked for correct handling of leap years. The rules below are one way to do this, and not necessarily the most elegant, but relatively easy to understand: a date is OK if the day of the month is between 1 and 28, inclusive (we can rely on the range checks already performed in the grammar rules), or if the day is 29 or 30 and the month is not 2, or if the day is 31 and the month is one of those which has 31 days. Or, finally, it's OK if the year is divisible by four and its divisibility by 100 and 400 is OK. The latter is just complex enough to be worth putting into a separate predicate.
< 56 Checking date values [continues 46 sva_plf rules for built-in types] > ≡
dateok(_Y,_M,D) :- D < 29.
dateok(_Y,M,29) :- M =\= 2.
dateok(_Y,M,30) :- M =\= 2.
dateok(_Y,M,31) :- member(M,[1,3,5,7,8,10,12]).
dateok(Y,2,29) :- 
  (Y >= 0 -> Yx = Y ; Yx is Y + 1),   /* adjust for BC */
  0 is Yx mod 4,
  Lc is Yx mod 100,
  L4c is Yx mod 400,
  leapyearcheck(Lc,L4c).



A year is a leap year if (it is divisible by 4, but we already have that) it is not divisible by 100, or else if it is divisible by 400.
< 57 Checking date values [continues 46 sva_plf rules for built-in types] > ≡
/* if C is nonzero, it's not a century year, 
 * so it's a leapyear */
leapyearcheck(C,_Q) :- C =\= 0. 
/* If both numbers are 0, it's a quad-century year, 
 * so it's a leapyear */
leapyearcheck(0,0).            



2.4.6. Checking SKUs

We also need to have rules for checking the two simple types declared in the purchase-order schema: SKUs and quantities.
The SKU value checking has been seen before (in [Sperberg-McQueen 2004b]); all we do here is translate it from DCG into DCTG notation.
< 58 Value-checking rules for SKU > ≡
sva_plf_t_SKU(PLF) :- 
  ws_normalize(preserve,PLF,LF),
  atom_chars(LF,Charseq),
  lexform_t_SKU(_Structure,Charseq,[]).

lexform_t_SKU ::= sku_decimal_part, hyphen, sku_alpha_part.
sku_decimal_part ::= digit, digit, digit.
sku_alpha_part ::= cap_a_z, cap_a_z.
cap_a_z ::= [Char], { char_type(Char,upper) }.
Continued in <Value-checking rules for quantities 59>
This code is used in < DCTG core version of the purchase order schema 85 >

2.4.7. Checking quantities

The quantity value checking can rely in part on the rule for integers:
< 59 Value-checking rules for quantities [continues 58 Value-checking rules for SKU] > ≡
sva_plf_t_e_quantity_t_e_item_t_Items(PLF) :- 
  ws_normalize(collapse,PLF,LF),
  atom_chars(LF,Lchars),
  lexform_integer(_,Lchars,[]),
  number_chars(Num,Lchars),
  Num < 100.



2.5. Exposing the PSVI

The PSVI created in the DCTG just defined is available, of course, to Prolog code in the way that DCTG properties are designed to be. This section defines some predicates which exploit that availability by writing out a serial form of the PSVI to allow inspection and processing by XML software. Like all I/O operations in Prolog, they have side effects and have no interesting declarative interpretation.
There is no standard XML form for reflecting the PSVI; the form generated here is based on the suggestions in [Sperberg-McQueen 2002]; it has the same basic information set as the input document, except that extra attributes are added to each element, to record the properties added to the infoset by schema-validity assessment. Some of the added attributes record PSVI properties of the element itself, others the PSVI properties of its attributes.

2.5.1. Top-level call

The top-level predicate is write_psvi; it takes a single argument, which is a parsed element node, by convention the one with which schema-validity assessment started. It
  • calls a lower-level predicate to find all the namespaces needed in the document, and return a list of namespace bindings
  • writes out the element's generic identifier using an appropriate namespace prefix
  • calls lower-level predicates to writing out the element's attributes, its DCTG properties, and its children
  • writes out an end-tag
< 60 Top-level predicate for writing PSVI > ≡
/* write_psvi(ParsedNode): write top-level element. */
write_psvi(Pn) :-
  XPSVI = 'http://www.w3.org/People/cmsmcq/ns/xpsvi',
  nsbindings(Pn,[ns('##NONE',''),ns(xpsvi,XPSVI)],Nsbs),
  Pn ^^ local_name(Gi),
  Pn ^^ namespace_name(NS),
  Pn ^^ attributes(LPa),
  Pn ^^ namespace_attributes(LPna),
  Pn ^^ children(LCh),
  uname_qname_context(NS,Gi,Nsbs,QN),
  write('<'),
  write(QN),
  psvi_atts(LPa,Nsbs),
  write('  xmlns:xpsvi="'), write(XPSVI), write('"'), nl,
  psvi_nsatts(LPna,Nsbs),
  psvi_props(Pn,Nsbs),
  psvi_attprops(LPa,Nsbs),
  write('>'),
  psvi_children(LCh,Nsbs),
  write('</'),
  write(QN), 
  write('>'),
  nl.
Continued in <Calculating list of active namespace bindings 61>, <Generating a QName from a namespace name and local name, given a list of namespace bindings 62>, <Writing out attributes in PSVI 65>, <Writing out namespace attributes in PSVI 70>, <Writing out PSVI properties for element 71>, <Writing out PSVI properties for attributes 76>, <Writing out children in PSVI 81>
This code is used in < Generic utilities for DCTG-encoded schemas 88 > < Generic utilities for DCTG-encoded schemas (PV) 95 > < Generic utilities for DCTG-encoded schemas (2L) 269 >

2.5.2. Generating current set of namespace bindings

To avoid cluttering the output document with more namespace declarations than necessary, we will reuse the namespace bindings in the document, adding to them only an explicit entry for the default binding of the empty prefix to the unnamed namespace and a binding for xpsvi to “http://www.w3.org/People/cmsmcq/ns/xpsvi”. And (as seen above) we will pass the inherited namespace bindings down when processing the children.
To calculate the namespace bindings to use on any given element, we run through the list of namespace attributes attached to the element, prepending each in turn to the bindings inherited from the environment.
< 61 Calculating list of active namespace bindings [continues 60 Top-level predicate for writing PSVI] > ≡
/* nsbindings(Pn,Inherited,Total): true if Total is a list of 
 * namespace bindings, those attached to Pn first, then
 * the inherited ones. */
nsbindings(Pn,Inherited,Nsbs) :-
  Pn ^^ namespace_attributes(LPna),
  nsbind(Inherited,LPna,Nsbs).

nsbind(Bindings,[],Bindings).
nsbind(Inherited,[Pna | LPna],[ns(Pre,NS) | Nsbs]) :-
  Pna ^^ prefix(Pre),
  Pna ^^ namespace(NS),
  nsbind(Inherited,LPna,Nsbs).
Continued in <Finding one binding for a namespace 64>


2.5.3. Generating QName given namespace bindings

Given a set of bindings, we can calculate a QName for any namespace name + local name pair. If there is a non-null prefix, we concatenate the prefix, a colon, and the local name. If the prefix is given as the keyword “##NONE”, then we calculate an unprefixed name. If there is no binding for the namespace in question (this shouldn't happen, but just in case!), we emit the conventional “{NSName}Localname” form of name.
< 62 Generating a QName from a namespace name and local name, given a list of namespace bindings [continues 60 Top-level predicate for writing PSVI] > ≡
/* uname_qname_context(NS,Localname,Nsbs,QName) */
uname_qname_context(NS,Localname,Nsbs,QName) :-
  binding(Nsbs,NS,Prefix),
  Prefix \= '##NONE',
  Prefix \= '',
  concat_atom([Prefix,':',Localname],QName).
uname_qname_context(NS,Localname,Nsbs,Localname) :-
  binding(Nsbs,NS,'##NONE').
uname_qname_context(NS,Localname,Nsbs,Localname) :-
  binding(Nsbs,NS,'').
/* emergency: spit out a Uname if you have to */
uname_qname_context(NS,Localname,Nsbs,Uname) :-
  not(binding(Nsbs,NS,_Prefix)),
  concat_atom(['{',NS,'}',Localname],Uname).
Continued in <QName generation for attributes 63>


Attributes use slightly different rules. We could simply stick “ns('##NONE','')” onto the beginning of the list of namespace bindings when we call uname_qname_context to generate an attribute name, but it seems likely to be easier to see what's going on if we define a different predicate:
< 63 QName generation for attributes [continues 62 Generating a QName from a namespace name and local name, given a list of namespace bindings] > ≡
/* Attributes use special rules. */
uname_attname_context('',Localname,_Nsbs,Localname).
uname_attname_context('##NONE',Localname,_Nsbs,Localname).
uname_attname_context(NS,Localname,Nsbs,Qname) :-
  NS \= '',
  NS \= '##NONE',
  uname_qname_context(NS,Localname,Nsbs,Qname).



We need a predicate to find a binding for a given namespace name. To make things deterministic, we return only the first binding found.
< 64 Finding one binding for a namespace [continues 61 Calculating list of active namespace bindings] > ≡
/* binding(Nsbs,NS,Prefix) : true iff Prefix is bound
 * to NS in Nsbs. */
binding(Nsbs,NS,Prefix) :-
  binding(Nsbs,NS,[],Prefix).

/* binding/4: return the first binding found for the namespace
 */

/* If the head of the list of bindings is for our NS, and the 
 * prefix is not occluded, then return the prefix. */
binding([ns(Prefix,NS) | _Nsbs],NS,Occluded,Prefix) :-
  not(member(Prefix,Occluded)).

/* If the head of the list of bindings is for our NS, but the 
 * prefix is occluded, then recur. */
binding([ns(BadPrefix,NS) | Nsbs],NS,Occluded,Prefix) :-
  member(BadPrefix,Occluded),
  binding(Nsbs,NS,Occluded,Prefix).

/* If the head of the list of bindings is not for our NS,
 * then recur. */
binding([ns(Prefix0,NS0) | Nsbs],NS,Occluded,Prefix) :-
  NS0 \= NS,
  binding(Nsbs,NS,[Prefix0 | Occluded], Prefix).



2.5.4. Writing out attributes

To write out the attributes belonging to an element, we use psvi_atts/2,