Copyright
©
2010
©
2011
W3C
®
®
(
MIT
,
ERCIM
,
Keio
),
All
Rights
Reserved.
W3C
liability
,
trademark
and
document
use
rules
apply.
This document defines a streamable profile of XPath 1.0 suitable for use with XML Signature 2.0.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a W3C Last Call Working Draft of "XML Signature Streaming Profile of XPath 1.0".
A diff-marked version of this specification that highlights changes against the previous version is available. Major changes in this version include:
This
document
is
was
originally
derived
from
material
in
the
previous
an
earlier
publication
of
XML
Signature
2.0,
see
http://www.w3.org/TR/2010/WD-xmldsig-core2-20100304/#sec-XPath-2.0
.
This
document
was
published
by
the
XML
Security
Working
Group
as
a
First
Public
Last
Call
Working
Draft.
This
document
is
intended
to
become
a
W3C
Recommendation.
If
you
wish
to
make
comments
regarding
this
document,
please
send
them
to
public-xmlsec@w3.org
(
subscribe
,
archives
).
The
Last
Call
period
ends
31
May
2011.
All
feedback
is
welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
This document specifies a streamable profile of XPath 1.0 [ XPATH ] for use in XML Signature 2.0 [ XMLDSIG-CORE2 ]. It is a proper subset of XPath 1.0, i.e. any XPath expression that is part of this subset is also a valid XPath 1.0 expression, and when evaluated using the streaming algorithm mentioned here, produces exactly the same results as those produced by a regular DOM based XPath engine. Although this XPath subset has been designed in the context of XML Signature 2.0, it is a general purpose subset and can be used in other contexts too.
The motivation for introducing this profile is outlined here.
XML Signature lets one sign parts of the XML document. In 1.x version of XML Signature [ XMLDSIG-CORE1 ], the part to be signed can be identified in one of the following ways:
ID
based
references
This
is
the
simplest
and
the
most
popular
mechanism.
An
ID
attribute
is
added
to
the
element
to
be
signed,
and
the
signature
refers
to
this
element
by
this
ID.
However
is
this
approach
has
certain
problems:
XPath Filter Transform This is the original XPath mechanism in XML Signature 1.0. It solves the three problems mentioned above, but it introduces new problems:
//chapter/
(i.e.
all
chapter
)
descendants
has
to
be
expressed
as
ancestor-or-self::chapter
(i.e.
a
boolean
expression
which
evaluates
to
true
for
a
node,
if
that
node
has
an
ancestor
call
chapter
).
Not
only
is
this
very
hard
to
understand
,
but
also
some
XPaths
cannot
be
expressed
like
this
at
all.
For
example
it
is
not
possible
to
express
/book/chapter[3]
in
this
model.
XPath Filter 2 Transform This was introduced to solve the problems in above problems [ XMLDSIG-XPATH-FILTER2 ]. However it has a few problems of its own:
XPointer
At
the
time
the
XML
Signature
1.0
was
written,
XPointer
supported
a
full
XPath
model,
but
it
was
still
under
development.
Later
on
it
split
up
into
multiple
specs,
the
full
XPath
support
remained
as
a
Working
draft
[
XPTR-XPOINTER
],
and
only
the
XPointer
element
scheme
[
XPTR-ELEMENT
]
became
an
official
recommendation.
W3C
Recommendation.
The
XPointer
element
scheme
does
not
support
generic
XPaths,
it
only
allows
basic
addressing
of
XML
elements
e.g.
element(/1/2)
identifies
the
2nd
child
of
the
root
element.
XML
Signature
2.0,
retains
all
1.x
mechanisms
for
backwards
compatibility,
but
it
introduces
a
new
mechanism
<ds2:Selection>
which
can
be
viewed
as
a
very
simplified
form
of
XPath
Filter
2
Transform.
It
consists
of
URI
which
can
be
used
for
ID
based
references
and
an
IncludedXPath
and
ExcludedXPath
for
inclusions
and
exclusions.
The
result
of
the
selection
is
subtrees
identified
by
included
XPath,
minus
the
subtrees
identified
by
excluded
XPath.
These
IncludedXPath
and
ExcludedXPath
take
a
profile
of
XPath,
and
and
it
is
this
profile
that
this
document
describes.
This 2.0 selection mechanism, has all the advantages of XPath Filter 2 transform, and additionally it is designed to support streaming. However it does the restrict the kind of selection - only subtrees with one round of subtree exclusion can be selected. This restriction is required for high performance.
The
XPath
profile
defined
in
this
document
is
one-pass
streamable.
streamable
with
single-pass
pre-order
XPath
recognition.
This
means
All the XPaths to be evaluated are known at the beginning of the one-pass.
It
should
be
possible
to
evaluate
this
XPath
these
XPaths
on
large
XML
documents
without
having
having
to
load
the
entire
document
into
memory.
The
implementation
should
read
the
XML
one
chunk
at
a
time,
and
do
a
single
forward
only
pass
over
the
document.
More
specifically,
the
XPath
engine
should
work
off
a
streaming
XML
parser.
A
streaming
parser
is
a
software
module
that
reads
the
XML
document,
and
constructs
a
stream
of
XML
events
like
"beginElement",
"text",
"endElement"
etc.
[StAX]
[
XML-PARSER-STAX
]
is
an
example
of
a
streaming
parser.
At
any
point
the
streaming
parser
only
has
the
current
event
in
memory.
The
XPath
engine
needs
Algorithm
Section
describes
one
possible
algorithm
to
evaluate
the
this
XPath
based
on
subset
with
a
streaming
parser.
In
this
stream
algorithm
each
of
events,
so
it
can
only
work
off
the
current
event,
and
some
limited
XPaths
is
converted
into
a
state
e.g.
machine
during
initialization
then
as
the
ancestors
of
this
element.
This
means
that
XML
input
document
is
parsed
with
a
streaming
parser
the
resultant
XML
events
are
fed
as
inputs
into
these
state
machines.
The
state
machines
determine
whether
the
current
XML
event
is
accepted
by
the
XPath
should
be
evaluatable
with
this
limited
information.
or
not.
The
XPath
itself
might
select
a
large
portion
of
the
document,
or
even
the
whole
document.
The
result
of
the
XPath
evaluation
should
not
be
loaded
all
into
memory
if
it
is
large.
Instead
it
should
be
pipelined
to
the
processing
stage,
which
in
case
of
XML
signature
2.0
is
canonicalization.
i.e.
an
XML
Signature
2.0
implementation
should
be
able
to
select
portions
of
the
XML
document
using
XPath,
canonicalize
the
selected
sections
and
then
compute
a
digest,
all
in
a
pipeline
with
a
limited
amount
of
memory.
stage.
The XML document may have very large text nodes. The XPath engine should not be required to load such large nodes in their entirety. Instead they should be split up into multiple text nodes, and processed one by one.
In
the
context
of
XML
Signature,
streaming
is
absolutely
essential
for
network
appliances
XML
gateways
which
need
to
perform
XML
Signature
and
Encryption
operations
for
messages
on
the
wire.
It
is
also
important
for
performance
sensitive
application
applications;
as
streaming,
streaming
improves
performance
by
conserving
memory,
which
greatly
reduces
temporary
memory
allocation,
deallocation
and
resulting
in
far
less
memory
garbage
collection
calls,
thereby
improving
performance.
collection.
For XML Signatures it is not only the XPath expressions that need to be evaluated in streaming, but the rest of the signature processing as well (e.g. canonicalization and digesting also need to be performed in streaming mode). For example a streaming Signature processor could compute Reference digests for 2.0 Signatures as follows:
Initialize
XPath
engines
for
each
of
the
<IncludedXPath>
and
<ExcludedXPath>
for
each
<Reference>
.
Initialize a "Selector", "Canonicalizer" and a "Digestor" for each Reference and put them into a pipeline.
The
"Selector"
takes
as
input
XML
events
generated
by
the
Parser.
It
checks
if
the
XML
event
is
either
a
descendant
of
the
subtree
identified
by
the
<Selection>
's
URI
attribute
or
accepted
by
the
IncludedXPath
engine
and
not
accepted
by
the
ExcludedXPath
engine.
Exclusions
always
trump
Inclusions,
and
Exclusions
also
apply
to
ID
references.
If
the
check
passes,
it
passes
on
the
XML
event
to
the
"Canonicalizer".
Note
a
<Selection>
can
either
contain
the
URI
attribute
or
a
<IncludedXPath>
subelement,
but
not
both.
But
in
either
case
it
can
optionally
contain
the
<ExcludedXPath>
subelement.
The "Canonicalizer" inputs XML events passed on by the Selector, and emits byte arrays for each event, and sends them to the "Digestor".
The "Digestor" takes byte arrays and computes a running digest.
Start parsing the current XML document using a streaming XML parser, and feed the XML events to each of the Reference processing pipelines.
At the end of parsing the pipelines will contain the computed digest for each Reference. Note the same XML event may be accepted by more than one Reference, and hence included is multiple digests.
For
simplicity
the
above
steps
make
a
simplifying
assumption
that
all
the
<Reference>
s
have
a
<Selection>
of
Type="http://www.w3.org/2010/xmldsig2#xml"
,
and
each
of
the
<Selection>
URI
s
are
same
document
references.
If
the
<Selection>
URI
s
refer
to
external
resource,
the
URI
should
be
dereferenced
to
fetch
the
external
XML
document,
and
the
XPaths
be
evaluated
on
this
external
document,
instead
of
the
current
document.
Note that it is not always possible to apply or verify XML Signatures in a one-pass streaming fashion. For instance, the verification of an enveloped signature requires an XPath for selection that matches an element that is located prior to the XML Signature itself (in document order). Hence, on signature verification, the selected elements are processed by the streaming parser before the selection XPath itself gets parsed. Example:
<Document> <DataBlock1 /> <Signature xmlns="http://www.w3.org/2000/09/xmldsig#"> <SignedInfo> [...] <Reference> <Transforms> <Transform Algorithm="http://www.w3.org/2010/xmldsig2#transform"> <dsig2:Selection xmlns:dsig2="http://www.w3.org/2010/xmldsig2#" type="http://www.w3.org/2010/xmldsig2#xml" URI="" /> [...] </Transform> </Transforms> [...] </Reference> </SignedInfo> [...] </Signature> <DataBlock2 /> </Document>
As can be seen, the XML Signature selects the whole document, hence all XML elements therein must be processed on signature verification. However, when parsing this document using a streaming approach, the verifying application might not know in advance which parts of the document are protected by the XML Signature. Hence, it will start parsing the document to extract the XPath expressions used for selection, but once it encounters that information, all the elements processed before have already been processed and dismissed (such as the <Document> and <DataBlock1> elements). Thus, these elements have not been digested, and hence there is no way to verify such an XML Signature in a one-pass streaming fashion.
Note that this impossibility of one-pass streaming is not only affecting enveloping signatures. For instance, an XML Signature verification with a selection of
<dsig2:Selection type="http://www.w3.org/2010/xmldsig2#xml" xmlns:dsig2="http://www.w3.org/2010/xmldsig2#" URI="" > <dsig2:IncludedXPath> //DataBlock1 </dsig2:IncludedXPath> </dsig2:Selection>would have also failed due to the same issue, though not being an enveloped signature. The same holds for ID-based selection if the selected elements occur prior to the XML Signature in document order.
These
problems
can
be
alleviated
by
doing
the
verification
in
two
passes,
the
first
pass
merely
scanning
the
document
for
the
<Signature>
and
the
second
pass
actually
evaluating
the
XPath
,
canonicalizing
and
computing
the
digest.
Applications
that
require
pure
one
pass
processing
should
avoid
backward
references
of
any
kind.
Apart from streaming, the XPath profile also needs to satisfy the following requirements:
The
profile
should
produce
results
that
are
compatible
with
the
C14N
2.0
data
model.
i.e
model,
i.e.
it
should
only
result
in
element
nodes
or
attribute
nodes
(but
not
xml:
attribute
and
namespace
attributes).
This XPath profile should include some of the known usages of XPath in XML Signatures.
@SOAP:actor
attribute
matches
a
certain
value.
Refer
section
4.1.3
of
[
EBXML-MSG
GovTalkMessage/Body
subtree,
but
exclude
the
GovTalkMessage/Body/IRevenvelope/IRHeader/IRmark
The
following
table
defines
tables
define
this
XPath
profile.
It
is
expressed
as
a
diff
restricted
version
of
the
XPath
1.0
grammar,
and
the
rule
numbers
here
match
the
rule
numbers
in
the
XPath
1.0
grammar
in
[
XPATH
].
Insertions
are
underlined
and
deletions
are
Although
this
grammar
appears
to
deviate
from
the
[
XPATH
]
grammar,
it
is
in
strikeout
.
fact
a
proper
subset
of
XPath
1.0,
i.e.
any
XPath
expression
defined
by
this
grammar
is
also
a
valid
XPath
1.0
expression
and
has
exactly
the
same
meaning
as
that
defined
by
the
XPath
1.0
specification.
Grammar |
Explanation |
---|---|
XPath ::= (AbsoluteLocationPath '|' )* AbsoluteLocationPath |
/a/b
|
//a[@c]
.
Note:
[
XPATH
]
allows
a
generic
Expr
in
top
level,
e.g
Relative
Location
Paths
e.g.
|
Grammar | Explanation |
---|---|
AbsoluteLocationPath ::= '/' RelativeLocationPath? | AbbreviatedAbsoluteLocationPath RelativeLocationPath ::= Step | RelativeLocationPath '/' Step | AbbreviatedRelativeLocationPath AbbreviatedAbsoluteLocationPath ::= '//' RelativeLocationPath AbbreviatedRelativeLocationPath ::= RelativeLocationPath '//' Step |
.
Double
slashes
are
.
|
Step ::= AxisSpecifier NameTest RestrictedPredicate* | AbbreviatedStep AxisSpecifier ::= AxisName '::' | AbbreviatedAxisSpecifier AbbreviatedAxisSpecifier ::= '@'? AxisName ::= 'attribute' | 'child' | 'descendant' | 'descendant-or-self' | 'following' | 'following-sibling' | 'self' NameTest ::= '*' | NCName ':' '*' | QName |
ancestor
,
ancestor-or-self
,
,
preceding
,
preceding-sibling
namespace
is
also
not
streamable,
so
it
is
disallowed
too.
Examples
of
allowed
Steps
:
This
Step
is
a
restricted
form
of
the
Step
in
XPath
1.0
in
two
aspects.
First
it
only
allows
a
restricted
set
of
Predicate
expressions
that
use
attributes
only,
this
restriction
is
described
in
the
next
section;
and
second
it
doesn't
allow
NodeTests.
e.g.
Consider
this
Step
|
/a/b[c/d]
is
not
allowed.
These
kinds
of
XPath
expressions
cannot
be
streamed
in
general,
e.g.
in
the
/a/b[c/d]
'b'
may
have
a
lot
of
children,
with
'c'
being
the
last
one.
To
determine
if
'b'
is
included
or
not,
the
XPath
processor
needs
to
traverse
through
all
the
children
of
'b',
searching
for
the
existence
of
'c'.
By
the
time
it
finds
'c',
all
the
previous
children
of
'b'
have
already
been
removed.
[22]
passed,
and
there
is
no
way
to
rewind
back
to
beginning
of
'b'.
Grammar | Explanation |
---|---|
RestrictedPredicate ::= '[' AttributeExpr ']' AttributeExpr ::= OrExpr OrExpr ::= AndExpr | OrExpr 'or' AndExpr AndExpr ::= EqualityExpr | AndExpr 'and' EqualityExpr EqualityExpr ::= RelationalExpr | EqualityExpr '=' RelationalExpr | EqualityExpr '!=' RelationalExpr RelationalExpr ::= AdditiveExpr | RelationalExpr '<' AdditiveExpr | RelationalExpr '>' AdditiveExpr | RelationalExpr '<=' AdditiveExpr | RelationalExpr '>=' AdditiveExpr AdditiveExpr ::= MultiplicativeExpr | AdditiveExpr '+' MultiplicativeExpr | AdditiveExpr '-' MultiplicativeExpr MultiplicativeExpr ::= UnaryExpr | MultiplicativeExpr MultiplyOperator UnaryExpr | MultiplicativeExpr 'div' UnaryExpr | MultiplicativeExpr 'mod' UnaryExpr UnaryExpr ::= PrimaryExpr | AttributeReference | '-' UnaryExpr AttributeReference ::= 'attribute' '::' NameTest | '@' NameTest |
An
AttributeReference
is
reference
to
an
attribute
of
the
current
element.
e.g.
|
PrimaryExpr ::= VariableReference | '(' AttributeExpr ')' | Literal | Number | FunctionCall Literal ::= '"' [^"]* '"' | "'" [^']* "'" Number ::= Digits ('.' Digits?)? | '.' Digits Digits ::= [0-9]+ FunctionCall ::= FunctionName '(' ( Argument ( ',' Argument )* )? ')' Argument ::= AttributeExpr FunctionName ::= QName - NodeType VariableReference ::= '$' QName |
"foo"
or
a
number
e.g.
23
or
a
variable
reference
e.g.
$var1
or
a
function
call
e.g.
sum(23,
$price)
.
It
can
also
be
a
complete
AttributeReference.
node(),
comment(),
text(),
processing-instruction()
are
reserved
names,
and
cannot
be
used
for
|
Node
set
functions
|
Note: All of these functions are only allowed inside a predicate. A predicate's expression can only involve attribute nodes of the current element. Functions can also be used inside this expression, but this function's arguments also have to be expressions involving attribute nodes. There is no way to use elements, text nodes comments and processing instructions in predicate expressions.
The
"string-value"
of
an
attribute
The
String,
number
and
boolean
functions
are
all
supported.
However
the
no
argument
forms
of
|
Note: The descendant and related axes can be exploited by a denial of service attacks. See section "XPath selection that causes denial of service in streaming mode" in [ XMLDSIG-BESTPRACTICES ].
This sections explains the profile with some XPath expressions that are part of this profile and some that aren't.
All the XPath examples below are based on the following XML document.
<book> <foreword> </foreword> <chapter type="preface"> </chapter> <chapter> <title>Hybridism</title> </chapter> <chapter> </chapter> </book>
Examples of XPath expression that are included in the profile.
# | Example | Description | XML Dsig 2.0 |
---|---|---|---|
1 |
/book/chapter
|
all
chapter
children
of
book
|
Y |
2 |
/book/chapter[3]
|
third
chapter
child
of
book
|
Y |
3 |
/book/chapter[@type="preface"]
|
all
chapter
children
of
book
that
have
a
type
attribute
with
value
preface
|
Y |
4 |
/book/chapter[@type="preface"][1]
|
the
first
chapter
child
of
book
that
has
a
title
attribute
with
value
preface.
|
Y |
5 |
/book/chapter[2]/title[1]
|
the
first
title
child
of
the
second
chapter
child
of
book
.
|
Y |
6 |
/book/chapter[contains(@type,"pre")]
|
all
chapter
children
of
book
that
have
an
attribute
type
whose
value
contains
the
string
"pre".
|
Y |
7 |
/child::book/child::chapter[contains(attribute::type,"pre")]
|
non abbreviated form of the above | Y |
8 |
|
chapter
children
of
book
whose
position
is
an
odd
number,
i.e.
the
odd
numbered
chapters
of
the
book.
|
Y |
|
/book/chapter[position()
mod
2
!=
0][@type="preface"]
|
all
the
odd
numbered
chapters
whose
type
is
"preface".
|
Y |
|
//chapter
|
all
chapter
descendants
|
Y |
|
/book/chapter
|
/book/foreword
|
all
the
chapter
children
of
book
and
all
the
foreword
|
Y |
|
//*
|
all the element nodes in the document. |
Y
Note: when this is used to identify a selection in XML Dsig 2.0, it is exactly equivalent to "/*" which select only the document root element. |
These are examples of XPath expressions that are NOT included in the profile.
# | Example | Description | XML Dsig 2.0 |
---|---|---|---|
1 |
/book/chapter[title="Hybridism"]
|
all
chapter
children
of
book
that
have
a
title
sub
element
with
value
Hybridism
|
N
expressions can only involve attributes |
2 |
(/book)/chapter
|
Evaluate the (/book) expression and set that to the context node, and get the chapter child of that context node. |
N
the top level expression cannot have parenthesis or any other operators except the union operator "|" |
3 |
count(/book/chapter)
|
count
the
number
of
chapter
children
of
book
|
N
|
4 |
chapter
|
all
chapter
element
children
of
context
node.
|
N
relative location paths not allowed. |
5 |
.
|
The context node. |
N
relative location paths are not allowed. |
6 |
/book/chapter/title/ancestor-or-self::chapter
|
the
chapter
ancestor
of
/book/chapter/title
.
|
N
Only child, descendant and self axes are allowed. |
7 |
/book/chapter/title/text()
|
the
text
child
of
/book/chapter/title
.
|
N
text, comment and processing-instructions cannot be selected. |
8 |
id("i1")
|
elements
that
have
ID,
whose
value
is
|
N
id()
function
is
|
9 |
/book[chapter/title]
|
the
book
element
if
it
has
a
chapter/title
grandchild.
|
N
only attributes are allowed in predicates. |
10 |
/book/*[local-name(self::node())
=
"chapter"]
|
the
children
of
book
element
whose
local
name
is
"chapter".
|
N
Only attributes are allowed in predicates. |
11 |
/book/chapter[2]/node()
|
all
the
child
elements
of
the
second
chapter
of
the
book
.
|
N
node() |
12 |
/book/chapter
or
/book/foreword
|
boolean result is true, i.e. either of the location paths evaluate to non empty. |
N
or operator is not allowed at top level. Top level expression can only be union of location paths. |
This section outlines an algorithm for a Streaming XPath engine that can execute this XPath subset. It is NOT NORMATIVE.
For
parsing:
A
streaming
XML
Parser
e.g.
[
XML-PARSER-STAX
]
which
will
produce
events
like
StartElement,
EndElement,
TextNode
etc.
This
event
stream
will
be
the
input
for
the
XPath
engine.
The
StartElement
event
includes
all
the
attributes
in
that
element.
An
streaming
XML
Parser
may
break
up
a
large
text
node
into
multiple
TextNode
events.
"|"
.
i.e.
break
up
the
locationPath
|
locationPath
|
..
into
individual
location
paths.
Dated references below are to the latest known or appropriate edition of the referenced work. The referenced works may be subject to revision, and conformant implementations may follow, and are encouraged to investigate the appropriateness of following, some or all more recent editions or replacements of the works cited. It is in each case implementation-defined which editions are supported.