Copyright
©
2010
©
2011
W3C
®
®
(
MIT
,
ERCIM
,
Keio
),
All
Rights
Reserved.
W3C
liability
,
trademark
and
document
use
rules
apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This
is
a
First
an
updated
Public
Working
Draft
of
"XML
Security
2.0
Requirements
and
Design
Considerations".
Changes
from
the
previous
published
version
include:
This document includes material that was published previously for early feedback in the document titled "XML Signature Transform Simplification: Requirements and Design", see http://www.w3.org/TR/2009/WD-xmldsig-simplify-20090730/ .
This
document
was
developed
published
by
the
XML
Security
Working
Group
.
Please
send
as
a
Working
Draft.
This
document
is
intended
to
become
a
W3C
Recommendation.
If
you
wish
to
make
comments
about
regarding
this
document
document,
please
send
them
to
public-xmlsec-comments@w3.org
public-xmlsec@w3.org
(with
public
archive
(
subscribe
,
archives
).
All
feedback
is
welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This
document
was
produced
by
a
group
operating
under
the
5
February
2004
W3C
Patent
Policy
.
The
group
does
not
expect
this
document
to
become
a
W3C
Recommendation.
W3C
maintains
a
public
list
of
any
patent
disclosures
made
in
connection
with
the
deliverables
of
the
group;
that
page
also
includes
instructions
for
disclosing
a
patent.
An
individual
who
has
actual
knowledge
of
a
patent
which
the
individual
believes
contains
Essential
Claim(s)
must
disclose
the
information
in
accordance
with
section
6
of
the
W3C
Patent
Policy
.
ds:SignedInfo
This is requirements and design options for XML Security 2.0, including Canonical XML 2.0 and XML Signature 2.0.
The
Reference
processing
model
and
associated
transforms
currently
defined
by
XML
Signature
[XMLDSIG2nd]
[
XMLDSIG-CORE
]
are
very
general
and
open-ended,
which
open-ended.
This
complicates
implementation
and
allows
for
misuse,
leading
to
performance
and
security
difficulties.
Support
for
arbitrary
canonicalization
algorithms,
and
the
complexity
of
the
existing
algorithms
in
order
to
meet
various
generic
requirements
is
also
a
source
of
problems.
Current
experience
with
the
use
of
XML
Signature
suggests
that
a
simplified
reference,
transform,
and
canonicalization
processing
model
would
address
the
most
common
use
cases
while
improving
performance
and
reducing
complexity
and
security
risks
[XMLSecNextSteps]
[
XMLSEC-NEXTSTEPS-2007
[BradHill]
.
],
[
XMLDSIG-COMPLEXITY
].
This
document
outlines
a
proposed
change
to
the
XML
Signature
processing
model
to
achieve
these
goals.
It
also
outlines
use
cases
and
the
new
requirements
associated
with
the
suggested
changes.
Rather than adding an additional constrained processing model the goal is to provide for an actual replacement of the existing generically extensible model that exists now. The general approach is to define a new transformation model while allowing use of the previous model where warranted. This allows a more constrained model going forward, while enabling continued cases to continue to be supported.
The
following
design
principles
will
be
used
to
guide
further
development
of
XML
Security,
including
XML
Signature,
XML
Encryption
and
Canonical
XML.
These
principles
are
intended
to
encourage
consistent
design
decisions,
to
provide
insight
into
design
rationale
and
to
anchor
discussions
on
requirements
and
design.
This
list
includes
items
from
the
original
requirements
for
XML
Signature
[XMLDSIG-REQS]
[
XMLDSIG-REQUIREMENTS
]
as
well
as
general
principles
from
EXI
[EXI]
.
[
EXI
]
.
Listed
in
alphabetical
order:
Backward
compatibility
should
not
be
broken
unnecessarily.
Versioning
should
be
clearly
considered.
Consideration
must
be
given,
for
example,
for
interoperability
with
the
First
and
Second
Editions
of
XML
Signature
[XMLDSIG2nd]
.
[
XMLDSIG-CORE
]
.
XML
Security
must
be
consistent
with
the
Web
Architecture
[Webarch]
.
[
WEBARCH
]
.
XML Security should enable efficient implementations, in order to remove barriers to adoption and use.
One of primary objectives of XML Signature is to support a wide variety of use cases requiring digital signatures, including situations requiring multiple signatures, counter-signatures, and signatures including multiple items to be included in a signature. Extensibility should be possible, but by default options should be constrained when the flexibility is not needed.
To reach the broadest set of applications, reduce the security threat footprint and improve efficiency, simple, elegant approaches are preferred to large, analytical or complex ones.
Recognize pragmatic issues, including recognizing that software might be implemented in layers, with a security layer independent of an application layer.
Existing open standards should be reused where possible, as long as other principles can be met.
XML Security should adhere to security best practices, and minimize the opportunities for threats based on XML Security mechanisms.
XML
Security
must
integrate
well
with
existing
XML
technologies,
be
compatible
with
the
XML
Information
Set
[Infoset]
,
[
XML-INFOSET
]
,
in
order
to
maintain
interoperability
with
existing
and
prospective
XML
specifications.
XML
Signatures
should
themselves
be
self-describing
first
class
XML
objects
[XMLDSIG-REQS]
.
[
XMLDSIG-REQUIREMENTS
]
.
This
means
that
XML
Signatures
can
be
referenced
via
URI
and
used
in
other
operations.
For
example,
an
XML
Signature
may
be
signed
or
encrypted,
or
referred
to
in
a
statement
(such
as
an
RDF
statement).
Message content will be provided and processed by multiple software components acting autonomously. The XML will make use of multiple namespaces, potentially with duplicate element names.
Messages may pass through multiple intermediary nodes which may add, subtract or alter content in either the SOAP header or body.
Generally
the
ability
to
provide
ephemeral
authentication,
integrity
protection
and
confidentiality
of
message
content
including
attachments,
using
a
variety
of
technologies.
In
some
cases,
messages
with
signatures
may
be
stored
for
purposes
of
non-repudiation.
dispute
resolution.
Any or all of messages may be signed and/or encrypted zero or more times in any order. Signatures and encryptions may overlap. A receiver must be able to properly verify signatures and decrypt data in the proper order (assuming access to the necessary secrets or trust points) based on nothing but the message.
It must be possible to determine whether the correct portions of the message have been signed and encrypted with the correct keys according to policy.
To the extent possible allowed by the ordering of data and cryptographic operations it should be possible for a sender or a receiver to perform processing in a single pass over the message.
A digital image file contains the raw image data and optional metadata. This metadata contains information like the date the photo was taken, exposure information, search info, general description, etc. Now a photographer wants to use an XML signature to digital sign their photo to ensure it isn't modified by someone, but still wants allows other users to add new meta-data to their photo. This can only be done if the photographer only signs the raw image data and excludes the metadata.
The
XML
Signature
1.0
specification
allows
authors
of
XML
signatures
Signatures
to
sign
a
subset
of
an
XML
document,
but
doesn't
define
any
grammar
that
allows
a
subset
of
a
non
XML
resource
to
be
signed.
The
requirement
for
the
next
version
of
the
XML
signature
Signature
specification
is
to
define
some
grammar
a
mechanism
that
allows
a
subset
of
a
non
XML
resource
to
be
signed.
Besides
the
explicit
design
principles
and
requirements
in
[C14N-REQS]
,
[
XML-CANONICAL-REQ
]
,
the
Canonical
XML
and
Exclusive
Canonicalization
specifications
are
guided
by
a
number
of
design
decisions
that
we
present
and
discuss
in
this
section.
The basic idea of a canonical XML is to have a representation of an XML document (the output being a concrete string of bytes) that captures some kind of "essence" of the document, while disregarding certain properties that are considered artifacts of the input document (thought of, again, as an octet stream), and deemed to be safely ignorable.
The
historic
Canonical
XML
Requirements
[C14N-REQS]
[
XML-CANONICAL-REQ
]
include:
The specification for Canonical XML shall describe how to derive the canonical form of any XML document. Every XML document shall have a unique canonical form.
The canonical form of an XML document shall be a well formed XML document with the following invariant property:
Any XML document, say X, processed by a canonicalizer, will produce an XML Document X'.
X' passed through the same canonicalizer must produce X'.
X' passed through any other conforming canonicalizer should produce X', or else one of them in not conformant.
In other words, Canonicalization is historically thought of as a well-defined, idempotent mapping from the set of XML documents into itself.
In
its
main
use
case,
XML
Signature,
Canonical
XML
[C14N]
[
XML-C14N
]
(and
its
cousin,
Exclusive
Canonicalization)
Canonicalization
[
XML-EXC-C14N
])
is
actually
used
to
fulfill
a
number
of
distinct
functions:
Canonical XML is used as the canonical mapping from a node-set to an octet stream whenever such a mapping is required to connect distinct transforms to each other.
Canonical
XML
is
used
to
serialize
the
ds:SignedInfo
element
before
it
is
hashed
as
part
of
the
signing
process;
note
that
this
element
does
not
necessarily
exist
as
a
serialization.
Canonical XML is used to discard artifacts of a specific representation before that representation is hashed in the course of either signature generation or validation.
This section summarizes a number of design options that arise when some of the requirements listed above are relaxed.
It is not required to have canonicalization as general purpose transform to be used anywhere in a transform chain. Its only use would be to produce an octet stream that will be hashed.
Currently
canonicalization
is
used
whenever
there
is
an
impedance
mismatch
with
one
transform
emitting
binary,
and
next
transform
requiring
nodeset.
This
is
not
required
any
more.
of
a
2.0
version.
Also
XML
Canonicalization
is
picked
up
used
in
some
other
specs
e.g.
DSS
to
do
some
cleanup
of
the
XML.
This
is
not
required
either
of
a
2.0
version.
Assuming
that
a
canonicalization
step
is
necessary
to
be
performed
as
the
last
step
of
reference
processing
before
hashing
of
the
resulting
octet-stream,
the
requirement
that
XML
canonicalization
produce
valid
XML
could
be
relaxed.
Some
interesting
things
can
be
done
with
this
relaxation
-
namespace
prefixes
can
be
expanded
out,
tag
names
in
closing
tags
can
be
omitted,
and
EXI
serialization
format
can
be
used.
A
possible
design
is
described
in
[Thompson]
.
[
XMLDSIG-THOMPSON
]
.
ds:SignedInfo
For
every
application
of
XML
Signature,
a
ds:SignedInfo
element
needs
to
be
hashed
and
signed.
This
step
always
involves
canonicalization
of
a
document
subset.
While
some
parts
of
ds:SignedInfo
include
an
open
content
model
(
ds:Object
,
in
particular),
there
is
a
large
class
of
signatures
for
which
the
content
model
of
ds:SignedInfo
is
well-understood.
A
special-purpose
canonicalization
algorithm
might
be
cost-effective
if
it
can
reduce
the
computational
cost
for
canonicalizing
ds:SignedInfo
in
a
suitably
large
portion
of
use
cases.
This design option could manifest itself in several ways.
Constrain the classes of node-sets that are acceptable .
There is no need to be able to canonicalize a fully generic nodeset. Nodeset is an XPath concept and a generic nodeset can have many strange things - like attribute nodes without the containing element, removal of namespace nodes without removal of the corresponding namespace declarations - these kinds of things only increase the complexity of the Canonicalization algorithm without adding any value.
Instead of a generic nodeset, canonicalization needs to work on a different data model :
Start with a subtree or a set of subtrees. These subtrees must be rooted at element nodes. For example, these subtrees can't be a single text node or a single attribute node.
Optionally
from
this
set,
exclude
some
subtrees
(of
element
nodes)
or
exclude
some
attribute
nodes.
Can
only
exclude
Only
regular
attributes,
attributes
can
be
excluded,
not
attributes
that
are
namespace
declarations.
TBD
if
xml:
attributes
can
be
excluded.
declarations
or
in
the
xml
namespace.
Optionally
to
this
set,
reinclude
some
subtrees
(of
element
nodes)
nodes).
(Note:
this
is
not
supported
in
Canonical
XML
2.0,
in
order
to
support
goals
related
to
simplicity.)
Constrain the classes of XML documents that are acceptable .
Canonical
XML
currently
expends
much
complexity
on
merging
relative
URI
references
appearing
in
xml:base
parameters.
A
revised
version
of
Canonical
XML
could
be
defined
to
fail
on
documents
in
which
the
xml:base
URI
reference
cannot
be
successfully
absolutized.
Handling of namespaces is a known major source of complexity in Canonical XML (and, to a lesser extent, in Exclusive Canonicalization). At least part of this complexity is due to a design decision to preserve namespace prefixes, which in turn is necessary to protect the meaning of QNames.
Canonical
XML
should
support
the
option
of
namespace
prefix
re-
writing,
optionally
including
rewriting
prefixes
that
are
embedded
in
the
content
as
QNames.
This
can
include,
for
example,
QNames
inside
an
xsi:type
attribute.
QNames
embedded
in
xsi:type
are
easy
to
detect,
but
some
other
instances
of
QNames
in
content
may
be
hard
to
detect,
so
prefix
rewriting
may
break
the
meaning
of
QNames.
The
advantage
of
using
prefix
rewriting
is
to
avoid
attaching
significance
to
the
prefix
name
since
two
different
prefix
names
are
considered
to
semantically
equivalent
if
the
prefixes
map
to
the
same
namespace
URI.
In
this
case
they
should
canonicalize
to
the
same
value,
as
will
happen
with
prefix
rewriting.
Prefixes
may
be
rewritten
using
unique
string
values,
URIs
or
other
mechanisms,
depending
on
the
specification
design.
One
use
of
an
XML
Signature
is
for
integrity
protection,
to
determine
if
content
has
been
changed.
Content
is
identified
by
one
or
more
ds:Reference
elements,
causing
that
content
to
be
located
and
hashed.
In
the
current
XML
Signature
Second
Edition
processing
model
each
ds:Reference
may
include
a
transform
chain
to
apply
one
or
more
transforms
before
hashing
the
content
for
inclusion
in
a
signature.
Obviously
a
signature
operation
may
occur
in
a
workflow
after
various
transformations
have
been
performed
on
content,
as
long
as
the
content
can
be
identified
by
a
ds:Reference
at
the
appropriate
point.
point
in
that
workflow.
In
this
sense,
XML
Signature
could
be
viewed
as
a
step
in
a
processing
model,
for
example
in
XProc
[XProc]
.
[
XPROC
]
.
What
is
referred
to
here
is
not
such
application
processing
steps,
but
only
the
limited
case
of
transforms
defined
and
processed
as
part
of
the
XML
Signature
processing.
There
are
cases
however
where
transformations
must
occur
as
part
of
signature
processing
itself..
itself.
The
reasons
for
these
are
more
limited,
however,
so
we
propose
in
this
document
to
simplify
such
processing.
Reasons
include
the
following:
Signing only pertains to a portion of the content, but the entire content has meaning outside of signing. Thus the signing operation should be able to sign a selected portion of content (and this may be also specified by signing all apart from a portion to be excluded).
A signature XML element may be included with the content, yet upon verification the signature element itself is excluded from the content that is verified.
Some content within a signature element might be included in signing and verification (e.g. signature properties) even though the signature is not itself.
Sometimes
it
may
be
necessary
to
sign,
not
the
raw
data,
but
the
data
that
a
user
actually
sees.
This
is
called
"sign
what
you
see"
requirement
in
Section
8.1.2
of
the
XML
Signature
specification.
specification
.
This
might
require,
for
example,
using
XSLT
to
transform
the
raw
data
into
an
HTML
form,
and
signing
this
HTML
data.
Well-defined signature processing is necessary to handle needs specific to signing, but should not be expected to handle arbitrary processing that could he handled as well as part of a workflow outside of signing.
As
an
example
of
the
need
to
sign
or
verify
a
portion
of
the
content,
suppose
you
have
a
document
with
the
familiar
"office
use
only"
section.
When
a
user
signs
the
document,
the
document
subset
should
be
the
entire
document
less
the
"office
use
only"
section.
This
way,
any
change
made
to
the
document
in
any
place
except
the
"office
use
only"
section
would
invalidate
the
signature.
The
purpose
of
a
digital
signature
is
to
become
invalid
when
any
change
is
made,
except
those
anticipated
by
the
system.
signer.
Thus,
subtraction
filtering
is
the
best
fit
for
a
document
subset
signature.
By comparison, if a document subset signature merely selects the portion of the document to be signed, then additions can be made not only to the "office use only" section but also to any other location in the document that is outside of the selected portions of the document. It is entirely too easy to exploit the document semantics and inject unintended side effects. That is why exclusion is necessary. All is signed apart from the excluded portion, thus eliminating possibility of unwanted undetected additions.
There are specific requirements associated with Signature transform processing:
Enable applications to determine what is signed.
Support "see what you sign" by allowing applications to determine what was included for signing and possibly confirm that with users. The current unrestricted transform model makes it very difficult to inspect the signature to determine what was really signed, without actually executing all the transforms.
Enable higher performance and streamability
Signing XML data should be almost as fast as serializing the XML to bytes (using an identity transformer) and then signing the bytes. Currently transforms are defined in terms of a "nodeset" and a nodeset implies using a DOM parser, which is very slow. It should be possible to sign documents using a streaming XML parser, in which the whole document is never loaded in memory at once.
Avoid performance penalties and security risks associated with arbitrary transformations by restricting the possible transformation technologies.
Such generality may still be applied in a workflow outside of signature processing with this restriction.
Define a more robust canonicalization
There
are
many
problems
with
the
current
canonicalization
algorithms.
For
example
people
are
really
taken
aback
when
they
are
told
that
canonicalization
does
not
remove
whitespace
in
between
tags.
Whitespaces
in
base64
encoded
content
causes
problems
too.
as
well.
Prefix
names
being
significant
is
yet
another
source
of
issues.
Schema
aware
canonicalization
is
another
possibility,
but
this
may
have
issues
related
to
requiring
a
schema.
The
current
Transform
chain
mode
model
is
very
procedural;
it
can
have
XPath,
C14N,
EnvelopedSign,
Base64,
XSLT
etc
transforms
any
number
of
times
in
any
order.
While
this
gives
a
lot
of
flexibility
to
the
signer,
it
makes
it
extremely
hard
for
the
verifier
to
determine
what
was
actually
signed.
Applications usually follow one of these mechanisms to determine what is signed
Trust the signer completely
Some applications do not inspect the transform chain at all. They expect that signer has sent a meaningful and safe transform chain, and since the transform chain is also signed it assures that the chain has not changed in transit.
This does not work for scenarios where the verifier has little trust in the signer. As an example, suppose there is a application that expects requests to signed with the user's password, and there are tens of thousands of users. This application will of course not trust all of its users, and given the possibility of DoS attacks, and that some transforms can change which is really signed, it will not want to run a chain of transforms that it doesn't understand.
Check predigested data
Some XML signature libraries have a provision to return the predigested data back to the application, i.e. the octet stream that results from running all the transforms, including an implicit canonicalization at the end.
The predigested data however cannot be easily compared with the expected data. Suppose the application expects XML elements A, B and C to be signed, it cannot just convert A, B, C to octet streams and search for them inside the predigested data octet stream. The predigested data is canonicalized, and so the search might fail. Also this mechanism is subject to wrapping attacks, as there is no information as to which part of the original document produced this predigested data.
Check nodeset just before canonicalization
If the transform chain only has nodeset->nodeset transforms (i.e. XPath or EnvelopedSig) in the beginning, followed by one final nodeset->binary transform (i.e. a C14n transform), then an implementation can return the nodeset just before the canonicalization. Unlike the predigested data, this is much easier to compare - DOM specifically has a method to compare nodes for equality, so this method could be used to compare expected nodeset with nodeset just before canonicalization.
Unfortunately this mechanism does not work if there is any transform that causes an internal conversion from nodeset->binary->nodeset, because in such case the nodes cannot be compared any more. An XSLT transform does this kind of conversion as does the DecryptTransform.
Put restrictions on transforms
Many higher level protocols put restrictions on the transforms. For example, ebXML specifies that there should be exactly two transforms, namely XPath and then the EnvelopedSig transform. SAML specifies there should be only one transform, the EnvelopedSig transform. This is not a generic solution, but it works well for these specific cases.
The XPath transform is a very useful transform to specify what is to be signed. Id based mechanisms are simpler, but they have many problems:
An Id identifies a complete subtree, if some parts of the subtree have to be excluded an XPath has to be used.
An
Id
attribute
has
to
be
of
type
ID.
If
there
is
no
schema/DTD
information
it
is
not
possible
to
determine
the
type.
Some
implementations
get
around
this
by
having
certain
reserved
names,
e.g.
xml:id
or
wsu:id
.
These
attributes
are
allowed
everywhere
and
assumed
to
be
of
type
ID
even
if
there
is
no
schema
available.
Ids
usually
require
schema
changes
usually,
changes,
i.e.
the
schema
has
to
identify
which
elements
can
have
id
ID
attributes.
Ids can also lead to wrapping attacks.
A regular XPath Filter specifies XPaths "inside out". Anything more difficult than the simplest XPath requires using the "count" and other special functions. The XPath is often so complex it almost impossible to determine what is being signed by looking at the XPath expression.
An XPath 2.0 filter solves this problem and lets people write regular XPath, but it hasn't gained wide acceptance because it is optional. Also it offers too much unneeded flexibility allowing any number of union, intersect and subtract operations in any order. This flexibility again makes it harder for the verifier.
Unlike the ID which can only be once per reference, an XPath transform can be anywhere in the transform chain. For example, a transform chain can have XPath->C14N->XPath. A verifier getting this kind of transform chain would be clueless about the intent of the transform.
What would be preferable if instead of transforms the signature were more declarative and clearly separated selection from canonicalization. For example it could list out all the URIs, ids, or included XPaths, excluded XPaths of the the elements that are signed. Then it could apply canonicalization. This would make it easier for the verifier to first inspect the signature to determine what is signed and compare against a policy. To give one example, there might be a WS-SecurityPolicy with an expected list of XPaths. Only if this matches, will the verifier do the canonicalization to compute the digests.
The XML Signature Best Practices document [ XMLDSIG-BESTPRACTICES ] points out many potential security risks in XML Signatures.
Order of operations
Reference validation before signature validation is extremely susceptible to denial of service attacks in some scenarios.
Insecurities in XSLT transforms
XSLT is a complete programming language. An untrusted XSLT can use deeply nested loops to launch DoS attacks, or use "user defined extensions" like "os.exec" to execute system commands.
Full expansion of Nodesets
As mentioned above a full expansion of an XPath nodesets results in a huge amount of memory usage, and this can be exploited for DoS attacks.
Complex XPaths
XPath Filter 1.0 requires very complex looking XPaths, these are very hard to understand, and an application can be potentially fooled into believing something is signed, whereas is is actually not. Also complex XPaths can use too many resources.
Wrapping attacks
ID based references and lack of a mechanism to determine what was really signed can enable wrapping attacks [ MCINTOSH-WRAP ].
Problems
with
RetrievalMethod
RetrievalMethod can lead to infinite loops. Also transforms in retrieval method can lead to many attacks, and these cannot be solved by changing the order of operations.
XML Signature should not require DOM. There are existing streaming XML Signature implementations but they make various assumptions. It would be better to formalize these assumptions and requirements at the standardization level, rather than leave it up to each implementation.
DOM
parsers
have
a
large
overhead.
Suppose
there
is
a
1MB
XML
document.
If
this
loaded
into
memory
as
a
byte
array
it
remains
as
a
1MB
byte
array.
But
if
it
is
parsed
into
a
DOM
it
explodes
to
5-10x
in
size.
This
is
because
in
DOM,
each
XML
node
has
to
become
an
object.
Objects
have
overheads
of
memory
book
keeping,
virtual
function
tables
etc.
Also
each
XML
node
needs
parent,
next
sibling,
previous
sibling
pointers,
and
it
also
needs
prefix,
namespaceURI
etc,
which
could
be
objects
themselves.
All
these
eat
up
memory
and
it
is
a
popular
misconception
that
memory
is
very
cheap.
Even
if
this
memory
were
temporary
allocation
only
it
would
still
be
expensive
-
in
garbage
collected
languages
allocating
and
freeing
too
much
of
memory
triggers
then
the
garbage
collector
too
often
which
drastically
slows
down
the
system.
Also
this
10x
DOM
explosion
can
result
in
physical
memory
getting
exhausted
and
requiring
more
pages
to
be
swapped
from
disk.
That
is
why
web
services
often
use
streaming
XML
parsers
on
the
server
side.
DOM
parsers
will
croak
and
groan
if
asked
to
process
multiple
large
XML
documents
simultaneously,
whereas
streaming
XML
parsers
will
happily
chug
along
because
of
their
low
memory
consumption.
It is important to distinguish between one-pass and streamability. Streamability means not requiring to have the whole document in a parsed form available for random access, i.e. not requiring a DOM. While one pass is desirable, two pass doesn't take away all the merits of streaming. Suppose the signature value is before the data to be signed. This means that the signature value cannot be updated in the first pass, but only in the second pass - this is not really bad from the performance point of view. Let us the say the document is being streamed out into 1MB byte array, then in the first pass write some dummy bytes for this signature value and remember the location, and in the 2nd pass just update this location with the actual signature bytes, so the 2nd pass is very quick.
Also streamability does not require the ordering between the subelements of signature element. It can be assumed that the entire Signature element (assuming it is detached or enveloped signature) will be loaded up into a java/c++ object, so the order of the elements inside the Signature element does not affect streamability.
Verification in particular cannot be 1 pass - let us say you have a signed 1GB incoming message, which you need to verify first and then upload to a database. So you have to make two passes on this data - a first pass to verify and second pass to upload to the database. One cannot combine these two into 1 pass because verification result is determined only after reading the last byte.
The
main
impediment
to
streamability
is
the
transform
chain,
because
many
of
the
transforms
are
defined
on
nodesets
and
nodeset
requires
a
DOM.
An
XPath
transform
is
the
biggest
culprit
as
there
are
many
XPath
expressions
which
cannot
be
streamed.
It
is
necessary
to
define
a
streamable
subset
of
XPath.
XPath
(which
has
been
done
for
XPath
1.0,
see
[
XMLDSIG-XPATH
]).
Nodesets have another big problem. This nodeset concept was borrowed from XPath 1.0, and an XPath nodeset introduces a new kind of XML node - the namespace node. Namespace nodes are different from namespace declarations in an important way - they are not inherited. This means they need to be repeated for every node for which they are applicable. To give an example, if there is a document with 100 namespace declarations at the top element and with 99 child elements of the top element, a regular DOM will only have 200 (1 top element node + 99 child element nodes + 100 attribute nodes), whereas a nodeset will have 10,100 nodes (1 top element + 99 child element + 100*100 namespace nodes).
A
naive
implementation
which
uses
the
nodeset
as
defined
will
therefore
be
very
slow,
and
be
also
be
subject
to
various
denial
of
service
attacks.
A
smart
implementation
can
try
to
not
expand
the
nodeset
fully
and
use
inheritance,
but
they
it
won't
be
fully
compliant
with
the
XML
Signature
spec.
This
is
because
an
XPAth
XPath
filter
can
address
each
of
namespace
nodes
individually
and
filter
them
out,
even
though
it
is
meaningless
in
XML.
The
Y4
test
vector
in
the
first
interop
Exclusive
Canonicalization
Implementation
and
Interoperability
Report
has
an
example
of
this.
Because
of
these
performance
problems
some
implementations
do
not
support
this
Y4
test
vector
or
only
support
it
conditionally.
Order
XML
Signature
requires
a
profile
of
operations
XPath
to
enable
streaming.
Reference
validation
before
signature
validation
Signature
verification
can
be
done
in
two
passes.
The
first
pass
is
extremely
susceptible
a
very
cursory
pass
to
denial
of
service
attacks
collect
the
signature
element
and
signing
keys
from
the
document.
Signatures
are
often
present
in
some
scenarios.
the
beginning
of
the
document,
so
this
usually
a
very
short
pass.
At
the
end
of
the
first
pass,
the
IncludedXPath
and
ExcludedPath
are
taken
from
each
reference
and
used
to
construct
"state
machines"
from
these
XPaths.
XSLT
After
the
first
pass,
the
second
pass
is
performed.
In
this
pass
the
document
is
parsed
using
a
complete
programming
language.
An
untrusted
XSLT
can
use
deeply
nested
loops
streaming
XML
parser
to
launch
DoS
attacks,
or
use
"user
defined
extensions"
like
"os.exec"
generate
XML
events.
These
events
are
fed
into
a
state
machine.
If
the
event
is
accepted
by
an
IncludedXpath
,
but
not
accepted
by
an
ExcludedXPath
then
it
is
included,
in
that
case
the
event
is
passed
on
to
execute
system
commands.
Full
expansion
a
streaming
canonicalizer,
and
then
to
a
streaming
digestor.
At
the
end
of
Nodesets
the
second
pass
the
result
is
digests
for
each
reference.
As
mentioned
above
a
full
expansion
The
operation
and
requirements
of
an
this
XPath
nodesets
results
in
a
huge
amount
profile
is
different
from
the
requirements
of
memory
usage,
and
other
XPath
profiles,
such
as
that
for
XSLT
template
processing
[
XSLT21
].
For
this
can
reason,
XML
Security
requires
its
own
XPath
profile,
although
it
might
be
exploited
suitable
for
DoS
attacks.
other
uses
as
well.
The
reason
the
XSLT
XPath
Filter
1.0
requires
very
complex
looking
XPaths,
these
profile
is
not
suitable
is
that
the
assumptions
and
requirements
are
very
hard
different.
In
XSLT
processing
the
XPaths
are
not
known
in
advance.
The
XSLT
processor
has
to
understand,
be
ready
to
process
any
XPath
that
it
comes
across,
so
it
maintains
a
context.
This
context
consists
of
all
the
ancestors
of
the
current
element
and
an
application
some
histograms
so
that
it
can
be
potentially
fooled
into
believing
something
is
signed,
whereas
is
process
the
position()
function.
The
XPath
needs
to
evaluated
with
only
this
context
and
nothing
else.
This
is
actually
not.
Also
complex
a
fundamental
difference
from
XML
Signature
model.
In
XML
Signature,
the
XPaths
can
use
too
many
resources.
are
known
in
advance,
and
being
continuously
evaluated
for
every
node.
But
in
XSLT,
they
are
evaluated
only
once.
ID
based
references
and
lack
The
XPath
subset
is
defined
as
the
kind
of
a
mechanism
to
determine
what
was
really
signed
subset
can
enable
to
wrapping
attacks.
be
evaluated
with
the
XPath
context.
In
the
XSLT
profile,
for
example,
all
sideways
axis
are
disallowed
by
the
subset
i.e.
following,
preceding,
following-sibling,
or
preceding-sibling.
But
the
Signature
subset
allows
following,
and
following-sibling.
Another
big
difference
is
the
way
this
subset
is
defined.
XML
Signature
defines
the
subset
by
syntax.
Although
this
kind
of
definition
is
simpler
to
define
and
understand,
it
results
in
XPaths
that
are
allowed
in
one
syntax,
but
not
allowed
in
another
syntax.
e.g.
is
allowed,
but
RetrievalMethod
/a/b
(/a)/b
is
not
allowed
in
XML
Signature.
XSLT
defines
the
subset
by
a
"data
flow
graph".
This
has
restrictions
like
once
you
start
going
up,
you
can't
go
down.
(See
the
seven
such
rules
in
http://www.w3.org/TR/xslt-21/#streamability-conditions
.)
While
XML
Signature
is
very
strict
in
allowing
only
attributes
in
predicate,
XSLT
is
much
more
lax,
e.g.
/a[b]
is
not
allowed
in
XML
Signature,
but
is
allowed
in
XSLT,
because
the
rule
4
says
that
it
is
ok
to
go
downwards
as
long
you
don't
revisit
a
node
more
than
once.
RetrievalMethod
can
lead
to
infinite
loops.
Also
transforms
Another
difference
arising
from
this
evaluation
model
is
that
XSLT
allows
relative
XPaths
-
in
retrieval
method
can
lead
to
many
attacks,
fact
that
is
a
very
important
part
of
XSLT.
There
is
always
a
current
context
node,
when
evaluating
the
XSLT
XPath.
So
it
allows
parent
and
these
cannot
be
solved
by
changing
ancestor
axis.
In
summary,
the
order
of
operations.
two
subsets
have
completely
different
purpose
and
there
is
no
benefit
in
making
them
similar,
that
will
only
cripple
both
the
use
cases.
There
are
subsets
whose
use
cases
are
similar
to
be
addressed
XML
Signature
where
XPath
expressions
are
known
in
advance
and
XPath
expressions
are
used
for
selection.
An
example
is
the
new
specification.
WS-Transfer
use
case.
Thanks to John Boyer for his suggestions on this topic.
Contributions
received
from
the
members
of
the
XML
Security
Working
Group:
Scott
Cantor,
Juan
Carlos
Cruellas,
Pratik
Datta,
Gerald
Edgar,
Ken
Graf,
Phillip
Hallam-Baker,
Brad
Hill,
Frederick
Hirsch,
Brian
LaMacchia,
Konrad
Lanz,
Hal
Lockhart,
Cynthia
Martin,
Rob
Miller,
Sean
Mullan,
Shivaram
Mysore,
Magnus
Nyström,
Nyström,
Bruce
Rich,
Thomas
Roessler,
Ed
Simon,
Chris
Solc,
John
Wray,
Kelvin
Yiu.
Dated references below are to the latest known or appropriate edition of the referenced work. The referenced works may be subject to revision, and conformant implementations may follow, and are encouraged to investigate the appropriateness of following, some or all more recent editions or replacements of the works cited. It is in each case implementation-defined which editions are supported.
No normative references.