Re: VC-JWT perma-thread (was: Re: RDF Dataset Canonicalization - Formal Proof)

Sorry I meant for this reply to David to go to the whole list.

In case I come across as some JOSE hater, I am not... I actually worked
very hard with Mike Jones and others to add JWK support to the DID Core
Spec, and have also worked to make it easy to produce JWS/JWT and LD Proofs
from the same verification material:


> On Tue, Mar 30, 2021 at 11:49 PM David Waite <>
> wrote:
>> <>[image: Ping Identity]
>> <>
>> David Waite
>> Principal Technical Architect, CTO Office
>> w: 303 468 2855
>> Connect with us: [image: Glassdoor logo]
>> <,24.htm> [image:
>> LinkedIn logo] <> [image: twitter
>> logo] <> [image: facebook logo]
>> <> [image: youtube logo]
>> <> [image: Blog logo]
>> <>
>> <>
>> <>
>> <>
>> <>
>> <>
>> <>
>> <>
>> On Tue, Mar 30, 2021 at 9:43 AM Orie Steele <>
>> wrote:
>>> Overall I agree with a lot of David's comments.
>> <snip>
>>> A couple observations....
>>> base64 in jose is a form of canonicalizing... because header and payload
>>> objects might have different orderings, but base64url encoding makes those
>>> orderings opaque... by inflating them 33%.
>> Canonicalization means to convert multiple potential representations of
>> equivalent data into a single representation. I would define what JOSE does
>> as straight-up processing transforms. The url-safe base64 encoding protects
>> the data from modification in transport.
> agreed, inflating data 33% is clearly not canonicalization.
>> You can even turn the b64 encoding step off (RFC 7797) if your payload is
>> already URL safe, or if you are doing detached signatures.
>> canonicalize in the LD Proof could be JCS, or simple sorting of JSON
>>> Keys... or RDF Data Set Normalization... each would yield a different
>>> signature...
>> Not just that - each would cover a different interpretation of data. Your
>> signature does not prevent abuse from equivalent forms.
> I suppose the same problem exists with JOSE, just no standard for how to
> interpret fields that are not registered.
>> If you are using LD-Proofs, you either need to process the resulting data
>> _as RDF_ or have additional rules for processing to further lock down any
>> abuses that might come from misinterpreting the RDF because you are looking
>> at it through a manipulated set of JSON-LD lenses.
> Here you are asserting that somehow canonicalization destroys information,
> if that were true it would be a problem. If you can't tell if some JSON is
> equivalent to some canonical form, that would also be a problem.
> Luckily both are achievable, with both JCS and RDF DataSet Canonicalization.
> I do agree that it's more work to think about canonical information
> representations than it is to inflate a payload 33% and make it url safe...
> it's also more useful for very large datasets.
>>> mechanically, the fact that JCS exists hints at the problem with JOSE...
>>> if you want to sign things, you want stable hashes, and therefore need SOME
>>> form of canonicalization for complex data structures.
>>> JOSE works very well for small id tokens, like the ones that are used in
>>> OIDC / OAuth... JOSE totally doesn't scale to signatures over large data
>>> sets without another tool.
>> Sure, you are talking about reducing arbitrary subsets of a potentially
>> modified document back to some chosen canonical form and then seeing if
>> there was a pertinent modification. This is what XML DSig was made for :-)
> I don't think I am old enough to know what XML DSig is... sounds like it
> was traumatizing : )
> If your general point is that schema based languages or types are bad, I
> would say that they increase friction and burden, and that pays off when
> the code base or problem space gets very large... again consider a generic
> solution to strongly typed data in an open world model.
>> Turns out in a lot of use-cases, that subset is usually "a well defined
>> block of data" and pertinent modifications are usually "any modification
>> whatsoever". Crypto in that case is being used for send-and-receive, or
>> archive-and-restore, and not for doing a verification as part of a larger
>> dataset.
>> When that isn't the case, you have a significantly harder task, such as
>> what is currently in progress as HTTP Message Signatures.
> Agreed, HTTP Signatures require canonicalization of the HTTP Request Data
> Structures... because they are complex, and you want to make sure everyone
> is signing things the same way.
>>> "Detached JWS with Unencoded Payload":
>>> This is how the JWS for LD Proofs are generated, and the "Unencoded
>>> payload part" is the result of the canonicalization algorithm....
>>> What would happen if we just decided to use "Unencoded Payload" without
>>> canonicalization?... maybe we just use JSON.stringify?
>> Intermediaries may do things like convert from LF to CRLF and back, so
>> you would want to keep people treating the data as binary, and make the
>> data behave as binary in transit Exchange IIRC used to change the line
>> encoding of *.txt files _inside ZIP archives_. CRLF is also now considered
>> a grapheme, and will canonicalize down in some unicode tools as well.
> I'm not sure I follow fully, but if you are suggesting a binary format
> would be better, I agree, however having worked with COSE a little, I can
> say that binary formats require a significant amount of up front tooling to
> offer the same level of developer experience that JOSE has... despite its
> limitations, JOSE is fairly trivial to implement and to debug.
>> it still works!... sorta... now I can generate a new message and
>>> signature for every ordering of data in the payload... for a really complex
>>> and very large payload, that's going to be a LOT of deeply equal objects...
>>> that each yield a different signature... this can lead to storing a massive
>>> amount of redundant but indistinguishable data... which can lead to
>>> resource exhaustion attacks.
>>> In fact, the sidetree protocol uses JCS for this exact reason...
>> The attacker still has to send all of that redundant data - and they
>> could always make it non-redundant by making any canonical change
>> (including changing the string "José" to "José".)
>> Yes, defense in depth requires validating untrusted user input... IMO
> part of that is asking for canonical representations from users... here is
> another thread on the subject:
> So I would consider this more a cache optimization (still important) than
>> an attack solution.
>> So in summary, in any JOSE library you can replace JSON with JCS and get
>>> better signatures, and developers will thank you because they won't be
>>> tracking down bugs related to duplicate content... and canonicalization can
>>> also lead to security issues if not handled properly... regardless of how
>>> you canonicalize things.
>> I'm not quite sure the scenario of "bugs related to duplicate content" -
>> if you are allowing repeated changes of data, filtering out non-canonical
>> changes is an optimization. Your policy is still apparently to allow a ton
>> of changes to data.
> canonicalization helps detect content that can lead to bugs... similar to
> how types and schemas help with that... obviously use case matters here,
> but from a tooling perspective you can use schemas and canonicalization or
> you can decide not too... for some use cases, that decision will yield a
> lot of cost for your engineering team, for others it won't.
>> Since you would be using detached signatures, you would necessarily break
>> the semantics of existing deployments and tools. You would have to define
>> the semantics for how to transfer that new data since there are no JWS+JCS
>> formats or best practices. And this would save no data over another
>> JWS+detached JSON transmission format.
> Regarding JSON over the wire, I agreed the only thing that would make JSON
> over the wire worse would be base64url encoding it.... assuming it was
> large JSON.
>> I particularly think developers in languages such as Rust, Go, and C
>> would be less than excited about the opportunity to be the first to
>> contribute a JCS implementation to their respective platforms. Even less so
>> if they find out they need to build new JSON tooling for strict Ecmascript
>> and I-JSON serialization and conformance.
> Looks like there is support in those languages and more... I suppose those
> languages are already used to being forced to support JSON in order to use
>> *CONFIDENTIALITY NOTICE: This email may contain confidential and
>> privileged material for the sole use of the intended recipient(s). Any
>> review, use, distribution or disclosure by others is strictly prohibited.
>> If you have received this communication in error, please notify the sender
>> immediately by e-mail and delete the message and any file attachments from
>> your computer. Thank you.*
> --
> Chief Technical Officer
> <>

Chief Technical Officer


Received on Wednesday, 31 March 2021 15:04:05 UTC