[webauthn] String Handling section nits (#1527) from Addison Phillips via GitHub on 2020-11-21 (public-webauthn@w3.org from November 2020)

From: Addison Phillips via GitHub <sysbot+gh@w3.org>
Date: Sat, 21 Nov 2020 00:55:13 +0000
To: public-webauthn@w3.org
Message-ID: <issues.opened-747885782-1605920112-sysbot+gh@w3.org>
aphillips has just created a new issue for https://github.com/w3c/webauthn:

== String Handling section nits ==
6.4. String Handling
https://w3c.github.io/webauthn/#sctn-strings

> (too long to quote)

This section describes byte-count truncation, including considerations for both code point and grapheme cluster based truncation. This is nicely written and the illustration is very helpful. (Thank you for making this addition to your spec since our last review)

There are some potential infelicities in this chunk:

> Conforming User Agents are responsible for ensuring that the authenticator behaviour observed by Relying Parties conforms to this specification with respect to string handling. For example, if an authenticator is known to behave incorrectly when asked to store large strings, the user agent SHOULD perform the truncation for it in order to maintain the model from the point of view of the Relying Party. User-agents that do this SHOULD truncate at grapheme clusters.

* Consider changing "truncate at grapheme clusters" to "truncate at grapheme cluster boundaries" or "truncate on grapheme cluster boundaries"

> Truncation based on UTF-8 sequences alone may cause a grapheme cluster to be truncated, but still valid [UTR29]. This could make the grapheme cluster render as a different valid glyph instead of removing the glyph entirely.

* The first sentence is a little unclear, since the term "valid" (or is it "valid UTR29"??) doesn't really mean anything. A few things that are worth noting here are:

  * Some sequences, such as those that use the zero-width joiner character (ZWJ), might end up with a dangling joiner which interacts strangely with surrounding text.
  * While the example is nicely done, the visible effect is more pronounced in some languages, such as Indic scripts (where truncating a conjunct can change the appearance and meaning much more profoundly).

*Note: I18N is considering some revisions to our text about truncation in [SPECDEV](https://www.w3.org/TR/international-specs/#char_truncation), including providing more details not germane to Webauthn so that the material can be referenced in this section (and in other specs with similar issues in the future).*

> In addition to that, truncating on byte boundaries alone causes a known issue that user agents should be aware of: if the authenticator is using [FIDO-CTAP] then future messages from the authenticator may contain invalid CBOR since the value is typed as a CBOR string and thus is required to be valid UTF-8. User agents are tasked with handling this to avoid burdening authenticators with understanding character encodings and Unicode character properties. Thus, when dealing with authenticators, user agents SHOULD:
> 1.    Ensure that any strings sent to authenticators are validly encoded.
> 2.    Handle the case where strings have been truncated resulting in an invalid encoding. For example, any partial code point at the end may be dropped or replaced with U+FFFD.

* It's a little thing, but replacing a "partial code point" with U+FFFD means replacing a byte sequence that is 1, 2, or 3 bytes long with a 3 byte long sequence in UTF-8 (`0xEF.BF.BD`), that is, doing this operation may result in a DOMString whose UTF-8 representation is greater than the limit originally being imposed. As long as this isn't a problem, that's fine, but maybe worth calling out.

Please view or discuss this issue at https://github.com/w3c/webauthn/issues/1527 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Saturday, 21 November 2020 00:55:17 UTC