[web-annotation] Normalisation of Text Quote Selector from r12a via GitHub on 2016-05-17 (public-annotation@w3.org from May 2016)

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Tue, 17 May 2016 13:00:14 +0000
To: public-annotation@w3.org
Message-ID: <issues.opened-155253248-1463490013-sysbot+gh@w3.org>

r12a has just created a new issue for 
https://github.com/w3c/web-annotation:

== Normalisation of Text Quote Selector ==
[raised by r12a, not yet discussed by the i18n WG]

4.2.4 Text Quote Selector
https://www.w3.org/TR/2016/WD-annotation-model-20160331/#text-quote-selector

> The text MUST be normalized before recording. Thus HTML/XML tags 
should be removed, character entities should be replaced with the 
character that they encode, unnecessary whitespace should be 
normalized, character encoding should be turned into UTF-8, and so 
forth. The normalization routine may be performed automatically by a 
browser, and other applications should implement the DOM String 
Comparisons method. This allows the Selector to be used with different
 encodings and user agents and still have the same semantics and 
utility. 

I think we agreed on the teleconference that normalization is not 
appropriate before establishing a range using the Text Position 
Selector (counting characters), but it **is** appropriate for the Text
 Quote Selector (which selects a string with prefix and suffix), since
 the basis for identifying that location relies on matching strings.  
I just want to be sure that that's correct, and if so that the 
reference to [DOM String 
Comparisons](https://www.w3.org/TR/2016/WD-annotation-model-20160331/#bib-DOM-Level-3-Core)
 serves the expected purpose.

Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/222 using your GitHub 
account

Received on Tuesday, 17 May 2016 13:00:21 UTC