This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22258 - Provide function for easy matching of HTML and DITA class attributes
Summary: Provide function for easy matching of HTML and DITA class attributes
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Working drafts
Hardware: PC Windows NT
: P2 enhancement
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-04 08:32 UTC by Jirka Kosek
Modified: 2014-09-15 09:26 UTC (History)
2 users (show)

See Also:


Attachments

Description Jirka Kosek 2013-06-04 08:32:45 UTC
Several markup languages rely heavily on using values inside particular attribute (usually named class) for specifying nature of element. It is typical in HTML when you can have markup such <div class="main"> and you need matching on @class here. But class attribute can have several values, some values are needed for additional CSS styling, e.g. <div class="main ui-scrollable-pane ui-border">.

In order to match such values you need to resort to patterns like

*[contains(concat(' ', @class, ' '), ' main ')]

Moreover matching should be case-insensitive. The similar approach is used in DITA where specialized elements are by default processed by template for more generic element which name is stored inside class attribute. DITA stylesheets are full of patterns like:

*[contains(@class,' topic/related-links ')]

It would be very useful if there will be standard function which can test if some token is present inside list of whitespace separated values, ideally this match should be done case-insensitively.

It's very easy to write custom function for this, but this is so common operation that native function that can be further optimized would be real plus.

Something like:

class-matches(value as xs:string, text as xs:string) as xs:boolean

Jirka
Comment 1 Jirka Kosek 2014-02-13 14:13:30 UTC
Also such function should have an optional argument to allow case insensitive matching. Such case insensitive matching is used in HTML class attribute.
Comment 2 Michael Kay 2014-04-28 11:00:48 UTC
It's not too difficult to write such a function yourself:

contains-token($A, $B) ::=
  tokenize($A, '\s+')!upper-case(.) = upper-case($B)

We've generally turned down requests for convenience functions that can be expressed as a one-liner unless there's a very strong case.
Comment 3 Jirka Kosek 2014-04-28 17:02:54 UTC
> We've generally turned down requests for convenience functions that can be
> expressed as a one-liner unless there's a very strong case.

Yes, user-defined function is an option and such approach is completely reasonable, but only in the context of XSLT or XQuery. XPath is often used standalone for querying document -- for example simple queries run over DOM API in application, using XPath search box in XML editor. This is probably only thing for which CSS Selectors are more convenient than XPath and I would like to keep XPath to be the best query language for HTML documents. This makes this function sort of special and for me this is a good enough reason to have such function directly available in the language.
Comment 4 Liam R E Quin 2014-04-28 22:06:56 UTC
The usual ways I have seen people match class attributes for HTML are (expanding on the original request)

(1) //div[@class = 'etymology']

and, for people smart enough to realise they might have more than one class name in the attribute,

(2) //div[contains(@class, 'etymology')]

Only very rarely do you see,

(3) //div[contains(concat(' ', @class, ' '), concat(' ', etymology, ' '))]

It's also a common request on places like stackoverflow.

It's a common need, something that many people get wrong when they try to write it, and something that would improve XPath's usage for the Open WebPlatform. I've raised it before and still support the idea, even though XPath 3 is already pretty big.
Comment 5 Michael Kay 2014-04-29 17:48:07 UTC
The WG decided today it's worth taking this as a use case and seeing what we can come up with.

Just to capture one idea I suggested during the meeting: find some way of modelling an HTML DOM as a typed XDM instance, without having to put it through schema validation. This view could also be "normalized" e.g. where attributes are case-insensitive. The value of the @class attribute would then (hopefully) appear as of type attribute(*, xs:string*), with the string value automatically tokenized and normalized to lower case. The expression 

@class = "ui-border"

would then "do the right thing".
Comment 6 Liam R E Quin 2014-07-28 16:54:58 UTC
The joint F2F meeting considered this and agreed to consider a proposal

[[
    add contains-token($string, $token, $collation)
    $collation could default to an  HTML ASCIILower URI
    define an HTML ASCIILower collation
    single-argument tokenize() that tokenizes on whitespace
    not proposing to reconcile whitespace differences - use the existing XML whitespace rules
    to resolve https://www.w3.org/Bugs/Public/show_bug.cgi?id=22258

contains-token could work like id() and search a list if given one.
That way implementations that built typed XDM instances from HTML would
continue to work.

ACTION A-579-06: liam to write proposal for contains-token($string, $token, $collation) ETA 30th July as per meeting 579

]]
Comment 7 Liam R E Quin 2014-07-29 12:42:47 UTC
The joint WGs accepted Liam's proposal to add contains-token(), resolving this issue.

Jirka, if you are not satisfied, please feel free to reopen this issue - thank you for reporting it.