W3C TAG F2F Feb 8-10, 2011

Agenda review

<scribe> scribe: Ashok

<DKA> First draft of "product" page for privacy drafts: http://www.w3.org/2001/tag/products/PrivacyFriendlyWeb.html

HTML-XML-Divergence-67: HTML / XML Unification

<ht> http://norman.walsh.name/2011/02/08/html-xml

<ht> http://www.w3.org/wiki/HTML_XML_Use_Cases

<johnk> http://appliedlife.blogspot.com/2009/08/markup-languages-family-tree.html - I did this when I was reading the HTML5 spec last year

Noah: Norm is chairing the XML/HTML unification taskforce
... Issue-120 on HTML is on distributed extensibility
... there is also an issue on RDFa prefixes

Norm: We consituted the taskforce with a mixture of XML and HTML folks

<noah> Norm's blog entry on the state of play in the HTML/XML Unification subgroup: http://norman.walsh.name/2011/02/08/html-xml

Norm: started to figure out what the problem was ... didn't get very far
... then started on usecases

<ht> http://www.w3.org/wiki/HTML_XML_Use_Cases

<noah> Use cases wiki: http://www.w3.org/wiki/HTML_XML_Use_Cases

Norm: We can discuss the usecases

LM: They are usecase categories
... you say XML Toolcahin but there are many flavors of Toolchains with different requirements
... I don't see roundtripping

Norm: Roundtripping was something we talked about but did not make it as a usecase

LM: Some may not think some of these use cases are important ... relating them to successful use may be very helpful

ht: Kai Scheppe from Deutsche Telekom AG talked about how XHTML had been very helpful
... discusses another commercial usecase

<masinter> i think our feedback that going down to get more concrete examples that would increase credibility

ht: Such commercial usecases would be useful

<masinter> HTML is not good for data scraping....

ht: Many colleagues scrape data and waste lots of time with HTML ... XHTML is much better for them

LM: (discusses use case details) -- for example analysis and extraction, looking for keywords, summarization

Noah: Norm, could you talk about the mindset of the group and where it is going

<masinter> different detailed use cases have different requirements...e.g., "scraping" might have performance requirements, while those of "processing" care about fidelity

<masinter> round-tripping has even higher requirement for fidelity beyond import + export

Noah: Says group members ready to leave
... if we refine usecases that may convince some people to stay and work on the issue

LM: We need to solicit additional requirements from more real (esp commercial) users

Norm: Roundtripping may be a new usecase

LM: usecase is starting with HTML, doing some XML processing abd then enitting HTML

Tim: The common DOM does not work because you don't add new TBody elements

<Zakim> timbl, you wanted to wonder about scripts

LM: Using an XML Toolcahin to produce HTML -- new usecase

<noah> TBL: If the task force just nourishes and maintains the concept of polyglot, that would be very userful

Norm: The HTML folks were quick to reject the Polyglot spec as too brittle ... too strict about angle brackets etc.

<noah> Norm: Polyglot is perceived as fragile for the same reasons as any XML, I.e. too strict about perfect syntax

<masinter> (note "race to the bottom" from ht)

<noah> Noah: I don't buy that, because I think the #1 use case for polyglot is for people who are using XML tool chains or are happy to produce "perfect" syntax, but whose users require content served text/html...

<noah> ...so, they want a spec that tells them just what they can and can't put into that perfect syntax and have it work right when served text/html

ht: Argument is that producing polyglot is hard, so once someone starts using a single language everyone goes to that -- race to the bottom

<masinter> "I think "use XML toolchain to produce HTML" is the most common use case in the industry, and that polyglot is likely the most appropriate direction for them

LM: Task force might recommend changes to HTML spec, e.g., options for API to the DOM ... e.g. not failing in some way
... or include some guidance about what not to use

ht: That's the polyglot document

LM: No, it can have unbalanced brackets but does not use some features

Norm: I think there is a single DOM

<noah> New use case wiki page (very rough): http://www.w3.org/wiki/HTML_XML_Use_Case_08

<masinter> document.write is a leading example

Tim: For many people the DOM is an API ... supports the same methds

Noah: XML and HTML processors working on the DOM

<Zakim> timbl, you wanted to talk about race to the top

ht: There is html in conversation and the html out conversation

Tim: It is easy to produce polyglot documents ... avoids document.write
... run it thru tidy ... if you produce polyglot you gaet 2 sets of people using it ... html folks and xml folks
... so there will be a 'race to the top'

Noah: Sympathetic to polyglot

<masinter> polyglot is useful for use cases that weren't in the set of use cases written up

Noah: useful for simple cases ... what about using external libraries, etc.
... these may use document.write
... so does Polyglot apply in these cases

<masinter> "document.write" isn't the entire set of things that are "HTML specific DOM operations", but it's a good poster child for it

Norm: The vast majority of Web docs are using string concatenetaion and they don't want to run tidy

LM: People may be discounting Polyglot because they are not looking at right usecases

Noah: Added usecase 8

<masinter> the task force should be looking at creating a document that is acceptable to the W3C and web community... their local agreement is ok

Noah: Should we invest in improving the Polyglot document

Norm: I thought the Polyglot document went as far as it could

LM: There is a large community of people with toolchain who needs to satisfied

Norm: The taskforce will produce a report and that will be reviewed
... I was unable to persuade people to make technical changes

Noah: Talks about the taskforce and peoples motivations

<masinter> A good faith participation in a task force would be to agree on a problem statement for the task force.

larry: What is the task?

Norm: It proved to be difficult to state the problem
... so people moved on to usecases

LM: Now that you have usecase are you going to try and define the prooblem again

Noah: The tone of the taskforce has been constructive

LM: My experience is that when you are at loggerheads, bring in more people
... bring in people who need the solution

Noah: Will the real users come to the taskforce and explain their usecases?

LM: Document in the report where there is not consensus and why

Norm: Usecase number 4 is most bizzare

<masinter> the XML -> (XML/HTML polyglot ) -> XML or HTML tool chain

<masinter> and the use case of "scraping" as a kind of consuming

Noah: Some folks claim no changes are needed ... HTML is the answer and XML is not helpful

Norm: I think taskforce has gone as well as it could
... no usecase has convinced the HTML folks that they need to change

Peter: What changes are you thinking of

<noah> NW: Even the script hack can be useful.

<noah> TBL: What's the script hack?

<noah> NW: <script type="application/xml"> plus a shim that finds that stuff in the DOM and parses the XML

<noah> NW: The XQuery folks are actually doing this.

<noah> NW: On good days, you can almost imagine this is acceptable.

<noah> http://www.w3.org/wiki/Talk:HTML_XML_Use_Case_04

Noah: For running XQuery in the browser

LM: The thing that will cause change is serious users

Norm: Now that many browsers ship with XHTML support you can just use XHTML

Noah: People have different perspectives ... worried about different users

<Zakim> ht, you wanted to make the XSLT-in-the-browser poiint

ht: I'm concerned that people say that the XML to HTML problem is the same as anything to HTML

<masinter> xml & xslt use case is important

ht: so why do we have XSLT in the browser
... Use script tag to put not HTML stuff in HTML

<masinter> XML as constituted part

<Zakim> timbl, you wanted to ask about FBML

ht: what is the real substantive value of XML as how data gets on the web

Tim: Asks about FBML ... adds tags to HTML

JohnK: Facebook says they are deprecating it in favor of CSS, Javascript

<johnk> FBXML: http://developers.facebook.com/docs/reference/fbml/

Tim: Talk about lack of modularity in CSS

<masinter> many IETF specs use XML for interchange, and need presentation... would like to make sure those use cases are represented

Dan: Activity streams and other social network speca are XML-based

<masinter> XML + XSLT might be more important than XHTML?

Norm: XML has failed only in the client otherwise very useful and widely used
... some pressure to move to JSON

<DKA> Ostatus specification I mentioned: http://ostatus.org/sites/default/files/ostatus-1.0-draft-2-specification.html

<DKA> To be brought in as an input into http://www.w3.org/2005/Incubator/federatedsocialweb/

<masinter> (1) task force should agree to "change proposals" to HTML spec that encompass the proposed solutions as "best practice", perhaps by making reference to task force report.

<DKA> Leveraging (XML) activity streams spec: http://activitystrea.ms/

<masinter> (2) question about XML + XSLT vs. XHTML in priority

<Zakim> noah, you wanted to answer Henry

Noah: I don't think XSL will come and go because of the taskforce
... many apps would break if clients dropped support for it, at least in the immediate future.

<Zakim> masinter, you wanted to note that perspective, "best practice" recommendations are important

<noah> Noah: to be clearer, what I said is that XSLT won't go away in the browsers, and that's for the right reasons, I.e., it would break lots of existing deployed software if XSLT were removed.

<noah> Noah: maybe or maybe not there would be enough future value to motivate keeping it if there weren't such compatibilty issues, but I believe it will stay if only for compatibility, at least for awhile. Just my opinion..

LM: You will come up with best practices. These should be pointed to by the HTML spec

Norm: Do you think there is stuff in HTML spec that contradicts what the taskforce says? That would be interesting.
... and much tougher area

LM: Perhaps your charter should be: look at usecases and recommend best practices

Norm: I think I can get the taskforce to agree to that

<timbl> (Suppose you parse XML to a JS object not a dom .. how close is XML to JSON anyway? you have to decide whether element contents are going to be null or a string or list (mixed content)) Certainly the problem of mapping to RDF is a common problem, and a common mapping language would probably work.)

<ht> The XMLHttpRequest CR draft http://www.w3.org/TR/XMLHttpRequest/#document-response-entity-body does still 'privilege' XML, as parsed per the XML specs

Break for 20 minutes

HTML Prefixes, Namespaces and Extensibility

Noah: Describes background of issue -- decentralized extensibility in HTML
... the HTML WG did a survey that was officially of the WG membership, but the TAG also sent a note.

<noah> HTML WG held a survey, TAG input at http://lists.w3.org/Archives/Public/www-tag/2010Oct/0033.html

<noah> HTML WG Chairs' decision: http://lists.w3.org/Archives/Public/public-html/2011Feb/0085.html

Noah: The chairs chose the "no changes" option

The note says they looked for evidence that decentralized extensibility was important and did not find enough

scribe: they will look at new evidence

<noah> The main decentralized extensibility issue is http://www.w3.org/html/wg/tracker/issues/41

<noah> There is also http://www.w3.org/html/wg/tracker/issues/120 on prefixing, especially for RDFa

scribe: they say use RDFa without prefix mechanism

Noah: Back to HTML WG issue 41

Working thru mail from HTML WG re. the decision

Noah: The TAG discussed all the proposals and decided to back the "like SVG" proposal

ht: It is a qualified version of the Microsoft proposal

Tim: Re. Uncontested Observations. We did not argue for removal of existing extensibility points
... existing extensibility points have serious architectural limitations
... <object> is horrible ... would not use this to add a new form of bold

LM: Users do often understand relation between prefixes and namespaces ... some may find this confusing

Dan: Maybe we should pick our battles with HTML WG
... put on our energies into the taskforce

JohnK: Not useful to go thru the email point by point
... we want ability to add attributes with prefixes without any approval

Tim: Some people argue that if you add a namespace that is bad
... they don't have a model of special user communities of browser users

JohnK: Asks whether architectural arguments are not self-evident

<Zakim> masinter, you wanted to talk about process

Could we just list these arguments

LM: I see no point in TAG responding to HTML WG at this point
... we can advise the Director how to respond to the appeal
... better to let the HTML document get to Last Call

<johnk> johnk's specific potential architectural issues "What we mean when we say distributed extensibility

<johnk> arguments for:

<johnk> * that it should be possible for anyone to define their own markup

<johnk> extensions (and the syntactic/semantic "meaning" of said extensions)

<johnk> without permission from anyone else

<johnk> * that we should encourage these extensions to be publicly (not

<johnk> "proprietarily") available without the permission of the HTML WG

<johnk> counter-argument: encourages proprietary extensions to HTML?

<Zakim> noah, you wanted to talk about possible response

LM: It is in their charter "encouraged to find extensibility mechanisms"

<masinter> "The HTML WG is encouraged to provide a mechanism to permit independently developed vocabularies such as Internationalization Tag Set (ITS), Ruby, and RDFa to be mixed into HTML documents. Whether this occurs through the extensibility mechanism of XML, whether it is also allowed in the classic HTML serialization, and whether it uses the DTD and Schema modularization techniques, is for the HTML WG to determine."

Noah: Worth looking at how much decentralized extensibility is already in the spec...

<noah> http://dev.w3.org/html5/spec/Overview.html#conformance-requirements

Noah: I think it allows decentralized extensibility
... what it does not have is a mechanism to avoid name collisions.
... If I come up with a new element I cannot put it in a namespace if I'm using the text/html serialization. I can write a document that is an "applicable specification" describing the element; if used, the element will appear in the DOM.

scribe: I can use Javascript on this DOM node

Noah: So, you do have distributed extensibility .... what you don't have a mechanism for preventing collisions

<Zakim> ht, you wanted to support the pick our fights proposition

Noah: So, if we can agree that's the state of play, then we can decide whether we wish to raise any additional concerns.

<masinter> and also http://tools.ietf.org/html/bcp125

ht: I agree with Dan and Larry in saying that there is no point in pursuing the opportunity for pushback that is in this note

<noah> Noah: you also are, and I can see the arguments on both sides of this, losing the ability to "follow your nose" to find the pertinent specs when some random document is encountered, and that document uses applicable specs. You can't in general find the specs from the document.

<noah> Noah: with namespaces, whatever their other problems, you can.

<Zakim> masinter, you wanted to talk about http://tools.ietf.org/html/draft-iab-extension-recs-05

LM: We could respond to IETF document on extensibility ... brings in a broader perspective

<noah> Hmm, Larry says HTML is a protocol "sort of". Well, yes sort of, but I'm more familiar with the "protocols & formats formulation". HTML is more a format, and I don't think the versioning considerations for formats are in general the same as for protocols.

LM: we could look at their arguments and see if they apply to HTML
... some new evidence to bear on the process
... Another related document

RFC 4775: Procedures and Processes for Protocols Extensibility Mecahnisms

Noah: Looks like 4775 is recomending Registries

Discussion about registries

scribe: and whether they help ot hinder distributed extensibility

<noah> From: http://tools.ietf.org/html/rfc4775

<noah> " An extension is often likely to make use of additional values added

<noah> to an existing IANA registry (in many cases, simply by adding a new

<noah> "TLV" (type-length-value) field). It is essential that such new

<noah> values are properly registered by the applicable procedures,"

<masinter> the power struggle is part of it "who has control"

<masinter> but the power struggle is confounded by the technical issues

Discussion of how extensibility really works

LM: HTML decision narrow ... there were no acceptable proposals

Tim: We are trying to provide a solution for the little guy ... URLs are easy to mint

<Zakim> noah, you wanted to do a logistics & time check

<masinter> action-120?

<trackbot> ACTION-120 -- Dan Connolly to review of "Usage Patterns For Client-Side URL parameters" , preferably this week -- due 2008-03-20 -- CLOSED

<trackbot> http://www.w3.org/2001/tag/group/track/actions/120

Tim: create little community of browser users

<Zakim> ht, you wanted to mention the Accessibility parallel

<masinter> issue-120?

<trackbot> ISSUE-120 does not exist

ht: Since HTML WG have resolved Issue 41 this can wait
... you can send mail asking if we can wait on 120

<ht> In terms of thinking about advising the Director as we come up to a Process milestone at which objections wrt DistrExtens may be on the agenda, Tim's point about standing up for the little guy reminded me of a possible parallel with I18N and Accessibility -- Director's Review is the point at which unrepresented consituencies are considered

<ht> Candidate small languages for use in distr. exten. : XForms, XMP, FBML (Facebook Markup Language, now deprecated), CML (Chemical Markup Language), [Music?]

<Norm> There is a music markup language, Michael Kay brought it up as an example

<ht> I think the plugin support is already there

<masinter> scribe: masinter

<scribe> scribenick: masinter

LUNCH

XML HTML task force

ht: What is goal of his activity?

noah: goal is to help this task force be successful

norm: want to go through use case in more detail
... if there are specific use cases that aren't satisfied, especially interesting

showing http://www.w3.org/wiki/HTML_XML_Use_Case_01

ht: how many such parsers are there?

norm: I believe there are 2 or 3. Henri in Java, Sam in Ruby, someone else....

ht: when I looked a few months ago, there was no tool that did what I needed, which were 'error recovery'
... this "Solution" is at least misleading. "Truth in advertising"

larry: Henry said he found NONE. If there is NONE, it might mean that it is impossible. A solution that requires something 'impossible' isn't a solution.

noah: if parsers are needed, then ones that are needed will get built.

johnk: there isn't enough need from stand-alone parsers, such as they are extractable from browsers.

tim: I rewrote problem statement, and edited it into the "Discussion" tag

(looking at http://www.w3.org/wiki/Talk:HTML_XML_Use_Case_01)

<johnk> johnk: it hasn't yet been determined that there is enough need for a standalone HTML5 parser such that there is a clear need to separate it from other software (such as browser)

tim: I took out some of the derogatory comments that were garbage ("race to the top" vs. "race to the bottom")
... I would like a ringing endorsement of polyglot to come out of this task force.

norm: that isn't polyglot... the mapping of HTML into XML because there is an XML document that has the same DOM as the HTML

tim: the requirement to accept polylot on the priority

larry: there are really at least three very sub-categories here (HTML -> XMLO tool chain)
... (1) extract, analyze (2) round-trip (3) ...

norm: Use case 2: (looking at http://www.w3.org/wiki/Talk:HTML_XML_Use_Case_02)

tim: you need to put something in the examples to make it clear that this is not "XHTML" but XML in general, e.g., docbook

norm: not sure that this is a real use case, not a lot of enthusiasm for this

(looking at http://www.w3.org/wiki/Talk:HTML_XML_Use_Case_03 now)

larry: in #2, separate 'browser' from 'non-browser'

Examples are things like documentation

larry: copy/paste and clipboard thing is a separate use case

tim: I'm impressed that copy/paste from web to email works
... table from web page into mail message and it works

norm: I expect the techniques that it will let that work
... oxygen does a whole bunch of work to make that work

tim: thinking about the RDF case... you get a piece of HTML in the middle of RDF so that works
... if you do any form of escaping, in general there is no expectation that if you put some escaped CDATA in the XML that it has any meaning, and no expectation... this happens in RSS

norm: of the two, the escaped text is far less effective
... I noticed in the Twitter API that the identity of the submitter is escaped HTML

tim: Microsoft's odata ("almost linked data") when you get a feed it's an RSSFeed

ht: ((missed example))

<Norm> In Atom, HTML markup is sometimes escaped and sometimes not, using a type attribute to distinguish between them.

<ht> Is it expected that this will work: <object type="application/xml" data="data:,<hello xml:lang='en'>world</hello>" /> ?

noah: couldn't introduce a new tag other than 'script'

henry: in polyglot, need CDATA in script, if you need polyglot and use <> in script

or use data:application/xml,<hello ....

now looking at http://www.w3.org/wiki/Talk:HTML_XML_Use_Case_04

(discussion of XML5 document)

norm: XML community could take this up....

noah: discussion of robustness principle
... you should have the same burden to be conservative in what you said

dana: observation: people use string concatenation to produce HTML because to do otherwise wouldn't be satisfactory for performance reason... that's the implicit reason, and they are prone to error

tim: related use case: jQuery. jQuery allows you to parse .navigate + something that looks like xquery (it isn't xquery but looks like it, or css selectors) + insert things (looks like HTML), there is no reason that it actually could use implicit tags on close tags, they could do all kinds of things, the critical thing is to get the code to all fit on one line or one page
... in cases where people are stuffing strings in... for things that stuff in little bits of syntax (Turtle example), in those cases, it is a nice situation where xml tools could ive people an ability in their scripting

(have been looking at http://www.w3.org/wiki/HTML_XML_Use_Case_05)

now looking at http://www.w3.org/wiki/HTML_XML_Use_Case_06

"dead use case", a lot like use case 1

no one was prepared to stand up to do this

larry: separation between situations where things render, vs. things are auxiliary data

http://www.w3.org/wiki/HTML_XML_Use_Case_07

noah: what some subgroups don't like is "stop on first error"
...the XML Recommendation doesn't provide any interpretation or mappings for such documents, other than to say that they are indeed not well formed. One could image revisions to the XML Rec., or other specs, that would provide such interpretations, and that would support error recovery. I'm told that XML5 is an attempt in that direction.

larry: this is a kind of social engineering through spec writing that is difficult to accomplish without consensus on the goal and agreement to abide by it. Social engineering is to get senders to be conservative in what they send by having some conservative receivers that they are likely to test against.

larry: have to get agreement to do social engineering in the first place, and that the goal of having conservative senders is an important goal

noah: is it really doing the fixup you want or not?
... have the specs enable you to turn off when you want to
... how often or with how much noise or smoke would be a debate you'd have to have
... Keep in mind that a main goal for XML was for exchanging mission critical data — for that, silent recovery is not the right approach.

ht: in the first two years, the idea that we were building XML for machine-to-machine communication was not on the forefront. It was about getting information in front of humans, and the 'error handling' was there was because the arms race of forgiving viewers was harmful
... the motivation was to end the "arms race" of fixup by saying "no one will do fixup"
... that's opposite of what we're doing now, which is to say "everyone will do the same kind of fixup"

noah: could go to the community to see if there are some XML fixups that would be useful

ashok: ask the user, flag it, how aggressive a fixup, mash HTML5 fixup

peter: I have no problem with relaxing some of the rules of XML, but I wouldn't like to go all the way of tag fixup, such as happens in HTML. Leave XHTML being an XML application with all of the XML rules.
... all you're doing is allowing people to write bad XML

noah: will more people use this if we do this?

tim: too much of a pain typing the quotes around the attributes... some of those things where there is absoluetely no ambiguity, perhaps we could relax the rules.

noah: we should go only as far as necessary to get widespread adoption, vs. abandonment.

larry: 7 isn't really a use case, it's a proposed solution looking for use cases. my claim is that the proposal doesn't actually seem to solve any known problem

looking at about http://www.w3.org/wiki/HTML_XML_Use_Case_08

norm: this wasn't there earlier, should have been, because task force talked about it. "Right" answer is that XML tools should grow an HTML output method

(Larry points out again that 'round trip' is more than 'consume and produce' because round trip may have more requirements for preservation )

<DKA> Scribe: Dan

<DKA> ScribeNick: DKA

Norm: You're not likely to be cdata in script elements.
... it doesn't work if you use script elements...

Henry: A normal xml serializer would never use cdata sections...
... In all the use that many of us make of xmlspec dtd - you must use output-mode=html - because this produces <p></p> when you have empty paragraphs. Because if you produce <p/> this [messes up most browsers.]

<scribe> Scribe: masinter

<scribe> ScribeNick: masinter

noah: Norm, have you gotten useful feedback from us?

norm: I got useful feedback. I'll go back into the minutes, lots of cases for making use cases more detailed. No one has said I've gone off in all the wrong direction....
... the trajectory the task force is going to land, I have no idea what to do next....

<noah> LM: I think our role here is to figure out what the TAG should do given where the taskforce stands.

<noah> LM: I think part of our role is to help those who have a stake in XML to be more easily heard in this process. A lot don't feel they've been heard. These use cases are the vehicle.

<noah> LM: I can see that doing more can be frustrating, but I believe that someone has to do a lot more.

<noah> NW: I'm not at all unwilling to do more work, I do keep asking >what< you want me to do.

<noah> LM: I would ask Roy... (discussion tails off)

<noah> LM: Roy has an XML toolchain, and his review might be interesting.

<noah> NW: I'll break out the use cases and try to figure good candidates to provide feedback on each.

<noah> NM: You could somewhat publicly ask people for review.

<noah> NW: Prefer to do it after the report's a bit cleaner -- I don't want to be responsible for people misunderstanding the wiki in its current form

(discussion of process)

dka: in spirit of providing feedback, worth saying "kudos for doing this", amazing you've managed to make the progress you have

<noah> DKA: Major kudos to Norm for doing what is in many ways a thankless job. There's a lot of good progress here. I support publishing as a TAG note or something like that, once baked.

dka: Not only a browser group, to consider 'what changes should be considered for XML as well', we need to really believe that, to think about how this stuff could be put into place

norm: James did microXML and John Cowan has picked this up and is producing this group. Liam did agree to put something in XML Core that they may would add something into their charter revision about this.
... XML5 is an attempt to say how XML as it exists might work better, while MicroXML might be 'how to make XML smaller'; things like "namespaces aren't special"
... maybe James was thinking there might be some movement from the HTML side.

noah: how relevant will this be practically?

norm: microXML might be interesting, would like to know more what problems it solves

<Zakim> ht, you wanted to say something more about templating

ht: in terms of looking for concrete use cases, the phrase "templating" does describe some tooling that I've observed ... (XForms is a partial example of this), a successive refinement approach to producing web pages.
... there are some architecures out there that work that way... it's a mixture of HTML and proprietary markup, that push it through (not a pipeline, an interate-to-fixed-point processing step) until it gets to the point where there is nothing left but XHTML....

there is a requirement that HTML5 make it not any harder to produce (polyglot) HTML output that way than it is today

there are a lot of systems that now support IE6....

ht: maybe it is already the case that polyglot HTML5 is not harder than producing XHTML 1.1 polyglot

<ht> One example of this is the Factonomy (www.factonomy.com) Framework

<jar> on break now.

<jar> http://www.w3.org/2001/tag/2011/02/metadata-arch.html

Metadata Architecture

Slides: http://www.w3.org/2001/tag/2011/02/metadata-arch.html

issue-63?

<trackbot> ISSUE-63 -- Metadata Architecture for the Web -- open

<trackbot> http://www.w3.org/2001/tag/group/track/issues/63

action-282?

<trackbot> ACTION-282 -- Jonathan Rees to draft a finding on metadata architecture. -- due 2011-04-01 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/282

jar: slide 6.... not getting consensus
... RDFa, tooling might be different, all the deployed stuff will be called into question
... slide 7 interoperability issue: same name used for two different things
... another example, 'wants'

ht: facebooks 'likes'... one person likes the page, one person likes the screwdriver

jar: creative commons 'licenses' is clearly a problem, 'likes' or 'wants' are less
... slide 9.... new uri scheme, foaf...
... slide 9 second line shows 6 alternatives for notation

<ht> (Discussion about RDF about="" and the status of Same Document Reference)

<ht> http://www.apps.ietf.org/rfc/rfc3986.html#sec-4.4

<noah> Hmm, from http://www.apps.ietf.org/rfc/rfc3986.html#sec-4.4

<noah> "When a same-document reference is dereferenced for a retrieval action, the target of that reference is defined to be within the same entity (representation, document, or message) as the reference; therefore, a dereference should not result in a new retrieval action. "

<noah> That doesn't quite say: "The null reference identifies the same resource as the URI used to retrieve the document." Sort of an odd construction. Why? Does this matter?

JAR: I think the best way to get consensus around this is to take it to REC track.... is this a task force thing? is it an objective?

<ht> Because not all s-d-rs are null references

tim: this broke out on the linked open data list

<noah> I'm not hung up on the null part, I'm hung up on the "target is defined to be within"

<noah> That doesn't say what the URI(s) identify.

<ht> Right -- the 'within' is there because the target of "#foo" is not the target of the base URI

<noah> Yes, but it doesn't mention the resource, it mentions the representation, which is very odd.

tim: linked open data list has many people who have joined recently. Looking at that, there was some real pain expressed ... when you are producing linked data for a bunch of abstract things, it's a pain to have to do 303 all the time, and using hash wasn't satisfactory
... two things to do, "Hash is beautiful", or "add a 208"

<noah> We don't usually say that a URI identifies something within the representation, except in very unusual edge cases.

<noah> (We do in particular cases where the media type spec says it does.)

<ht> Yes, that reference/resource distinction is not well-respected here

jar: the TAG should engage on the linked open data list, or invite them to discuss it on the TAG list

<Norm> Hashes are problematic if the number of items in the document is very large.

<ht> Let's look at HTTP-bis

<noah> But if it's not well respected, then what does the above mean?

<noah> More to the point, does it matter that we straighten this out in the context of the discussion that JAR is leading?

<ht> No

<ht> I don't think

<noah> Hmm. OK.

jar: is the tag willing to engage in good faith process intended to get editor's draft

<ht> This is the answer, noah: "When a same-document reference is dereferenced for a retrieval action"

<ht> retrieval actions _are_ about representations

ashok: there are other stakeholders
... I would like "those guys" part of the discussion

noah: I think Jonathan means "Recommendation"

<ht> I agree that "is within" is bad -- it should have used wording that said "is related to in the same way that a full use of the baseURI plus #... if any is related"

<noah> JAR: right Noah, I'm proposing a formal W3C Recommendation produced using the full W3C process

noah: we had agreed to push this forward as a Rec, and then dropped the ball?

(scribe uncertain what the topic is)

ht: we have precedent for issuing documents on the rec track. We should do that with the content Jonathan is presenting to us.

tim: question is, are there alternatives for solving the problem?

jar: there are three alternatives: engage on LOD, do an architectural rec, form a new working group

<noah> ACTION: Noah to figure out where we stand with http://www.w3.org/TR/2006/WD-namespaceState-20060329/ on the rec track [recorded in http://www.w3.org/2011/02/09-tagmem-irc]

<trackbot> Created ACTION-521 - Figure out where we stand with http://www.w3.org/TR/2006/WD-namespaceState-20060329/ on the rec track [on Noah Mendelsohn - due 2011-02-16].

<noah> ACTION-521 Due 2011-03-01

<trackbot> ACTION-521 Figure out where we stand with http://www.w3.org/TR/2006/WD-namespaceState-20060329/ on the rec track due date now 2011-03-01

<noah> HT: We should do an architectural rec.

larry: if the topic is as broad as JAR's presentation, i would favor a new working group

tim: the TAG could do a focused 'nut' of the core element of httpRange-14

noah: the right thing to do would be to set off on the road of doing that in the tag
... if this worth the effort at all, set off down the road to engage the right community, have to watch IP issues
... that's the place where they or we would go on

ashok: should this be a separate mailing list?

noah: at some point we should put out an announcement, hey we're working on this

<noah> Noah: Jonathan, are you willing to actually play the leadership role in taking this down a REC track.

<noah> JAR: Yes, if the group is willing to provide reviews, or at least stay out of the way.

JAR is showing draft which might become a rec

larry: I would be more comfortably with a working group with a charter around metadata architecture, partly because i know people i would like to get to participate, who would not follow a www-tag discussion

tim: (re jar slide 15) WebArch covers this

jar: someone else holding Nadia responsible for someone else using Dirk's URI referentially
... slide 16, (why these questions are useless)
... slide 17: segue to persistence

<noah> ACTION-201?

<trackbot> ACTION-201 -- Jonathan Rees to report on status of AWWSW discussions -- due 2011-01-25 -- PENDINGREVIEW

<trackbot> http://www.w3.org/2001/tag/group/track/actions/201

<noah> ACTION-201?

<trackbot> ACTION-201 -- Jonathan Rees to report on status of AWWSW discussions -- due 2011-01-25 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/201

<noah> ACTION-201 Due 2011-03-07

<trackbot> ACTION-201 Report on status of AWWSW discussions due date now 2011-03-07

Persistence of references

<noah> ACTION-478?

<trackbot> ACTION-478 -- Jonathan Rees to prepare a first draft of a finding on persistence of references, to be based on decision tree from Oct. F2F Due: 2010-01-31 -- due 2011-01-31 -- PENDINGREVIEW

<trackbot> http://www.w3.org/2001/tag/group/track/actions/478

<noah> http://www.w3.org/2001/tag/2010/10/20-minutes#item01

jar: if you take the problem as a reference to a document, that reliably refers to some document, and you want it to work 100 years into the future....
... ... and you want that computational agent to be able to resolve it

ht: ... and the tree was an analysis of the failures?

jar: several functions: publisher producing the document; one who assigns identifier; one who archives the document for a long time; one who looks up a reference
... the 19th century view is that the description is written out in natural language (publisher, title, author, date), but "not machine friendly"
... if they're actionable, then someone can track these down

ht: the reliability of the citeseer parser for database is 70%
... datapoint... that's just correctly identifying what the parts are

<timbl> ... just parsing a reference

jar: Hybrid approach... is the hybrid approach good enough?

<Ashok> LM: dont like the term 'human-friendly' here

larry: (2) Hybrid is between (1) and all the rest

LM: "Not a URI" means a structured reference
... note there was early IETF work on "URC" which was attribute/value pairs for identiying

jar: if you write a URI, you have to have some faith that the scheme registrations are reliable

larry: date + URI (not embedded in a duri)

jar: (going through steps)
... "update all web clients" is a miracle

tim: you could install plugins in your client

lm: "not actionable" is "not actionable today"

tim: people will provide ways of resolving

ht: i own a couple of the domain names necessary for 'info' to be dereferenced

larry: note there were urn resolution protocols

jar: lsid was another example, it was never maintained

larry: xmp.iid and xmp.did in http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/DynamicMediaXMPPartnerGuide.pdf#page=19

jar: whether the http: scheme as specified is suitable for this purpose
... in the case where persistence matters, you can trust the domain owner

topic?

larry points to http://larry.masinter.net/9909-twist.pdf

<noah> Jonathan is discussing: http://www.w3.org/2001/tag/2011/02/intervention.html

jar: was on the phone two weeks ago with Dan Connolly on "ownership"

larry: Jefferson's Moose book has an interesting history about top level domain ownership
... see http://jeffersonsmoose.org/

noah: (discussion about security, DNS cache poisoning, etc.)

larry: you've identified several different roles, and each node in the tree needs to be evaluated around impact to those roles... may need to also add 'bad guys' and other players

<ht> Re the earlier aside about info:, when I explored this and its proposed (partial?) resolution mechanism, I discovered a) a dependence on certain sub-domains of the info TLD and b) the fact that several of these were either un-'owned' or in non-appropriate hands. Since then I have 'owned' lccn.info and oclcnum.info, having unsuccessfully tried to get Stu Weibel to take them on

<ht> My registration of them expires again in a few months. . .

jar: what matters is the person who writes a URI, and the person who wants to read the document, and everything else is infrastructure

larry: archivist is necessary and sufficient.... that is, if there are no archives, having long-term identifiers aren't very useful; if there is an archive, then whatever they are doing can be used for long-term identifier

ht: this might turn into a requirement for infrastructure

jar: hypothesis: it would make a difference to make the DNS root manager to admit that some part of the DNS space had some kind of persistence characteristic, or contractually held to

tim: one way to abandon DNS is to set up an alternative root

jar: then you have to convince the entire world to use that alternate root. There is no communication between Alice and Bob to indicate that they use that alternative root, unless you use another URI scheme

tim: if it's just insurance, you could make a file, and distributed by bittorrent...

jar: what if ICANN agrees that '.arc' is agreed to be (something)
... what else do i need to add to this story for the next draft

ht: I need to take the old document to see if the risks it identifies and the goals are all covered here

jar: there are lots of ways of bailing out of this?

ht: information sicence communities have different attitudes to doi

tim: what's interesting, what you want is security in the long term, having more than one solution in parallel is interesting

jar: i imagine some kind of metadata lo

http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/DynamicMediaXMPPartnerGuide.pdf#page=19

<ht> LM: Put a GUID in the document, and let search be the retrieval mechanism

<ht> JAR: Vulnerable to spoofing

<ht> HST: Use a checksum

<ht> LM: Right, use MD5 as the GUID

<ht> HST: What does the URI look like

lm: every administrative system ends

jar: the binomial system has had, in 250 years, only 10 disputes

(discussion of conflicts over defining documents for species)

noah: (banking systems -- there's a method of correcting anything that is wrong)

jar: my point is that there are systems that are relatively free of authority, that are outside of any system of authority

<ht> I note that the pblm with using a checksum is that it violates a fundamental principle of archiving, which is to keep your content usable by rolling it forward

<ht> In the old days, that meant from paper to microfilm to microfiche

<ht> now it means electronic format evolution

<noah> ACTION-478?

<trackbot> http://www.w3.org/2001/tag/group/track/actions/478

<jar> masinter said " I don't think a system can be simultaneously X, Y, and scalable"

lm: administrative, scalable, and stable
... the bigger it is, the more likely it is it will fail sooner

<noah> ACTION-478?

<trackbot> ACTION-478 -- Jonathan Rees to prepare a second draft of a finding on persistence of references, to be based on decision tree from Oct. F2F Due: 2010-01-31 -- due 2011-01-31 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/478

<jar> masinter, you have just restated zooko's triangle http://en.wikipedia.org/wiki/Zooko's_triangle

<lm> jar, no, zooko's triangle is 'secure, memorable, global' and that's a different set of things

<lm> jar, mine is: "requires administration" and "scalable" => "not reliable"

<jar> bitcoin might show a way to escape it (I'm told... need to research this)

<noah> ACTION-478 Due 2011-03-22

<trackbot> ACTION-478 Prepare a second draft of a finding on persistence of references, to be based on decision tree from Oct. F2F Due: 2010-01-31 due date now 2011-03-22

<noah> ACTION-477?

<trackbot> ACTION-477 -- Henry S. Thompson to organize meeting on persistence of domains -- due 2011-03-15 -- OPEN

<trackbot> http://www.w3.org/2001/tag/group/track/actions/477

<noah> HT: Leave it, still working on it.

TAG Meeting in June

Noah: RESOLUTION: The June F2F will be in Cambridge 6-8 June 2011

Noah: So, the meeting that was to have been 5-7 June 2011 is now scheduled 1 day earlier.

ADJOURNED

Wednesday 09 Feb 2011

Attendees

Contents

Agenda review

HTML-XML-Divergence-67: HTML / XML Unification

HTML Prefixes, Namespaces and Extensibility

XML HTML task force

Metadata Architecture

Persistence of references

TAG Meeting in June

Summary of Action Items