W3C Technical Architecture Group Status Report (January

This is a report from the W3C Technical Architecture Group to the W3C membership on TAG activities from January through April, 2011.

Administration and participation

Meetings

During the period covered by this report, the TAG held one face to face meeting: 8th-10th February 2011, MIT, Cambridge, MA.

The next F2F meeting of the TAG will be held 6-8 June 2011 at the offices of the W3C in Cambridge, MA; the TAG expects to meet again 13-15 September 2011 in either London or Edinburgh, UK, and we will also participate in the W3C Technical Plenary that's planned for Santa Clara, 31 October through 4 November 2011.

The TAG holds weekly teleconferences, typically on Thursdays.

Membership changes

We are delighted that independent consultant Jeni Tennison has accepted an appointed position on the TAG. Jeni is widely known for her expertise in XML, for her important contributions to the United Kingdom's linked open government data initiative, and for her deep knowledge of computer science and Web architecture. Jeni's appointment was announced in January of 2011, and she will serve through January of 2013.

As noted in our previous report, Peter Linss was recently elected to the TAG, and his two year term began on 1 February 2011. On that same date, Ashok Malhotra moved from an appointed to an elected position, and the terms of John Kemp and T.V. Raman came to an end. We want to thank both John and Raman for their many important contributions to the TAG.

HTML

The TAG continues to work with the HTML Working Group and other concerned members of the Web community to help refine technical details of the proposed HTML 5 Recommendation, and to assure that HTML aligns well with principles of Web Architecture, and with other specifications.

HTML / XML Unification

In our previous report, we announced that the TAG opened a new issue: (ISSUE-67: HTML and XML Divergence), and that we were creating an informal group to explore improved architectural synergy between XML and HTML5. Several experts from the HTML and XML communities are participating, and we are particularly grateful for the contributions of former TAG member Norm Walsh, who is volunteering the considerable time and effort required to chair the group.

The group held regular teleconferences during the winter months, produced a Wiki with proposed use cases for HTML/XML integration, and based on the group's analysis Norm has prepared a draft report. Norm was kind enough to join the TAG in person at our January meeting, and we expect at our June meeting to be reviewing the group's report in detail.

HTML Language Reference

As previously reported, the TAG and HTML WG chairs discovered at the technical plenary in November 2010 that an administrative error had caused confusion as to whether document http://dev.w3.org/html5/spec-author-view/ would be carried forward on the Recommendation track. This was of concern to the TAG, because that document is central to the agreed resolution of an earlier TAG request for publication of a language reference for HTML5. We are pleased to note that HTML WG chair has now confirmed that the authoring view of the HTML5 Recommendation was published in the most recent set of HTML5 working drafts, and that the working group is indeed committed to carrying it forward as a normative document on the Recommendation track. Accordingly, the TAG has closed its own ACTION-379, and we have thanked the HTML WG for their careful attention to our concerns.

Core Mechanisms of the Web

During this period, the TAG worked on the following areas relating to core mechanisms of the Web. Much of this work was done in cooperation with the IETF:

Mime and the Web

We previously reported on work that the TAG is doing to deal with inconsistencies in the use of MIME for the Web vs. for it's original application to email, etc. In late 2010, Larry Masinter prepared Internet Draft: MIME and the Web (draft-masinter-mime-web-info-02), which explores some of the issues, and which suggests development of a new Best Common Practice for registration of Internet Media Types and Charsets. We continue to work on refining this document, but we are now focusing especially on issues relating to use of registries (see next section).

Registries

There are many points in Web architecture where working groups, standards organizations, or other users of the Web have the opportunity to introduce new keywords, vocabularies, data formats, URI schemes, etc. Typically, machine readable identifiers are required for each one, some means is usually required to ensure that such identifiers are not unintentionally assigned for multiple uses, and usually it's desirable to have some means of finding pertinent specifications or software based on the identifier.

The TAG has recently increased its focus on questions relating to the use of centralized registries to manage the allocation and documentation of such identifiers. There are at least several reasons why this appears to be an important issue at this time, including:

TAG members Larry Masinter and Henry Thompson represented the TAG at the recent IETF meeting in Prague, and Larry reports that significant progress was made in organizing work relating to registries. We expect that the TAG will remain closely involved in this work.

Fragment identifiers in RDFa

The TAG has been working during the past few months to resolve concerns relating to the use of fragment identifiers in RDFa and for other Semantic Web applications. RFC 3986 states: "The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation". In the case of RDFa in text/html, the pertinent media type is RFC 2854 which specifies "For documents labeled as text/html, the fragment identifier designates the correspondingly named element"; for XML documents, and thus for application/xhtml+xml RFC 3023 suggests XPointer as a likely fragment syntax. The usage in RDFa appears to be inconsistent with both of these specifications.

The TAG is therefore actively exploring a constellation of issues relating to the interpretation of fragment identifiers, the specification of such interpretations in media type registrations (which relates to the registry question described above), content negotiation in HTTP, and common practice in the Semantic Web community.

Domain name persistence

The TAG continues to explore steps that the W3C might take to facilitate the use of URIs, and in particular http-scheme URIs, as stable identifiers that can be relied upon to remain associated with the intended resource over a very long period of time. Such stable identifiers are required, for example, for references to scholarly publications, for use in libraries, etc. For a variety of reasons, URIs are perceived not to have the required characteristics today. Well known issues relate to the way in which the DNS names are assigned and reassigned, the lack of a means of assigning such a DNS name for more than a few years at a time, etc. Jonathan Rees has been leading the TAG's efforts to understand the requirements and concerns, and to propose steps that might make URIs more appropriate for persistent references.

IETF Liaison

Henry Thompson presented an introduction to the TAG and its work to a plenary panel at the March IETF meeting in Prague. In addition to introducing the TAG itself, Henry described several of the technical issues that the TAG is working on that may be of interest to the IETF. One example relates to the scalability of access to resources: how does the Web implement flow control or otherwise gracefully degrade when very large numbers of requests are directed to one particular resource? (We've seen this, e.g., when various tools repeatedly re-request copies of the HTML DTD from the W3C Web site.)

Plans are being made to increase direct contact between the TAG and the Internet Architecture Board (the IAB is the closest counterpart to the TAG at the IETF). We expect to have a joint IAB/TAG teleconference in conjunction with the TAG's June F2F.

At the IETF meeting in Prague, Jim Gettys of Alcatel Lucent Bell Labs made a significant presentation about so-called Buffer Bloat. In short, the provision of excess buffering in network switches, access points, Web software etc. is contrary to specifications, and undermines the ability of TCP to adapt appropriately to congested links. Jim suggests that Web user agents are aggravating the problem by opening too many HTML streams in parallel. As buffers fill, network latencies rise, sometimes to significant fractions of a second or more, DNS access is delayed or times out, and other high priority traffic is delayed.

With the possible exception of the user agent issue mentioned above, it's not yet clear whether the TAG needs to be involved in the resolution of these problems, but there is a concern that buffer bloat is a significant threat to the robustness of the Web. Accordingly, we will be having Jim join us for a discussion at our June F2F.

Web Application Architecture

The Web was initially a system for sharing documents, typically in HTML. Languages like JavaScript were then introduced, in part to provide somewhat more dynamic rendering, or for aids to navigation. Later, a new class of Web applications has emerged: the browser is now used as a container for applications that may execute for extended periods, that integrate information from diverse sources, and that provide users with the ability to navigate among states while remaining in the same application. Some of these applications also store information for offline use. HTML is used not as a representation of an individual document, but as a framework for hosting complex program logic, which is typically coded in JavaScript.

The TAG continues to focus on the many architectural issues raised by these new Web applications. The following sections describe several of the application-related areas in which the TAG has been active:

Client-side state

TAG member Ashok Malhotra is revising a draft TAG Finding on Client-side State in Web applications, adapting earlier work that was done by outgoing TAG member T.V. Raman. The traditional "Web of Documents" tends to involve little if any manipulation of state at the client, except for very localized control of things like scrolling and history lists. With AJAX-style Web applications, the use of URIs and other state-related mechanisms like cookies becomes more complicated. The relationship between the URI that the user initially accesses, the data that's actually displayed, and the URIs that are shown in address bars and history lists, is not as clear as with traditional uses of the Web. The draft finding explores these issues, with the intention of eventually suggesting best practices for the construction of such applications.

As part of its work on client side state, the TAG has explored in depth the recent proliferation of so-called #! pronounced ("hash bang") URIs, which have been introduced by widely used services such as Twitter. A full explanation of #! and the concerns surrounding it is beyond the scope of this report, but an excellent introduction is provided in a blog posting by TAG member Jeni Tennison. Indeed, we expect that some of the material from Jeni's posting will be adapted for use in the client-side state finding.

Client-side storage

In addition to the relatively short-lived "state" discussed above, Web applications also make use of facilities such as SQL stores or HTML5 Web Storage to preserve data between browser sessions. The TAG is exploring best practices for use of such storage, and for using (when appropriate) URIs to identify the information stored, and is considering the pros and cons of maintaining local/remote transparency (consider an e-mail application: should the e-mail be accessed using the same URI, regardless of whether the access is to a server, or to a copy of the e-mail in local Web storage?) Relatively little progress on this was made during the period covered by this report, but we expect to refocus on this once the client-side state finding is closer to final publication.

API Design for Web Applications

As discussed below in the section on privacy, we are exploring the benefits of API "minimization" as an approach to reducing unnecessary exposure of user data.

Metadata access and representations

The TAG continues to explore the use of the Web for storing and retrieving metadata, and for associating that metadata with the data to which it applies. During the period covered by this report, the TAG has focused especially on the use of HTTP to access Web documents, metadata about those documents, and other Semantic Web information. There are at least two significant aspects to this work.

The TAG decided some time ago that having a more formal and rigorous exposition of the semantics of HTTP interactions might be helpful as a framework for resolving concerns such as the one discussed above. For several years, an informal group of Semantic Web experts has been working under the general auspices of the TAG, and significant progress has been made in recent months. Jonathan Rees is the main TAG representative to this effort.

Those who have followed the TAG for long enough will remember the long and difficult discussions that the TAG held between 2002 and 2005, examining the use of HTTP status codes in the case where an HTTP GET is to return not a representation of the identified resource, but information (e.g. RDF) about that resource. In 2005, the TAG adopted a resolution recommending the use of status code 303 redirections in this case. The TAG has within the past few months been contacted by members of the user community who express dissatisfaction with this resolution, based in part on concerns about performance. There are also concerns about the rate at which the 303 approach has been deployed in practice.

The TAG continues to discuss all of these concerns, with the intent of producing a finding that will clarify the semantics of the pertinent specifications, and recommend good practice for deployment of and access to metadata resources.

Policy-related work

The TAG's remit is to do technical and architectural work, but often that work is motivated by policy issues that have been raised by others. This section discusses two such areas in which the TAG has been active

Privacy

The TAG continues to focus on architectural challenges relating to maintaining appropriate levels of privacy for users of the Web.

TAG member Dan Appelquist is drafting a TAG finding that explores the design of Web APIs (I.e. typically JavaScript APIs) that minimize the the transfer of information beyond that specifically required for successful execution of an application. For example, an application that requires the postal code for some person should be able to get it without also being presented with her age, her userid, etc. We call this approach to API design "API minimization", and we are working in cooperation with the W3C Web Applications working group to produce useful, practical guidance regarding the design of such APIs.

TAG member Ashok Malhotra represented the TAG at the just concluded W3C Workshop on Web Tracking and User Privacy .

The TAG is working closely with W3C Technology and Society Domain leader Thomas Roessler so that our efforts will be well coordinated with the W3C's growing investments in privacy-related work.

Copyright and deep linking

In September of 2003 the TAG published a finding "Deep Linking" in the World Wide Web. Deep linking is described there as "the practice of publishing a hyperlink from a page on one site to a page "inside" another site, bypassing the home or portal page." That finding concluded that: "any attempt to forbid the practice of deep linking is based on a misunderstanding of the technology, and threatens to undermine the functioning of the Web as a whole."

Nonetheless, questions continue to arise as to whether certain sorts of linking should be prohibited or discouraged, and in particular whether the act of publishing a link to copyrighted material might in any circumstance cause a violation of the copyright.

The TAG recognizes that it has neither the legal expertise to address such questions, nor the responsibility by charter to do so, but we do feel it's important that we explain to the community as clearly as possible what the specifications say regarding the implications of operations performed on the Web, that we help people to understand how the pertinent mechanisms of the Web work in practice, and that we facilitate discussion that is clear and technically accurate.

As a particular example, it's useful to clarify the distinction between simple linking and transclusion. Typically, a simple link must be clicked in order for referenced content to appear, and there is at least an opportunity to warn users of copyright issues before the link is traversed; when mechanisms such as HTML <img src="xxx"> are used, the referenced resource (xxx) is displayed without any intervention by the human reader, and indeed that person may have no easy way of noticing that content from multiple sources has been combined. Web caches, proxies, crawlers, and prefetch agents are other Web capabilities that must be understood and carefully considered by anyone who is creating laws or rules that relate to making copies of information using the Web.

As an initial step, the TAG is working on a document that attempts to explain concepts such as Web caching and transclusion. Once that is in place, we will consider whether to go further in making recommendations to those who set policy or write copyright legislation.

About the Technical Architecture Group

The Technical Architecture Group (TAG) was created in February 2001. Three TAG participants are appointed by the Director and five TAG participants are elected by the Advisory Committee. The mission of the TAG is stewardship of the Web architecture. Included in this mission is building consensus around principles of Web architecture, resolving issues involving Web architecture, and helping to coordinate cross-technology architecture developments inside and outside W3C.

Details on TAG activities can be found from the TAG home page. The TAG meets weekly via teleconference and several times each year in person. Summaries (such as this one) of the TAG's activity are provided periodically to the W3C Advisory Committee, W3C working group chairs, and to the public TAG mailing list (www-tag archive). The TAG welcomes public discussion of open issues, as well as proposals for new issues, on that same list. The TAG's previous status report was published in January, 2011, covering the period through December, 2010.

¹ Two-year term begins 1 Feb 2010 (see election results).
² Two year term begins 1 February 2011 (see election results).
³ Re-appointed on April 7, 2010 to serve remainder of two year term (due to change in affiliation).
⁴ Two year term begins 1 February 2011 (see appointment announcement).
Yves Lafon (W3C) is staff contact for the TAG.

W3C Technical Architecture Group Status Report (January - April, 2011)