Abstract

The web has long had formats and mechanisms whereby content which canonically exists at one location is also available in a different form in a different location. Some of the oldest examples include RSS and other machine readable syndication formats, and the newest include content platforms such as Blendle, Facebook's instant articles and Google's AMP top stories carousel.

This raises important issues concerning the primacy of URLs and origins on the web, and the ability for users to make judgments about the trustworthiness and provenance of information they encounter while using it.

Distributed content has compelling use cases and is well supported by fundamental web technologies such as hyperlinks and iframes, but some newer approaches can present security and privacy challenges.

Status of This Document

This document has been produced by the W3C Technical Architecture Group (TAG).

The TAG approved this finding at its July 2017 face to face meeting in London. Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

1. Definitions

While the word 'syndicated' has been long used to describe the republishing of content by a third party, the emergence and growth of a new generation of mostly proprietary platforms has prompted the use of a new term within the media industry. We note that the term 'distributed content' has gained acceptance in that community (cf. the annual Reuters Institute Digital News Report), and that while the general mechanism can be applied to almost any kind of content, the most popular use case to date has been news.

In recognition of this, we use the term 'distributed content' in this finding, but do not intend to restrict the scope of the finding to news.

3. Potential concerns

Distributed content presents the greatest challenge to web architecture when it is high fidelity, has unclear attribution, is complete, lacks a reference to the canonical source, is distributed at scale, and is discovered serendipitously rather than via a conscious-choice opt-in mechanism. These challenges present as:

The web platform has, over time, developed defenses against malicious content which lean on the ability for URLs and Origins to be a primary indicator of identity and trust, such as:

Many of these defenses, which are designed to protect users, can be undermined by mechanisms for serving distributed content that conflate the content of many origins within a single origin or remove content from its source.

4. Recommendations

Sites which facilitate the consumption of distributed content should make efforts to avoid the concerns outlined above. The TAG believes in and hopes to strengthen the origin model, and has encouraged the Web App Sec WG in their work on Secure Contexts.

The TAG finds that it is essential to emphasize the value of the browser-level origin authenticity built into the Web platform as opposed to mechanisms that an untrusted content platform may choose to provide. Browsers are literally the "user's agent", and the model of the browser as a trusted gateway protecting the user from untrusted content is fundamental to balancing the needs of the user with the motivations of website owners, and therefore fundamental to the architecture of the Web.

The anchor element is designed to allow one website to refer visitors to content on another website, whilst retaining all the features of the web platform. We encourage distribution platforms to use this mechanism where appropriate. We encourage the loading of pages from original source origins, rather than re-hosted, non-canonical locations. We further discourage rewriting links within content with the purpose of keeping a user within a distribution platform.

Other potential solutions may include:

Discussion of new feature development is beyond the scope of this TAG finding, but we stress that we are open to the architecture of the web evolving and changing to accommodate user needs in a way that is compatible with its core purpose and principles.