HTTPS and the Semantic Web/Linked Data

Part of Data

Author(s) and publish date

Published:: 20 May 2016

In short, keep writing “http:” and trust that the infrastructure will quietly switch over to TLS (https) whenever both client and server can handle it. Meanwhile, let’s try to get SemWeb software to be doing TLS+UIR+HSTS and be as secure as modern browsers.
Sandro Hawke

As we hope you've noticed, W3C is increasing the security of its own Web site and is strongly encouraging everyone to do the same. I've included some details from our systems team below for an explanation but the key technologies to look into if you're interested are Http Strict Transport Security (HSTS) and Upgrade-Insecure-Requests (UIR).

Bottom line: we want everyone to use HTTPS and there are smarts in place on our servers and in many browsers to take care of the upgrade automatically.

So what of Semantic Web URIs, particularly namespaces like http://www.w3.org/1999/02/22-rdf-syntax-ns#?

Visit that URI in a modern, secure browser and you'll be redirected to https://www.w3.org/1999/02/22-rdf-syntax-ns#. Older browsers and, in this context more importantly, other user agents that do not recognize HSTS and/or UIR will not be redirected. So you can go on using http://www.w3.org namespaces without disruption.

This raises a number of questions.

Firstly, is the community agreed that if two URIs differ only in the scheme (http://, https:// and perhaps whatever comes in future) then they identify the same resource? We believe that this can only be asserted by the domain owner. In the specific case of http://www.w3.org/* we do make that assertion. Note that this does not necessarily apply to any current or future subdomains of w3.org.

Secondly, some members of the Semantic Web community have already moved to HTTPS (it was a key motivator for w3id.org). How steep is the path from where we are today to moving to a more secure Semantic Web, i.e. one that habitually uses HTTPS rather than HTTP? Have you/are you considering upgrading your own software?

Until and if the Semantic Web operates on more secure connections, we will need to be careful to pass around http URIs - which is likely to mean remembering to knock off the s when pasting a URI from your browser.

That's a royal pain but we've looked at various workarounds and they're all horrible. For example, we could deliberately redirect requests to things like our vocabulary namespaces away from the secure w3.org site to a deliberately less secure sub-domain - gah! No thanks.

Thirdly, a key feature of the HSTS/UIR landscape is that there is no need to go back and edit old resources - communication is carried out using HTTPS without further intervention. Can this be true for Semantic Web/Linked Data too or should we be considering more drastic action. For example, editing definitions in turtle files such as the one at http://www.w3.org/ns/dcat# to make it explicit that http://www.w3.org/ns/dcat#Dataset is owl:equivalentClass to https://www.w3.org/ns/dcat#Dataset (or even worse, having to go through and actually duplicate all the definitions with the different subject).

I really hope point 3 is unnecessary - but I'd like to be sure it is.

Background

Jose Kahan from W3C's Systems Team adds

HSTS does the client side upgrade from HTTP to HTTPS for a given domain. However, that header is only sent when doing an HTTPS connection. UIR defines a header that, if sent by browser, will tell the server it prefers using HTTPS and the server will redirect to HTTPS, then HSTS (through the header in the response) will kick in. HSTS doesn't handle the case of mixed-content. That is the other part that UIR does to complement HSTS: tell the browser to update URLs of all content associated with a resource to HTTPS before requesting it.

For browser UAs, if HSTS is enabled for a domain and you browse a document by typing its URL on the navigation bar or follow a link to a new document, the request will be sent as HTTPS, regardless of the URL saying HTTP. If the document includes a CSS file, javascript, or an image, for example and that URL is HTTP, the request for those resources will only be sent as HTTPS if the UA supports UIR.

Comments (3)

David Booth - 20 May 2016 at 18:40:26 UTC

> is the community agreed that if two URIs differ only in the scheme (http://, https:// and perhaps whatever comes in future) then they identify the same resource?

Yes, though it is a "SHOULD" rather than a "MUST". If a URI owner treats them as identifying different resources, then that URI owner is being unhelpful to the community.
Sandro Hawke - 20 May 2016 at 18:58:53 UTC

In short, keep writing "http:" and trust that the infrastructure will quietly switch over to TLS (https) whenever both client and server can handle it. Meanwhile, let's try to get SemWeb software to be doing TLS+UIR+HSTS and be as secure as modern browsers.
Makx Dekkers - 23 May 2016 at 10:30:00 UTC

I am with Sandro here. An identifier should not have to worry about secure or insecure resolution. The infrastructure should take care of that in the background, which I understand HSTS and UIR are designed to do.
As far as I see it, declaring http equivalent to https (and to what might come in the future) is not in line with https://tools.ietf.org/html/rfc3986#section-6.

Comments for this post are closed.

Standards & groups

Get involved

Resources

News & events

Support us

About

HTTPS and the Semantic Web/Linked Data

Author(s) and publish date

Background

Related RSS feed

Comments (3)