We recently announced that we are planning to start redirecting all of www.w3.org to https, as is commonly done elsewhere.
Here are some notes on what we have learned so far, and answers to some questions we have received.
We receive a lot of automated requests for machine-readable resources on our site and have for many years, see for example this blog post from 2008. Due to the huge amount of traffic (hundreds of millions of requests per day) and the generic user-agent headers that are commonly in use (for example
Java/xx), it’s hard to identify the source of most of this traffic. Also the generic user-agent strings make it difficult to do targeted outreach to the developers of the software making these requests.
Therefore we decided to do some limited tests of redirecting our entire site to https so any issues could be discovered and understood. We weren’t sure if this would only impact a handful of people who could easily adapt with some simple configuration changes, or if it had the possibility of being more disruptive.
During our initial tests we heard from a few people that this was causing issues with their systems that make automated requests to our site, for example when doing XML Schema validation. We are hoping these systems can be reworked to either follow the redirects to https, or use an XML catalog to keep local copies of any files needed to avoid making unnecessary requests to our site.
Questions we have received include: what action are we expecting from Web developers? Is it necessary to update all references starting with
In general that is not necessary, and in fact in many cases those references to our site starting with
http://www.w3.org/ must be preserved exactly as is, for example in a reference to an XML namespace that must be an exact match for a given string.
If you maintain a software system that retrieves resources from www.w3.org, please check whether it has the ability to handle redirects and https and update the software if needed. Also consider carefully whether you want to keep this dependency on our site or if it would be worthwhile to rework your systems to remove it, for example using an XML catalog. We do our best to keep our systems available and performant but we have occasional service outages like any other site. We expect most people would not want their production systems to be impacted by issues with our site.
We plan to continue limited tests of this change to our site over the coming weeks and months to gather more feedback in order to understand its impact before deploying it more permanently. Depending on the results of these tests we may decide to defer this change until more software can be updated, or deploy it with specific exceptions for example continuing to serve .xsd files via HTTP while redirecting the rest of the site.
To stay informed of future tests and other updates to our systems please stay tuned to our systems status page.