Four simple tips for web page providers

Sure, the web is full of "silos", but let's not over-dramatize. We can fix this. The web is perfectly capable of supporting social use cases. On the social web there is no clear distinction between information publisher and information consumer: users are generally doing both simultaneously.

This is easy to implement if all users are publishing (and consuming) at the same domain name. This essay will highlight what I think are the four main topics to address when we want users to communicate with each other seamlessly, even if they happen to be at different domain names. And they are suprisingly simple tricks. Sure, it requires some technology and standards, but much of that already existed a decade ago. The important lessons are in how you design your application for your users.

The following are four tips for web page providers. If you run a website where people can create a web page (you probably call them "accounts" or "logins"), then this is what you should do to make your website a better participant in the social web:

Don't login wall me

The first trick is to make public information public, and not put it behind a login wall. Twitter currently allows you to view my web page there (https://twitter.com/michielbdejong), but if you want to see who I follow (click 'following' while not logged in), then you need to log in. Why?

Quora is even worse in this respect, prompting people to log in as a first thing. Why? This means that conversations are drawn into the website where they start, rather than staying decentralized. When I consume public information on Quora or any other website, I should not be forced to log in.

Consuming limited-audience content can also be implemented without login walls: for instance with unguessable URLs. Picassa used to do this - around a decade ago it was possible to share a Picassa album by sending its URL to someone. Now, the consumer of my photo album is prompted to create a Google account for this. Again: why? All web page providers should switch back to displaying content without login walls.

Of course, once a web page provider offers its users the ability to actually publish content (i.e., without a login wall in front of it), this is big progress compared to what most users get now from their web page provider. And it's an easy win. A next concern is whether the format of this web page follows standards like semantic markup and web feed support. But once web page providers commit to breaking down their login walls, this would then be a small additional step.

The problem is not that a web page provider like Facebook is unable to engineer a product where you can actually publish stuff, or that there are not enough standards describing how to actually publish something on the web. The power to redecentralize the social web does not lie with engineers, it lies with product managers, who sometimes decide that the ability to actually publish something on the web is not what users need.

Don't name space me

So removing login walls resolves decentralized consumption of information. Once that works, we only need a few more tricks so that I can also produce content on the web, even if this content is in reaction to existing content elsewhere (for instance when replying to a Quora question). I should be able to do so on my own web page, regardless of who my web page provider is.

A key concept for this is namespace. Web page providers don't just provide hosting space, they also provide a user interface that allows users to interact with other users. This application is usually namespaced: it will allow you only to interact with users on the same domain name, not with users on other domain names. The buttons, options, and controls of this user interface usually omit the domain name part when identifying other users. For instance, Twitter's @-mention syntax only works when mentioning a user whose web page is also on Twitter.

StatusNet introduced an important improvement to the @-mention, by changing it to an '@user@host' format. This defaults to a local @-mention when the '@host' part is omitted. Granted, the syntax is maybe not as simple and clean, but in return for that, it is more powerful.

An important field of research is how such user interactions can be made name space independent. Once web page providers commit to improving their user interfaces in this way, then its technical implementation is not so difficult.

Of course, there is also the underlying technical question of how a cross-domain @-mention is "delivered": the person being @-mentioned should somehow become aware of this. In fact, it's an engineering task that was already solved for blogs, a decade ago. That is what my third tip is about.

Polyglot rel-aware linkback

It's not only mentioning or messaging another person which should trigger a cross-domain notification. The same is true for mentioning content. That way, conversations can be threaded together across domains. And hyperlinks are also used in machine-readable data, so there we need something similar too. When my data document represents a move in a chess game, then it will reference the URI of that game with a hyperlink. This idea is the essence of linked data.

The data document representing the game will also want to link to all the moves that make up the game. How can it do that? My publication of a chess move somehow needs to result in it being linked to from the main data document of the game in which the move occurs.

Notifying the other document of your link, so that it can link back to you, is called Linkback. The wikipedia page there mentions three protocols. WebMention is a simplification of Pingback, which recently gained some real-world traction.

We also have the Salmon protocol which does the same thing, but using cryptographic signatures. And then there is the rdf trackers proposal, which solves the same problem, adding a slightly more powerful query engine than the other five protocols mentioned.

OK, so that's a problem. Six protocols for linkback? And it will only work if both hosts offer the same one? That would mean as many as 83% of all linkback requests might fail due to incompatibility of the emitting host with the receiving host. How can we solve that?

We can try to invent a sort of universal standard, in the same way Esperanto tries to allow people from all nations to communicate with each other. But this would be unlikely to succeed for organizational reasons. And there is a simpler option: make each server support all six linkback protocols, plus any new ones that might pop up.

If your server speaks all linkback protocols, then even if the other server supports only one of them, the communication will be successful. This works both if you are the emitter and if you are the receiver.

To make hyperlinks more meaningful, we can use the 'rel' attribute. For instance, my blog post may have a link with rel="author", indicating that it was written by me. A useful link relation for blog posts that are a reply to another blog post is "in-reply-to".

In rdf, the predicate fulfills this role, for instance 'foaf:knows'. In an ActivityStreams document, each field indicates a relationship, for instance 'actor'. When your server receives a linkback notification for a URI, it can look up in which relation the URI was linked to, and display the hyperlink back to it in a corresponding way.

Using polyglot rel-aware linkback, adding pages into a specific conversation, chess game, or photo tag list becomes possible on the web.

Decentralized search

There is however one fourth thing that's needed before the cross-domain experience on the federated social web is as good as the same-origin one: (friend) discovery. Sometimes you want to search a content-addressable index, for instance, searching for a user by name.

We need a decentralized search index which is publically available, like a phone book, which allows search by name, or by location, or other criteria. This search index cannot be hosted at one single point of failure: several mirrors need to offer an interface to it, and exchange index data with each other. To get started with this, a few existing web page providers could pool their search indexes into one, and publish that index on bittorrent in one or more well-known formats (SQL, csv, xml, json, ...). Anybody could then take this data and instantiate a search engine.

If we only index rows of "full name, nickname, location, avatar, URL", and create a full-text index on at least the "full name" and "nick name" fields, then typing in the first few letters of a name, could yield a list of full names, with avatars and locations, with each result row linked to a URL. It is then up to the web page owner what they publish on that URL, but at least they will be findable.

It will probably not be feasible to make the whole web searchable for topics this way, in the way Google Search does, but for the specific domain of friend discovery, this would definitely be doable.

There are of course important privacy concerns that need to be looked at carefully, and one should only index pages which are intended to be public. Only people who want to be findable on the web should ever end up in this database. And they should have the option of being removed from it; it is up to the ethics of the search provider to only display results from the latest version of the database, and "erase" historic results whenever this is requested. Also, we should make sure that surfacing information that is currently "public but buried" doesn't lead to unintended side-effects.

But the core question of decentralizing (social) search, stands: if I'm findable on Twitter, then what you see there is already "publically" findable within a specific domain name silo. I then have no problem with also being findable in this decentralized search database.