ActivityPub/Primer/proxyUrl endpoint

From W3C Wiki

The proxyUrl endpoint lets ActivityPub API clients request ActivityPub objects from remote servers. It provides a bridge between the client-to-server authentication mechanism and the server-to-server authentication mechanism.

For example, if an ActivityPub server uses OAuth 2.0 for client-to-server authentication, and HTTP Signature for server-to-server authentication, the client software can POST the id of the ActivityPub object it wants to retrieve to the proxyUrl on its own server with an OAuth 2.0 bearer token, and the server will retrieve the remote object using HTTP Signature authentication, and return the value to the client.

The ActivityPub API client can't use its OAuth 2.0 token with the remote server, because that server didn't grant the token, and has no way of validating it. The API client also cannot use HTTP Signature directly, since the private key for the user is kept on the ActivityPub server.

This allows ActivityPub API clients to retrieve remote objects with the authorization of their current user. This is important when the remote object was not addressed to the Public.

Discovery

The proxyUrl for an ActivityPub actor is found in the endpoints collection of the actor profile. For example:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://social.example/users/evanp",
  "type": "Person",
  "inbox": "https://social.example/users/evanp/inbox",
  "outbox": "https://social.example/users/evanp/outbox",
  "following": "https://social.example/users/evanp/following",
  "followers": "https://social.example/users/evanp/followers",
  "liked": "https://social.example/users/evanp/liked",
  "endpoints": {
    "proxyUrl": "https://social.example/application/proxy"
  }
}

The proxyUrl endpoint doesn't have to be unique to the actor; all actors using the same account server might have the same proxyUrl value. However, because it may be unique, ActivityPub API clients should check it for each user.

There is not a way to specify a separate authentication mechanism for the proxyUrl; the proxyUrl server should use the same authentication mechanism as the rest of the ActivityPub API. For example, the same OAuth 2.0 token should be usable for reading the actor's inbox and for posting to the proxyUrl.

Calling

The ActivityPub API client can retrieve a remote object by sending a POST request to the ActivityPub API server hosting the proxyUrl.

POST /application/proxy HTTP/1.1
Host: social.example
Authorization: Bearer OAUTH_TOKEN_HERE
Content-Type: application/x-www-form-urlencoded

id=https://remote.example/users/dustyweb/notes/334

The result can be an Activity Streams 2.0 representation of the object:

200 OK
Content-Type: application/ld+json; profile="https://www.w3.org/ns/activitystreams"

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "id": "https://remote.example/users/dustyweb/notes/334",
  "type": "Note",
  "attributedTo": "https://remote.example/users/dustyweb",
  "to": "https://remote.example/users/dustyweb/followers",
  "contentMap": {
    "en": "Hello, followers and only followers!"
  }
}

The proxyUrl can also be used to fetch remote binary data, like images, video, or audio files:

POST /application/proxy HTTP/1.1
Host: social.example
Authorization: Bearer OAUTH_TOKEN_HERE
Content-Type: application/x-www-form-urlencoded

id=https://remote.example/uploads/dustyweb/selfie10.jpg

The result will be a binary representation of the object:

200 OK
Content-Type: image/jpeg

<Image data here>

Implementation

The ActivityPub server has a number of implementation issues for proxyUrl.

  • Authentication. The incoming request must be authenticated with the local client API authentication method. Otherwise, the server is acting as an open proxy, which can be a major security concern.
  • Caching. It may make sense to cache the results of a proxyUrl request so that the same values can be returned for future requests. Care should be taken to avoid returning cached data to local users that are not authorized to use it. For example, if user1@domainA requests https://domainB/data through the proxy URL, and the data is cached, the server at domainA should be careful only to return the cached data to user2@domainA later if the server knows that user2 has access to the data (for example, the data is public, or it is addressed specifically to the user2, or it is addressed to a collection, like a followers collection, that the server knows user2 is a part of). This doesn't consider user-level blocks against user2 on domainB! The complexity of this kind of authorization may make it better to avoid serving cached data except for the original requester.
  • Streaming. One method of implementation is to read the entire remote response into memory, and then write that response to the client. Although this can be useful for Activity Streams 2.0 content, it can be a problem for binary data. Reading the remote response headers only, validating, and then streaming the remote response body directly to the client, can keep the memory footprint much lower.
  • Attacks. As with any client-submitted URLs, the server should take care in making the remote request. Some possible issues include:
    • Improper URL schemes. All ActivityPub IDs should be https: URLs; any other URL schemes can be rejected.
    • Certificate validation. The HTTPS certificate used should be valid.
    • Timeout. One form of attack is sending many requests that hang indefinitely. The HTTP request should timeout after a reasonable number of seconds (say, 30).
    • Response size. Another attack is sending many requests that flood the server with excess data. Setting an upper limit for data response size, especially based on the response type, may be worthwhile.
    • Rate limiting. The number of proxyUrl requests allowed in a certain period should be limited; for example, 6000/hour.
    • Following redirects. A limited number of redirects should be followed; 3 is a good limit.
    • Local URLs. Check for hosts like 'localhost' or URLs on the local network like '192.168.0.12'.