WebAccessControl

From W3C Wiki


This wiki page was actively edited mainly until 2016; since then WAC work was more activity continued within the Solid project. See explanation and intention to merge them back into one unambiguous specification

WebAccessControl

WebAccessControl is a decentralized system for allowing different users and groups various forms of access to resources where users and groups are identified by HTTP URIs.

Index

The system is similar to the access control system used within many file systems except that the documents controlled, the users and the groups are all identified by URIs. Users are identified by WebIDs. Groups of users are identified by the URI of a class of users which, if you look it up, returns a list of those in the class. This means a person hosted by any site can be a member of a group hosted by any other site.

You can give access to a document on one site to users and groups hosted by other sites. Users do not need to have a profile on the site to have access to documents on it.

In order to be able to share common code a common ontology is proposed which provides the terms necessary for access control lists to be stored.

Each request for a Web Resource returns an HTTP document containing a Link header to an ACL resource which describes access to the given resource and potentially others, as shown by this diagram.

The diagram gives world readable access to the WebID profile document /2013/card but only gives limited read access to the /2013/protected resource, to the members of a group that went to a particular conference.

Vocabulary

The ontology is http://www.w3.org/ns/auth/acl

See also the discussion page about the WAC vocabulary (incl. a schema visualization).

If you look it up with Tabulator (or equivalent) you should get the definitive information. It will not be completely repeated in this wiki.

Note that there are properties of an Authorization which allow you to specify who gets access either by giving a relation to a specific user or a different relation to a class of user. The same technique is used for expressing to which resource or class of resources access is being granted.

@@TBD In order to make published authorizations say things that are true as far as possible, an Authorization should not be understood as making a claim about who has access to a specified set of resource, but instead the slightly more sophisticated: An Authorization specifies access control modes for set of resources, and is applicable only if a resource points in their HTTP header to an acl file that contains or includes the given rule. This allows an access control rule to specify that it gives access to all resources in a domain, even for resources that don't yet exist, and not be falsified if a resource does not itself include the acl. This needs to be covered in the ontology.

There is a relationship between a resource and its access control list resource. This is to allow a user agent to allow user interaction with the ACL, for example by an ACL editor.

Agents and classes

An Authorization is an abstract thing whose properties are defined in an Access Control List. The ACL does NOT have to explicitly state that it is of rdf:type Authorization.

agent
agentClass

Two classes are recognized specifically.

foaf:Agent This allows anyone and anything to access.
acl:AuthenticatedAgent This allows anyone with an authenticated ID to access. (2017/8 on)

Resources covered by the ACL

Any number of agents and/or agentClasses may be specified in an ACL file using the following relations

accessTo
accessToClass

Servers are required to recognize the class foaf:Agent as the class of all agents. This indicates that the given access is public. In some cases this will mean that authentication is therefore not required, and may be skipped. When a resource is being written, however, it may be necessary to associate the change with some kind of ID for accountability purposes.

@@ TBD: The class of agents with WebIDs would be a useful addition, which would require authentication but place no further constraints on the agent.

Modes of Access

In the ontology modes are classes -- think of them of classes of operation. The Read mode is class operation which includes all those operations which reveal information about the content of the resource.

Mode: Those allowed may:
Read read the contents (including querying it, etc)
Write overwrite the contents (including deleting it, or modifying part of it).
Append add information to [the end of] it but not remove information.
Control set the Access Control List for this themselves.

@@TBD: The Control mode is very confusing and by itself does not allow the acl to be only read mode, but require read-write. Only readable acls is useful to allow agents to know what they need to do if they want to participate. It is much easier to have acls just be their own acls, by for example having the Link header point to the same file with "Link: <>; rel=acl" This is discussed in detail on issue 112 of rww-play wac:Control. Andrei Sambra says his rww.io server does the same.

Examples

@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

[acl:accessTo <card>; acl:mode acl:Read; acl:agentClass foaf:Agent].
[acl:accessTo <card>; acl:mode acl:Read, acl:Write;  acl:agent <card#i>].

This means that anyone may read card, and <card#i> can write it. ( we won't put the prefixes in any more from here on )

[acl:accessTo <card>; acl:mode acl:Read; acl:agentClass foaf:Agent].
[acl:accessTo <card>; acl:mode acl:Write;  acl:agent <card#i>].

Because acl:agent has domain foaf:Agent the last line implies that <card#i> is a foaf:Agent.

Protocol

  1. The user authenticates to the server using WebID authentication over TLS
  2. The server looks in it local ACL information for the resource, according to the local convention. (e.g for access to/foo/bar.baz, in /foo/.meta/bar.baz.meta). This is parsed (as typically N3) into RDF graph G.
  3. If the WebID is mentioned explicitly in the file as being allowed access then access is granted.
  4. If not, if there are classes of agent which would be granted access, then for each class C the class URI is checked in a cache, and if necessary looked up on the web, returning a document parsed to a graph Gc. If Gc contains { ?WebID rdf:type ?C } the the user is granted access.

Note that the protocol is very strict about where information comes from: the user's WebID profile is only checked for being cross-linked with the certificate during the WebID Authentication process, but is not believed for any information on class membership and so on.

Examples

The access control list is:


[acl:accessTo <card>; acl:mode acl:Read; acl:agentClass <http://my.example.net/groups/friends#group>].
[acl:accessTo <card>; acl:mode acl:Read, acl:Write;  acl:agentClass <groups/family#group>].

This means that anyone in the group <http://my.example.net/groups/friends#group> may read card, and <groups/family#group> can write it.

The resource <http://my.example.netgroups/friends> says:


<#group> foaf:member <../user/alice#me>, <../user/bob#me>, <../user/charlie#me>. 


The resource <groups/family> says:


<#group> foaf:member  <../people/don#me>, <../people/eloise#me>. 


The server reads the ACL file. When a request comes in it checks whether the ACL file information is sufficient for allowing access as is, and as it is not, it looks up the group files. If it were very well optimized, if a write request came in, it would only look up the family file, as that is the only one which could give write access.

Modifying Access Control information

The design goal is that the WebAccessControl storage should be a creative medium in which

  • new WAC'd information resources can be set up by the user
  • new WAC'd information resources can be be set up by an distributed client-side application on behalf of a user,

without the intervention of administrative humans running the storage server.

The server of the resource in question gives a link acl:acl from the resource to the URI of an associated ACL resource.

   <profile> acl:acl <.meta/profile.meta>.


or say

   <profile> acl:acl <profile,meta>.


(Note that the system must make up some URI which suits itself. It may on a file-system-based service correspond to a file, in which case the URI may simply map to a file name. However it can also be any form of storage and the URI can be made up so that it will connect to the access control list system. Note that there are some advantages in having the two URIs close in the tree so that relative URIs can be used in the link and in the ACL information itself.)

(This link may be given in profile itself or via the HTTP link header which is used to link to associated metadata. A common agreed form must be used -- if in the future site metadata files and standard ways of defining them are defined then these would be an alternative.)

The client follows, for example, an HTTP header field:

Link: <meta/profile.meta>; rel=meta

and in meta/profile.meta it finds amongst other things

   <../profile> acl:acl <>.

i.e. "This is the ACL for profile".

The server indicates that the user can edit the ACL in the normal way for editable linked data. The client puts up an optional window, panel, etc, containing ACL editing user interface. The changes are written back in the normal way for editable linked data. SPARQL/Update is the preferred method as it allows small changes to be sent but WebDav will work.

See also how meta was used at W3C in the early days

rel=acl or rel=meta?

Neither the acl nor the meta link relations have been registered as required by RFC5988 on Web Linking in the IANA link relations registry. Both probably should be registered. The meta relation suggests that the document linked to could contain any metadata about the resource. For pictures metadata could be information such as who appears in the picture, when the picture was taken, etc... Acl relations are going to be very important for the infrastructure of the web, so it would be very helpful if the data in the linked to document was very precisely about access control.

Just to help gather consensus, please add which of these your implementation uses, with pointers to the code or service.

Implementations using rel="meta" to point to acls

  • N/A

Implementations using rel="acl" to point to acls

Sanitation of the request

In prototype systems one can use a generic SPARQL/Update system to control the ACL resource. However when write access is given to parties who are not totally trusted then the system MUST check that the ACL modification request does contain only valid ACL information.


WAC And LDP

The Linked Data Platform WG are developing a protocol to interact using HTTP verbs with a server ( see latest spec). As a WG note they may also propose a Access Control use cases and requirements. Here are some initial ideas of how the two can interact.

Editing ACLs

ACLs can be themselves editable using the same verbs as those used by LDP: PUT and PATCH.

For example if one takes a small subset of SPARQL Update ( the part that does not use subgraphs ), then PATCH can be used as follows. Imagine a profile document https://localhost:8443/2013/card.acl that was readable only by the user whose profile document it was. Eg

$ curl -i --cert keys.pem:test https://joe.example/2013/card.acl
HTTP/1.1 200 OK
Content-Type: text/turtle
Content-Length: 181

@prefix acl: <http://www.w3.org/ns/auth/acl#> . 
@prefix foaf: <http://xmlns.com/foaf/0.1/> . 

[] acl:accessTo <card>;  
   acl:mode acl:Read, acl:Write; 
   acl:agent <card#me> .

Of course a WebID Profile should be visible by everyone. This could be done easily by PATCHNG the above ACL as follows:

$ cat card.acl.update 
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
[] acl:accessTo <https://joe.example/2013/card> ;
   acl:mode acl:Read;
   acl:agentClass foaf:Agent .
}

$ curl -X PATCH -k -i --data-binary @../eg/card.acl.update -H "Content-Type: application/sparql-update; utf-8"  --cert ../eg/test-localhost.pem:test  https://joe.example:8443/2013/card.acl
HTTP/1.1 200 OK
Link: </2013/card.acl>; rel=acl
...

So even though the acl file is neither an LDPR nor an LDPC it should be editable using the same methods. It is arguable that an ACL file should not be DELETEable, as it is created by the server on creation of the LDPR. ( DELETEing the LDPR should delete the associated acl on the other hand ).

WAC relation to HTTP Verbs

In LDP an OPTIONS request MUST return a Allow header stating what HTTP methods can be used on a resource. Those headers can be returned ( whether they MUST be is an open question [1] )

Here is an initial mapping between the HTTP verbs used by LDP and the wac ontology.

HTTP VERBS wac:Read wac:Write wac:Control wac:Append
GET x x
PUT x x
POST x x? √ ?
DELETE x x
PATCH x √ with INSERT only

( Note wac:Control, is just an indirect Read, Write, but since it is on an LDPR that is not an LDPC, POST has not been defined. See also issue TBD above with wac:Control )

It would be nice if the Allow header could give mostly the right hints to the client about what he is allowed to do without needing to parse the acl links, as parsing the ACL link and following the redirects could require quite a number of hops (and may not be available to the client). This still would leave a non-authenticated user with the task of having to follow those links, if he wants to discover what rights authentication would bring.

Looking at the above table it is clear that wac:Read maps nicely to GET and that wac:Write maps nicely to all the others methods, except READ.

But this still leaves the following two problems:

  • does wac:Append apply to POST, ie is POST an append operation? (It feels like it.)
  • a client that knows it has PATCH access might not yet know if can only use INSERT, ie, only update or if it can also DELETE triples from a graph.

Extension

It is reasonable in the future to allow different forms of information to be stored in the same writable metadata space as the ACL. There are a class new protocols which so not require any functionality in the server side. These include the storage of information about licensing, appropriate use, provenance of the original resource. It is not clear whether these are better implemented by extending the WAC metadata or by a system built on top of the WAC's storage layer, using a separte WAC's resource for the new metadata. The decision is in general a little analogous to the decision on a Mac OS file system.

Including other acls

To avoid duplication of access control rules it is often more interesting to include another acl, rather than duplicating the acl content. This allows edits to one acl to change all the acls that include it. For this we suggest an acl:include relation:

<> acl:include <.acl> .

The above relation would require the server to include the data from the <.acl> resource ( which is the acl for the directory )

Implemented by:


Regular Expressions

Being able to define groups of resources via regular expressions is very useful. It allows one to create a root ldp:Container that gives rights to its children in one rule.

rww-play implements a acl:regex relation, which currently uses Java Regular expressions to specify a constraint on an agent class:

[] acl:accessToClass [ acl:regex "https://joe.solid.example/.*" ];  
   acl:mode acl:Read; 
   acl:agentClass foaf:Agent .

One could use POWDER, or invent some simpler notation.

Cors User Agents

Giving a specific resource access to an Origin

It is often useful to allow JavaScript user agents to access a resource in r/w mode. This would be a way to extend trust to Javascript served by a given domain. It is suggested that we add a acl:origin relation from the Agent Class to a string, so that one could write the following acl

[] acl:accessToClass [ acl:regex "https://bblfish.solid.example/.*" ];  
   acl:mode acl:Write; 
   acl:origin <https://apps.rww.io>  .

This would then add the header

Access-Control-Allow-Origin: https://apps.rww.io

to a request on that resource. Note that in both the ACL file and the header, like in the Origin: header, the origin URI does NOT have a trailing slash.

Setting "Access-Control-Allow-Methods" for a particular Agent

A linked Data publisher may want to make a whole set of resources available over CORS. For completely publicly accessible resources that is reasonably easy: one can just add (please check)

Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept

In read mode that should work fine. (In write mode, one may need to be careful to log the user and the Origin that made the change.)

But what should the server do for any resources that is protected? It cannot in a blanket manner state that the resource is accessible to every Origin. That would make it much too easy for a piece of JavaScript to use the authentication state in a browser to do whatever the designer of the JS wanted rather than what the browser user wanted. But if the server selects a particular Origin that it trusts, then that would limit the growth of JavaScript applications very severely to those known and trusted to the data publisher.

It should really be up to browser user to specify which JavaScript it trusts ( sadly this can only be done with the extremely coarse Origin tool ). The suggestion is therefore that the user's WebID contain a list of trusted origins, and that the server use those to decide what Origin to add to the header:

<#i> acl:trustedOrigin <http://apps.w3c.org/>, <http://apps.timbl.name>, </> .

The server after authenticating the user, would then add those origins to header. If we want to allow that one trusts some origin for all read operations, but only some for write operations then something more complex would be needed such as

<#i> acl:trustedOrigin [ acl:mode acl:Read;
                         cal:accessToClass foaf:Document;  //<- give access to all documents ( that allow one access )
                         acl:agentClass foaf:Agent;
                        ],
                        [ acl:mode acll:Write;
                         acl:accessToClass foaf:Document;  //<- give access to all documents ( that allow access of course )
                         acl:agent [ acl:origin <https://apps.w3.org/> ], [ acl:origin <> ] //but only to JS agents that come from these two origins
                        ] .

The server after authenticating the user, could then use that information to write out what Origin is allowed what action.

This won't work because the pre-flight request does not allow authentication. see thread on WebID mailing list

Implementations

The following systems implement or plan to implement WAC:

  • gold uses WAC with WebID-based users and groups. It also exposes a Link header with rel="acl" for its resources.
  • data.fm offers WAC-authorized, WebID-authenticated linked data storage, with code available via github
  • Drew Perttula's photo viewer includes WAC
  • OpenLink Data Spaces (ODS) uses WAC in all application areas, including Addressbook, Bookmarks, Briefcase (file-sharing), Calendar, FeedManager, Gallery, Mail, Polls, Weblog, and Wiki.
  • Toby Inkster's RDF-ACL library implements WAC in Perl, abstracting away the SPARQL so that Perl programmers just need to call $acl->check($agent_uri, $document_uri, 'read').
  • rwwPlay is an LDP implementation in Scala that supports WAC with WebID Auth. The ACLs are themselves editable using LDP.
  • rww.io supports WAC and it offers a minimal UI for ACL management.
  • Fedora 4 is a digital object repository supporting WAC.

Are interested

Broken links

  • WACup, a WAC explorer/viewer for RDFa implemented with jQuery and rdfquery (client-side)

Related

Mailing List