Linked Data Shapes, Forms and Footprints

In a world of linked data, in which anyone can say anything about anything, how do we build systems in which users and apps are easily allowed to express useful, helpful things? What tools can we use which allow new systems to grow easily and work well together?

Ontology languages

The RDF schema languages, RDF Schema and OWL, tell you implications one can draw from RDF Model data. They also tell you what things do not make logical sense. Therefore in a sense they indirectly have the function of constraining what RDF data one can write, though just by telling what would be nonsense (false). So they can in a rather weak way be used to guide a user interface. But that won't do what we need.

Other schema systems, like that of schema.org, give suggestions as to what predicates can be used to talk about objects of a given class. That is useful, but still is not enough.

In this document, we will discuss three kinds of technologies to help with building apps on top of data:

Shapes explain to machines what data should look like, independently of how that data is displayed to a user.
Forms are a user interface allowing people to read and write data in a specific shape.
Footprints explain to machines where new data should be stored.

Shape Languages

Shapes languages, such as SHACL, specifically address the need to constrain the data in a graph to a certain shape. For every child, you must specify a parent, and for every parent you must specify a phone number, and so on. Shapes languages can be used to validate existing data to make sure it will work before processing, to validate submitted data from an external source, and so on. They can be used in effect to define an Application Program Interface: The application program using this interface must provide data in this shape, and will get data in this shape.

Let us take an example of a user's contact data laid out in an AddressBook, with Groups and Individuals. (These are roughly the shapes of the standard VCARD conversion to RDF). First of all here are the prefixes we will be using in all these examples.

Shape Constraints

contact-shapes.ttl

The last part of the above shape is a "property shape" which says that the book can have any number of links to groups.
Shapes can point to other shapes. Next, here is a (subset of a) shape for the actual contact data for a person:

Shape Expressions

The Shape Expressions language, ShEx, is in fact quite different from the shape language above. It does more than constrain the shape of the graph: it also defines a canonical ordered traversal of the graph. It is in a way a kind of query language. If applied simply, it (like SPARQL) returns an array of bindings. "Yes, it matches a contact shape, and here are all the names and phone numbers". If it is applied recursively, it returns a tree of bindings. "Yes, it matches the shape, and here are all the contact points; and for each contact point here are the address and phone number; and for each address the number and street"

Because it outputs tree structures, SheX can be used as a step in UI generation, generating for example HTML tables (simple case) or nested HTML tables (recursive case). Generate it using using vanilla JS or your favorite UI framework.

Another use for this form of query is to map between ontologies, [@@ref] as potentially a shape can be used backwards -- instead of being used to parse the input it can be used to generate the output. (This is a little like the "Return" clause in an XQuery.)

Shape Conformance

If a folder or document is known to conform to a shape, it is useful to explicitly indicate this. That way, clients know exactly what kind of data to expect. Furthermore, the UI can then suggest relevant forms to view and edit the data. @@TODO: predicate to indicate conformance

Form Languages

SHACL	ShEx
Typically, just validates a graph.	Validates, and generates an array or tree of bindings
SHACL is unordered	ShEx is ordered.
SHACL is in RDF. SHACL files are .ttl files.	ShEx is a new language, but has an RDF syntax.
Can be used to generate forms, but the designer has to add order.	Can be used to generate forms directly.

Form languages are used to define user interfaces for reading and writing Linked Data with a specific shape. A form can be thought of as a simple form (like Google forms etc.) which asks a series of questions, but a form language can define a user interface using all kinds of different types of controls -- fields, widgets and so on. The result of using the OSX Interface Builder, for example, can be considered as a form. Do not think boring bureaucratic forms, think exciting live user experience.

There exists a one-to-many relationship between shapes and forms: a specific shape can be viewed and edited with many different forms. For instance, forms may contain more or less information at different levels. The principle of separation of form and content suggests that a high-level description is important. This allows the system and the user to customize it at run time. It is most accessible to people with disabilities. But also, the system need to be fed good design from the designer, grouping things well, arranging the flow through the form, adding tips, icons, and so on. It is reasonable also for a form to link to a shape. A form can make use of type information (telephone number, date, email address etc) and also range information (max length of string, max value of integer, etc) which is typically part of the arc of a shape. The convention for extending a form field, or an arc in a shape, could be the same when the shape is in SHACL, as both are in RDF.

The User Interface ontology gives hints as to how data should be presented, and, more substantially, defines forms. A form is a sequence of related user interface questions or (abstract) widgets. The form language defines the direct relationship between the form fields and the shape elements, and, through the shape, connects user input to the triples of the RDF graph which is being constructed or edited.

Recursive Forms

Like shapes, forms can refer to other forms. So a form for an event can include a nested form for each person involved and a form for a person can have a nested form for each of the events they are involved in. So forms can be recursive, possibly mutually recursive, really just as a function of what users need.

Differences between forms and shapes

While shapes languages and form languages look similar, they are fundamentally different.

Despite these differences, rumor has it that some folks who use SHACL have in fact used it as a form language. One could imagine adding the ordering by adding a sequence numbers each fields, for example. However, keeping shapes and forms apart, separates the concern of machine interoperability from the graphical interface, and allow different (simple or complex) forms to be used with the same shape, as illustrated below.

Shapes	Forms
Completely unordered: RDF is unordered and so shapes of RDF do not express order	Completely ordered: order is crucial in the UI
Constrain the data stored: can be used for validation, and to define an API	Define a user interface control template
One or a small number of shapes defined	Many forms can be defined over the same data.
Shapes do not contain UI information	Forms contain UI data on field type, control size, titles, grouping, colors, headings, labels, tips, accessibility information such as alternative renderings
Shapes are edited by developers and API designers	Forms are edited by designers and power users

Example of an extended form: contact-form.ttl

Here is an example of a form for filling in details of a new contact. The vcard class is Individual. The full form is linked. Below are some highlights. Note how the form refers back to arcs of the shape, such that the shape and its constraints do not have to be duplicated. @@TODO bring full form online

Example of a simple form: person-form.ttl

The same shape can also be used by a different form. For instance, to simply view or edit a contact's name and their profile picture, we could use the following form:

These forms essentially provide different views of the same data (shape). Data created in one form can be edited in the other, and vice-versa.

Diagram relating shapes to forms

Here is a diagram attempting to lay out the way the pieces, some existing and some to be made (2019-04), fit together.

We imagine here that there are shared and project-wide repositories of shapes and of forms. Shapes, remember, are use by developers, and forms may be made not only by developers and designers but also by power users. In addition to a form being created from scratch by a designer, a basic version of it could be autogenerated from a shape, after which it can be refined manually.

Footprint Languages

The problem of where to store new data

For all these languages above, to get real interoperability in a read-write linked data world, we need something more. In a read-only world, the Linked Data rules gives us the possibility of picking some data and following the links to find related data. When for example, we find an address book, we can follow links to find the parts of it, and hence the groups and the people it describes. For we may need to know a certain protocol about how to find things, such as finding a foaf:publications link to find a graph which contains data about what someone has foaf:made, or the very general rdfs:seeAlso relation which takes to other graphs about the same thing. But when we want (and we should always want!) to write stuff as well as read it, our generic data editors don't currently have a way of agreeing on where to store the new information.

In a linked data world, triples are stored at the document of the subject, or of the object, or both. In this world, the critical things are then

(Many of the generic apps I have written and seen so far tend to store all the data just in the same place as the data it already found, or in a single file pointed to form a configuration file which is set up specifically by the application code by hand.)

The RDF form language in the UI ontology defines not just the UI elements in a vacuum, but defines them in relation to the graph of linked data. For every UI widget displayed, there is a link to the part of the data shape it corresponds to. In addition, it can also specify where it should store its data by linking to a footprint. In a linked data world, that means having information with which to construct the URIs of new objects, and also as to whether the link is traditionally stored with the home document of the subject, of the object, or both.

Because URIs in the web of linked data are primary, footprints should only be used to create a new link where old data does not exist. Once a linked data link exists, systems must follow that and never use the footprint information to guess that an object exists at URI given by the footpath function. (They might, however, use shape information about how information is linked, if conformance to a shape is explicitly indicated.)

Like between shapes and forms, a many-to-many relation exists between shapes and footprints: data conforming to a shape can be stored using different kinds of footprints. Again, this does not make a difference for reading data (for which we only rely on the shape), but only for creating new data (which requires a footprint to tell where things should go).

Examples

Here are some examples (These may not be correct!) imagining footprints for a few data browser things.

Here we imagine that dir() is a function which trims off a path segment and munge() a function which sanitizes a string to remove or encode strange characters.

How can we add that information in? As several different footprints of the same shape can still interoperate at the read level, it makes sense for footprint then to link to the shape. You might have contacts information from different system using different footprints but the same shape. When you validate data using a footprint, you can validate the URIs too, as well as validating the shape.

Just like folders and documents can indicate conformance with a shape, they could also indicate their preference for a certain footprint. A suggested footprint can be linked to by a form, as the form has to create the nodes as the user generates more data. If no footprint is indicated then for the resource to which a new resource is to be appended, the form's suggested footprint can be used instead.

(There is a design choice here as to whether to use some microsyntax like the JS expressions in the example, or to express their structure in RDF.)

Example: contact-footprint.ttl

Discovery

This is the process, for a given user, of finding where they keep their stuff. In this context, discovery is about the system finding shapes and forms. And hence, where new stuff should be kept. Not yet discussed in this article. Places to look:

The last path, the path through a project or group, is an important one for the vision of a read-write web and a collaborative world. Much collaboration will be within projects which are not public, and even if they are public in the sense of being publicly readable, there will be a limited number of people actively involved. W3C working groups and GitHub organizations are examples, as are committees and boards and project teams all over the world.

Within a limited group, things scale better technically and socially. You can have annotation servers which allow people to share their comments without being flooded by input from the general public, or wasted on people who are not really interested. The systems which end up maximizing the effectiveness of communication when people take social actions, like liking, objecting, endorsing, and so on, are important. The read-write web gives us the ability to create all kinds of architectures here, and we must pick ones which are simple but deliver a great value. We must allow users to construct social and data structures as they need, and support a web becomes more and more fractal.

Whenever there is a path to find public things related to a user, and a similar path to find private things related to a user, then the same path should be made for groups in which the user is active.