Tim Berners-Lee
Date: Original note 2010-07-06, Extended and turned into more of a history 2019-07-05, last change: $Date: 2019/07/05 19:59:55 $
Status: personal view only. Editing status: in progress.

Up to Design Issues


Goals for a Human-Data Interface

There was, in 2010, an unfulfilled need for a powerful, friendly program to interact with data on the web. The document web spread by the simple method of people copying the source of each other's web pages, and getting the instant gratification of being able to see one's work in a web browser, and indeed to pass on the URL to friends and family. However, most of the initial Semantic Web projects (around 2000) used back-end machines with no generic user interface. Researchers, to explain data to each other, tended to use circles-and-arrows diagrams of the graphs. These, though are hopeless -- very inefficient uses of space -- when it comes to displaying data to a normal user. Normal users expect things like the Mac OS X AddressBook application, a spreadsheet, or the iTunes tune funding interface. These use fairly straight forward list of properties, tables, forms, faceted query views, and so on, for the data they manage. The different apps are islands, with no links between them, but they do give a user experience which users seem to use.

History of the project

In fall and end of 2007, I hacked together a simple data browser which allowed you to look at the properties of something in a compact outline form. This would become the focus of a lot of effort from MIT undergraduates (UROPS) over the next few years. Each property or relation was a line, but where it was a relation to something else, an "open me" button (the triangle from the Mac icon style) allowed you to see further parts of the graph, displaying the graph as a tree. Users are used to trees. This was also motivated by a person's need to be able to explore arbitrary linked data on the web. Using to it discuss some data with Philippe le H├ęgaret, he complained that he wanted a table. I added query by example function, allowing one to find any graph out there in the web of data, select specific nodes in it, and then press a button to search for all matching sub-graphs, producing a table of the results. The table is another thing users are very familiar with. This software, which I called then Tabulator, as it created tables from the graph, now allowed one to do a "query by example". The user could be in one of two modes. In outline mode, the user explores the graph of linked data. Having found an interesting shape in the data, the user could then query to find all other places in the data which had that shape.

The following January, and in following summers, various undergraduate students at MIT added functionality. Some contributed to the overall project, some worked on specific aspects. Some used it as a base for building systems in their research.(See references

Target audience

A target audience for the Tabulator project was the set of engaged and intelligent people who were comfortable using a computer, but who had no specific training in RDF and the semantic web. A spreadsheet user is an interesting example: they spend a lot of effort in some cases getting the computer to do what they want, and work at quite a high level of abstraction. In fact they tend to pour their energy into a spreadsheet creating tables and relations about which the computer itself does not understand the semantics. These users surely would be those in a position to be able to capitalize on these talents to use a powerful interface to explore and create arbitrary RDF data.

A major goal is that the software should allow users to be as powerful as possible without having to write their own code.

Design principles

Never show URIs

While RDF buffs may revel in the URIs of nodes, users do not. User experience should not include naked URIs.

Use the ontologies

A goal was to use the ontology as the bridge between a mathematical graph and something which can be shown to a person. The project uses the labels of predicates and classes to construct the controls in the user interface. There was also code to look at the user's language preferences, and in the (rare but wonderful) case where people had added labels in many languages to the ontology, to automatically pick the right one. So the project's localization was all leveraged off the good work of ontology writers.

Use existing user interface language

Users of operating systems like OS X, Linux and Windows, not to mention the web, had a lot of user interfaces which they came across, and tend to have learned a UI language of those platforms. Use those languages to naturally communicate with those users.

Represent the graph as a tree

Users are typically happy navigating trees such as file system (folder) hierarchies. If you use a tree-oriented UI for what is actually a graph, it will actually be fine. If you expose parts of the graph, set of arcs at a time, as though they were arcs in a tree, the user will be able to use tree-oriented thinking. They will be able to navigate in a loop, but they won't bother to.

When possible. Use existing common programs as models.

Read-Write Web

Allow a user to edit data which they have the right to

Don't tempt user to think they can edit something if they in fact can't

A little semantics goes a long way

The project used a little inference to good effect. Specifically it used a little RDF-Schema inference around ranges and domains, subproperties and subclasses, and, from OWL, just sameAs and the Functional Property and Inverse Functional Property features. While much of the semantic web community was arguing about varyious competing logics, at the practical level just this level of inference was useful and simple. It was build into the quad store the client uses (rdflib.js).

So if in the code you wanted to show every Person, you could ask the store to give you the list, automatically including people who were not explicitly labeled in the data with an explicit type, but which had properties which implied they were a person. If you wanted to know what classes something was in, again you could get a list of all of them, some deduced from their properties. So the convention (largely unwritten) was that you don't bother writing out the class of something when it is obvious from the properties. Anything with a transaction date is a transaction.

Smushing

When browsing the net of real data from real people, a constant challenge has always been unerstanding when two RDF node refer to the same thing. When countless lists and address books would have the same people in them, representing all the data about them to the user as just one person was crucial.

The Friend of a Friend (FOAF) project did this in an interesting way. You could, in your profile, list your friends with their email address only with no name. You could also even use a hash of the email address. Then someone else, looking at your profile, if they had the same person in their contacts, would see the right name and contact info in your list of friends. The nodes from your profile and the node in their contacts would be smushed together. Similarly, any two people in your dataset with the same Social Security Number would be smushed together. The quadstore would pick a URI as the cannonical one, and redirect any queries on the other URIs to that one.

The tabulator worked so that the quadstore was saved throughout a user's session. So as ones' wanderings though the web revealed more data, one could go back to an earlier object and find it now augmented with all kinds of new data. Because of the mantra that on web, anything could say anything about anything. -- and in practice because smushing worked, and was built into the quad store. The conclusion was that this low level of inference was a sweet spot, allowing data to managed easily without a lot explict developer coding at the application level.

Specs of the User Interface

Spec of the mouse-based navigation

Models to refer to include outliners, and outline-mode file system browsers.

  1. Clicking on the "unexpanded" icon opens up a subtree of properties where one exists, as an outline, replacing the icon with the "expanded" icon
  2. Clicking on the "expanded" icon closes up a subtree of properties, replacing the icon with the "unexpanded" icon

Spec of the keyboard navigation

Real users will expect to be able to use arrows to navigate their data just as they do in a spread sheet, or a file manager for that matter.

  1. Up and down arrows move up and down a list of property values
  2. Return/Enter opens up a subtree of properties where one exists, as an outline. If there is no subtree, but a data value, then, Enter opens up the data value field for editing.
  3. Left arrow allows one to retreat to a higher level of the outline.
  4. Right arrow allows one to advance to a lower level of the outline where it has already been expanded. Or optionally, immediately expand the outline.
  5. Escape allows one to close up a subtree of the outline, opposite of Enter.

Keyboard input of new data

There is a blue plus sign (or two) at the bottom of a property list, to allow the user to add more data.

If you click on the blue plus sign at the bottom of the Object column, the right hand column, then you get a form in which to add the data value of the RDF thing using the same predicate as the line above. The user experience takes into account the amount we know from the ontology about the range of the property.

If you click on the blue cross at the bottom of the predicate column, then you are adding a new triple with with same subject, but new predicate and object. You get a place to type the predicate, which will auto complete using the labels of all the predicate the system has learned about to date. Basic RDF Schema logic can be used to make sure that the properties proposed are appropriate -- that you can specify the VCARD name of a person, and the Dublin Core Title of anessay, but not the other way around. The user interface at this point gives a pop-up list of suitable properties whose labels match the characters the user is typing. To distinguish two with the same label, the vocabulary nickname ("foaf". etc) would be given too.

Drag and drop

  1. Any cell representing a named node is draggable as a link with that URI
  2. The space in a property table for a subject S predicate P is a drop target for a new object O, creating the new triple S P O.
  3. (Many other parts of the interface are drop targets)

The Tabulator project developers put a lot of effort into trying to take the goal of making a good user experience for both reading and writing data. They showed that this can be done better with a good ontology, and they used that technique to good effect.

Query by example

This is a classic form of query building in user interface design. You let the user find an example of something, and say "give me more like that". Here the user is exploring a graph masquerading as a tree. They open up various different parts of the tree do different levels until they find what they want. For example, they can select the name and chair of a group, and the name of the chair, and then for each document edited by the group they can look at its review status and its deadlines. With those field selected, they pressed "find all" button and the interface transforms to a table.

If the table column included dates or times, then they could be viewed as a timeline; if they included latitude and longitude then they could be viewed on a map. (If they were doing it in 2019, one would include D3 and other visualization systems). (Back in the day these were Yahoo maps rather than Google map or Open Street Maps). So one could explore the Geographical distribution of things, pick an interesting example, and then click on it to go back into graph-exploring mode. So the two modes of enquiry, query and graph traversal, world work alternately.

While the goal was to allow users to do a query without writing any code, in fact there was a window in which one could see the query in SPARQL form, and edit it and re-run it. But there was no way of saving the views and queries the user had discovered as first objects themselves.

Custom views: Panes

The tabulator system described to date used the same outline-mode view of whatever type of thing the user was looking at.

It was clear though that to make a functional world of linked data user experience commonly user things needed views and controls which had been developer-written to be as efficient as possible. The specific view is invoked by clicking on its icon, in a tray of icons which thought they would be useful for the type (RDFS Class) of object in question.

Forms

Forms represented a way for a power user to customize the interface, without having to write any code. This is how they worked.

The social arrangement of the shared form ontology is not ideal, but no less centralized than things like NPM and Gitub in 2019.

With shared user-written forms, the project explored code-free user-written programs. The odd useful form written with the system survives, as does the form form. Forms were a user-interface-oriented language related to shapes languages like SHACL and ShEx. They were simple, as the number of forms fields was limited, but powerful, as they could link to each other in a network of connected forms which will then lead the users to create a growing web of connected data. Forms have also been used as a shortcut to building a hand-written view. Often a lot of the control is just a set of form fields and so it easier and more future-proof to slap in a form.

The relationship of forms with shapes and footprints is explored elsewhere (2019).

Modes: Online vs Extension vs Databrowser

During most of the project, the tabulator code was available in "online mode" or as an extension to Firefox (and for a while, Chrome). "Online mode" was as a web app, where it functioned as a browser within a browser. It has to have its own URI bar at the top, under the browser's URI bar, as a web app did not have enough control of the URI bar. The constant issues with browser distrust of web code, the Same Origin Policy, Mixed Content restrictions, and later CORS, made the online mode a constant battle, and often left users unable to load all the data they needed to.

As a Firefox extension, the system was more powerful. It could combine smoothly with the existing document-oriented features of Firefox to provide data-oriented features. The browser's own URL bar could be used, whether it was showing a web page, an anchor within an HTML page, or a data document, or an object referred to by the HTML document. With time, the Firefox conventions and APIs for creating extensions have changed.

The Tabulator was used with a variety of servers, such as ldphp. The LDP standard and later solid project saw common specs and more compatible servers and the As solid servers became available, the original Tabulator framework was repurposed as a general user interface for solid users. In this case, it is done using a trick, called the data browser. If a user with a normal document browser asks for something which is a data object, the server returns in the first instance a small HTML file which loads a large Javascript file, which adds all the data functionality, and the re-requests the same thing, this time, through content negotiation, getting the data itself. Code loaded that way can more or less (with 2019 browsers) keep the browser URL bar doing the right thing and track history, although CORS problems can be an ever-changing nightmare. Maybe in the future the tabulator code project and its derivatives will be installed as native apps on laptops (possibly using frameworks like Electron) and mobile. Maybe the functionality will be adopted by web browsers, giving them innate ability to read and write data, as well to read documents, as they currently do.

Conclusions

The Tabulator project and its offshoots over the years explored the idea of users being powerfully equipped in a world of linked data. Making the URI of the object the basis of everything, but hiding them wherever possible from users, were valuable lessons. The spectrum, from hand-coded interfaces, through user-created forms to generic adaptive default interfaces was an important optimization. There were quite a lot of custom panes created, from contact information, to financial transactions and trips, to visualizatons of rules and user-targeted explanations of decisions made by intelligent systems. A spectrum which the project did not explore was the data spectrum between private, shared, and public data. The server used did not have a consistent user-accessible access control system. The solid project (in 2019) now does, and systems built using this sort of technology become more exciting when the user can control who has access to the new data as it is created. The current (2019-06) implementations of this system run in web apps, which is still limited by browsers, so porting this to mobile native and laptop-native platforms would be interesting.


References

Students who worked on the Tabulator at some point include: Yuhsin Chen, Lydia Chilton, Ruth Dhanaraj, Jim Hollenbach, Adam Lerer, Ilaria Liccardi, Kanghao Lu, J. Presbrey, Oshani Seneviratne, and David Sheets. Dan Connolly and m c schraefel and Ralph Swick were also involved

Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan Connolly, Ruth Dhanaraj, James Hollenbach, Adam Lerer, and David Sheets, MIT DIG, "Tabulator: Exploring and Analyzing linked data on the Semantic Web" Proceedings of the 3rd international semantic web user interaction workshop. Vol. 2006, 2006. [citeseer]

T Berners-lee, J. Hollenbach, Kanghao Lu, J. Presbrey, m c schraefel, "Tabulator Redux: Browsing and Writing Linked Data", 2007

Tim Berners-Lee, Richard Cyganiak, Michael Hausenblas, Joe Presbrey, Oshani Seneviratne and Oana-Elena Ureche, "Realising A Read-Write Web of Data"

Ching-man Au Yeung, Ilaria Liccardi, Kanghao Lu, Oshani Seneviratne, Tim Berners-Lee Decentralization: The Future of Online Social Networking W3C Workshop on the Future of Social Networking Position Papers. Vol. 2. 2009.


Up to Design Issues

Tim BL