Abstract: The Quatro Project has developed a scalable system for adding machine-processable quality labels/trustmarks to online content. The validity of the labels is derived from direct communication through web services operated by labelling authorities. These typically encode the outcome of human review of online materials.

Introduction

A recent study carried out at Carleton University in Ottawa [1] shows that Web users make up their minds about the quality of a website in as little as a twentieth of a second. First impressions, based solely on aesthetic appearance, remain persistent as the user consumes the actual content.

Users who are aware that it is necessary to distinguish between reliable and false information may use a variety of simple techniques in their evaluation, but these tend to be simple rules of thumb. For example, students in at least one Norwegian school are taught that a website that displays a copyright notice can be considered to be more reliable than one that doesn't [2].

Against this background, the Quatro Project took up the challenge laid down initially by the European Parliament to create a system of interoperable quality labels. Now in its second year, the project is able to demonstrate a scalable system based on RDF metadata backed up by third-party data sources and real-time content analysis. The presence of the label and its trustworthiness can be displayed to the user through the browser and/or as annotations on search results.

The Quatro Vocabulary and Labelling Authorities

Interoperability in this context means that a label issued by one trustmark scheme should be recognisable by those familiar with another. This was one reason for creating a set of descriptors that, research showed, were commonly found in a wide variety of trustmark schemes. Another was to make it easier to train content analysers to recognise more trustworthy sites.

Elements of the vocabulary most relevant to the workshop include:

Providing suitable identification information (i.e. is it clear who the creator of the information is)
Transparent domain name
Content provider credentials (this is particularly relevant to medical and other 'expert advice' websites)

The full vocabulary is available on the project website [6], and a persistent URL of http://purl.org/quatro/elements/1.0/ has been reserved that resolves to the vocabulary as an RDF schema.

The Quatro project partners include three labelling authorities (LAs). The Internet Quality Agency (IQUA) and Web Mèdica Acreditada (WMA) operate traditional trustmark schemes. Websites are reviewed according to a set of published criteria and, if successful, are entitled to display a Seal of Approval. If the user clicks on the seal, a window opens displaying data pulled from the LA's database. IQUA is a generalised scheme dealing with issues such as eCommerce, whereas WMA is an expert medical website review system. Both organisations now offer machine-processable labels alongside their traditional logo-based system.

The Internet Content Rating Association (ICRA) has long offered machine-processable labels for the purposes of child protection. During 2005 it moved from a system based on PICS [7] to RDF Content Labels.

Labelling Authorities using the Quatro system are not required to use only the Quatro descriptors. They can, of course, add their own, but where their review criteria match, it is in their interests to use the common vocabulary, at least for machine-processing purposes.

Trusting the Label

The critical aspect of any metadata-based system like this, of course, is adding trust to it. A label may say that the website being visited is legal, decent and honest, but we need the famous "oh yeah?" button. The project partners have developed two complementary tools that offer exactly that. One is a browser extension, known as ViQ, that recognises the presence of labels on a website that is being visited and adds suitable icons to the browser. The other, LADI, is a service that passes requests to a search engine and annotates the returned results with icons adjacent to links to sites on which labels are found.

Both tools connect to a back-end system known as the Quatro Proxy or QUAPRO that carries out the actual validation process.

Screenshot 1: The ViQ Browser Extension (Visualise Quality)

Screenshot 1 shows that the site being visited has a label with assertions from 3 LAs. The different icons in the bottom right hand side of the browser exemplify the three possible levels of trust reported by QUAPRO: valid, invalid and don't know.

Whether through the ViQ browser extension, through a search box within the browser itself, or from any website with an appropriate input form, the LADI interface provides annotated search results in a similar fashion (screenshot 2).

Screenshot 2: Annotated search results from LADI. Only the second result returned has a label

In both ViQ and LADI the user can click on the icon to get full details of the label itself, as shown in screenshot 3.

Simple icons are used in both tools to show that labels are present. In the case of ViQ, these icons are modified in real time in response to signals from QUAPRO to give a quick visual indication of the label's validity. LADI does not do this, as it would impose an unacceptable delay on the results being returned. However, in both ViQ and LADI, the user can choose to see full details of the label, including where and when it was issued, and whether or not it is valid, by clicking on the appropriate icon (see screenshot 3).

Screenshot 3: Detail of label as displayed by ViQ

The QUAPRO Architecture

Although tools such as ViQ and LADI are what the end user will see, it is the job of QUAPRO to locate labels and assess their validity. This is achieved through web services operated by the labelling authorities and, where relevant, a content analyser.

Figure 1: The QUAPRO-Architecture

The process step by step:

QUAPRO receives an array of URLs (perhaps an array of size 1) from ViQ, LADI or some other client.

It then visits the URLs one by one and looks for a link to an RDF file. Such a link may be found either as an HTTP Response Header or as an (X)HTML link tag. If an RDF Content Label is found, QUAPRO identifies the LA(s) in the label(s) and reports back to the original client.

When requested to do so, QUAPRO then contacts each LA's database via a Data Access Interface which implements SOAP messaging. This request is sent automatically by ViQ but only in response to a user action in the case of LADI. Different LAs will operate their labelling schemes in different ways, and this needs to be taken into account when assessing validity. There are three possible scenarios:

The RDF-CL file is stored on the labelled site AND the LA allows the content label to be edited by the site owner,
The RDF-CL file is stored on the labelled site BUT the LA does not allow the content label to be edited by the site owner,
The RDF-CL file is stored on the LA database, which means that the content label cannot be edited by the site owner.

Details about whether labels may or may not be edited, whether they should be found on the labelled site or in the LA's database are encoded in the relevant labelling schema.

In cases A and C above, QUAPRO decides on a label's validity based on whether the time of the request is within the valid period defined by the LA. In case B, QUAPRO makes an additional check by taking a hash of the label. This is sent to the LA, where it can be compared with the hash stored in the central database.

For some labelling authorities, a further check is possible. A schema may specify a content analyser able to generate a label automatically that can be compared with the label found. The FilterX analyser, for example, can analyse content and return an ICRA label describing whether a resource contains pornography, non-pornographic glamour, or neither. This can be compared with the content provider's own label, as well as ICRA's database of labels that have been checked by a team of reviewers.

A synthesis of the results of the date, hash, and content checking (if all of them are available) gives the final validity value of a label that is returned to the client.

QUAPRO clearly plays a key role in the trust model. It is therefore not a fully open platform. Although anyone can set up a labelling scheme using RDF-CL, QUAPRO will only work with recognised schemes which are added by hand so that as long as it is QUAPRO performing the validation, the results will not be easily falsified. Digital certificates could be added to the communication channels to enhance this aspect.

Summary

The Quatro project offers a highly flexible means by which trustmarks can be made machine processable alongside content labels, conformance certificates and the like. For example, Segala [8] is using the system to encode its certification scheme for web accessibility. The Incubator Activity [5] is feeding directly into the Mobile Web Initiative's development of a mobileOK trustmark [9].

The technology comes down to providing end users with a visual cue that a real person who knows what they're talking about has actually visited the labelled resource and agrees with the description offered.

Add that to related ideas like shared bookmarks and other RDF-based trust models and maybe a twentieth of a second will in fact prove enough to decide whether a website is good, bad or indifferent.