Work on Metadata is part of W3C's Technology and Society Domain.
Note: The W3C Metadata Activity was replaced with the W3C Semantic Web Activity when the latter was chartered in February, 2001. The "future" work described herein (last updated in November 2000) is obsoleted by the Semantic Web Activity Statement.
There is now a wealth of information on every subject available on the Net. For many, however, the true excitement of the Web is in the services that you can access from your home or office. Today's Web gives people access to news, to the weather and to financial services. Via the Web, users can purchase books, computers, clothes, and any number of other items; you can book seats on planes and rooms in hotels. The possible uses of the Web seem endless, but there the technology is missing a crucial piece. Missing is a part of the Web which contains information about information - labeling, cataloging and descriptive information structured in such a way that allows Web pages to be properly searched and processed in particular by computer. In other words, what is now very much needed on the Web is metadata. W3C's Metadata Activity is concerned with ways to model and encode metadata. A particular priority of W3C is to use the Web to document the meaning of the metadata. Our strong interest in metadata has prompted development of the Resource Description Framework (RDF ™) and its relative PICS ™ (Platform for Internet Content Selection). PICS is now complete; work on RDF continues.
PICS consists of a suite of specifications which enable people to distribute metadata about the content of digital material in the form of "labels". These contain information about the content in simple, computer-readable form. Information can be given a label, which computers can then process in the background according to settings previously specified by the user, filtering out undesirable material or directing users to sites that may be of special interest to them. While PICS has general applicability to labelling pages for a variety of metadata purposes, the PICS specification was originally designed to allow parents and teachers to screen out materials unsuitable for children using the Internet. Rather than simply censoring the information itself, as various legislative bodies have suggested, PICS gives responsibility to users to control personally, or to delegate control of, what they receive on their browsers.
PICS work led to the development of the Resource Description Framework (RDF), which provides a more general treatment of metadata. RDF is a declarative language and provides a standard way for using XML to represent metadata in the form of statements about properties and relationships of items on the Web. Such items, known as resources, can be almost anything, provided it has a Web address. This means that you can associate metadata with a Web page, a graphic, an audio file, a movie clip, and so on.
This is a very simple example of RDF to give a feel of the way it works, and to demonstrate in a basic way the related concepts of RDF schemas and XML namespaces.
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <Description about="http://www.w3.org/Press/99Folio.pdf"> <dc:title>The W3C Folio 1999</dc:title> <dc:creator>W3C Communications Team</dc:creator> <dc:date>1999-03-10</dc:date> <dc:subject>Web development, World Wide Web Consortium, Interoperability of the Web</dc:subject> </Description> </RDF>
In this example, RDF has been used to express data about the W3C Folio, the Consortium's Prospectus available on-line on the W3C site. The basic concept is that metadata about this item on the Web is described through a collection of properties called an RDF Description. Notice that RDF uses the familiar arrangements of brackets, backslashes, tag names, attributes, and other elements of syntax which are part and parcel of XML. RDF is indeed, written in XML.
Line 1. This line declares that the whole XML chunk is an RDF expression and that it uses the format defined by the RDF Model and Syntax specification.
Line 2. This line indicates where on the Web the RDF "application", or rather its vocabulary and how this should be used, may be found. The location is given as http://purl.org/dc/elements/1.1/ which is the Dublin Core, a vocabulary associated with bibliographic information. The xmlns attribute (the XML namespaces attribute) associates the prefix "dc" with the Dublin Core Web address. Subsequent RDF statements may use this code to indicate that they relate to this particular set of RDF properties and grammar. More about this idea is explained in the later discussion of schemas.
Line 3. In the third line, the Description tag is used to indicate exactly to which Web resource the ensuing metadata will relate. This line says that the metadata descriptions will be about the Web resource http://www.w3.org/Press/99Folio.pdf, which is the W3C Prospectus in on-line form on the Web.
lines 4,5,6, and 7. These lines are the RDF statements themselves; they constitute the metadata. A number of properties are used; title, creator, date, and subject. These refer directly to properties defined as part of the Dublin Core RDF vocabulary. When the metadata is processed, software will recognize these property names and deal with the metadata accordingly.
RDF provides a framework in which independent communities can develop vocabularies that suit their specific needs and share vocabularies with other communities. In order to share vocabularies, the meaning of the terms must be spelled out in detail. The descriptions of these vocabulary sets are called RDF Schemas. A schema defines the meaning, characteristics, and relationships of a set of properties, and this may include constraints on potential values and the inheritance of properties from other schemas. The RDF language allows each document containing metadata to clarify which vocabulary is being used by assigning each vocabulary a Web address. The schema specification language is a declarative representation language influenced by ideas from knowledge representation (e.g. semantic nets, frames, predicate logic) as well as database schema specification languages and graph data models.
One of the best-known schemas is the Dublin Core, invented by the library community (their first meeting was in Dublin, Ohio, USA). Other schemas might deal with quite different domains. For an application on "English Pubs", properties might relate to "location", "food quality", "star rating", "nearby attractions" and so on. But what happens if two applications use the same tag names? This is where XML namespaces become important.
RDF uses the idea of the XML namespace to effectively allow RDF statements to reference a particular RDF vocabulary or "schema". Bear in mind that two applications might adopt the same headings and categories when it comes to organizing material. Perhaps the property address is used to mean a company location in one application, and a company's Web address in another. Potential conflicts are resolved because, through various programming mechanisms, a tag for a property name can use a short code which signals to which specific application vocabulary that tag "belongs". The "Namespaces in XML" specification describes such mechanisms in detail and is useful not only in the context of RDF but for many other XML applications also.
There are many practical uses of RDF. Here is a sampling of what is likely to be in the pipeline.
Thesauri and library classification schemes. These are well known examples of hierarchical systems for representing subject taxonomies in terms of the relationships between named concepts. The RDF Schema specification has exactly the features for creating RDF models that represent the logical structure of thesauri and other library classification systems.
Web sitemaps. A sitemap can be seen "internally" as a description of a Web site. The RDF Schema specification provides a mechanism for defining the vocabulary needed for this kind of application. With RDF you can describe how one item is related to another, how one page is "a descendant" of another, and so on.
Description of the contents of Web pages. This is one of the basic functions of the Dublin Core initiative. The Dublin Core is a set of 15 properties associated with bibliographic information. These can be used to describe items on the Web sufficiently well that search engines and other software can work much more efficiently. The Dublin Core Workshop series has been a major influence on the development of RDF.
Describing the formal structure of privacy practice descriptions. How does a site manage personal information? Will it disclose any of this information to others? What will the user get in return? W3C's Platform for Privacy Preferences Project (P3P) is working on a platform that allows users to be informed of a site's practices. Users, or software operating on their behalf, can then negotiate for a different privacy policy and come to an agreement with the site which will be the basis for any subsequent release of information. RDF may be used to describe the formal structure of privacy practice descriptions (see the Privacy Activity Statement for more information).
Descriptions of device capabilities The Mobile Access Activity has an interest in a framework for describing the display and processing capabilities of mobile Web devices. RDF provides a way to describe the capabilities and preferences associated with users and the hardware and software they are using to access the Web. This will permit Web content to be tailored to the specific needs of the user.
Rating systems These offer a way of labeling resources so that people (or computers) can filter information. RDF enables programmers to devise rating systems for any number of domains.
Expressing metadata about metadata. Suppose you have constructed a rating system for restaurants using RDF. You can also then use RDF to describe metadata about a given rating - the date it was given, by which organization, and so on. You can say things like: "The Wonderful English Pubs Guide says that The Tollgate at Holt has a five-star rating."
Digital Signatures. As the Web matures and more everyday tasks are performed online, so digital signatures will become increasingly important (this is explained in the Digital Signatures Activity Statement). RDF may be used to express information concerning what you are signing, what the significance of the signature is, the dates that the signature is valid, and so on.
W3C has created the RDF specifications as a framework for application-specific vocabularies. We have links with the Dublin Core Workshop series. The Dublin Core is an attempt to define bibliographic categories for Web pages. W3C expects to coordinate work on specific vocabularies only when they cross several application domains.
The RDF Syntax Working Group defined the RDF data model and selected the RDF/XML syntax. The RDF Schema Working Group developed a vocabulary to specify the sets of vocabularies specific to each application. The RDF Model and Syntax Working Group has completed its deliverable and has adjourned. The RDF Schema Working Group has adjourned pending implementation feedback on its Candidate Recommendation for the RDF Schema description language. An RDF Interest Group has been established to provide a forum for developers who are using the RDF specifications and to provide a locus for establishing critical mass for further work in the metadata area. The The Metadata Coordination Group is the forum in which dependencies on and from other activities such as XML and P3P are managed.
The Resource Description Framework Model and Syntax Specification became a W3C Recommendation on 22nd February, 1999. Meanwhile the Resource Description Framework (RDF) Schema Specification 1.0 was released as a W3C Candidate Recommendation on 27 March, 2000.
Member review of the RDF Schema Proposed Recommendation, while supporting the release of RDF Schema as a Recommendation, brought some feedback that a closer connection with, and possibly a merger into, the XML Schema work should be considered. A technical meeting was held in August, 1999 to consider this question. The results of that meeting were published as a W3C Note in October, 1999.
The PICS technical specifications remain stable W3C Recommendations. These specifications include the PICS Label Distribution Label Syntax and Communication Protocols Recommendation, the PICS Rating Services and Rating Systems Recommendation, the PICSRules Recommendation for writing filtering profiles, and the PICS Signed Labels (DSig) Recommendation.
The RDF Interest Group is the primary forum in which W3C Members and the public will discuss specific ideas for future work and gauge when there is sufficient interest to make a formal proposal for a new Working Group. Possible work items include: a rule language for augmenting client-side scripting facilities that acts on RDF metadata, a search (resource discovery) protocol based on RDF, and a metadata query protocol based on RDF. Specific application metadata vocabularies are generally not within the scope of a W3C activity, however W3C is open to work proposals for applications such as Privacy and Mobile Access which may be characterized as necessary for the infrastructure of the Web.
The contact for the Metadata Activity is Ralph Swick