Contents:
nearby:
XML syntax is a little tedious, but lots of people are evidently willing and able of editing it by hand. RDF adds another layer of tedium, but there are still a few folks willing to write it by hand. I make heavy use of reification/quoting in my representation of logical formulas in RDF. This adds another layer of tedium that I find unmanageable, and I have been writing XML/SGML/HTML by hand for 10 years.
I have had a lot of success lately using XSLT to screen-scrape RDF out of XHTML
pages, and I'm quite happy to use a hypertext editor (e.g. Amaya) to record my knowledge. I make use of the
occasional class
or rel
attribute to distinguish the
information that a particluar XSLT transformatoin is looking for from stuff
that just happens to be there for other reasons. For example, I can write a
typed link:
<a rel="interest" href="http://www.w3.org/XML/">XML</a>
on my home page, and convert it to RDF ala:
<rdf:Description about=""> <interest> <rdf:Description rdf:about="http://www.w3.org/XML/"> <rdfs:label>XML</rdfs:label> </rdf:Description> </interest> </rdf:Description>
But I want to go beyond the post-hoc/third-party style of screen-scraping and make it clear that I, the author of the web pages, am making the very RDF assertions that the XSLT transformation generates, when I write my web pages. And I'm starting to think that this techique is sufficiently useful that it will be deployed beyond the single-use transformations I have been doing, to a scale where managing collisions among link relationship names and class names is essential.
NOTE: This section is being reconsidered in light of GRDDL
The HTML 4.0 specification, in section 6.12 Link types, enumerates a few useful link relationships, and then adds:
Authors may wish to define additional link types not described in this specification. If they do so, they should use a profile to cite the conventions used to define the link types. Please see the
profile
attribute of theHEAD
element for more details.
We hereby establish the following conventions used to define some link types:
First, a mechanism somewhat analagous to the binding of element and attribute name prefixes to URIs in Namespaces in XML: a link relationship name whose prefix matches the id attribute of the head element denotes the URI resulting from the concatenation of the profile URI (in absolute form) and the local part of the link relationship name. For example:<html xmlns="http://www.w3.org/1999/xhtml"> <head id='rel' profile="http://www.w3.org/2000/07/hs78#"> <title>example</title> <link id='c' rel='rel:classes' href='http://www.w3.org/2000/07/hs78#' /> </head> ... </html>
A relationship name containing no colon (':') character has an empty ("")
prefix. The empty prefix should be declared explicitly ala <head
id='' profile='...'>
rather than by omitting the id
attribute.
Second, we define a link relationship called classes that allows class names to denote URIs. A link
element that uses this link relationship binds the prefix in its
id
attribute to the URI denoted by its href
attribute. In the following example, the rel attribute refers to this classes link relationship, and the class attribute refers
to the Rule class, described below.
<html xmlns="http://www.w3.org/1999/xhtml"> <head id='rel' profile="http://www.w3.org/2000/07/hs78#"> <title>example</title> <link id='c' rel='rel:classes' href='http://www.w3.org/2000/07/hs78#' /> </head> <body> <dl class="c:Rule"> ... </dl> </html>
@@hmm... I'm using the same URI for three mechanisms here: (a) link relationship namespace mechanism, (b) a namespace for a link relationship, (c) a namespace for three classes. I should probably provide separate URIs for each of those, and define this one as implying all three.target-namespace (obsolete)
A div
element bearing the global class name ClassTree declares a hierarchy of classes, one for each
li element in the div
element.
Here's an example from lists; note that we refer to the Seq class, but we declare the List class:
<div class="ClassTree"> <h2>Class hierarchy</h2> <ul> <li><a href="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq">Seq</a> <ul> <li><b id="List">List</b> e.g. <em id="empty">empty</em></li> </ul> </li> </ul> </div>
Note that you can declare instances ala <em
id="empty">empty</em>
or <a
href="..ref...">thatThing</a>
. This markup is translated to the
following RDF (see the whole file for details such as
namespace declarations):
<s:Class r:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq" s:label="Seq" /> <s:Class r:ID="List" s:label="List"> <s:subClassOf r:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq" /> </s:Class> <r:Description r:ID="empty"> <r:type r:resource="#List" /> </r:Description>
@@TODO: unions, enumerated sets.
This direct translation of id
attributes in HTML to id
attributes in RDF relies on an assumption that the RDF will be made available
at the same address as the HTML is available; i.e. they are variants of the
same generic resource (in the sense of section 14.44
Vary in the HTTP
specification; see also: Generic Resources)
@@TODO: model this generic/variant
relationship in RDF.
An li
element bearing the global class name Property declares a property whose URI and label are taken
from the id attribute and content of the first element in the property. The
domain and range of the property are taken from the first and second a
elements in the li element, respectively, if present. A p element in the li is
taken as a comment. For example:
<li class="Property"><b id="first">first</b>: <a href="#List">List</a> -> anything <p>first(l, x) = x is the first item in l</p> </li>
is transformed to:
<r:Property r:ID="first" s:label="first" s:domain="#List"> <s:comment>first(l, x) = x is the first item in l</s:comment> </r:Property>
You can link to the property (using <a
href="...xyz">xyz</a>
) as well as declaring it (using <b
id="xyz">xyz</a>
).
@@TODO: syntax for "facets", i.e. properties of properties; stuff like inverse, transitive, subproperty, etc.
A dl
element bearing the global class name Rule declares an inference rule.
It should have just one dt
/dd
pair: the
dt
is the conclusion of the rule, and the dd
contains a list (ul
) of premises (li
elements). Each
statement (i.e. premise or conclusion) is written as an element for the
predicate, an element for the subject, and an element for the object. Each of
the predicate, subject, and object elements is either an a
or a
var
element. var elements represent variables, and a
elements refer (by the URI reference in the href
attribute) to
constants. @@TODO: support for RDF
literals, i.e. strings, using tt.
See the lists schema for examples.
@@TODO: support for n-ary relations.
Currently, the only HTML structure interpreted as a knowledge representation by WebKB is the definition list. Its use is similar to the frame-oriented CG notation with strings as type names
2.1.6.1 HTML structures in The WebKB set of tools by Philippe MARTIN
I developed a style sheet for use with these markup
conventions. It only works with the empty prefix (i.e.
class="ClassTree"
, not class="my:ClassTree"
).
A general convention in this stylesheet is: underlined stuff is significant to the transformation to RDF (there are some exceptions: links in free text don't get transformed to RDF).
aka stuff to revisit when I upgrade this transformation...