It's important to link data on the Semantic Web, but how is this achieved? Since there are several ways to find new data, each of these ways can be used deliberately to link:
- By dereferencing the fragmentless racine of a URI. Find information about
- Follow well known generic link properties such as rdfs:seeAlso
- Follow application specific link properties (like what?)
So, which properties are well known for linking to new RDF data? That's one thing that a huge search engine for the Semantic Web might be able to solve descriptively: it could expose a list of properties whose objects tend to be HTTP URIs that 200 OK resolve to some application/rdf+xml or one of the other RDF MIME types.
What about application specific link properties? What do FOAF crawlers and Tabulator follow?
Known Properties for Linking
- rdfs:seeAlso - for generic links, a little like @href in HTML.
- skos:exactMatch and other subproperties of skos:mappingRelation - to link concepts across SKOS vocabularies
What about finding new generic and application specific links? TimBL suggests triples like
foaf:made link:listDocumentProperty foaf:pubs to say that foaf:pubs is a property whose range is a class of RDF documents that contain foaf:made as a property. This touches on GraphSchemata.
Although the results of GRDDL might be indistinguishable from any other graphs, some of the source formats (XHTML, Atom) do include explicitly GETable URIs (e.g. in href links). The linked documents may also be GRDDLable... Maybe there's use in a subproperty of rdfs:seeAlso that says "the linked document might be RDF but if you only see text/html you'll have to apply GRDDL"?
A more radical viewpoint is that all URIs should be dereferenceable, which would mean that every RDF triple becomes a hyperlink and there is no need for specific linking properties.
This approach is followed by people like Chris Bizer, Richard Cyganiak and (I guess) also Tim Berners-Lee (see Linked Data) and is implemented by tools like Tabulator, Disco or the Semantic Web Client Library.
This means for a Semantic Web crawler that it should try to dereference all URIs that it finds in any RDF document and add all retrieved data to its cache.