Literals as Subjects

From Semantic Web Standards

This page summarises and synthesises the long and abundant discussion on the Semantic Web public mailing list about allowing literals in the subject position of an RDF triple. This feature has been requested early by the users of RDF but it seems that now there are defenders of the restriction and the debate is still open on whether the next version of RDF should allow literals as subjects. The following tries to present arguments on both sides in an objective way.

Links: (probably) the first mail in the thread in June; thread #1 and thread #2 in July. Most of the mails were also sent to the LOD mailing list, but not all mails were on both lists, so here are the LOD links for July: thread #1 and thread #2 from there...

Why do this work now?

  • It would be possible to express some "truth" about literals, e.g., { "Sub" ex:subStringOf "Subject" } or { "3" ex:divides "9" } or { "Literal" rdf:type ex:EnglishWrittenWord } or {7 ex:monthName "July" }
    • Counter argument: This can already be done using existing work-arounds, e.g., using { _:Sub owl:sameAs "Sub" ; _:Sub ex:subStringOf "Subject }
  • It would possible to write down statements that are anyway true in the RDFS or OWL semantics, e.g., { "abc" rdf:type rdfs:Literal } or { "abc" owl:sameAs "abc" };
  • It would make "generalised RDF" a standard. Note that the upper levels of the Semantic Web layer cake (SPARQL, OWL 2, RIF) are all using "generalised RDF" rather than RDF in their specifications;
  • It would simplify the inference rules of RDF/RDFS;
  • It would allow reasoners that produce triples with literals as subjects to be able to publish their inferences instead of having to filter them out;
  • The changes to the specification would be trivial and the spec would even be simplified.

Why not do this work?

  • Standard-compliant implementations would have to be updated, which leads to the following issues:
    • the mere cost of re-implemented things (Jeremy Carroll and others have suggested that this would outweigh the benefits);
    • the restriction on subjects can be used to optimise algorithms, especially when it comes to indexing and querying; lifting the restriction may therefore lead to reduced efficiency;
  • The syntax of RDF/XML would have to be changed (see Proposals below);
  • It has been suggested that allowing literals as subjects would incite people to misuse literals, e.g., using "London" instead of http://...London
    • Counter argument: the misuse of literals is an issue independent of the position in the triple, e.g., { ex:city rdf:type "London" } is allowed in RDF.

Other philosophical issues about the intuitive meaning of putting a literal in subject position and the relevance of such a practice as a representation of knowledge about the world have been raised by several persons. However, it is unclear whether such arguments can be retained against the development of a technical device.

The RDF Core Work Items also mentions that:

  • The inference rules would change
    • Counter argument: This is in fact an advantage (they would be simplified) and thereby, would favour optimisations in reasoners;
  • This would diverge RDF and OWL even further
    • Counter argument: This is simply not true and quite the opposite in fact. OWL DL would have exactly the same restrictions as before.
  • Changes to the OWL documents might be "required"
    • Counter argument: These changes would be quite simple and would only be for the best of the specifications.

Proposals

  • Leave this alone. It's too late, too expensive, or too disruptive to do anything about it now. (cf Jeremy Carroll, Ian Davis)
  • Promote the use of owl:sameAs for this: { _:bnode owl:sameAs "xyz" } then use _:bnode whenever there is a need to place a literal in subject position; ...
  • Update the model to just allow it. Let syntaxes be modified as needed.
  • Provide a new predicate, like sameAs, but just for this purpose.
  • Can someone re-write this to be more clear: Standardise generalised RDF(S) (which even allows bnodes and literals in predicate position) and define a subset (or profile) RDF-@ which forbids literals in subject and predicate position, as well as bnodes in predicate position; ...AZ: RDF-@ would be used to optimise data indexing and querying (more or less like OWL 2 QL defines a subset of OWL 2 DL for efficient query answering)...
  • Use data: URIs (Graham Klyne, 9 July 2010) Perhaps allow this only for transitional use.
  • Standardize N3 (or N4) and make RIF compatible as 'generalized RDF', define RDF/XML etc as a subset of

Likely technical issues

  • It has been suggested that Determining the correct set of inference rules may be problematic but the set of inferences would in fact be easier to define.
  • Optimising triple stores with generalised RDF: ...AZ: however the presence of general text at any position does not differ from databases where any datatype can appear in any column. Moreover, there are still a lot of queries that may have a join on objects, e.g., SELECT ?comm WHERE { ?a rdfs:comment ?comm . ?b rdfs:comment ?comm . }. So the gain in efficiency is not due to the limitation but to a heuristic on the form of the queries....

People interested in doing the work

?