Difference between revisions of "WebSchemas/sameAs"

From W3C Wiki
Jump to: navigation, search
m (moved WebSchemas/sameThingAs to WebSchemas/sameAs: Shorter name is better)
(revised to use sameAs)
Line 1: Line 1:
{{Template:SchemaDotOrgProposal|name=sameThingAs|status=Proposal}}
+
{{Template:SchemaDotOrgProposal|name=sameAs|status=Proposal}}
  
 
This is a proposal to improve and clarify schema.org's handling of identity issues, in particular for the common case where diverse sites provide information about the same real world entity.
 
This is a proposal to improve and clarify schema.org's handling of identity issues, in particular for the common case where diverse sites provide information about the same real world entity.
  
It adds a property to schema.org, 'sameThingAs' that can be used to indicate when a single real-world entity is being described.
+
It adds a property to schema.org, 'sameAs' that can be used to indicate when a single real-world entity is being described.  
 +
 
 +
This property is inspired by owl:sameAs and has essentially the same semantics, although schema.org HTML data tends to blur distinctions which are important to OWL users, such as URIs for entities versus the pages that describe them. It is also similar in intent to the microformats community's use of  ''rel='me' ''.
 +
 
 +
An earlier version of this proposal suggested sameThingAs. This version is revised to use sameAs since (a) the meaning is very close to owl:sameAs (b) the markup is expected on millions/billions of pages; why not save 5 chars?
  
 
== Background ==
 
== Background ==
Line 42: Line 46:
 
The central challenge here is to allow simplicity for authors and publishers, while making it possible to reconstitute a useful entity-relationship data graph from markup. It is also important to be able to indicate when two different pages are talking about the same underlying real-world entity.  
 
The central challenge here is to allow simplicity for authors and publishers, while making it possible to reconstitute a useful entity-relationship data graph from markup. It is also important to be able to indicate when two different pages are talking about the same underlying real-world entity.  
  
No single solution will work for all parties. The goal of this proposal is to add a simple construct that works alongside 'url' property. While 'url' points from something to a page/record that's mostly about it and is in some sense 'its' page, sameThingAs can be used more freely wherever we have useful identifiers (direct or via-some-page) for an entity of interest.
+
No single solution will work for all parties. The goal of this proposal is to add a simple construct that works alongside 'url' property. While 'url' points from something to a page/record that's mostly about it and is in some sense 'its' page, sameAs can be used more freely wherever we have useful identifiers (direct or via-some-page) for an entity of interest.
  
 
=== Related work ===
 
=== Related work ===
Line 55: Line 59:
 
For example, in Microdata, the 'itemid' attribute is available; in RDFa Lite, a comparable 'resource' attribute is available.  We confirm explicitly that such identifiers are welcome and encouraged in schema.org markup, although we cannot advise at this stage on exactly which identifiers to use.
 
For example, in Microdata, the 'itemid' attribute is available; in RDFa Lite, a comparable 'resource' attribute is available.  We confirm explicitly that such identifiers are welcome and encouraged in schema.org markup, although we cannot advise at this stage on exactly which identifiers to use.
  
2. '''We add a property to the [http://schema.org/Thing Thing] type, called 'sameThingAs'. '''
+
2. '''We add a property to the [http://schema.org/Thing Thing] type, called 'sameAs'. '''
  
 
(alternative: call it sameAs, since it means owl:sameAs)
 
(alternative: call it sameAs, since it means owl:sameAs)
  
The value of 'sameThingAs' can be another Thing (really, the same thing; there's only one underlying entity). This is used with the kinds of direct entity identifiers we see in (Microdata) 'itemid' and (RDFa Lite) 'resource' attributes. It can also be a document. For example, we might link from a description of Tom Baker to the page on Wikipedia about him.
+
The value of 'sameAs' can be another Thing (really, the same thing; there's only one underlying entity). This is used with the kinds of direct entity identifiers we see in (Microdata) 'itemid' and (RDFa Lite) 'resource' attributes. It can also be a document. For example, we might link from a description of Tom Baker to the page on Wikipedia about him.
  
 
3. We clarify that the schema.org 'url' property isn't directly applicable in this case, since there is no strong association between Tom Baker and the Wikipedia page, beyond the relationship by topic. We keep 'url' for the stronger case where the page is in some sense 'his'; roughly the notion of a 'homepage'.
 
3. We clarify that the schema.org 'url' property isn't directly applicable in this case, since there is no strong association between Tom Baker and the Wikipedia page, beyond the relationship by topic. We keep 'url' for the stronger case where the page is in some sense 'his'; roughly the notion of a 'homepage'.
Line 74: Line 78:
 
* Freebase, http://www.freebase.com/view/en/douglas_adams
 
* Freebase, http://www.freebase.com/view/en/douglas_adams
  
Clearly enough, we have 4 of something (pages), and 1 of something (the person). The schema:sameThingAs relationship holds between any pairs here (or any of these and Douglas Adams himself).
+
Clearly enough, we have 4 of something (pages), and 1 of something (the person). The schema:sameAs relationship holds between any pairs here (or any of these and Douglas Adams himself).
  
 
The existing W3C 'owl:sameAs' property asserts strong, absolute identity.  
 
The existing W3C 'owl:sameAs' property asserts strong, absolute identity.  
Line 80: Line 84:
 
If we said  'http://en.wikipedia.org/wiki/Douglas_Adams owl:sameAs http://www.rottentomatoes.com/celebrity/douglas_adams/' we are saying that what we have here are two identifiers for the same thing.
 
If we said  'http://en.wikipedia.org/wiki/Douglas_Adams owl:sameAs http://www.rottentomatoes.com/celebrity/douglas_adams/' we are saying that what we have here are two identifiers for the same thing.
  
What we want to say with schema:sameThingAs is a little different. We're saying that there is one underlying real world entity, but allowing the relationship type to be used also between documents that indirectly indicate that entity.  
+
What we want to say with schema:sameAs is a little different. We're saying that there is one underlying real world entity, but allowing the relationship type to be used also between documents that indirectly indicate that entity.  
  
 
=== Fictional profile page about someone... ===
 
=== Fictional profile page about someone... ===
  
So, imagine an IMDB-like site with profiles of writers/actors/directors etc. It is quite likely such a site would already have a hyperlink from a profile page for someone, over to another page on -say- Wikipedia or Twitter, for that same person. By using ''sameThingAs'' we can indicate that the remote page has the same 'primary topic'.  
+
So, imagine an IMDB-like site with profiles of writers/actors/directors etc. It is quite likely such a site would already have a hyperlink from a profile page for someone, over to another page on -say- Wikipedia or Twitter, for that same person. By using ''sameAs'' we can indicate that the remote page has the same 'primary topic'.  
  
 
Q: ''Do we agree that this is definitively better than stretching the 'url' property to such scenarios''?
 
Q: ''Do we agree that this is definitively better than stretching the 'url' property to such scenarios''?
Line 94: Line 98:
 
<p itemscope itemtype="http://schema.org/Person">
 
<p itemscope itemtype="http://schema.org/Person">
 
This is a page about <span itemprop="name">Douglas Adams.</a>
 
This is a page about <span itemprop="name">Douglas Adams.</a>
See <a itemprop="sameThingAs" href="http://en.wikipedia.org/wiki/Douglas_Adams">wikipedia entry</a> for more details.
+
See <a itemprop="sameAs" href="http://en.wikipedia.org/wiki/Douglas_Adams">wikipedia entry</a> for more details.
 
</p>
 
</p>
 
</body>
 
</body>
Line 100: Line 104:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Q: ''What if the value of the sameThingAs property was a URI designed to identify the thing itself directly. Perhaps even a URN or UUID identifier?''
+
Q: ''What if the value of the sameAs property was a URI designed to identify the thing itself directly. Perhaps even a URN or UUID identifier?''
A: This would be the equivalent representation: <link itemprop="sameThingAs" href="http://dbpedia/resource/Douglas_Adams"/>
+
A: This would be the equivalent representation: <link itemprop="sameAs" href="http://dbpedia/resource/Douglas_Adams"/>
  
 
== FAQs ==
 
== FAQs ==
  
* Q: Why didn't you use owl:sameAs A: we would have (rightly) been accused of over-using it.  
+
* Q: <strike>Why didn't you use owl:sameAs A: we would have (rightly) been accused of over-using it. </strike> (earlier version was for a schema:sameThingAs)
 
* Q: Why not just use 'url' from schema.org? A: This was a possibility, but the current design keeps 'url's meaning more restricted.
 
* Q: Why not just use 'url' from schema.org? A: This was a possibility, but the current design keeps 'url's meaning more restricted.
* Q: When we get two URIs related via sameThingAs, how do we know if each link is 'the thing' or 'a page about the thing'? A: This is somewhat heuristic, but point is that existing data will already be mixing these together...
+
* Q: When we get two URIs related via sameAs, how do we know if each link is 'the thing' or 'a page about the thing'? A: This is somewhat heuristic, but point is that existing data will already be mixing these together...
  
 
== Discussion elsewhere ==
 
== Discussion elsewhere ==

Revision as of 00:59, 15 May 2013


This is a WebSchemas proposal sameAs for schema.org. See Proposals listing for more. Status: Proposal



This is a proposal to improve and clarify schema.org's handling of identity issues, in particular for the common case where diverse sites provide information about the same real world entity.

It adds a property to schema.org, 'sameAs' that can be used to indicate when a single real-world entity is being described.

This property is inspired by owl:sameAs and has essentially the same semantics, although schema.org HTML data tends to blur distinctions which are important to OWL users, such as URIs for entities versus the pages that describe them. It is also similar in intent to the microformats community's use of rel='me' .

An earlier version of this proposal suggested sameThingAs. This version is revised to use sameAs since (a) the meaning is very close to owl:sameAs (b) the markup is expected on millions/billions of pages; why not save 5 chars?

Background

Schema.org's data model is of linked entities and relationships, with an emphasis on their description using structured data within ordinary HTML Web pages.

Both HTML5 Microdata and RDFa Lite provide attributes ('itemid' and 'resource', respectively) whose values are identifiers for 'the thing itself'. In Microdata terms, 'itemid' gives us a 'global identifier', whose meaning is contextual, and based on the vocabulary being used. For example, a vocabulary defining a type 'Book' might use an itemid like 'urn:isbn:0-330-34032-8'. Similarly, in RDF, the word 'resource' is effectively a synonym for 'thing', and RDFa Lite's 'resource' attribute allows URI identifiers to be given for each thing being described.

When structured data is deployed within linked HTML pages, property values may also be URLs/URIs. For simplicity and usability, it is common for "identifiers for a page" and "identifiers for the main thing described by a page" to be conflated.

For example, here is what we see currently on the IMDB site, when looking at a page for a particular work (markup fixed for readability):

  1. <h4>Stars:</h4>
  2. <a href="/name/nm0010930/"   itemprop="actor">Douglas Adams</a>, 
  3. <a  href="/name/nm0048982/"   itemprop="actor">Tom Baker</a> 
  4. and <a  href="/name/nm3035100/"   itemprop="actor">Hans Peter Brondmo</a>
  5. </div>

Here, our markup is talking about a CreativeWork, the documentary Hyperland from 1990. The cast list includes a link (typed 'actor') to a page about the actor Tom Baker. There is also a Wikipedia entry about the same documentary, about the actor Tom Baker, and about the writer and co-star Douglas Adams.

While some linked data sources try to carefully maintain the distinction between 'things' and 'pages that stand for those things', this is not always easy for many of the environments where schema.org markup (whether Microdata or RDFa Lite) is deployed.

One schema.org strategy for dealing with this is the 'url' property. From the getting started guide:

Using the url property. Some web pages are about a specific item. For example, you may have a web page about a single person, which you could mark up using the Person item type. Other pages have a collection of items described on them. For example, your company site could have a page listing employees, with a link to a profile page for each person. For pages like this with a collection of items, you should mark up each item separately (in this case as a series of Persons) and add the url property to the link to the corresponding page for each item, like this:

  1. <div itemscope itemtype="http://schema.org/Person">
  2.   <a href="alice.html" itemprop="url">Alice Jones</a>
  3. </div>
  4. <div itemscope itemtype="http://schema.org/Person">
  5.   <a href="bob.html" itemprop="url">Bob Smith</a>
  6. </div>

The central challenge here is to allow simplicity for authors and publishers, while making it possible to reconstitute a useful entity-relationship data graph from markup. It is also important to be able to indicate when two different pages are talking about the same underlying real-world entity.

No single solution will work for all parties. The goal of this proposal is to add a simple construct that works alongside 'url' property. While 'url' points from something to a page/record that's mostly about it and is in some sense 'its' page, sameAs can be used more freely wherever we have useful identifiers (direct or via-some-page) for an entity of interest.

Related work

Proposal details

Schema.org Identity Clarifications

1. In various notations, it is possible to distinguish identifiers for the underlying real-world entity, from the record or page identifiers used for publications about that entity. For example, in Microdata, the 'itemid' attribute is available; in RDFa Lite, a comparable 'resource' attribute is available. We confirm explicitly that such identifiers are welcome and encouraged in schema.org markup, although we cannot advise at this stage on exactly which identifiers to use.

2. We add a property to the Thing type, called 'sameAs'.

(alternative: call it sameAs, since it means owl:sameAs)

The value of 'sameAs' can be another Thing (really, the same thing; there's only one underlying entity). This is used with the kinds of direct entity identifiers we see in (Microdata) 'itemid' and (RDFa Lite) 'resource' attributes. It can also be a document. For example, we might link from a description of Tom Baker to the page on Wikipedia about him.

3. We clarify that the schema.org 'url' property isn't directly applicable in this case, since there is no strong association between Tom Baker and the Wikipedia page, beyond the relationship by topic. We keep 'url' for the stronger case where the page is in some sense 'his'; roughly the notion of a 'homepage'.

Example

Take case of the actor, director, writer Douglas Adams.

There are pages about him,

Clearly enough, we have 4 of something (pages), and 1 of something (the person). The schema:sameAs relationship holds between any pairs here (or any of these and Douglas Adams himself).

The existing W3C 'owl:sameAs' property asserts strong, absolute identity.

If we said 'http://en.wikipedia.org/wiki/Douglas_Adams owl:sameAs http://www.rottentomatoes.com/celebrity/douglas_adams/' we are saying that what we have here are two identifiers for the same thing.

What we want to say with schema:sameAs is a little different. We're saying that there is one underlying real world entity, but allowing the relationship type to be used also between documents that indirectly indicate that entity.

Fictional profile page about someone...

So, imagine an IMDB-like site with profiles of writers/actors/directors etc. It is quite likely such a site would already have a hyperlink from a profile page for someone, over to another page on -say- Wikipedia or Twitter, for that same person. By using sameAs we can indicate that the remote page has the same 'primary topic'.

Q: Do we agree that this is definitively better than stretching the 'url' property to such scenarios?

  1. <html>
  2. <body>
  3. <h1>Author Profile: Douglas Adams</h1>
  4. <p itemscope itemtype="http://schema.org/Person">
  5. This is a page about <span itemprop="name">Douglas Adams.</a>
  6. See <a itemprop="sameAs" href="http://en.wikipedia.org/wiki/Douglas_Adams">wikipedia entry</a> for more details.
  7. </p>
  8. </body>
  9. </html>

Q: What if the value of the sameAs property was a URI designed to identify the thing itself directly. Perhaps even a URN or UUID identifier? A: This would be the equivalent representation: <link itemprop="sameAs" href="http://dbpedia/resource/Douglas_Adams"/>

FAQs

  • Q: Why didn't you use owl:sameAs A: we would have (rightly) been accused of over-using it. (earlier version was for a schema:sameThingAs)
  • Q: Why not just use 'url' from schema.org? A: This was a possibility, but the current design keeps 'url's meaning more restricted.
  • Q: When we get two URIs related via sameAs, how do we know if each link is 'the thing' or 'a page about the thing'? A: This is somewhat heuristic, but point is that existing data will already be mixing these together...

Discussion elsewhere

See also...

  • IRC chat with Ed Summers, Dan Brickley, Mo McRoberts.