Re: [all] Call for consensus on disambiguation - feedback integrated [ACTION-181]

Hi, all,

disambigSource*, entityTypeSource*: while I agree it is redundant in 
light of the fact that the entityType* and disambigIdent* already 
implicitly contain this information by virtue of being an URI, I 
included it because:
1) pointing to a particular knowledge base could answer 'where can I 
retrieve all possible values for this property?'.
2) using non-URI identifiers for cases where there's no URIs defined 
(non-English wordnets, etc). Although, the last compromise was to go 
with URIs, therefore making the disambigSource less relevant.

I wasn't aware of the isDefinedBy property, and haven't seen it used, 
but if this actually is the convention, we can avoid introducing these 
new attributes and can suggest people to look up the resource's 
isDefinedBy prop, as long as we're dealing with URIs. Pointing to both a 
identifier as well as a knowledge base feels like we're trying to solve 
a data management problem, where it would probably be better to delegate 
all this to the URI mechanism.

The question is whether the consumers of this data can work with this 
setup?

About disambigType: I agree with defining them as URIs instead of 
literal constants, makes sense. Merging it with entityTypeIdent feels 
unnatural, since disambigType specifies the relationship type of the 
disambigIdent, telling on what level we are disambiguating, whereas the 
entityType specifies the type of the underlying entity - equivalent to 
asserting an rdf:type about it. I'd prefer re-defining the entityType 
attribute to something that can generalize over all disambiguation levels.

Suggestion:
* entityType (current) -> generalize to disambigType, cover all levels;
* disambigType (current) -> rename to disambigLevel, change constants 
from literals to URIs.
* disambigSource* (current) -> drop, suggest people to use isDefinedBy
* entityTypeSource* (current) -> drop, suggest people to use isDefinedBy
* Every disambigurationRule must specify exactly one of disambigRef, 
-RefPointer or -Pointer.

Another to remove disambigType/disambigLevel altogether would be to 
explicitly enumerate all possible disambiguation levels as distinct 
attributes: disambigOntoConceptIdent, disambigLexConceptIdent, 
disambigEntityIdent. In any case, I wouldn't want to lose this 
information, as the distinction is important.

-- Tadej

On 20. 08. 2012 11:13, Felix Sasaki wrote:
> Hi Sebastian,
>
> 2012/8/20 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de 
> <mailto:hellmann@informatik.uni-leipzig.de>>
>
>     Hi Felix,
>     your proposal is based on the assumption, that more data is
>     available at these three URLs:
>
>     http:/nerd.eurecom.fr/ontology#Place
>     <http://nerd.eurecom.fr/ontology#Place>
>     http://dbpedia.org/resource/Dublin
>     http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3
>
>     While this assumption is ok for the Semantic Web, I am not sure
>     about the ITS world.
>
>
>
> You are right that in the "ITS world" one cannot be sure that more 
> data is available. But I would argue that implementors who process 
> links also in the ITS world very likely need to know (not 
> automatically, but as a prerequisite for implementation ) what the URL 
> is about. So I'd rather encourage implementors towards that "Semantic 
> Web like" approach than defining so many attributes.
>
> Feedback from the people who want to process "disambiguation" without 
> Semantic Web processing is of course very important here.
>
>
>     Furthermore, if you are attempting to minimize it, I would suggest
>      to merge
>     "its-entity-type-ident-ref" into "its-disambig-type-ref". You
>     wouldn't be limited to entity types and could use any of:
>
>
>
> Makes sense to me, thanks for the proposal - let's see what Tadej and 
> others say.
>
> Best,
>
> Felix
>
>
>     - http:/nerd.eurecom.fr/ontology#Place
>     <http://nerd.eurecom.fr/ontology#Place>
>     - http://dbpedia.org/ontology/Place
>     - http://www.monnet-project.eu/lemon#LexicalSense
>     - http://www.monnet-project.eu/lemon#LexicalEntry
>     - http://wordnet.princeton.edu/wndatamodel#NounWordSense
>     - http://wordnet.princeton.edu/wndatamodel#Synset
>
>     All the best,
>     Sebastian
>
>     Am 20.08.2012 09:44, schrieb Felix Sasaki:
>
>         Hi Sebastian, all,
>
>         thanks, Sebastian. From what you say in the wiki and in the
>         previous mail,
>         I think one could simplify things a lot.
>
>         The HTML example from Tadej *could* look like this:
>
>         <html lang="en">
>
>             <head>
>
>                <meta charset="utf-8" />
>
>                <title>Entity: Local Test</title>
>
>             </head>
>
>             <body>
>
>                 <p><span
>
>         its-entity-type-ident-ref="http:/nerd.eurecom.fr/ontology#Place <http://nerd.eurecom.fr/ontology#Place>"
>
>         its-disambig-ident-ref="http://dbpedia.org/resource/Dublin">Dublin</span>
>         is the <span
>
>         its-disambig-ident-ref="
>         http://www.w3.org/2006/03/wn/wn20/instances/worsense-capital-noun-3">capital</span>
>         of Ireland.</p>
>
>             </body>
>
>         </html>
>
>         That is, no explicit "resource" references for entity type and
>         disambiguation source, and no disambig-type.
>
>         Also, I think one could get rid of adding this kind of
>         information via
>         global rules - I really don't see a use case for that.
>
>         Tadej, others, thoughts? Maybe Yves as one of the implementors
>         processing
>         the output and other have some thoughts too?
>
>         Best,
>
>         Felix
>
>         2012/8/17 Sebastian Hellmann
>         <hellmann@informatik.uni-leipzig.de
>         <mailto:hellmann@informatik.uni-leipzig.de>>
>
>             Dear Felix,
>             to solve this issue I prepared a page:
>             http://wiki.nlp2rdf.org/wiki/**DBpedia_Spotlight<http://wiki.nlp2rdf.org/wiki/DBpedia_Spotlight>
>
>
>             It is a rough draft, so there are many mistakes, still.
>             Once it is mature,
>             I will send it to the DBpedia Spotlight and Apache Stanbol
>             lists to get
>             their feedback.
>             Note that I don't have a problem with these properties as
>             XML attributes,
>             where they can naturally occur only once and encoding an
>             implicit
>             dependency (attribute refering to another attribute) is
>             unproblematic. They
>             are, however, difficult to handle in RDF, even when
>             declaring them
>             functional.
>             I will report back, if there are any news,
>
>             All the best,
>             Sebastian
>
>
>
>
>             Am 14.08.2012 21:34, schrieb Felix Sasaki:
>
>                 Hi Sebastian, all,
>
>                 August is taking its tribute ... I am wondering if
>                 there any thoughts on
>                 Sebastian's mail below. It seems that some of the
>                 proposed ITS attributes
>                 are not needed, but I don't have the competence to
>                 evaluate this. Thoughts
>                 from others?  Sebastian, could you confirm that the
>                 output mentioned in
>                 this other thread
>
>                 http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>                 lt/2012Aug/0168.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0168.html>
>
>
>
>                 is correct for NIF? I then would create a test case
>                 for our test suite,
>                 see
>
>                 http://lists.w3.org/Archives/**Public/public-multilingualweb-**
>                 lt-tests/2012Aug/0003.html<http://lists.w3.org/Archives/Public/public-multilingualweb-lt-tests/2012Aug/0003.html>
>
>
>
>                 Thanks,
>
>                 Felix
>
>                 Am Donnerstag, 9. August 2012 schrieb Sebastian Hellmann :
>
>                   Hi Felix,
>
>                     below mostly my opinion on this. Nothing, wrong
>                     with including these
>                     properties, but they might not make sense in RDF.
>                     If you think, that
>                     there
>                     are people who would really use these properties
>                     in RDF, then go ahead
>                     and
>                     include them. Personally, *I* wouldn't know for
>                     what *I* could use them.
>                     More comments inline.
>
>                     Am 09.08.2012 15 <tel:09.08.2012%2015>:20, schrieb
>                     Felix Sasaki:
>
>                       its:entityTypeSourceRef
>
>                           I really do not find this property helpful.
>
>                     Do you see any sense in saying that
>                     http://dbpedia.org/resource/****
>                     Dublin
>                     <http://dbpedia.org/resource/**Dublin><http://dbpedia.org/**
>                     resource/Dublin
>                     <http://dbpedia.org/resource/Dublin>>is from
>
>
>                     http://dbpedia.org ? In the linked data world
>                     http://dbpedia.org/resource/
>                     **Dublin
>                     <http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>
>                     comes from
>                     http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin><
>
>
>                     http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>>.
>                     So you might specify a way to convert that to ITS,
>                     but we might not need
>
>                     an RDF property for this.
>
>                        its:disambigType
>
>                         "(http://www.w3.org/2005/11/****its/lexicalConcept|
>                         <http://www.w3.org/2005/11/****its/lexicalConcept%7C><http://www.w3.org/2005/11/**its/lexicalConcept%7C>
>                         <http://**www.w3.org/2005/11/its/**lexicalConcept%7C
>                         <http://www.w3.org/2005/11/its/**lexicalConcept%7C><http://www.w3.org/2005/11/its/lexicalConcept%7C>
>                         http://www.w3.org/2005/11/its/****ontologyConcept|http://www.**w3.**
>                         <http://www.w3.org/2005/11/its/****ontologyConcept%7Chttp://www.**w3.**><http://www.w3.org/2005/11/its/**ontologyConcept%7Chttp://www.w3.**>
>                         org/2005/11/its/<http://www.**w3.org/2005/11/its/**
>                         <http://w3.org/2005/11/its/**>
>                         ontologyConcept%7Chttp://www.**w3.org/2005/11/its/
>                         <http://w3.org/2005/11/its/><http://www.w3.org/2005/11/its/ontologyConcept%7Chttp://www.w3.org/2005/11/its/>
>
>
>                         entity)"
>
>                           I am unsure about this one.
>
>                        its:entityTypeRef
>                     is already rdf:type, so it would be a duplicate to
>                     have its:entityTypeRef
>                     in RDF. For
>                     http://dbpedia.org/resource/****Dublin<http://dbpedia.org/resource/**Dublin>
>                     <http://dbpedia.org/**resource/Dublin<http://dbpedia.org/resource/Dublin>
>
>                         its:**entityTypeRef would be one of:
>
>                     http://dbpedia.org/ontology/****PopulatedPlace<http://dbpedia.org/ontology/**PopulatedPlace>
>                     <http://dbpedia.**org/ontology/PopulatedPlace<http://dbpedia.org/ontology/PopulatedPlace>
>                     http://dbpedia.org/ontology/****Settlement<http://dbpedia.org/ontology/**Settlement>
>                     <http://dbpedia.org/**ontology/Settlement<http://dbpedia.org/ontology/Settlement>
>                     http://umbel.org/umbel/rc/****PopulatedPlace<http://umbel.org/umbel/rc/**PopulatedPlace>
>                     <http://umbel.**org/umbel/rc/PopulatedPlace<http://umbel.org/umbel/rc/PopulatedPlace>
>                     http://dbpedia.org/ontology/****Place<http://dbpedia.org/ontology/**Place><
>                     http://dbpedia.org/ontology/**Place
>                     <http://dbpedia.org/ontology/Place>>
>                     http://umbel.org/umbel/rc/****Village<http://umbel.org/umbel/rc/**Village><
>                     http://umbel.org/umbel/rc/**Village
>                     <http://umbel.org/umbel/rc/Village>>
>                     http://umbel.org/umbel/rc/****Location_Underspecified<http://umbel.org/umbel/rc/**Location_Underspecified>
>                     <http:/**/umbel.org/umbel/rc/Location_**Underspecified
>                     <http://umbel.org/umbel/rc/Location_**Underspecified><http://umbel.org/umbel/rc/Location_Underspecified>
>                     http://schema.org/Place
>                     http://www.w3.org/2002/07/owl#****Thing<http://www.w3.org/2002/07/owl#**Thing>
>                     <http://www.w3.org/**2002/07/owl#Thing<http://www.w3.org/2002/07/owl#Thing>
>                     http://www.opengis.net/gml/_****Feature<http://www.opengis.net/gml/_**Feature>
>                     <http://www.opengis.**net/gml/_Feature<http://www.opengis.net/gml/_Feature>
>                     +
>                     http:/nerd.eurecom.fr/****ontology#Place
>                     <http://nerd.eurecom.fr/****ontology#Place><http://nerd.eurecom.fr/**ontology#Place>
>                     <http://nerd.**eurecom.fr/ontology#Place
>                     <http://eurecom.fr/ontology#Place><http://nerd.eurecom.fr/ontology#Place>
>
>
>
>                     If you have a Problem with this plurality. Then it
>                     might be good to
>                     include an annotation property
>                      its:preferedEntityTypeRef
>                     So the data is there already in RDF, the problem
>                     is rather to find a way
>                     to convert it back to ITS.
>
>                     All the best,
>                     Sebastian
>
>
>
>                     Thanks,
>
>
>                     Felix
>
>                     2012/8/9 Felix Sasaki <fsasaki@w3.org
>                     <mailto:fsasaki@w3.org>>
>
>                        Thanks for this, Tadej, looks good. There is
>                     just one comment I don't
>                     see
>                     reflected:
>
>                     7) A question on the data category in general and
>                     the "rules" element:
>                     does it make sense to make some attributes
>                     mandatory? Currently, this
>                     would
>                     be valid:
>                     <its:disambiguation
>                     selector="/text/body/p[@id='****dublin']/>
>
>
>
>
>                     It seems that still all metadata items /
>                     attributes are optional. Is
>                     there
>                     a way to be more specific about what must or must
>                     not appear together,
>                     what
>                     is optional etc?
>
>                     Best,
>
>                     Felix
>
>                     2012/8/9 Tadej Stajner <tadej.stajner@ijs.si
>                     <mailto:tadej.stajner@ijs.si>>
>
>                          Hi,
>                         thanks for the tips. I covered them, and I
>                     agree towards removing the
>                     local XPath, since it has very limited use. Here
>                     is another incorporating
>                     all these comments.
>                     -- Tadej
>
>                     On 8/3/2012 1:07 PM, Felix Sasaki wrote:
>
>                     Hi Tadej, all,
>
>                         thanks a lot for this. Just a few comments /
>                     questions:
>
>                         1) About "The information applies to the
>                     textual content of the
>                     element, including child elements and
>                     attributes.": wouldn't it make more
>                     sense to say that this applies to only the content
>                     of the element? E.g.
>                     if
>                     you annotate the "span" element in
>
>                         <p>I have seen <span id="timbl"><span
>                     class="firstame">Tim</span>
>                     <span
>                     class="lastname">Berners-Lee</****span></span> in
>                     the olympics opening
>
>
>                     ceremony</p>
>
>                         You want to express disambiguation information
>                     about the "span"
>                     element
>                     with the "id" attribute, but not about the "id"
>                     attribute or the nested
>                     span elements. So inheritance probably should be:
>                     "There is no
>                     inheritance". What do you think?
>
>
>                         2) About "The Entity data category can be
>                     expressed with global rules,
>                     or locally on an individual element.": This should
>                     probably be "The
>                     Disambiguation data category can be expressed with
>                     global rules, or
>                     locally
>                     on an individual element."
>
>                         3) About local markup: for other data
>                     categories, we don't have the
>                     "pointer" attributes as local markup, since
>                     processing of XPath in local
>                     markup can be very expensive. So I would propose
>                     to drop the local
>                     pointer
>                     attributes here too.
>
>                         4) In the table at the end, "Global pointing
>                     to existing information"
>                     should be "yes" I think.
>
>                         5) This selector
>                     <its:disambiguation
>                     selector="/text/body/p/#****dublin" ...
>                     In XPath should be
>                     <its:disambiguation
>                     selector="/text/body/p[@id='****dublin']
>
>
>
>                         6) To follow the conventions from other data
>                     categories, the
>                     "its:disambiguation" element should probably be called
>                     "its:disambiguationRule".
>
>                         7) A question on the data category in general
>                     and the "rules" element:
>                     does it make sense to make some attributes
>                     mandatory? Currently, this
>                     would
>                     be valid:
>                     <its:disambiguation
>                     selector="/text/body/p[@id='****dublin']/>
>
>
>
>                         8) A question to the others in this thread
>                     (Guiseppe, Pablo, Raphael,
>                     Sebastian): is this a representation that makes
>                     sense to you and that
>                     your
>                     tools could produce?
>
>                         9) A question to the MT guys: is the way
>                     "entity and disambiguation"
>                     information is represented here useful for you?
>
>                         Best,
>
>                         Felix
>
>                     2012/8/3 Tadej Štajner <tadej.stajner@ijs.si
>                     <mailto:tadej.stajner@ijs.si>>
>
>                        Hi,
>                     I incorporated some comments that 'entity' was
>                     still conflated from
>                     several distinct things in the data category
>                     proposal. Now, we
>                     distinguish
>                     between disambiguation of word sense, ontology
>                     concept and entity
>                     instance.
>                     Following that, it seems that 'Disambiguation' was
>                     the better name for
>                     the
>                     data category.
>
>                     Thanks for everyone's input!
>
>                     -- Tadej
>
>                     On 02. 08. 2012 17 <tel:02.%2008.%202012%2017>:26,
>                     Tadej Štajner wrote:
>
>                        Apologies -- wrong link on the previous mail.
>                     This is the relevant one:
>                     http://www.w3.org/****International/multilingualweb/**
>                     **lt/track/actions/181<http://www.w3.org/**International/multilingualweb/**lt/track/actions/181>
>                     <http://**www.w3.org/International/**multilingualweb/lt/track/**
>                     <http://www.w3.org/International/**multilingualweb/lt/track/**>
>
>
>                     actions/181<http://www.w3.org/International/multilingualweb/lt/track/actions/181>
>                     -- Tadej
>
>                     On 02. 08. 2012 17 <tel:02.%2008.%202012%2017>:22,
>                     Tadej Štajner wrote:
>
>                     Dipl. Inf. Sebastian Hellmann
>                     Department of Computer Science, University of Leipzig
>                     Events:
>                         *
>                     http://sabre2012.infai.org/****mlode<http://sabre2012.infai.org/**mlode><
>
>
>                     http://sabre2012.infai.org/**mlode
>                     <http://sabre2012.infai.org/mlode>>(Leipzig,
>                     Sept. 23-24-25, 2012)
>
>                         * http://wole2012.eurecom.fr (*Deadline: July
>                     31st 2012*)
>                     Projects: http://nlp2rdf.org , http://dbpedia.org
>                     Homepage:
>                     http://bis.informatik.uni-**le**ipzig.de/SebastianHellmann
>                     <http://ipzig.de/SebastianHellmann><http://leipzig.de/SebastianHellmann>
>                     <htt**p://bis.informatik.uni-**leipzig.de/SebastianHellmann
>                     <http://leipzig.de/SebastianHellmann><http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>                     Research Group: http://aksw.org
>
>
>
>             --
>             Dipl. Inf. Sebastian Hellmann
>             Department of Computer Science, University of Leipzig
>             Events:
>                * http://sabre2012.infai.org/**mlode
>             <http://sabre2012.infai.org/mlode>(Leipzig, Sept.
>             23-24-25, 2012)
>                * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>             Projects: http://nlp2rdf.org , http://dbpedia.org
>             Homepage:
>             http://bis.informatik.uni-**leipzig.de/SebastianHellmann
>             <http://leipzig.de/SebastianHellmann><http://bis.informatik.uni-leipzig.de/SebastianHellmann>
>             Research Group: http://aksw.org
>
>
>
>
>
>
>     -- 
>     Dipl. Inf. Sebastian Hellmann
>     Department of Computer Science, University of Leipzig
>     Events:
>       * http://sabre2012.infai.org/mlode (Leipzig, Sept. 23-24-25, 2012)
>       * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
>     Projects: http://nlp2rdf.org , http://dbpedia.org
>     Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>     Research Group: http://aksw.org
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>

Received on Monday, 20 August 2012 09:46:16 UTC