Re: [ISSUE-29][ACTION-164] ITS2NIF2ITS - RDF roundtrip

Hi Sebastian, all,

I tried to create the NIF output (since we need two implementations) for

<html xmlns:its="http://www.w3.org/2005/11/its">
    <body>
        <h2 its:translate="yes">Welcome to <span its:translate="no"
                >Dublin</span> in <b its:translate="no">Ireland</b>! </h2>
    </body>
</html>

(I used an XML input here, but otherwise this is the same like your example
in the wiki.

Does the below output make sense? I am sure that the uuid is wrong, but I
don't know how to generate one.


[

@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#>.
@prefix str: <http://nlp2rdf.lod2.eu/schema/string/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
<http://example.com/exampledoc.html#offset_0_50> str:referenceContext
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50>;
	a <str:String>;
	itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_14_44> str:referenceContext
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44>;
	a <str:String>;
	itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_25_31> str:referenceContext
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31>;
	a <str:String>;
	itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_25_32> str:referenceContext
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32>;
	a <str:String>;
	itsrdf:translate "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<http://example.com/exampledoc.html#offset_5_49> str:referenceContext
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49>;
	a <str:String>;
	itsrdf:translate "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#0_50> str:isString
"\r\n    \r\n        Welcome to Dublin in Ireland! \r\n    \r\n";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#14_44> str:isString
"Welcome to Dublin in Ireland! ";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_31> str:isString "Dublin";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#25_32> str:isString "Ireland";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.
<urn:uuid:CEB9FD94-6779-4257-B992-C853617CB791#5_49> str:isString
"\r\n        Welcome to Dublin in Ireland! \r\n    ";
	str:occursIn <http://example.com/exampledoc.html>;
	a <str:Context>.

]

Thanks,

Felix

2012/8/9 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>

> Hi Jirka,
> thanks, for your feedback. I thought it was a requirement that the DOM
> should not be touched. I really never had any whitespace problems in any
> RDF serialization formats, so this was new to me. By the way, I can
> understand now, what your problem with the bloated mapping is. We really
> don't need to serialize it. Actually it can be kept in memory, which is
> more efficient. I added serialization as optional. Also I made an XML
> version, because for transferring such kind of data, XML is much better
> suited. (Is the XML alright?)  I made all the changes you suggested, the
> new version is online here:
> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**622#Example<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=622#Example>
>
> all the best,
> Sebastian
>
>
> Am 09.08.2012 11:59, schrieb Jirka Kosek:
>
>  On 9.8.2012 11:47, Sebastian Hellmann wrote:
>>
>>  you found an interesting point.
>>>
>>> I wrote some notes on the optimization:
>>> http://wiki.nlp2rdf.org/wiki/**ITS2NIF2ITS#Notes_on_optional_**
>>> optimizations<http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS#Notes_on_optional_optimizations>
>>> http://wiki.nlp2rdf.org/index.**php?title=ITS2NIF2ITS&oldid=**
>>> 614#Notes_on_optional_**optimizations<http://wiki.nlp2rdf.org/index.php?title=ITS2NIF2ITS&oldid=614#Notes_on_optional_optimizations>
>>>
>>> I think, it  generally depends on the use case, whether you would
>>> optimize.  Do you think we should specify/limit what optimizations are
>>> possible?
>>> It might be easier to explain implications to help developers,
>>> but leave the implementation under-specified.
>>> Do you think I should remove them from the algorithm description and
>>> move them to a completely different section? Would this help the
>>> structure of the document?
>>>
>> I think that NIF mapping is so unnatural as is that optimization can
>> make it really messy. If the goal of optimization was to create less
>> complex RDF representation with not blank text nodes and trimmed text
>> nodes with a lot of whitespace I can think that easier and workable
>> approach would be to:
>>
>> - remove all whitespace optimization from mapping algorithm
>>
>> - saying that algorithm can produce a lot of "phantom" predicates from
>> excessive whitespace
>>
>> - recommending to normalize whitespace in the input XML/HTML/DOM in
>> order to minimize such phantom predicates
>>
>> This way each user/application can create custom whitespace
>> normalization based on nature of input data and we don't have to care
>> about it.
>>
>> For example for your sample document it is safe (knowing HTML whitespace
>> handling rules) to normalize it to
>>
>> <html><body><h2 translate = "yes" >Welcome to <span
>> its-disambig-ident-ref = "http://dbpedia.org/resource/**Dublin<http://dbpedia.org/resource/Dublin>”
>> translate
>> = "no">Dublin</span> in <b translate="no">Ireland</b>!</**
>> h2></body></html>
>>
>> (Actually one line with no excessive whitespace.)
>>
>> Does this sounds reasonable to my SemWeb-educated friends?
>>
>>                         Jirka
>>
>>
>
> --
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events:
>   * http://sabre2012.infai.org/**mlode <http://sabre2012.infai.org/mlode>(Leipzig, Sept. 23-24-25, 2012)
>   * http://wole2012.eurecom.fr (*Deadline: July 31st 2012*)
> Projects: http://nlp2rdf.org , http://dbpedia.org
> Homepage: http://bis.informatik.uni-**leipzig.de/SebastianHellmann<http://bis.informatik.uni-leipzig.de/SebastianHellmann>
> Research Group: http://aksw.org
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Thursday, 9 August 2012 11:31:18 UTC