Re: ISSUE-137: Proposal to add sh:langShape

As resolved today, I have integrated sh:languageIn into the spec:

http://w3c.github.io/data-shapes/shacl/#LanguageInConstraintComponent

Any glitches in there?

Holger



On 13/09/2016 14:34, Holger Knublauch wrote:
>
>
> On 12/09/2016 20:07, Andy Seaborne wrote:
>>
>>
>> On 12/09/16 00:30, Holger Knublauch wrote:
>>> Taking this and Andy's input into consideration, maybe sh:langShape is
>>> an overkill and all we really need is a new parameter such as
>>> sh:languageIn which takes a node and, if it has a language tag, 
>>> verifies
>>> that it matches one of the provided languages following the SPARQL
>>> langMatches semantics. For example:
>>>
>>> ex:MyShape
>>>     a sh:Shape ;
>>>     sh:property [
>>>         sh:predicate skos:prefLabel ;
>>>         sh:or ( [ sh:datatype xsd:string ] [ sh:datatype rdf:langString
>>> ] ) ;
>>>         sh:langMatches ( "en" "fr" "de" ) .
>>
>> A note: this is a slightly different operation to sparql:langMatches 
>> which takes a language tag and a language match, not a literal and 
>> language match.  Some people prefer that local names are not reused 
>> to mean slightly different things where possible.
>
> Oops, yes. I intended to use sh:languageIn but forgot to update the 
> example. So here it is again:
>
> ex:MyShape
>     a sh:Shape ;
>     sh:property [
>         sh:predicate skos:prefLabel ;
>         sh:or ( [ sh:datatype xsd:string ] [ sh:datatype 
> rdf:langString ] ) ;
>         sh:languageIn ( "en" "fr" "de" ) ;
> ] .
>
>
>>
>>>     ] .
>>>
>>> langMatches could be for just a single language, but having a list is
>>> shorter for this (apparently) common case in multi-lingual countries
>>> such as Belgium. I didn't know the RFC supports wildcards - this should
>>> hopefully flexible enough to cover all given use cases, but others may
>>> need to confirm.
>>>
>>> Regards,
>>> Holger
>>>
>>> PS: Andy, I prefer sh:datatype rdf:langString because it would be one
>>> thing less to check (by form builders etc), and furthermore I believe
>>> the semantics of sh:langMatches needs to be that it only does something
>>> if the literal really has a language tag. Otherwise it would be harder
>>> to express mixed cases of either string or langString (which I believe
>>> is quite common).
>>
>> Consider
>>
>>      sh:property [
>>          sh:predicate skos:prefLabel ;
>>          sh:langMatches ( "en" "fr" "de" ) .
>>      ] .
>>
>> with data:
>>
>>     <uri> skos:prefLabel 123 .
>>
>> which is a violation when sh:langMatches requires the language tag 
>> but passes if sh:langMatches only triggers if there is a language tag 
>> at all.  I find the latter a strange natural interpretation of the 
>> shape.
>>
>> String or language match would be:
>>
>>   sh:or ( [ sh:datatype xsd:string ]
>>           [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
>>
>> There is no need to test for [ sh:datatype rdf:langString ] as well 
>> as it is implicit in having any language tag so it happens when 
>> sh:langMatches requires the language tag.
>>
>> For error checking:
>>
>> This data:
>>
>>    "abcde"^^rdf:langString
>>
>> is malformed and not in the value-space of rdf:langString; it is like 
>> writing
>>
>>    "abcde"^^xsd:integer
>>
>> It does have the datatype - it does not represent a legal value.
>>
>>
>> Another way: make language match "" mean xsd:string. (c.f XML where 
>> xml:lang="" means no language tag althouhg with slightly different 
>> implications).
>>
>>   sh:property [
>>      sh:or ( [ sh:datatype xsd:string ]
>>              [ sh:langMatches ( "en" "fr" "de" ) ] ) ;
>>    ] .
>>
>> vs
>>
>>    sh:property [
>>      sh:predicate skos:prefLabel ;
>>      sh:langMatches ("" "en" "fr" "de" ) .
>>    ] .
>
> So the change you seem to be advocating is to make sh:languageIn 
> produce violations if the value node is not a literal, or a literal 
> that does not have any language tag. As you point out, this would lead 
> to situations in which the sh:datatype rdfs:langString can be omitted 
> in an sh:or. The meaning of sh:datatype would not change, and people 
> can still state sh:datatype rdf:langString for the (common) case in 
> which any language is permitted. I believe I would be OK with that 
> interpretation.
>
> Here is a SPARQL ASK validator query that is passing the English and 
> Francais cases below:
>
>
>         ASK {
>             BIND (lang($value) AS ?valueLang) .
>             FILTER (bound(?valueLang) && EXISTS {
>                 GRAPH $shapesGraph {
>                     $languageIn (rdf:rest*)/rdf:first ?lang .
>                     FILTER (langMatches(?valueLang, ?lang))
>                 } } )
>         }
>
>
> ex:TestShape
>   rdf:type sh:Shape ;
>   rdfs:label "Test shape" ;
>   sh:languageIn (
>       "en"
>       "fr"
>     ) ;
>   sh:targetNode "English"@en ;
>   sh:targetNode "Francais"@fr ;
>   sh:targetNode rdfs:Resource ;     # Fails
>   sh:targetNode "Deutsch"@de ;    # Fails
>   sh:targetNode "Plain String" ;      # Fails
> .
>
> Holger
>

Received on Thursday, 15 September 2016 00:26:04 UTC