Avoid that collections break relationships

From Hydra Community Group

Disclaimer: The various proposals used both the term List and Collection. For consistency, the summary in this document has been harmonized to use the term Collection. It is still undecided whether Collection, List, or another term will be used in the final design.

Problem description

Let's assume we want to build a Web API that exposes information about persons and their friends. Using schema.org, the data would look somewhat like this:

</alice> a schema:Person ;
          schema:knows </bob>, ... </zorro> .

respectively

{
  "@id": "/alice",
  "@type": "Person",
  "knows": [ "/bob", ... "/zorro" ]
}

All this information would be available in the document at /alice. Depending on the number of friends, the document however may grow too large. Web APIs typically solve that by introducing an intermediary (paged) resource such as /alice/friends/. In Hydra we have collections to facilitate that:

</alice> a schema:Person ;
    schema:knows </alice/friends/> .
</alice/friends/> a hydra:Collection ;
    hydra:member </bob>, ... </zorro> .

respectively

{
  "@id": "/alice",
  "@type": "Person",
  "knows": "/alice/friends/"
}
{
  "@id": "/alice/friends/",
  "@type": "Collection",
  "member": [ "/bob", ... "/zorro" ]
}


This works, but has two problems:

  1. it breaks the /alice --[knows]--> /bob relationship
  2. it states that /alice --[knows]--> /alice/friends

While 1) can easily be fixed, 2) is much trickier--especially if we consider cases that don't use schema.org with its "weak semantics" but a vocabulary that uses rdfs:range, such as FOAF. In that case, the statement

</alice> foaf:knows </alice/friends/> .

and the fact that

foaf:knows rdfs:range foaf:Person .

would yield to the wrong inference that /alice/friends is a foaf:Person.


Proposed solutions

There have been a lot of discussions both on public-hydra as well as wider lists such as public-vocabs (schema.org's mailing list). Below is a summary of the proposed solutions.


Link to the collection via a generic property

</alice> a schema:Person ;
    rdfs:seeAlso </alice/friends/> .
</alice/friends/> a hydra:Collection ;
    foaf:topic schema:knows .

respectively

{
  "@id": "/alice",
  "@type": "Person",
  "seeAlso": {
    "@id": "/alice/friends/",
    "@type": "Collection",
    "foaf:topic": "schema:knows"
  }
}

or with different terms such as :describedBy, schema:about or VoID:

</alice/friends/> a void:Linkset;    # or e.g. :LinkPage
    void:linkPredicate :knows .

(VoID is typically used for different datasets whereas the use cases at hand typically deal with a single dataset)

It would of course also be possible to introduce a more explicit property for this "indirection". Something like:

</alice> hydra:hasCollection <alice/friends> .
</alice/friends/> a hydra:Collection ;
    foaf:topic schema:knows .

very similar to

{
  "@id": "/alice",
  "hasCollection": {
    "@id": "/alice/friends",
    "@type": "Collection",
    "manages": {
      "property": "schema:knows",
      "subject": "/alice"
    }
  }
}

or

{
  "@id": "/alice",
  "hasRelationshipIndirection": {
    "property": "schema:knows",
    "resource": "/alice/friends"
  }
}

or (same model, different terms) hasList, hasMany, relatesTo, relatesToMany:

{
  "@id": "/alice",
  "hasList": {
    "property": "schema:knows",
    "object": "/alice/friends"
  }
}


Pros:

  • clean modeling, explicit semantics (especially with more explicit properties than seeAlso)

Cons:

  • bigger payloads as the collection has not only to be referenced but also partly described to give clients enough information to decide whether to dereference it or not
  • decreased performance, instead of a direct lookup, a query is needed to find the right collection


Furthermore, in order to indicate to a client that it should look into rdfs:seeAlso etc., the API documentation (SupportedProperty or Constraint) could be augmented to express that

property: schema:knows
managedByCollection: true   -- or --   managedIndirectly: true

respectively

{
  "property": "schema:knows",
  "managedByCollection": true
}


Pros:

  • increased efficiency

Cons:

  • overhead / added complexity due to a larger vocabulary


To "reduce the cost of the indirection", also the following solution was discussed but it has the same problems as using the property directly on /alice, namely that the wrong inference would be made that /alice/friends/ is a person:

</alice> hydra:hasRelationshipIndirector [
    foaf:knows </alice/friends/>
] .

respectively

{
  "@id": "/alice",
  "hasRelationshipIndirector": {
    "foaf:knows": "/alice/friends/"
  }
}

Use of a blank node collection member to indirectly point to the collection

</alice> a foaf:Person;
    schema:knows [ hydra:isMemberOf </alice/friends/> ]

respectively

{
  "@id" "alice",
  "schema:knows": {
    "isMemberOf": "alice/friends/"
  }
}


Pros:

  • nothing to add to the vocabulary

Cons:

  • difficult to understand
  • introduces an undesired triple that has to be filtered out when ingesting the data


Use of a separate property to reference collections

:knowsCollection :collectionPropertyOf schema:knows .

where :collectionPropertyOf has the semantic condition

?collectionProperty :collectionPropertyOf ?property
?subject ?collectionProperty ?collection
?collection hydra:member ?member
imply
?subject ?property ?member

The direction could of course also be turned around:

schema:knows :collectionProperty :knowsCollection


Pros:

  • explicit semantics

Cons:

  • doubles the size of the vocabulary
  • perhaps difficult to find the "collection property"/interpret it as such


There has also been a suggestions to either use plural properties names (colleagues vs. colleague) or specific URL templates (/{property}/collection) to reference the collection instead of its members.

</alice> a schema:Person;
    schema:colleagues </alice/friends/> ;
    schema:colleague </bob>, ...
</alice/friends> a hydra:Collection;
    hydra:member </bob>, ...

or

  </alice> schema:knows/collection </alice/friends>


Pros:

  • simple

Cons:

  • error prone (plural vs. singular)
  • violates URI opacity principle (template)
  • effectively doubles the size of the vocabulary


Use of an operation with an explicitly defined target

</alice> a foaf:Person
 hydra:supportedOperation [
   a GetRelatedCollectionOperation;
   hydra:title "Get known relations";
   hydra:method "GET";
   hydra:uri </alice/friends/>;
   hydra:property schema:knows;
 ] .


Pros:

  • more or less explicit semantics

Cons:

  • operations are generally used for state-changing (unsafe) interactions, navigation isn't one
  • the introduction of hydra:uri might motivate people to hardcode the information into their clients