Skip to toolbar

Community & Business Groups

Minutes, April 06, 2021

Participants: Giulia, Claudio, Jean-Baptiste, Brendan, Rama  

Agenda: discuss the “file at the origin” and “html meta” proposals. Discuss if the three current proposals can be seen a complementary. 

Laurent, regarding the issue #10 open by Claudio: splitting html pages into identified sections is technically doable (via id attributes); associating different sections with different TDM rules is technically doable also (via XML idref attributes). But such a solution would not be html-friendly and would be complex to handle for both publishers and TDM Agents. We may think about it for a version 2 of the specification, but introducing this now would be really dangerous imho. 

Claudio: I understand and agree with the complexity for a version 1. 

Giulia : should http header properties be registered by a standards body? 

Brendan: if we choose to name these properties with a “X-” as a prefix, we don’t need to have any registration. If not we should have our properties registered by IANA to make the solution more “standard”. 

Laurent: regarding the “file” proposal, we have to choose between regexps and something simplistic like the robots solution. 

Jean-Baptiste : the solution should be as simple as possible. Even JS compatible regexps are so wide that they can be badly implemented. Implementing the robots.txt solution (using * and $ only) is more robust. 

Rama and Claudio agree. 

Laurent: ok I’ll change the “file” proposal accordingly. We’ll have to study the use of * and $ in details. 

Laurent : note that in the “file” proposal, if a file does not match any rule, it is driven by default by the exception allowed by the Article 4. 

Giulia, Jean-Baptiste, Rama : please add an example with 2 files treated specifically in a directory, an example with a $, an example of the difference or processing between matching files and directories.

Giulia : in robots meta (which specifies no index, no follow), does the rule apply only the html content or also to the resources referenced in the html document? 

Brendan, Laurent: not sure there is a definite robots specification for that. We’ll have to check. 

Brendan: See noimageindex in Google documentation at

Laurent: warning, this is Google specific information, aimed at avoiding ambiguities. Still not sure that the robots spec gives a definitive answer. 

Laurent: Therefore, do you think that the 3 solutions proposed so far are complementary and all 3 can be implemented by every TDM Agent in the world, or should we already make choices? 

Giulia: the two first solutions are generic (any media): pros and cons are tied to preferences of technical providers. The third one is vertical (html only), therefore a bit different, its interest is that a publisher can embed the information without any implication of a technical provider. 

Laurent: as we have a vertical solution now (for html), other vertical solutions may be proposed. A warning: each time we add an alternative solution, we force EVERY TDM Agent to implement it.

Participants: all agree that the 3 current solutions are useful and complementary, and that they are so simple that every TDM Agent can implement the 3 solutions. We’ll need an order of priority in case of conflict. 

Giulia : should we also express the rightsholder name? the date of the declaration? 

Claudio : if I get a “don’t mine” instruction, I’d like to know who said that and why. Therefore such metadata would be useful.

Laurent: we have a charter to provide instructions to machines for TDM purpose; semantic information is out of scope here. If publishers want to provider metadata about their content, there are standard solutions for that (json-ld + on html content,  IPTC for images, XML for PDF and media content). 

Next steps: 

a/ We’ll see if other proposals come to the table. 

b/ We will now start discussing the licence format: what publishers need to declare and which existing initiatives can be reused? could we define standard licenses ready for use? The experience of the IPTC with RightsML (based on ODRL2), EPC / Copyright Hub (also using ODRL2), STM Association, Crossref TDM service, ARDITO will be really useful for that. 

Leave a Reply

Your email address will not be published. Required fields are marked *

Before you comment here, note that this forum is moderated and your IP address is sent to Akismet, the plugin we use to mitigate spam comments.