Minutes of the TPAC 2021 CG meeting
Posted on:This meeting was held on Thursday 26 October 2021.
After a description of the work done so far (PDF available at http://www.edrlab.org/public/slides/tdmrep/TDMRep%20presentation.pdf), participants to the CG demonstrated the prototypes developed so far.
Prototypes:
- Jean-Baptiste de Vathaire (Cairn.info) described the integration of TDM properties on the Cairn website, using both http headers and html metadata (both for the sake of precaution).
- Robin Le Marois (Seraphin.legal) showed the results of the TDMRep compliant scrapping of the Cairn.info website by a scrapper they developed.
- Claudio Tubertini (Almalibri) introduced the Python utility he developed as a base for scrappers. You can find it at https://github.com/claudiotubertini/TDMproposal.
The consensus is that the specification corresponds to the requirements and is easy to implement, which corresponds to its main goals.
One remaining issue is the de-duplication of requests for licenses from TDM Actors. If many resources a linked to the same policy, on which ground should a TDM Actor send one request, or multiple requests? Not every Publisher will set a `target` property in its TDM Policy. It seems to the participant that the URL of the TDM Policy should be used as an identifier and therefore a way to send a generic email stating a “request for license to mine a set of resources pointing at the TDM Policy with URL xxxx”, optionally with the list of resource URLs the TDM Actors is willing to mine. This need to be clarified in a best practices guide.
Progress:
The CG will stay open for feedbacks about the specification until the end of the year 2021, will update it when needed, and will then release the specification as Final Report.
Communication:
The co-chairs and the FEP (Federation of European Publishers) will turn to the European Commission for planning demos or some other representation of the work done so far, and by this raise awareness about the project.
The CCC (Copyright Clearance Center) reports that WIPO is holding webinars every 2 weeks on copyright infrastructure and ne developments. It is a good avenue for promoting our work.
We hope that Press Publishers will size the project, as they are among the organizations which will benefit most from the success of the solution.
The group feels that we should create a landing page introducing the project (and therefore certainly acquire a specific domain name), from which multimedia communication can be developed, like an overview of the project and How To for the 3 techniques we propose.
An incentive for TDM Actors could be to create a label (“Clean scrapping”?) and a logo, for those who adopt the solution.
We’ll have to discuss the governance of the project after the Final Report of the CG is released. Should it be ex-nihilo, or based on an existing organization?
Pingback: L’Union européenne a favorisé l’IA dès 2019 en rendant les contenus numérisés, notamment en ligne, récupérables et exploitables à des fins de fouille et d’analyse (2/2) – Boris Perchat