17:05:00 <RRSAgent> RRSAgent has joined #publishingcg
17:05:04 <RRSAgent> logging to https://www.w3.org/2025/05/15-publishingcg-irc
17:05:45 <wolfgang> scribe: wolfgang
17:06:21 <wolfgang> gautier: Talk of Senthil (Ailaysa, Chennai) - we are taking notes
17:06:46 <gautierchomel> presentmiia
17:06:51 <gautierchomel> present+ miia
17:07:06 <wolfgang> senthil: speak about the concept, then provide a demo, then Q & A
17:09:07 <wolfgang> ... Senthil Nathan from Ailaysa - AI company - content translation based on AI - taking content in different languages - international book fair in Chennai - introduced products into publsihing - before mainly translation/localization - automatic translations using AI
17:13:06 <wolfgang> ... concepts: how to develop a responsible content in an AI context - we cannot have walled days - great data rush for training AI systems without knowledge and permission of owners - awareness that quality content is very important for AI - quality data should come from publishers, media companies, research institutes - shifting to being active
17:13:06 <wolfgang> negotiators
17:15:45 <wolfgang> ... Content exclusion of content as training data - in case of use responsible usage + permission needed - in 2024 ppl are actively discussing - should be a fair deal with proper compensation - illegal scraping was a big problem - is coming to an end - much more reduced now
17:17:55 <wolfgang> ... terms of permission are set by both parties - technical barriers can now be easily implemented - clear legal terms prohibiting use without limits - content watermarking and provenance tracking tools
17:20:31 <wolfgang> ... to include: fair licensing terms - mandatory source citations in AI output - quality control: selective participation with responsible AI companies - usage tracking: monitoring how content influences AI responses - consent frameworks: granular control over AI uses
17:21:16 <wolfgang> ... factors: technical, business, regulatory and market dynamics
17:23:03 <Michalis> Michalis has joined #publishingcg
17:23:19 <wolfgang> ... AI-specific exclusion protocols (better than robots.txt) - rise of new AI-crawlers (require new blocking mechanisms) - dynamic paywalls and anti-scraping tech - emergence for content-tracking tools
17:24:55 <wolfgang> ... blockers (NYT, Guardian) vs. partners (Axel Springer with OpenAI) vs. open access (But seeking attribution) vs. wait-and-see
17:26:07 <wolfgang> ... EU: Ai-Act - US: considering legal framework - courses of copyright offices
17:28:23 <wolfgang> ... market: growing need for high-quality content - AI is not thinking, algorithmic, not creative - publishers see new revenue streams via partnerships - data brokers like literary agency - syndication rights
17:29:22 <wolfgang> ... principle of fair monetization - important to track extent of usage and kinds of usage
17:31:45 <wolfgang> ... from authoring to reading: AI environment is set - book discovery enhanced through LLM recommendation and search systems - going beyond metadata and keywords: asking questions on the contents of the book (e.g. ChaiReader)
17:33:37 <wolfgang> ... future options: read book in another language such as Tamil thx to automatic translation or as audiobook - in libraries, bookstores, schools use of books may be changed -
17:34:58 <wolfgang> ... HarperCollins works with MS, also Sage, CUP,
17:34:58 <wolfgang> ... have to find common ground between publishers and AI companies
17:46:50 <wolfgang> Demo Chai Reader: Reading, Chatting and Buying in one portal - multilingual Q&A - buy routine integrated - in future: book recommendations based on search terms - translation of a book into a target language
17:47:10 <Michalis> q+
17:48:39 <wolfgang> gautier: when I'm chatting with a book, answers only from book content - LLM only used to prepare a nice answer - not training each book in LLM -
17:48:59 <wolfgang> Senthil: completely separated
17:49:24 <wolfgang> ack Michalis
17:50:36 <wolfgang> michalis: concerned that access to content should be fair use - esp. in the US -next months will be critical in legal aspects
17:51:18 <wolfgang> senthil: big publishers have great interest - different for small publishers or even authors -
17:51:47 <wolfgang> michalis: in education or academic this would be quite useful
17:54:57 <wolfgang> senthil: exactly useful to expolore several books in parallel to formulate an answer - we work with EDRLabs to improve on it - ChaiReader still in Beta - working with publishers - can chat with a collection of books, not only one at the same time - impact of "AI on economics" - reasoning capacity - more important than just referring back - great
17:54:58 <wolfgang> thing for book
17:55:03 <wolfgang>  discovery
17:55:47 <wolfgang> ivan: aren't you forced to make some sort of ranking between books consumed - need a local ranking for books you have
17:56:08 <wolfgang> senthil: possible to rank or categorize dependent on prompting
17:57:37 <wolfgang> vishal: the more correct the prompt, the more precise the answer will be - if 3 books have an answer - semantic ranking combined with keyword level ranking - still experimental feature - as Google and Amazon do
17:58:56 <wolfgang> ivan: in some cases this is not the best answer - in scholarly usage - ranking by systems outside your bookshop - based on reputation of answers - you use LLM only for niceties of input and output
18:00:23 <wolfgang> vishal: reinforcement learning - librarian knows the authors - deepseek uses this feature - integrate human expertise into machine
18:00:42 <wolfgang> senthil: good question
18:01:59 <gautierchomel> RSSAgent make minutes
18:02:42 <wolfgang> rrsagent, draft minutes
18:02:43 <RRSAgent> I have made the request to generate https://www.w3.org/2025/05/15-publishingcg-minutes.html wolfgang
18:03:42 <gautierchomel> Zakimrssagent, make minutes
18:04:24 <gautierchomel> rssagent, make minutes
18:04:41 <wolfgang> rrsagent, make logs public
18:05:58 <gautierchomel> rssagent, set meeting title to publishing community group
18:06:46 <wolfgang> rssagent, set meeting title to "publishing community group"