Meeting minutes
gautier: Talk of Senthil (Ailaysa, Chennai) - we are taking notes
<gautierchomel> presentmiia
senthil: speak about the concept, then provide a demo, then Q & A
… Senthil Nathan from Ailaysa - AI company - content translation based on AI - taking content in different languages - international book fair in Chennai - introduced products into publsihing - before mainly translation/localization - automatic translations using AI
… concepts: how to develop a responsible content in an AI context - we cannot have walled days - great data rush for training AI systems without knowledge and permission of owners - awareness that quality content is very important for AI - quality data should come from publishers, media companies, research institutes - shifting to being active
negotiators
… Content exclusion of content as training data - in case of use responsible usage + permission needed - in 2024 ppl are actively discussing - should be a fair deal with proper compensation - illegal scraping was a big problem - is coming to an end - much more reduced now
… terms of permission are set by both parties - technical barriers can now be easily implemented - clear legal terms prohibiting use without limits - content watermarking and provenance tracking tools
… to include: fair licensing terms - mandatory source citations in AI output - quality control: selective participation with responsible AI companies - usage tracking: monitoring how content influences AI responses - consent frameworks: granular control over AI uses
… factors: technical, business, regulatory and market dynamics
… AI-specific exclusion protocols (better than robots.txt) - rise of new AI-crawlers (require new blocking mechanisms) - dynamic paywalls and anti-scraping tech - emergence for content-tracking tools
… blockers (NYT, Guardian) vs. partners (Axel Springer with OpenAI) vs. open access (But seeking attribution) vs. wait-and-see
… EU: Ai-Act - US: considering legal framework - courses of copyright offices
… market: growing need for high-quality content - AI is not thinking, algorithmic, not creative - publishers see new revenue streams via partnerships - data brokers like literary agency - syndication rights
… principle of fair monetization - important to track extent of usage and kinds of usage
… from authoring to reading: AI environment is set - book discovery enhanced through LLM recommendation and search systems - going beyond metadata and keywords: asking questions on the contents of the book (e.g. ChaiReader)
… future options: read book in another language such as Tamil thx to automatic translation or as audiobook - in libraries, bookstores, schools use of books may be changed -
… HarperCollins works with MS, also Sage, CUP,
… have to find common ground between publishers and AI companies
Demo Chai Reader: Reading, Chatting and Buying in one portal - multilingual Q&A - buy routine integrated - in future: book recommendations based on search terms - translation of a book into a target language
gautier: when I'm chatting with a book, answers only from book content - LLM only used to prepare a nice answer - not training each book in LLM -
Senthil: completely separated
michalis: concerned that access to content should be fair use - esp. in the US -next months will be critical in legal aspects
senthil: big publishers have great interest - different for small publishers or even authors -
michalis: in education or academic this would be quite useful
senthil: exactly useful to expolore several books in parallel to formulate an answer - we work with EDRLabs to improve on it - ChaiReader still in Beta - working with publishers - can chat with a collection of books, not only one at the same time - impact of "AI on economics" - reasoning capacity - more important than just referring back - great
thing for book
discovery
ivan: aren't you forced to make some sort of ranking between books consumed - need a local ranking for books you have
senthil: possible to rank or categorize dependent on prompting
vishal: the more correct the prompt, the more precise the answer will be - if 3 books have an answer - semantic ranking combined with keyword level ranking - still experimental feature - as Google and Amazon do
ivan: in some cases this is not the best answer - in scholarly usage - ranking by systems outside your bookshop - based on reputation of answers - you use LLM only for niceties of input and output
vishal: reinforcement learning - librarian knows the authors - deepseek uses this feature - integrate human expertise into machine
senthil: good question
<gautierchomel> RSSAgent make minutes