17:03:22 RRSAgent has joined #publishingcg 17:03:27 logging to https://www.w3.org/2025/07/17-publishingcg-irc 17:03:37 Gautier: part of a series with AI in RS 17:04:13 ... today Daniel Weck, lead developer of Thorium (reading system from EDRLab) 17:04:20 George has joined #publishingcg 17:04:26 present+ wolfgang 17:04:36 present+ 17:05:07 DanielWeck has joined #publishingcg 17:05:14 present+ 17:06:33 George has joined #publishingcg 17:10:20 DanielWeck: AI-generated content descriptions in Thorium - unreleased experiment, thus work in progress - lays foundations for user experience we want to follow - demo two page spread with image of Jules Verne with description - inspect the HTML 17:13:33 ... image has empty @alt and empty title - lnik takes me to an appendix where the image is displayed - but here the @alt is not empty - it says "linked image" 17:16:56 ... would be great if we hade some help from AI - Thorium has a zoom feature - leads to room for textual description - can choose an LLM to generate description - 17:23:24 ... decide between "short" or "extended" description - I can edit the system prompt, but there is a default system prompt - Gemini very good at discovering ppl in images - extended description would have two paragraphs - advanced view of system prompt in JSON format - additional information in prompt in this format 17:26:44 ... select text fron answer - run a search on the Internet to get more information 17:29:20 ... new work from W3C WG - complex image (bar chart) - link to extended description - rich text that is not part of a short description - plan to create a modal interface where you might consult AI 17:30:09 present+ 17:30:22 present+ vladimir 17:30:40 present+ james 17:31:24 present+ ori 17:31:45 (1) user see descriptions (2) chat with AI (3) do further research on the web - familiar chat UI - modal interlay - default system prompt which sets useful boundaries - we also feed in metadata 17:32:48 ... request short or extended descriptions easily - just "one shot" - we need to inform the user that an AI will hallucinate 17:35:02 ... MCP Model Context Protocol for tool calls out of scope - RAG also not implemented - beyond basic embedding - also not local LLMs - response times OK, but not the quality - Gemini better for image descriptions 17:36:57 ... you may give metadata as embedded context for the prompt - advanced user may edit the systemprompt and might remove blatantly irrelevant metadata 17:37:15 q+ 17:38:08 George: publishers are not happy with AI getting trained with their copyrighted materials. Any protections? 17:40:53 Daniel: All conversations in the chat with AI, are used for training if I don't pay for using the LLM - If I were to pay for the service, the data remain private -always depends on the terms and conditions of a particular model - for publishers TDM reservation protocol allows to opt in or out - Thorium would respect this 17:41:21 .., any ideas how that could be solved? 17:41:52 George: if image is not used for training, publishers are OK with that. 17:42:43 Daniel: Thorium would have to police the use of data by an LLM - Would Thorium have to blacklist some models? 17:45:03 James: Publishers are very twitchy about copyrighted material - with an epub you can mark the TDM or place a couple of metatags - 6 or 7 different ways to signal that training ist not accepted - training is an issue - 17:45:16 ... on-device LLMS would be helpful 17:47:38 Daniel: Publishers don't want RS to create friction - with images copies and text scanning, it's so easy to be done (e.g. on a Mac) - we have to send the image to the AI, but can't control what the LLM wil be doing with it 17:48:05 James: could a publisher embed a token ? 17:50:14 Daniel: agreement with Mistral - access token for EDRLab - could run on a Thorium server - but Thorium doesn't transport the key itself- but uses it in accessing the LLM to answer users' requests 17:52:12 Ori: if you using the user's API key, you can't know what the AI does with it - Gemini say they don't use it for training, no idea what OpenAI does - using another key is problematic 17:52:59 ... Gemini doesn't feed requests for image descriptions to humans 17:54:42 Daniel: main stumbling block: potential of legal issues - we could enable it in nightly build, but not in production builds 17:55:07 George: JPEG has metadata in it - is that transmitted? 17:56:32 Daniel: in FB Messenger or Signal I check that GPS data is erased before I share pictures - with AI once the image payload is transmitted - it will be readable for AI 17:56:48 Ori: guess it will not ingest geographical data 17:58:07 Daniel: most LLMs have restrictions - in Thorium we don't create requests for LLMs manually - we feed image data into an abstraction interface 17:59:05 ... abstraction layer is fully client-side - it allows us to speak Javascript - 17:59:35 Ori: had to reduce size of image - don't send EXIf or geographical data 18:00:31 Daniel: images processed before sending them on the wire - reduction in size before sending 18:03:39 gautier: WCAG criteria; description must offer same service as the image - a way to fulfil this - focus on authored description (if available) - real success for WCAG requirement 18:03:53 rrsagent, draft minutes 18:03:54 I have made the request to generate https://www.w3.org/2025/07/17-publishingcg-minutes.html wolfgang 18:04:10 rrsagent, make logs public