17:03:22 <RRSAgent> RRSAgent has joined #publishingcg
17:03:27 <RRSAgent> logging to https://www.w3.org/2025/07/17-publishingcg-irc
17:03:37 <wolfgang> Gautier: part of a series with AI in RS
17:04:13 <wolfgang> ... today Daniel Weck, lead developer of Thorium (reading system from EDRLab)
17:04:20 <George> George has joined #publishingcg
17:04:26 <wolfgang> present+ wolfgang
17:04:36 <George> present+
17:05:07 <DanielWeck> DanielWeck has joined #publishingcg
17:05:14 <DanielWeck> present+
17:06:33 <George> George has joined #publishingcg
17:10:20 <wolfgang> DanielWeck: AI-generated content descriptions in Thorium - unreleased experiment, thus work in progress - lays foundations for user experience  we want to follow - demo two page spread with image of Jules Verne with description - inspect the HTML
17:13:33 <wolfgang> ... image has empty @alt and empty title - lnik takes me to an appendix where the image is displayed - but here the @alt is not empty - it says "linked image"
17:16:56 <wolfgang> ... would be great if we hade some help from AI - Thorium has a zoom feature - leads to room for textual description - can choose an LLM to generate description -
17:23:24 <wolfgang> ... decide between "short" or "extended" description - I can edit the system prompt, but there is a default system prompt - Gemini very good at discovering ppl in images - extended description would have two paragraphs - advanced view of system prompt in JSON format - additional information in prompt in this format
17:26:44 <wolfgang> ... select text fron answer - run a search on the Internet to get more information
17:29:20 <wolfgang> ... new work from W3C WG - complex image (bar chart) - link to extended description - rich text that is not part of a short description - plan to create a modal interface where you might consult AI
17:30:09 <gautierchomel> present+
17:30:22 <gautierchomel> present+ vladimir
17:30:40 <gautierchomel> present+ james
17:31:24 <gautierchomel> present+ ori
17:31:45 <wolfgang> (1) user see descriptions (2) chat with AI (3) do further research on the web - familiar chat UI - modal interlay - default system prompt which sets useful boundaries - we also feed in metadata
17:32:48 <wolfgang> ... request short or extended descriptions easily - just "one shot" - we need to inform the user that an AI will hallucinate
17:35:02 <wolfgang> ... MCP Model Context Protocol for tool calls out of scope - RAG also not implemented - beyond basic embedding - also not local LLMs - response times OK, but not the quality - Gemini better for image descriptions
17:36:57 <wolfgang> ... you may give metadata as embedded context for the prompt - advanced user may edit the systemprompt and might remove blatantly irrelevant metadata
17:37:15 <George> q+
17:38:08 <wolfgang> George: publishers are not happy with AI getting trained with their copyrighted materials. Any protections?
17:40:53 <wolfgang> Daniel: All conversations in the chat with AI, are used for training if I don't pay for using the LLM - If I were to pay for the service, the data remain private -always depends on the terms and conditions of a particular model - for publishers TDM reservation protocol allows to opt in or out - Thorium would respect this
17:41:21 <wolfgang> .., any ideas how that could be solved?
17:41:52 <wolfgang> George: if image is not used for training, publishers are OK with that.
17:42:43 <wolfgang> Daniel: Thorium would have to police the use of data by an LLM - Would Thorium have to blacklist some models?
17:45:03 <wolfgang> James: Publishers are very twitchy about copyrighted material - with an epub you can mark the TDM or place a couple of metatags - 6 or 7 different ways to signal that training ist not accepted - training is an issue -
17:45:16 <wolfgang> ... on-device LLMS would be helpful
17:47:38 <wolfgang> Daniel: Publishers don't want RS to create friction - with images copies and text scanning, it's so easy to be done (e.g. on a Mac) - we have to send the image to the AI, but can't control what the LLM wil be doing with it
17:48:05 <wolfgang> James: could a publisher embed a token ?
17:50:14 <wolfgang> Daniel: agreement with Mistral - access token for EDRLab - could run on a Thorium server - but Thorium doesn't transport the key itself- but uses it in accessing the LLM to answer users' requests
17:52:12 <wolfgang> Ori: if you using the user's API key, you can't know what the AI does with it - Gemini say they don't use it for training, no idea what OpenAI does - using another key is problematic
17:52:59 <wolfgang> ... Gemini doesn't feed requests for image descriptions to humans
17:54:42 <wolfgang> Daniel: main stumbling block: potential of legal issues - we could enable it in nightly build, but not in production builds
17:55:07 <wolfgang> George: JPEG has metadata in it - is that transmitted?
17:56:32 <wolfgang> Daniel: in FB Messenger or Signal I check that GPS data is erased before I share pictures - with AI once the image payload is transmitted - it will be readable for AI
17:56:48 <wolfgang> Ori: guess it will not ingest geographical data
17:58:07 <wolfgang> Daniel: most LLMs have restrictions - in Thorium we don't create requests for LLMs manually - we feed image data into an abstraction interface
17:59:05 <wolfgang> ... abstraction layer is fully client-side - it allows us to speak Javascript -
17:59:35 <wolfgang> Ori: had to reduce size of image - don't send EXIf or geographical data
18:00:31 <wolfgang> Daniel: images processed before sending them on the wire - reduction in size before sending
18:03:39 <wolfgang> gautier: WCAG criteria; description must offer same service as the image - a way to fulfil this - focus on authored description (if available) - real success for WCAG requirement
18:03:53 <wolfgang> rrsagent, draft minutes
18:03:54 <RRSAgent> I have made the request to generate https://www.w3.org/2025/07/17-publishingcg-minutes.html wolfgang
18:04:10 <wolfgang> rrsagent, make logs public