12:57:36 RRSAgent has joined #pmwg 12:57:41 logging to https://www.w3.org/2026/05/14-pmwg-irc 12:57:41 RRSAgent, make logs Public 12:57:42 Meeting: Publishing Maintenance Working Group 12:57:43 present+ 12:58:03 ivan has changed the topic to: Meeting Details 2026-05-14: https://lists.w3.org/Archives/Public/public-pm-wg/2026May/0012.html 12:58:04 Chair: wendy 12:58:04 Meeting: Publishing Maintenance Working Group Telco 12:58:04 Agenda: https://lists.w3.org/Archives/Public/public-pm-wg/2026May/0012.html 12:58:05 regrets+ brady 12:58:48 present+ 12:58:57 present+ sueneu 12:59:23 present+ avneeshsingh 12:59:31 MasakazuKitahara has joined #pmwg 12:59:54 present+ 13:00:02 AvneeshSingh has joined #pmwg 13:00:06 present+ 13:00:16 present+ 13:00:17 kimberg has joined #pmwg 13:00:21 present+ 13:00:32 present+ toshiakikoike 13:00:41 sueneu has joined #pmwg 13:00:48 present+ 13:01:01 DaleRogers has joined #pmwg 13:01:25 present+ DaleRogers 13:01:29 chair: sueneu 13:01:45 scribe+ 13:03:23 Topic: AI and its impact on our work 13:03:50 https://github.com/w3c/webai-roadmap/issues/30 13:03:51 sueneu: Today our discussion is about AI and its impact on our work 13:04:20 ... to sum it up, there is a cross-W3C effort to address AI, they are looking for places to collaborate and work together 13:04:35 ... we've been asked how current AI technologies may impact our work 13:04:40 ... [reading from issue] 13:05:01 ... given that we only have an hour, we need to stay focused 13:05:09 ... I thought we could start by throwing out ideas 13:05:17 ... by typing them directly into IRC 13:05:37 ... people can contribute in different ways, type your own comment, then we can make sure everyone is heard 13:05:52 CharlesL has joined #pmwg 13:05:55 ... then we can look at them, pick some of our favourites, and discuss those to share with in the issue 13:05:58 present+ 13:06:28 sueneu: I'll start, mention the trend and a short description of the impact 13:07:13 ivan: automatic on-line translations 13:07:25 @sueneu: improved artificial voices, may need more markup 13:07:46 ivan: read-aloud may need additional markup 13:08:06 DaleRogers: AI assisted EPUB coding that creates validation errors 13:08:11 wendyreid: More metadata properties or values to communicate use of AI within publications so consumers/retailers can make informed decisions 13:08:54 toshiakikoike: content for AI training, prevent unauthorized use 13:09:06 CharlesL: AI generated alt text for images and long descriptions 13:09:27 ivan: metadata to assess whether the author is AI 13:09:42 1. Challenges for the publishing industry (copyright and content protection) 13:09:42 A major ongoing concern in the publishing industry is how to handle or prevent unauthorized use of content for AI training. Legal and ethical concerns around AI-generated content, including potential infringement, remain significant and continue to affect the overall ecosystem. 13:09:42 2. Opportunities for tools and reading systems (accessibility and efficiency) 13:09:42 From the standpoint of authoring tool developers and Reading System (RS) vendors, AI also offers clear benefits that may influence technical priorities: 13:09:42 Authoring tools: Assisting creators by generating or suggesting alternative text for images. 13:09:44 Reading systems: Improving accessibility by enabling text-to-speech or descriptive output even when alt text or metadata is missing. 13:09:44 Development: Accelerating implementation of publishing tools and standards through AI-assisted development. 13:10:10 wendyreid: Facilitating dynamic or personalized layouts in EPUB reading systems through generative UI 13:10:49 sueneu: Anyone else? Let's sum these up 13:12:20 ... we have improved artificial voices in read aloud, validation errors caught with AI-assisted coding, metadata to identify AI, AI generated alt text, copyright and content protection, AI training 13:12:38 ... I'm not sure what opportunities for tools and reading systems, do we mean new capabilities within reading systems 13:12:56 q+ 13:13:01 ack ivan 13:13:31 ivan: What I think for the working group the interesting question is how do these different things affect or potentially affect the EPUB specification per se 13:14:03 ... there are some things that happen that we have no influence over, there are a number of entries here that we may want to put some effort into features that will help improve what AI gives us 13:14:04 One main thing is metadata to indicate that parts of content is AI generated. I think this is clearly in EPUB domain 13:14:30 ... for example, in translations, a future where the RS automatically translates the text, an english book translated into French 13:14:45 ... is there some additional things we need to put in EPUB that would improve translations? 13:15:13 ... there is already a specification called ITS we allow in EPUB 3.4 to help translation, but that may not be good enough, it may make the input unwieldy 13:15:31 ... do you need to flag what should and should not be translated, or acronyms or idioms 13:15:49 sueneu: What form do you think would be most helpful to the IG? 13:16:11 ivan: The question is for each WG, if all the AI tools were already widespread, what would our specification look like? 13:16:36 ... there are things we may not care about because its solved, or where we need to offer assistance to improve the experience 13:16:38 q+ 13:17:44 scribe+ sueneu 13:17:27 wendyreid: an example might be how do we structure document for AI content? 13:17:54 …for instance in FXL text and images have to be properly specified and structured 13:18:11 …to support on the fly generative content for accessibility 13:18:27 q? 13:18:47 ack wendyreid 13:18:57 sueneu: Can I ask another question? You know W3C at large 13:19:14 q+ 13:19:16 ... specifically for AI generated voices and artificial speech, not just EPUB but others 13:19:28 ... we should include it in our comments even if it isn't our problem 13:19:33 ivan: Exactly 13:19:41 ack AvneeshSingh 13:20:08 AvneeshSingh: APA already has a taskforce for TTS and markup for pronunciations, there's others concious of the increasing use of TTS engines 13:20:22 ... of course in DAISY we're using these engines because the quality is so much better 13:20:37 q+ 13:20:40 ... at TPAC we'll be having a discussion with them, especially for the publishing industry and markup 13:20:44 ack ivan 13:21:12 ivan: That gives me a question, we know we have the PLS, is it in line with such an approach to create a lexicon for translations? 13:21:26 ... well known acronyms or terms that should not be translated for a specific publication 13:21:52 AvneeshSingh: I think so, it would help, like "read" (reed) or "read" (red) 13:22:07 ... any technology to reduce ambiguity, AI needing to do less sense-making 13:22:13 ... anything to assist 13:22:22 sueneu: That sounds amazing, let's put that aside for other issues 13:22:40 ... we talked about read aloud, document structure, metadata 13:22:43 ... AI coding errors 13:23:24 ... we'll vote on the most popular topics, discuss those, and if we go over time we can find additional ways to continue the discussion 13:23:24 - AI coding errors 13:23:58 - Metadata to identify AI content 13:24:06 +1 13:24:06 +1 13:24:06 +1 13:24:08 +1 13:24:12 +1 13:24:18 +1 13:24:25 +1 13:24:30 - AI generated alt text for images 13:24:33 +1 13:24:35 +1 13:24:37 +1 13:24:51 AvneeshSingh: Is that separate? 13:24:56 +1 13:24:57 ivan: I believe so 13:25:00 +1 13:25:00 +1 13:25:29 - copyright infringement/ai training 13:25:34 +1 13:25:42 +1 13:25:50 +1 13:25:59 +1 13:26:04 +1 13:26:05 +1 13:26:13 - realtime translations 13:27:06 DaleRogers: Metadata is 7 votes, AI generated alt text is 6, Copyright is 6 votes 13:27:35 subtopic: Metadata for AI content 13:28:49 See also this mail sent to the CG recently: https://lists.w3.org/Archives/Group/group-pm-wg-chairs/2026May/0001.html 13:27:35 wendyreid: we already have an issue in our repo about this, but its important to 13:28:10 …crosswalk epub metadata with vendors to identify AI content 13:28:41 …because there is a lot of skeptical views and hostility toward ai 13:29:01 …so people can make informed decisions about how they engage with AI content 13:29:04 q+ 13:29:27 …right now it is opaque, you can tell what AI might have generated in a book 13:29:34 ack ivan 13:29:49 ivan: Great points, an additional area where this becomes more critical is scientific publishing 13:30:02 ... where submissions to publications may come entirely from AI 13:30:39 ... contradicts the ethics and practices of scientific publishing, but on the other hand, using AI to improve your publication might be acceptable, especially when facing linguistic challenges 13:30:47 ... most publications are in English for example 13:30:59 ... making it clear to signify to reviewers what parts use AI 13:31:05 ... absolutely necessary 13:31:14 ... that needs metadata added in one way or another 13:31:41 ... one more thing, this is mostly really a publishing problem, not a web problem 13:31:45 q+ 13:31:49 ... the web might be much less sensitive to these issues 13:31:51 q+ 13:31:55 ack DaleRogers 13:32:44 DaleRogers: That can get really complicated, for example I can write 8 pages of content that is my work, I can ask NotebookLM to only look at my resources, I can ask if it can find the pattern of my work, that is AI-assisted writing 13:32:56 ... but it's only on my content, or is it just a tool like a grammar checker? 13:32:58 q+ 13:33:02 ack wendyreid 13:33:07 ack wendyreid 13:34:10 q+ 13:35:02 wendyreid: this could be important, for example, in social media platforms are using AI to find AI content, and people are saying they are being accused of using AI when it was their own content. 13:35:40 …adding metadata for AI so we can add more nuance to how it is used 13:35:43 ack ivan 13:36:10 ivan: Coming back to what Dale said, the example of write my summary, it's a perfect example, scientific writers may need to do, and might do 13:36:28 ... we are not in a position on the merit of using these tools, eventually the scientific community will need to do that 13:36:36 ... they are having those conversations now. 13:36:50 ... our job is to provide the mechanism for this information being provided to the community 13:37:08 ... right now there is no way for an author to make clear what tools they used and how 13:37:31 ... the community can decide what to do with that, we can provide a means to help them disclose that 13:37:38 ack AvneeshSingh 13:37:56 AvneeshSingh: One thing there, the Business Group is an important entity to be involved, not only technical people can decide on this 13:38:22 q+ 13:38:24 ... we see how copilot is integrated in every microsoft product, people need to decide what the threshold is to being "ai-generated" 13:38:35 ... 70-80% of content will be assisted in some way 13:41:15 DaleRogers: I initially wanted to respond to what Avneesh said, how AI is just in everything, like in Photoshop and now it's integrated and if Photoshop created something, if an illustrator made something with AI in it 13:41:43 ... now, if they were directing AI and human-made elements, and AI made elements, the nuance needed to determine things is very complicated 13:38:35 subtopic: copyright infringement/ai training 13:38:43 sueneu: Let's discuss copyright next 13:38:53 ... people in publishing seem to be very interested in this 13:39:29 toshiakikoike: I shared my thoughts in IRC: 13:39:42 ... A major ongoing concern in the publishing industry is how to handle or prevent unauthorized use of content for AI training. Legal and ethical concerns around AI-generated content, including potential infringement, remain significant and continue to affect the overall ecosystem. 13:40:11 q+ 13:40:26 sueneu: Do we need to solve this or provide tools for it? 13:42:00 ack ivan 13:42:29 ivan: It's a pity that Laurent is not here, the TDM work is evolving towards marking up things that are a signal to LLM whether content should be used for training or not 13:42:43 ... there was work on search engines and crawlers, and TDM started from that approach 13:42:51 ... in the meantime we have LLMs now too 13:43:16 ... for our work, we need clear ways to include metadata about a publication or parts of a publication can be exposed to LLM training. 13:43:39 ... adding global metadata is relatively easy via the package.opf document, but more granular metadata is more complex, do we want to add that granularity? 13:43:41 q+ 13:43:46 ... per chapter or image or video 13:44:02 ... we need a granularity our current metadata structure is not perfectly aligned 13:44:04 ack sueneu 13:44:39 sueneu: Publishing hat, this could be particularly important for publishers, sometimes books are made up of content from different places, photosgraphs, stock photos, we'd need that granularity on a book by book basis 13:44:44 ivan: Or resource by resource 13:44:46 q+ 13:44:52 ack wendyreid 13:44:52 ack wendyreid 13:47:06 q+ 13:47:39 wendyreid: This is an important issue for us to highlight, because there are other groups working on the twin problems of ai content and fair use, and we need to be in alignment with them 13:47:58 q- 13:47:39 subtopic: AI generated alt text for images 13:48:05 sueneu: AI alt text 13:48:13 q+ 13:48:17 ack ivan 13:48:50 ivan: I think Charles raised this, there are 2 things here, one is a requirement to create alt for an image in my publication and I can use alt text to do this 13:49:09 q+ 13:49:25 ... the other possible scenario where I publish a book or webpage and I don't include alt text, and AI tools can come in to create the alt text for the user on the fly 13:49:26 q+ 13:49:31 ack AvneeshSingh 13:49:32 ... which scenario are we looking at? 13:49:43 ack CharlesL 13:50:07 CharlesL: On that point specifically, this is something I brought up a long time ago, someone made a tool to make alt text descriptions to add them to webpages 13:50:22 ... I was concerned because how do you know whether the image is informative or decorative 13:50:48 ... went into a rabbit hole of aria roles and `alt=""`, the standards for decorative 13:51:04 ... if you had `role="presentation"` and `alt=""` is a clear signal 13:51:22 ... though you only need `alt=""`, adding the role is a sure sign 13:51:26 ... that was my main point 13:51:43 ... knowing when images get described by AI is important too, for clarity 13:52:09 ... FYI, we're creating an ISO standard for remediated content, adding accessibility metadata, about to publish something for review, we talk about quality and AI there too 13:52:50 AvneeshSingh: There are two parts, the content side, image descriptions created by AI and added to a publication, we have a document on DAISY on it 13:53:00 ... if a human is in the generation process, they take responsibility 13:53:33 ... a difference between the front and backlist as well, but the use case Charles mentions is broader than publishing, but applies to us too 13:53:49 ... Readium has this as a feature, as does Narrator, it's a very broad discussion we should have 13:53:54 q+ 13:53:57 ack sue 13:54:01 ack sueneu 13:54:18 q+ 13:54:37 sueneu: Avneesh you bring up an interesting point, the copyright issue of on-the-fly image descriptions, it could be inaccurate enough to not be representative of what the author intended 13:54:45 ack ivan 13:55:26 ivan: Let me be a bit intentionally provocative, let's say the problem Susan mentioned doesn't exist, on the fly alt text is perfect, and metadata added to the image directs the AI properly, something like that 13:56:07 ... on the other hand, at the moment we've had many discussions on images, we are very strong on requiring accessibility data for books; we had discussions whether we want images in the spine for example (which creates accessibility problems), is it possible if the AI tools improve, does it make this discussion moot? 13:56:29 ... ie, image in the spine is automatically accessible because there are tools that do that automatically 13:56:35 q+ 13:56:41 ... does that change what we say, or the emphasis, in the EPUB specification? 13:56:43 q+ 13:56:49 ack DaleRogers 13:57:00 DaleRogers: I create comics, one image per page, lots going on 13:57:26 ... to decrease the amount of time to make it accessible, I will use AI to help me describe it, and sometimes I tweak it 13:57:51 ... but if I have gone through the work to make sure it is adequate and the intent is expressed, I don't want AI to ignore what is already there because I've done that work 13:58:01 ... if we could have something to prioritize existing content 13:58:20 ack wendyreid 13:58:25 ... I don't know how we can fight with AI on this, this always gets back to author intent and reading system behaviour with AI on top 13:59:03 wendyreid: A big thing to consider is the potential power of ai to make things more accessible, there might be a future where this works 13:59:41 …but now, this isn't a consistent experience. One person gets well written alt text, but the next person doesn't but there is no control over it 14:00:07 …the only control we have is if publishers supply the alt text even if it s written by AI, but a human has reviewed it. 14:00:32 …automated tools may someday be good enough but right now we have to push for human involvement 14:00:47 sueneu: What is the next step? 14:01:08 q+ 14:01:11 ivan: I will go through the minutes and add that to the issue, I'll clean it up a bit 14:01:32 ... question for me, do we want to pick this up another time with more people present to get more perspectives 14:01:44 ... I will contact Dom about this once the minutes are on the issue 14:01:57 ... I definitely think we should talk about this again 14:02:04 sueneu: Thanks everyone!