Publishing Business Group Conference Days: Generative AI Influences on Publishing

First day

Presentation by Peter Brantley

liisamk: We'll start with Peter Brantley

peter: I have learned a lot from preparing for this
… my diverse background working for libraries and other areas of publishing
… my talk is Aiding and Abetting: The AI That Came to Dinner
… foundation points or assertions - AI are more engines than factories
… better to think of generative AIs as services, bundles of ways of interacting
… conversational AIs like ChatGPT are one of the first ways that the public became aware
… can be invoked via APIs or plug-ins
… AIs create stress on the form of media
… today's art are isolates, but AIs produce synthetic forms of media
… format plasticity- art from text, video from text, text from images
… format injection- breaks down silos between media
… AI interwoven - example of how these things can work together
… university seminar in french revolutionary literator with resources
… students can study and annotate across all these titles
… AI can help with translation, marginalia, can seek video for contextualization, may flag insightful annotations
… consider literature as it is read, literature as it is published, then the marketplace
… think first about reading and how AI affects published work
… AI ready bookstore- reading is all hands activity
… authors can suggest integration points for AI in their content
… publishers can embed AI services into platforms
… readers can select AI as they read
… can help in book as a media object or in platform as a data services
… in book examples - language translation, contextualization, enhancements, dynamic format shifting
… many have thought about reader involved story elements, make this story more about this rather than that
… examples in platform - reader, benefit of bookshelves as database
… for a publisher, competitive advantage for subscriptions and aggregation of content
… reader can peruse across in innovative ways
… platform based AI provide refinable, conversational recommending services
… collaborative multi-lingual social interactions between readers, authors, reviewers
… publishing as a series of workflows
… consider these as a taxonomy-
… extract- build indexes, extracting key terms, phrases, build smart index, extract terms
… non-fiction, document citations as notes
… validate- check grammar, language, dialect, look up citations, ensure rights attribution, check facts
… generate- marketing and promotion, front and back matter, long form conversations like film scripts
… analyse - fits with acquisitions for fit within front or backlist, solicitations to augment catalog
… editorial functions to determine readability or cultural sensitivity and cohesive stories
… reformat - production adaptive content for various forms
… discover - suggest comps across a broader literative
… translate - languages, audiobooks for different global region, express equations in text
… beware introduced bias!
… industry impacts - will be impacts beyond readers and workflows
… questions: who engineers AI integrations? who gets access to user data that is generated?
… will functions of distributors expand or alter?
… what services do they provide?
… what happens to W3C web publication standards? what does EPUB mean?
… how can we optimize for integrations? metadata changes? triggers in books? what complex IP rights assertions?
… "intelligence" is not wholly human intelligence - how unique are we as contributors to the world
… we have introduced AIs and are just on the verge of a variety of kinds of outcomes

catherine: should human creativity be treated differently from computer generated?

peter: it is hard to draw a distinction between the two and whatever guidelines we create we need to be flexible

wendyreid: my research shows there is a context problem and particularly with translation. how do you factor in failure of context
… and where it is important to understanding?

peter: your question comes minutes after meta announces translation for thousands of languages
… there are 7000 languages on the planet
… there is difficulty translating and context, but increasingly in reach with dialogues and models
… in medicine, demonstrated empathy
… language itself is stocastic and not governed by hard set of rules and grammars always break down
… grammars fail in practice and AI is an area where it is strong

<Zakim> tzviya, you wanted to ask about amplifying bias

tzviya: one concern is that the training data can be biased and gets amplified. don't know how it is trained
… if we are working on the assumption that the data is good, we are flawed
… how are our materials becoming part of that bias

peter: it is good data, but it is human data and it can create bias
… OpenAI and others are trying to constrain the bias and injections
… people are increasingly aware
… there is a race between current models and locally trained models for specialized areas and focus training
… will become more readily available and more useful

IP, Rights and Privacy related to Generative AI - Catherine Stihler, CEO of Creative Commons

See slides.

cs: IP, rights and privacy related to generative AI
… something really changing
… our conversation with AI
… AI brings significant opportunities
… and challenges
… over the past 20 years CC licenses have become the global standard
… for sharing content for creators
… 2.5 billion works shared via CC license
… we've advocated for people's ability to build on copyrighted works
… have been seen ways in which CC license incorporated with facial recognition, etc.
… we see examples in privacy regulations, e.g., GDPR
… AI is a global phenomenon
… what to do is entirely clear
… something new for AI license
… share the desire to enable creators to see how their works are used
… as regulation taking time
… we could possibly decide some of the issues
… creative ecosystem that works well for everyone
… exact form for the solution
… really important to see how public interest in this debate
… solve the challenge that we see with the inputs
… and what's next?
… global summit in October
… allow a more equitable conversations
… support better sharing of creative content

lmk: any questions?

ts: how to join the mailinglist?

cs: please send an email to me
… I can help answers
… feel free to contact me

ds: tx

cs: pleasure

ds: would like you to speak Japanese too

cs: could do that using AI :)

TDM Protocol and Machine Learning for Training Generative AI - Laurent LeMeur, Director and CTO of EDRLab

See slides.

ll: will send the slides later
… [slide 1]
… What is TDM?
… text and data mining
… [slide 2]
… Directive on COpyright in the Digital Single Market -> DSM
… article 3 / article 4
… [slide 3]
… Articles 3and 4 in brief, for publishers
… [slide 4]
… Article 4 in brief, for TDM actors
… the situation is not yes/no
… [slide 5]
… TDM vs AI
… most AI solutions analyze large datasets as a first step
… in digital form
… [slide 6]
… The Commission recognises...
… [slide 7]
… Different ways to opt-out
… there are different ways
… one is blocking bot access
… research bot can be blocked
… another is embedded metadata
… C2PA, IPTC, etc.
… different types of metadata
… third is what we talk today
… indicating the decision to opt-out
… [slide 8]
… Illegal scrapping -> legal action
… 2nd/3rd solutions don't block the crawlers
… [slide 9]
… How can I know that my content is used?
… [slide 10]
… TDM Reservation Protocol / Our Goals
… [slide 11]
… W3C CG
… created in Feb 2021
… [slide 12]
… What is a CG?
… [slide 13]
… Expressing if TDM rights are reserved
… keep it simple stupid
… one property, 2 values
… optional web link
… [slide 14]
… One protocol, Three techniques
… preferred one is including HTTP headers
… but not always possible
… [slide 15]
… Example, centralized JSON file
… (example code)
… you get a policy to find information
… how to contact the publishers, etc.
… third part is Web site
… [slide 16]
… Example, http header
… (sample info of HTTP header)
… [slide 17]
… Example, html header
… [slide 18]
… Licensing policy
… optional feature
… [slide 18]
… Example, licensing policy
… simple profile of the ODRL2 format
… (example code)
… [slide 19]
… And now?
… what are doing?
… finalizing the CG Note
… tdm-reservation-protocol
… tdmrep
… there will be new round of discussion
… do we need to add any AI use cases?
… specific use of the content
… questions?

lmk: kinds of opt-outs
… different models?

ll: add different values
… there are various opinions
… e.g., AI training is not part of TDM
… not embedding or specialization
… TDM is handled by different initiative

lmk: should have a standard that has implementations
… are there ready-made AI solutions?

ll: there are some initiatives
… but segmentations so far
… so many initiatives there

ts: there are generative AI vendors
… not recommend just going to them and saying I want to work with them
… without doing a lot of homework
… helping train the AI or just tainign it for your internal use
… there is not such thing as of today

lr: a lot of things going on
… all very new
… technology as well
… good things to say
… one of the areas very complicated
… the rights to cover images could be very different from one to another
… need to clearly understand the work entirely
… a lot of stuff more so than others
… a lot of organizations have started to work

pb: partly in comment to
… fairly rich ecosystem these days
… privacy of their own homes as it were
… if you have technical staff, it's a path towards achieving deeper understanding

llm: very good question
… there is no other practical solution

ih: always possible to know
… whoever tries to take your data
… how to stop getting that?

llm: future registration
… one initiative about big lion
… wraps may alliances
… allows you to check if your data is included
… people cannot deny
… the repository is open for your questions

lmk: any other questions?

(none)

lmk: thank you, all the speakers!

[adjourned]

Second day

IP, Rights and Privacy related to Generative AI - Catherine Stihler, CEO of Creative Commons

See slides.

<jyoshii> まずキャサリンさんからお願いします。

<jyoshii> キャサリン（クリエイティブコモンズ）です。

<jyoshii> CEOを３年間努めていました。その前はEUの著作権関係の仕事をしていました。

<jyoshii> 生成AIの影響、知財、プライバシーについて話します。昨年11月から急速に影響を与え始めていますが、

<jyoshii> 会社の経営層や学校でも大きな影響を与えてきています。

<jyoshii> クリエイティブコモンズではこの影響に関して指針を出すべく活動中です。

<jyoshii> 著作権についてクリエイティブコモンズはどこでも誰でもアクセスできるという立場ですが、

<jyoshii> 著作者がシェアする、しないを選べるような立場を取ろうとしています。

<jyoshii> クリエイティブコモンズはすべてのクリエーターに権利物の利用が可能になるよう努めています。

<jyoshii> 図書館でも政府でもすべての関係者に働きかけを行っています。

<jyoshii> クリエイティブコモンズ（以下CC）は生成AIの世界でも著作者にとって多言語で利用可能な環境を作ろうとしています。

<jyoshii> CCのライセンスは生成AIについて文章以外のものも含めて利用可能なものを考えており、権利者に有害なものにならないよう考えています。

<jyoshii> 考えるべきは従来の考え方を超えていくことです。とはいえ個人情報保護法を守ることは考えなければなりません。

<jyoshii> 今の状況を見ると既に様々な700以上の法律が60位以上の国で定められていますが、規制するがわに

<jyoshii> 与するのではなく、バランスを持って利用可能にする方向を目指すのは厳しいと思っています。

<jyoshii> 活用より制限が優っており、法律面での戦いの中で利用を可能にしていく努力をしています。

<jyoshii> 多くの人々、特にクリエーターが利益を得て、共有できる環境、財としての主張も可能な環境を作ろうとしています。

<jyoshii> CCの役割は20年前からと同様にコミュニティ向けに作った仕組みを権利を利用可能な形を模索しています。

<jyoshii> 現在は様々な関係者と新しい仕組みを話し合っております。

<jyoshii> 関係者の集う場として10月にメキシコで生成AIについてのサミット開催を予定しています。参加方法は後ほどお知らせします。

<jyoshii> 10月3日から6日です。

catherine@creativecommons.org

<jyoshii> 参加希望の方はcatherine@creativecommons.orgにご連絡ください。

<jyoshii> ご質問もこのアドレスでどうぞ。

AI Influences on Publishing and Creative Content - Peter Brantley, Director, Online Strategy, UC Davis Library

See slides.

<jyoshii> 次はブラントレイさんです。

<jyoshii> スライドは皆さんに送ったものです。私はUCデイビス図書館オンライン戦略担当で、ブックスインブラウザーの仕事をしてきました。

<jyoshii> 皆さんと意見交換をしたいのですが、AIについて考慮すべき点を考えていきたいと思っています。

<jyoshii> AIはブラックボックスではなくプラグインやアプリなどのサービスとして利用することを考えてみてはどうでしょう。

<jyoshii> AIはメディアのフォーマットを超えて他のメディアとの交流に使えると考えています。

<jyoshii> Aiが織りなす世界は言語、年代の壁を越えて、学生に深い理解をもたらすことが可能になりえます。

<jyoshii> AIが文学や出版に与える影響についてどういう役割を果たすかお話ししましょう。

<jyoshii> 読むという体験はAIサービスで読書体験を豊かにすると考えています。

<jyoshii> AIがもたらす方向は一つには本の中で働き、もう一つはプラットフォームの中での部＾スターの役割です、

<jyoshii> 本の中（インブック）の例としては翻訳はもちろん、文脈の整理、他のメディアでの表現、音読などコンテンツの表現方法を変えることや、

<jyoshii> キャクターを選んで読むといった新しい体験が可能になるのではないでしょうか

<jyoshii> プラットフォームの中の例としては読者のDBとして活用、出版社にはサブスクに対抗して作品を詳細に説明し、会話型・他言語型のおすすめも可能になります。

<jyoshii> この環境で出版業界にはビジネスが変わるのではないでしょうか。

<jyoshii> それらの分類型（たくそのミー）として抽出、検証、などが可能になります。

<jyoshii> 抜粋、インデックスの構築、なども可能になります。

<jyoshii> AIは販促用のコンテンツをSNS用に生成することもできます

<jyoshii> 文法や正確さ、権利確認なども可能になります

<jyoshii> 編集面ではアイデア出しや様々な機能を提供できるでしょう

<jyoshii> 翻訳、音訳も様々に可能となるでしょう。

<jyoshii> 業界にとっての影響はまだ整理されていません。誰が何をするのかなどなど

<jyoshii> 技術的にはEPUBでのメタデータの埋め込みや権利の使用などのいろいろな問題が残っています。

<jyoshii> 最後は芦むら

<jyoshii> さんにお願いします

<jyoshii> 芦村：これからの技術標準

<jyoshii> 生成AI

<jyoshii> って何ですか？　と聞いてみました。

<jyoshii> 音声合成、音声認識、いろんなことができますが知ってることは知ってるけれど、

<jyoshii> どんな知識をどう学んだのかはわかりません。システムはベンダー間でバラバラで言葉は標準化されてません

<jyoshii_> (

<jyoshii_> （しばらく落ちておりました。すみません）

<jyoshii_> 広告WGに個人用広告CGが生まれた。分散したIDを一つにしようなんてことも提案されています。

<jyoshii_> セルフソブリンなんてものもあります。

<jyoshii> PRSENT+

<jyoshii> WoT

<jyoshii> deha

<jyoshii> では各社のインターフェースの共有化、統合もAIを被せることができます。音声認識もベンダーで規格が違うのを

<jyoshii> AIを使って繋げる、連携させるということがW3Cでは話し合われています

<jyoshii> いろんな標準化団体、業界標準が出来あがっちゃっている状況をW3Cがハブになってまとめていこう、といった動きもあります

TDM Protocol and Machine Learning for Training Generative AI - Laurent LeMeur, Director and CTO of EDRLab

See slides.

Personal views around AI and its influence - Kazuyuki Ashimura, W3C

See slides.

– DRAFT –
Publishing Business Group Conference Days: Generative AI Influences on Publishing

23 May 2023

Attendees

First day

Presentation by Peter Brantley

IP, Rights and Privacy related to Generative AI - Catherine Stihler, CEO of Creative Commons

TDM Protocol and Machine Learning for Training Generative AI - Laurent LeMeur, Director and CTO of EDRLab

Second day

IP, Rights and Privacy related to Generative AI - Catherine Stihler, CEO of Creative Commons

AI Influences on Publishing and Creative Content - Peter Brantley, Director, Online Strategy, UC Davis Library

TDM Protocol and Machine Learning for Training Generative AI - Laurent LeMeur, Director and CTO of EDRLab

Personal views around AI and its influence - Kazuyuki Ashimura, W3C