Digital Publishing Interest Group Teleconference -- 28 Apr 2014

<trackbot> Date: 28 April 2014

<liza> Bill's notes for today: https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata#NEAR_TERM_GOALS

<ivan> Guest: Laura Dawson, Bowker

<liza> Minutes: http://www.w3.org/2014/04/14-dpub-minutes.html

minutes approved

liza: guest intros

laura dawson: product manager for identifiers at bowker, involved in digital publishing, spoke to Bill K about metadata

Tom Cole, of Univ of Illinois, working with metadata for about 20 years

liza: Bill provided reading material based on interviews with metadata experts in industry
... there are some issues that Bill raised that are out of scope of W3C. e.g. identifiers - known problem, but we will not try to solve it
... we will talk about metadata as a method for transporting data, as in ONIX
... that may not be as relevant to the web
... section-level metadata is extremely relevant to web
... the network of metadata, that is the relationship of pieces of metadata to each other, is extremely relevant to web
... rights metadata (not DRM) is also a high-level topic

Bill_Kasdorf: more interviews to come. Madi's interviews not yet posted
... Bill interviewed people across industry. Madi interviewed experts within Pearson

<liza> https://www.w3.org/dpub/IG/wiki/Task_Forces/Metadata#NEAR_TERM_GOALS

Bill_Kasdorf: summary is available on wiki
... complexity and inconsistency are major issues, driving a desire for simplicity
... some have suggested putting ONIX into schema.org

Bill_Kasdorf: leaning toward focusing on embedded metadata that aids in discoverability and sales
... this led to exploring Thema
... Different regions have different specs for subject codes: BISAC, BIC, etc. Thema universalizes them

philm: Thema is very high-level, a classification that travels
... the question is how granular do we want to ger?
... example is content about specific battle in World War II. Thema classification does not go beyond WWII

Bill_Kasdorf: When within a domain, require details of that level
... this gets to a separate point - I don't think we should get into vocabularies, other than providing the mechanism for creating them
... we can provide keywording
... some people wish for OWP to do things it can already do
... Some people want ONIX-lite, but what parts are of interest?

ivan: we have terminology problems among communities
... schema.org contains main classes and properties
... others contain classifications

<ivan> http://blog.schema.org/2012/05/schemaorg-markup-for-external-lists.html

ivan: the approach of schema.org is to accept agreed upon but external terms. They do not want to standardize that
... this link points to a description of what schema.org will accept in a specific circumstance, in this case, wikipedia URIs
... what we need to agree upon is what schema.org will accept - what classes and properties, allowing for volatility within the specification to which they point

Bill_Kasdorf: there are so many different sides to publishing, and not all needs are aligned

Tim: another issue with things like ONIX, in addition to volatility, is the way that they are used
... every publisher uses it differently - not a standardization success
... the library world has attempted to add missing tags, such as imprint

Bill_Kasdorf: Recipients of metadata would like to see consistency, and publishers would like to see recipients standardize implementation

liza: is standardizing thema? is a recommendation for schema within scope of this group?

philm: agree that using vocabularies from different areas might be a good idea. ONIX sends a message with a lot of useful data, but some of it is not useful to web
... GS1 has useful info, but it doesn't have contributor element, so they used BISG best practices for ONIX
... We also need to look at different audiences. STM audience and general audience are going to be very different.
... Trade content is not often online, but STM content is often online.

ljdn: Flexibility of allowing different vocabularies, but I would argue that trade books are not online YET

<ivan> +1 Laura

<liza> +1000 Laura

ljdn: there may be a case for search engines to index the structure vocab of book for search

<TimCole> from the EPub blog a comment on about sufficiency of ONIX contributor codes and Library of Congress MARC relator codes for comic books: https://code.google.com/p/epub-revision/issues/detail?id=23

<liza> Well, it's a real thing we do at Safari

ljdn: probably want to web-enable books, even if do not sell them as online books
... amazon has categories, B&N has categories, we should make categories

liza: i get nervous when i hear ONIX-lite, because it sounds like we are creating a new standard and stepping on ONIX toes

<ljnd> Tzviya, not make categories, just be aware that the publisher is not solely in control of describing their own books

<ljnd> Vendors can as well

liza: but it makes sense to take a subset of currently available specs

Bill_Kasdorf: There is a medical vocab called UMLS with 1 mil concepts, maintained by NIH - one slice of STM
... issues arising, like subject, contributor, rights. Being able to base this on existing standards - point to an existing code list
... keywords is also important - author may not know what BISAC is, but can list keywords

brady-duga: a little nervous about putting just metadata and no data on internet, people want to find information, not just metadata when they search

liza: agree, but I think it's reasonable for a search to lead to where can find information, not the actual info

<ivan> +1 to Julie, this is the start

julie: re: onix-lite concerns: BISG committee looking at schema.org and first identifying where connections already exist
... there are probably more holes than there are connections
... we may be able to identify where current schema.org elements mimic ONIX. i.e. we may not have to create ONIX-lite.

Bill_Kasdorf: it may be a recommendation of how to include ONIX info in schema.org.

liza: this group could then write rec of HOW to include the info and how to process

Bill_Kasdorf: we have been focusing on schema.org. Is there anything else from OWP that we should look at?

ivan: reinforce what Julie said: the first thing to do is look at existing vocabularies in schema.org to see if the exist, then add deltas
... to answer Bill's question, in core OWP specs, there are no major tech change or improvement to implement
... we can act as go-between

gcapiel: process of getting new properties into schema.org is tricky and not completely transparent. Still a good idea, but there are other ways to do it
... there are other vocabularies like microdata and RDFa
... the challenge is what search engines will pick it up

Ivan: the first thing to do is talk to the people involved and sit down with people who represent this industry (BISG or others)
... and make our intentions clear

liza: is this an action item for ivan?

Ivan: it might not be for this group, might be appropriate for BISG
... would need closer rep from EDItEUR if we invite schema.org reps to a meeting

Bill_Kasdorf: this group has a strong book bias, but this is far-reaching

tzviya: I thought we had agreed to narrow scope to books because that is who the representation is and we had to narrow it down

Ivan: if we take an exercise like ONIX, it helps to start with a smaller unit like books
... I am happy to invite reps from schema.org, but I don't want to step on BISG's toes

Julie: BISG would be interested in making sure that our work is checked over by schema.org and we have a stamp of approval

TimCole: there is also a CG looking at bibliographic extensions (BibEx). Is that represented in this group?

<TimCole> http://www.w3.org/community/schemabibex/

ljdn: that group is not very active now and is focused on library work

Liza: out of time. continue on list

Digital Publishing Interest Group Teleconference

28 Apr 2014

Attendees

Contents

Summary of Action Items