Task Forces/Metadata/Len Vlahos & Julie Morris Interview

From Digital Publishing Interest Group
Jump to: navigation, search

Len Vlahos, Executive Director, and Julie Morris, Project Manager, Standards and Best Practices, Book Industry Study Group (BISG)

Len and Julie provide a very interesting perspective on our metadata questions because the BISG—a very broad-based book industry organization comprising a wide range of publishers, retailers, distributors, and service providers across the entire book supply chain (including almost all of the biggest ones)—has a number of committees that specifically focus on metadata issues. They are the US representative to EDItEUR regarding updates to ONIX; they are responsible for BISAC, the US subject vocabulary for the book supply chain; they were a major participant in creating Thema, the new international subject vocabulary; and their Identifiers Committee works closely with the groups responsible for key identifiers like ISBN and ISNI (currently participating in the revision to ISBN). Plus they are involved with rights and manufacturing-oriented metadata (EDI, RFID, etc.).They also do a lot of education as to best practices. This all is precisely the area of Julie’s responsibility, as you’ll see from her title; and Len, as Exec. Dir., is very actively engaged in all these things. They are an ideal “hub” for insight into the broad book publishing industry, both the creators and the recipients of metadata. I interviewed them jointly. (Julie, of course, is a member of the DPIG; and BISG, bless their hearts, is a W3C member, thanks to Len’s realizing how important that is.)

Len began by saying that in his view there are two major categories of metadata issues for book publishers: (1) communication, and (2) systems and processes.

Regarding communication of metadata, he said that there is a belief from downstream partners [by which he meant the recipients of publishers’ metadata] that publishers are confused by and inconsistent in their use of metadata, whereas upstream, the publishers think the downstream partners are making changes to their metadata, which is unwelcome. [IMO, there is a kernel of truth to both perceptions, although both are exaggerated.]

Regarding systems and processes, he pointed out that there are “no clear lines of responsibility for metadata.” “It’s a giant game of telephone that’s gone awry.”

Julie specifically addressed issues with ONIX. [In this context, we are talking about ONIX for Books, which is by far the main context in which ONIX is used. There are other versions of ONIX.] Here are some of the key things she pointed out:

  • There are problems with using metadata to indicate RELATIONSHIPS between things, for example a print book from which a digital version was derived.
  • There’s a huge issue in the IDENTIFIERS sphere [this comes up in almost every group I’m involved in, btw]: the lack of a “Work” identifier as opposed to identifiers for products of that work. [One reason for this is the difference between how publishers view “the work”—they are focused on what they are trying to sell—vs. librarians’ view of “the work” (e.g., “Huckleberry Finn” or “Hamlet,” not one particular publisher’s).]
  • VERSIONING is a big issue in digital publications: what’s a new edition vs. a version of an edition.
  • Expressing SERIES, relating products as part of a series or a group of titles that should be tied together.

Another big topic discussed by both Len and Julie was the volatile nature of the situation, because of changes in the industry and the types of products being produced.

How does “product metadata” relate to metadata embedded in a digital product? These are handled quite separately in most publishing organizations, and there is a lack of awareness and communication between departments (digital, production, marketing, etc.). Fundamentally there is a lack of clarity and consistency regarding “what are we saying about this thing?”

Plus, there is a need for metadata to describe changing products, which results in “mushrooming metadata.” Publishers like Pearson wind up thinking of themselves increasingly as technology companies.

Len pointed out another key problem: the book industry is no longer “all one thing” as it was in the past. There are so many bidirectional relationships between types of publishers and the entities they deal with (trade  retailer, trade  library, educational publisher  school system, etc.), and there is “no overriding standard that accommodates all of these.”

Len also pointed out that there have been discussions with GS1 [the organization responsible for global standards for the supply chain, best known for barcodes] and the book industry metadata was actually “in relatively good shape” compared to that in other sectors like apparel, beverages, etc. The book industry is “more similar to music: so many different SKUs.”

He also mentioned [surprisingly to me] that some have begun to question the long term value of ONIX to the industry. He pointed out that lots of work has been done on the library side regarding linked data. He suggested that there was an evolution toward looking at metadata not at a “record” level [e.g., an ONIX record] but at a more distributed level.

Julie pointed out the inherent conflict—or at least the co-existence, and at present not all that good alignment—between ONIX, MARC, and schema.org.

Len pointed out that eventually the _data_ about the book [not sure if he meant just metadata or the content itself] will reside in the cloud.

He gave the example of author vs. contributor metadata. There is important granular information connecting contacts, contracts, rights, etc. to authors and contributors and their products, and this is not best accomplished by “boxing it into either an ONIX or MARC record” [his implication being, if I can speculate, that this both makes that metadata inaccessible to updating and unnecessarily duplicates information that is really the same in a bunch of those different “boxes”]. A more “networked” approach [my word, not his, at least not in my notes] would lend itself to greater conformity, which would be all to the good.

Finally, he also pointed out that there is an inherent conflict between the needs of archivists/cataloguers and marketers/publishers [alluded to above in discussing the library vs. publisher pov on things like what “a work” is].