Warning:
This wiki has been archived and is now read-only.

DraftReportReviews

From Library Linked Data

Jump to: navigation, search

1 LLD XG Final Report: Summary of reviews

LLD XG Final Report: Summary of reviews

General document

Kim: review, ack'd
Juha Hakala, National Library of Finland: review, ack'd
Jonathan Rees review, ack'd Antoine
James Weinheimer [1] fwd by Karen Coyle
romain Wenz, from Bibliothèque nationale de France [2]

Scope of this report

René van der Ark, National Library of the Netherlands: review, ack'd

Dickson: review
- partial handling

Blog comments
- 2011-07-07 Emma: need to add definitions for “cultural organization” stating that “they could serve as memory institutions”
- 2011-07-27 Andrian: The distinction between “information produced or curated by libraries that describes resources or aids their discovery” and “data used primarily for library-management purposes” isn’t clear. It’s clear that e.g. holdings information and user data are omitted. But what about circulation data for a resource, the number and frequency of lendings and co-occurence with lendings of other books? This can aid the discovery of resources through suggestions like “People who borrowed resource A also borrowed resource B, C and D” and thus might be covered by the report (but I think it isn’t).

From Kim's review
- an example would be good with picture (datasets, element sets, value vocabularies)

Benefits of the Linked Data approach

Oreste: review: comments incorporated by Emma
Dickson: on 2011-06-19: suggestion that report touch on benefits of publishing experimental data as LOD so that experiments can be replicated. Beyond the scope of LLD XG? handled
Felix Sasaki on 2011-06-20: suggestions for specific edits in the form of an File:Lld-report-review-benefits-section.pdf, incorporated in the draft by Emma, Antoine, and Tom
- Advises on correct use of "internationalization" as opposed to "multilinguality" (change made)
- Proposes that report refer consistently to "memory institutions" (e.g., not "cultural organizations")
Blog comments:
- 2011-07-27 Jennifer Bowen: This whole section needs to make an even stronger case for why sharing library data as linked data is vitally important for the future of libraries, as well as how library data can contribute valuable data to the semantic web. I would suggest expanding this section and making it as persuasive as possible.
- 2011-07-27 Adrian: The distinction between “information produced or curated by libraries that describes resources or aids their discovery” and “data used primarily for library-management purposes” isn’t clear.It’s clear that e.g. holdings information and user data are omitted. But what about circulation data for a resource, the number and frequency of lendings and co-occurence with lendings of other books? This can aid the discovery of resources through suggestions like “People who borrowed resource A also borrowed resource B, C and D” and thus might be covered by the report (but I think it isn’t).
- 2011-07-22 Alan Danskin: It would be more accurate to characterize these as potential benefits, as they are still to be demonstrated on a meaningful scale. The analogy of “stone soup” is inappropriate. It implies that libraries have nothing of value to contribute. Whereas library metadata is valuable information that is rich in relationships. Expressing this data as linked data is not trivial as it was not designed for this purpose.
- 2011-07-22 Alan Danskin: Exposure of library authority data could be of much wider benefit than to libraries and their users
- 2011-07-22 Alan Danskin: “Links can be used to expand indexes much easier than required for todays federated searching, and can offer users a nearly unlimited number of pathways for browsing” Not sure what this sentence means.
- 2011-07-20 Jennifer Bowen, paragraph "Library authority data for names...": This seems to be lacking something here – add an explanation about how libraries have valuable skills and experience in creating and maintaining authority data and how those activities are important for linked data.
- 2011-07-20 Jennifer Bowen: Not exactly sure who the audience for this report is likely to be. If it is general library administrators, they are not likely to understand the term “global graph” unless it is defined here.
- 2011-07-29 Ed Chamberlain: This section makes some fairly high level assumptions about user understanding of HTTP. Such interactions with the data would depend largely upon the nature of the applications built around it.
- 2011-07-27 Adrian: “Citation management can be made as simple as cutting and pasting URLs.” I’d use the broader term “URIs” here. ->DONE
- 2011-07-22 Alan Danskin on Benefits to Organizations: The top down model has been driven by economics: we can’t afford to provide the granularity that would serve the customers best. The potential to invert this is one of the exciting possibilities offered by Linked Data.
- 2011-07-27 Adrian on paragraph "The technology itself": The term open is used here for the first time in the report. Its meaning should be made clear, e.g. by means of linking to a section where it is defined.
- 2011-07-22 Alan Danskin: BL’s experience with opening BNB data is that any restrictions on use will prevent re-use or aggregation of linked data.
- 2011-07-27 Adrian on paragraph "open data context": Again, it is talked about openness but it hasn’t been made clear, what it actually means. In general, the paragraph seems a bit out of context to me. -and- “The openness of data is more an opportunity than a threat. One benefit may be a clarification of the licensing of descriptive metadata towards openness, thus facilitating the reusing and sharing of data and improving institutional visibility.” Open licenses aren’t a benefit but an aspect of open data.
- 2011-07-20 Jennifer Bowen on "sea of triples": Last line here is great! Possible to highlight this in a text box or something?

From Kim's review :
- what about URNs? Libraries tend to use them quite a lot...
- "Benefits to Researchers, Students and Patrons" heading => "Benefits for the library users such as ..."
- "Beneftis to Researchers..." text:
  - 1st paragraph: the text can be shortened, start from the benefits (not from that Linked Data can not be noticed...) ;)
  - 2nd paragraph: "Library users should be comfortable" ... sounds like "must" to my ear, when the point is here that everybody (?) knows the WWW and thus they are comfortable with linked data...
  - 3rd paragraph: "RDFa" ... what about the recently published schema.org and Googles etc recommendations to use microformats and not RDFa?
- "Benenefits for Organizations" heading => "Benefits for Libraries and other Memory Organizations"
- "Benefits to Librarians, Archivists and Curators"
  - 1st paragraph: "By using Linked Data, memory institutions will create an open global pool of shared data that can be used and re-used to describe resources, with a limited amount of redudant effort compared with current cataloguing processes." => why is OPENESS a benefit?

From Jonathan's review :
- The 'scope' section is not related to benefits so should be put separate
- Every benefit of *any* technology comes with some cost. So the 'benefits' section IMO ought to be a 'value proposition' section - what benefits do you get, at what cost. I know this is covered later, so maybe 'benefits' should be shortened and turned into an introduction, with a promise of more depth later on.
- The benefits section is too gung-ho for my taste, e.g. the use of the word 'significant' with no justification. It also does not provide a comparison or null model. If you do LD you're not doing something else. Are the alternatives (including doing nothing, using XML or jSON or SOAP, etc) just as good? Why not?
- You could do some of the cheerleading by reference to other documents that promote LD, thus saving space here

From romain's review
- Benefits of the Linked Data approach

Comment: Libraries produce reliable data, especially vocabularies and authority data. If they open them as linked data, as soon as they use shared ontology, they can help structure the Web of Data with data that can be trusted, with vocabularies that anyone can link to.

Suggestion The Web needs to be structured with reliable and clean data, and libraries can provide them.

- Benefits to Librarians, archivists and curators

Comment: Among the very positive aspects of ?linked data? for libraries, there is the possibility to act at different levels, with various benefits. Suggestion Every approach can offer specific benefits, from internal re-use of data and identifiers to links or services to the end-user.

- Benefits to Developers

Comment: The general benefit is to get rid of specific library formats, which are not really interoperable (e.g. various MARCs). This is very important, so as to break barriers between libraries and between library data and other types of data. But the transition from library-specific data to LD won't be straightforward. Suggestion It will be possible to work step by step, with Web protocols. Suggestion A section that could be added as ?3.2.5.?: ?Benefits to service providers, software vendors and external developers: These developers will work with other important players: service providers, software vendors and external developers. The consequences are:

- - Research and development could be enhanced through these players. They could also work with research laboratories.
  - Libraries will still work with external vendors.
  - A new market emerges for industrials, developers and service providers, which can increase their financial benefits. For instance,

using interoperable RDF formats enable other actors to re-use structured data provided by libraries.?

Draft_Vocabularies_Datasets_Section

Alex: review, ack'd, [3], discussion ongoing Antoine
Comments on Available Vocabularies and Datasets
- Patrick Danowski: An analyze which datasets and which vocabularies are using which ontology (metadata fields) would be helpful to get an better idea which ontologies are widely used and which are not very common.
- Alan Danskin on "side deliverable": Thank you. These are useful resources; one of the difficulties with starting the BNB linked data project was knowing what is available and useful.
- Alan Danskin: A minor point, but LCSH is not limited to the topics of books.
- Teague Allen on definition of "dataset": I infer from two sentences within this paragraph that datasets are the implementation of value vocabulary terms, structured by a metadata element set, or sets. If true, explicitly saying this would be a helpful picture of linked data architecture. If not true, I’m still in the dark as to how the three resource groups relate.
  - Antoine: Datasets are indeed concrete data (e.g. the British national bibliography in RDF) that re-use elements from value vocabularies (e.g., LCSH), and are structured according to the specifications of metadata element sets (e.g., Dublin Core). We got other comments in the same line, and agree that the current wording can be improved. We’ll try to make our explanations clearer.
- Adrian on first mention of CKAN: CKAN is mentioned here for the first time. Thus, it should be shortly explained or referred to a section where it is explained.
- Alan Danskin on mention of use cases and CKAN...: Maintenance and development of these deliverables is highly desirable and we welcome the steps that have been taken.
- Catherine Jones on "low availability": I appreciate that the focus of this report is based on “library-held” datasets, however I don’t think the opportunities for journal articles are addressed. Whilst it is unlikely that most academic libraries will be cataloguing each journal article within their own catalogue – especially when there is Web of Science or SCOPUS available – end-users don’t necessarily see the distinction – or perhaps shouldn’t have to see the distinction between material types, or library’s decisions on ownership vs. access within their collection which will be reflected in their catalogue but not in the service provided. Sorry if this isn’t clear, but the service provided by a particular (academic) library is provided through a portfolio of electronic resource discovery tools, of which only one, the library catalogue, is the library responsible for creating the content, limiting linked data potential to “library-held” information may only be a small part of the information landscape for a particular user and isn’t building bridges to the work going on in the area of linking research data & publications (citing data etc)
  - Antoine: This is a really useful comment, thanks. I think when we used the expression “library-related resources” we were in fact thinking of data such as scientific publisher’s bases. But being more explicit would be useful. In fact the availability problem may be even more acute, for these specific datasets.
  - Jennifer: This paragraph is unclear. Is it saying that bibliographic datasets for what would commonly be referred to as “library catalog data” have low availability (and if so, can you speculate as to why that is?) or that those datasets ARE available, but that they aren’t that important? This would be an appropriate place to mention the need for software tools that help libraries to convert their bibliographic datasets to linked data.
  - AlanD: The British Library has recently made available a preview of the British National Bibliography Dataset http://www.bl.uk/bibliographic/datafree.html . The difficulties involved in this undertaking were considerable and go a long way to explaining the lack of published datasets. The BL chose to make BNB the focus of its linked data work because it is a large data set for which the BL, as the national library of the United Kingdom, is responsible and its scope can be reasonably clearly defined.
  - EdC: Two JISC funded projects based around Cambridge, Open Bibliography and COMET have made large library related datasets available, but it only goes so far. I second Catherines’ point about article / citation level data, there is serious value here. Furthermore, libraries could consider exposing operational data, holdings and anonymised circulation information to facilitate a richer range of interactive and recommendation based services.
- AlanD on "quality and support": This is partly a reflection of the [im]maturity of the technologies and the complexity of applying linked data to MARC data, which as is acknowledged elsewhere in the report, libraries are currently locked into. What is needed are models and tools to enable the conversion of MARC data. BL is looking at what would be necessary in order to release the tools we have used/created.
- Jennifer on deduplication: Would like to see some expansion of this topic. This is a very important consideration for the migration strategies recommendation later in the document.
  - EdC: Seconded. Matching against identifiers takes time and is prone to error. Some recommendation over which ones to focus on would be great
From Joathan's mail :
- what does 'a.o.' stand for?

Draft_Relevant_Technologies

Fumihiro: review
- handled
Jon: on 2011-06-22
Marcel: review - no action needed
Nicolas Chauvat: suggestion/add
WW: ack and suggested wording
Gordon: review, ack'd
Blog comments on technologies section
- 2011-07-22 Alan Danskin re: "Linked Data is an emerging technology...": This may be true, but it is also true that investment is needed in new skills, tools and kit.
- 2011-07-20 Jennifer Bowen: A very important, and persuasive, point about generating RDF serializations on the fly, rather than “throwing away” current applications. I would suggest adding something general about this in the first section of the report, about Benefits of linked data. This is an important benefit, that libraries do not completely have to retool in order to implement linked data.
- 2011-07-20 Jennifer Bowen on "by focusing on request...": Same comment as for paragraph 18 – this is a compelling point that should be brought out earlier in the report. Anything related to “lowering the barrier for entry” for libraries for linked data should be emphasized as much as possible.
From Jonathan's mail :
- Section on bulk access should mention the value of doing joins, which only work well when you've done a bulk load into some kind of query engine (triple store, etc).
- In 'front ends' where XSLT is mentioned you might want to mention GRDDL (although I don't know whether it's used)
- I'm not sure the microdata discussion is in scope, esp. given that microdata is nowhere close to Rec status... there's nothing here specific to libraries
- need reference for 'resource oriented architecture'
- Is Drupal your only example of a CMS? Would be nice to at least mention a 2nd one.
- Re 'web services for LLD', why would you *want* to refactor API capabilities using the LD stack? You're assuming too much knowledge on the part of the reader
From Romain's mail:
- Comment: We are talking about building structure in Web content, so that data from the Web can be used by machines, the way it would be in databases.
- Suggestion Building a « Linked data » infrastructure does not imply to create yet another silo.
- Microformats, Microdata and RDFa
  - Comment: Linked Data can go one step further from the work that has been done, for instance for OAI sets.
  - Suggestion RDFa can be a step for using existing information by distilling it into a Web structure.

Implementation challenges

Ray: on 2011-06-20: detailed in-line suggestions at http://www.loc.gov/standards/sru/w3clld/
- summary by Jodi, with many of the suggestions reflected in the draft - see diff -- unclear if all points in Jodi's summary have been addressed
Lars: review
comments
- 2011-07-20 Jennifer Bowen: My general impression of this section is that it fairly accurately describes the difficulties that libraries may face to adopting linked data (although some statements may be a bit too sweeping at times – I agree with some of the other comments below). However, my concern is that these challenges and barriers are presented without any attempt to suggest possible solutions to them. Since there are indeed many challenges to the library community, the recommendations presented later in the report do not seem to be adequately justified by the content of the report. In other words, the way the report reads right now, the challenges may outweigh the benefits. I do not believe that is what the report intends to convey, and that is not what is SHOULD convey. I recommend that some of the sections below include at least a brief discussion of possible steps to mediate these challenges and barriers, otherwise the whole situation just begins to seem pretty hopeless. Making the benefits section at the beginning more compelling will also help considerably. I will add other suggestions below where additions could be made.
- 2011-07-22 Alan Danskin: This whole section has a rather negative tone. Libraries are aware of the need for change. Linked data is one of the directions that change might take, if the benefits can be demonstrated, but as the section makes clear the challenges are considerable.
- 2011-06-28 Karen on "decreased budgets": I actually think that the picture is brighter than this. Although libraries haven’t been technology leaders, they have embraced new technologies to the benefit of their communities, providing free Internet access, lending ebooks (even before they were popular). This is separate from the struggle to manage the flood of digital content. There is another issue, which is that managing digital content might be better done on a scale that is larger than any one library, while managing physical items is suitable to local institutions. There are tens or hundreds of thousands of libraries, many very small. Digital materials need to be managed globally, not locally, and there is no global library organization to do this.
- 2011-07-20 Jennifer Bowen: One of the conclusions that I would make from this is that libraries can derive great benefit from linked data to try to address this situation of decreased budgets and inability to extend their missions to include digital information. Libraries have a great NEED for linked data, and this paragraph explains why.
- 2011-07-20 Jennifer Bowen on cooperative metadata creation: More should be made here (or somewhere in the report) about how the strong cooperative culture that is now present in the library community can be an asset for implementing linked data: use of common vocabularies and standards, consistency of metadata, structures in place to mobilize community action toward a shared goal…
- 2011-07-20 Jennifer Bowen on huge cultural change: Are there any signs that this is beginning to change, where libraries are beginning to interact with these other communities? Cite some examples here? How about the mere existence of this incubator group?
- 2011-07-18 Jody DeRidder on "understaffed": There appears to be a cultural prejudice against software developers in at least some segments of the library culture. Such work is seen as tasks for underlings, and hence the pay scale for programmers in libraries cannot compete with the commercial market. Those in higher ranks in the library are often expected to set aside programming work and not “get their hands dirty.” Programmers may not get respect from librarians, particularly if the programmers do not also have librarian degrees. Continued cutbacks to library funding also reduces the ability to hire decent programmers. These issues combine to keep many libraries in the backwaters of technological development.
- ...to which Jennifer Bowen replies: An important point. We have discovered this first-hand working on the eXtensible Catalog – programmer salaries in libraries cannot compete within the marketplace. This deserves mention in the report.
- 2011-07-20 Jennifer Bowen on "understaffed in key areas...": The sentences on library workers do not follow logically one from another. I would like to see this paragraph suggest possible ways to change the way that library workers are educated, or provide continuing education in linked data. Much has been happening in that arena over the past year or so.
- ...and: This section deals with libraries being understaffed technologically, so the section on library leaders should probably address the problems that library leaders have in employing technology staff. The points that are made here about libraries taking leadership in LLD should perhaps go in a separate section. I suggest that this also include discussion of how some library organizations are now exploring what actions to take regarding LLD (ALA and the Program for Cooperative Cataloging are two examples) and that what is needed is advice and leadership from outside the library community, to enable library leaders to know what specific steps need to be taken and to make informed decisions. That process is already beginning.
- 2011-06-29 Karen on "libraries do not adapt well": This has been criticized as being not only negative but not really true. Is there another way to say this? I think it is mainly about libraries having trouble being on the leading edge and generally having a hard time changing.
  - 2011-06-28 Catherine Jones: While I appreciate the fact that this section is in a larger one about Barriers to adoption, I do feel that the heading is overly critical. I think it would be fairer to say that Libraries are no longer early adopters of new technology for the parts of their service which they consider to be business critical – partially because of the issues of retro-conversion of the collections that they already hold and partially because they are service providers who need to ensure that the service continues to run.
  - Karen: The paragraph doesn’t really match the heading. The paragraph talks about the lack of library-related LD tools. Maybe that is an issue on its own?
  - 2011-07-17 Roy Tennant: That libraries “do not adapt well to technological change” is debatable, and largely orthogonal in any event to whether libraries are using linked data. The crux of the problem to me is the old “chicken and egg” problem. Libraries won’t use linked data until/unless it solves a need. Right now it doesn’t, or at least we lack the tools to make linked data effective in a library environment. Frankly, I don’t see any killer apps out there in any industry, which inhibits adoption in any industry, and even more so in libraries which are organizations of limited resources.
  - 2011-07-18 Laura K: I think linked data solves a fairly large need which tends to be overlooked: Making library metadata interoperable with the rest of the web, and with other networked information. I don’t necessarily think this is a situation where an application is going to pop up that makes people see its usefulness, but one where these ideas need to be taken into consideration when we think about how to reconstruct bibliographic metadata. How do we implement these ideas as we re-work (or get rid of) MARC?
  - Jennifer Bowen: If the statement about the library community only engaging with established technologies is allowed to stand, then there needs to be some explanation of WHY that is the case.
  - Jennifer: The statement that there are “no tools that specifically address library data” is a bit strong. I suggest at least a mention of emerging tools, such as the eXtensible Catalog, which will help to make library data “linked data ready”, if not (currently) to create true linked data yet.
  - Karen: Jennifer, I would gladly add XC but (and I just checked) there is no documentation that demonstrates that it produces LD. The only documentation that I can find talks about MARC and FRBR, but there is nothing on the record format or serialization. That information has to be public and open before a service can be included in the report. Your ontology needs to be open access on the Web in RDF format. If it is, please give a pointer.
  - 2011-07-18 Laura K on "long-term view": I don’t think this heading is entirely clear. Meaning that standards should also be considered as something that should last a long time? That standards take a long time to be developed? That we need to start thinking about how to preserve digital objects and web-based objects with new standards?
  - 2011-06-29 Karen: I think it would be worthwhile to talk separately about the issue of having iterative standards with proof-of-concept development, and the issue of the time lag imposed by the meeting cycles. I also think somewhere we should mention the variety of standards fora — IFLA, national fora, NISO, and the recent awareness of non-library fora, like W3C.
- 2011-06-29 Karen on "bottom-up": I think it would be worthwhile to talk separately about the issue of having iterative standards with proof-of-concept development, and the issue of the time lag imposed by the meeting cycles. I also think somewhere we should mention the variety of standards fora — IFLA, national fora, NISO, and the recent awareness of non-library fora, like W3C.
- 2011-06-28 Catherine Jones on "standards are limited": While I agree that cataloguing standards were designed to exchange data between libraries; I’m not sure that I would agree that bib. exchange with publishers is new and not accepted. While libraries may not use individual publishers – it is common practice to get bib records from your book supplier. This also doesn’t address journals – our institutional repository uses Cross-Ref to look up the DOIs of journal articles to enahance the information recorded about the publication in the IR.
  - Laura K: Perhaps if this was broadened to talk about the fact that we share metadata within our supply chain, for lack of a better phrase (publishers, indexing and abstracting services, etc), but not frequently with organizations outside of the traditional information world.
  - Jennifer Bowen: This paragraph could use some clarification. Who are the “few” in the last sentence? People within the library community? I assume the bibliographic data that needs “smarting up” is meant to be data from outside the library community that would be enriched with data from the library community? This is not clear.
  - Adrian: +1
  - Alan Danskin: “While the Web values global interchange between all parties, library cataloguing standards in the past have aimed to address only the exchange of data within the library community where the need to think of broader bibliographic data exchange (e.g. with publishers) is new and not universally accepted.” This is not a new issue. Libraries and publishers have different business models, which are reflected in their development of different standards for exchange. Publishers think of publications as products; libraries are concerned with inventory of their collections and the content of publications. The granularity of open linked data may provide an opportunity for a fresh look at what could be shared for mutual benefit. However publishers, as well as librarians, may regard metadata as a commodity to be restricted.
  - Adrian: “the need to think of broader bibliographic data exchange (e.g. with publishers) is new and not universally accepted” I suggest adding scholars to the brackets as an example of communities with which data exchange and interlinking would be very fruitful for academic libraries.
- 2011-07-18 Jody DeRidder on "ROI": Difficult, but perhaps not impossible. A test implementation could be used as a basis for feedback and user testing. Measurement should be made of experienced researchers as well as undergraduate students, in comparison with the same site unmodified. If indeed we can measure improved research capabilities, speed, and discovery, we will have built a case for expanding this effort.
  - Jennifer Bowen: I would like to see an acknowledgment of other measures of success for linked data other than those that can be calculated, in particular the ability of libraries to meet the needs of their users. The success of this can best be studied using other methods, such as participatory design, as described in the recent book, “Scholarly Practice, Participatory Design and the eXtensible Catalog” http://www.alastore.ala.org/detail.aspx?ID=3408.
  - Karen: For non-profits and other service organizations, ROI includes intangible benefits like “making society better.” The non-profit management literature addresses this. So we should assume ROI to include those “less tanglibles.”
- 2011-06-28 Catherine Jones on "niche": This is true – but one could say this of most disciplines, for example scientific instruments producing data in a certain format needs specialist, niche systems solutions. What is the special issue about Library systems in particular?
  - Laura Smart: Catherine is correct in that it’s true in other disciplines. I question, however, to what extent this is true in library systems. Database work is database work no matter how the element sets are structured. I think most commercial systems probably use ER modeling/diagramming when creating their systems, those systems are often built on commercial DB (ex. Innovative ILS implementation can be Oracle-based) or Open Source DB (MySQL), and who really knows what type of programming tools and paradigms are being used behind the proprietary wall. I think vendors like VTLS and Ex Libris are probably using agile development techniques.
  - Jennifer Bowen: But there are ways to address this issue, by providing tools that enable a smooth migration process for libraries to begin using linked data while continuing to use these niche systems. What is needed are cost-effective strategies for moving libraries forward..
- 2011-06-28 Laura Smart on "vocabulary changes": “the library metadata record, being designed primarily as a communication format, requires a full record replace for updates to any of its fields.” Not true. It’s possible to overlay specific fields while doing global updates in an ILS. It is true that it is costly, however.

The process of propagating the need to do updates is what’s expensive. LC changes a subject term or the NAF changes an authority heading, they have to spread the news that the change is made, and then local databases have to do the global updates. It suffers a time delay in addition to the monetary cost. This process can be automated and/or out-sourced but it still has its price.

- - Laura K: This is costly when metadata needs to be changed in many local records across thousands of libraries; if metadata were in a centralized database and linked to by library records, however, the vocabulary changes would only need to occur in one place, thus saving costs in the long run.
- 2011-07-21 Jennifer Bowen on "changes are costly": On the one hand this can be seen as a barrier for libraries to participate in linked data. On the other hand it represents an area where linked data could be a huge improvement for libraries in terms of managing such changes using a different infrastructure (registries, etc.)
- 2011-08-02 Adrian on "rights": The heading seems to assume that linked data necessarily means open data. This isn’t the case as you can publish data as RDF without an open license or without any license at all (as several organisations do) or as you can even do linked data in an intranet. Also, you can publish linkable data and let it be linked to and then establish a paywall around the data. In general, the report lacks a clarification regarding the terms “Linked Data” vs. “Open Data”. I suggest adding a paragraph or section to the report which clarifies the two terms, “open data” being about open access, open standards and open licenses in the first place and “linked data” being about a specific set of standards or best practices for publishing data on the web recommended by the W3C. An important aspect of open data is legal compatibility of data while linked data deals with technical compatibility of data.
- Karen on "published openly": It may be useful, however, to say that there is some valuable information to be gleaned from these private areas, like overall circulation statistics for individual titles. Scrubbing the data of any personally identifiable information adds cost to these projects. Privacy is essential and should not be compromised, but it has an impact on projects.
- Jennifer on "rights ownership": It seems to me that more detail is needed here about the issues with data sharing and the history of cooperative cataloging using centralized databases. This is so brief that it seems to be skirting around the issue. Perhaps just an additional sentence or two.
  - Adrian: I think this paragraph has to be fundamentally changed or even omitted. It implicitely argues that individual records are copyrighted. Much speaks for individual records aren’t copyrighted at all and that, thus, nobody owns any rights on them. At least in Europe you only have the related database right on collections of records. I believe the legal status of records is quite clear (not copyrighted), at most this is a grey area. The report shouldn’t speak in favour of the view that individual records are copyrightable.
- Jennifer on text strings: Although some work has been done to try to change this situation and some progress has been made. (e.g. MARC21 subfield zero) It seems misleading to me to not include some mention of efforts to get around these limitations.
  - Karen: Jennifer, can you give examples? I’m not sure what you’re referring to.
- Laura Smart on "shared terminology": I think the web community does have a concept which is equivalent (or at least quasi-equivalent) to libraries headings or authority control. It’s the concept of “unique identifiers.” It is true that the communities don’t share a common language or vocabulary but I don’t think it’s true that they don’t have concepts in common.
  - Karen: I agree with Laura that, if we investigate fully, we may find that we have more in common than it appears on the surface. An advantage to that investigation would be that it would require us to clarify our data goals in new terms; we might learn something from the exercise.
- Jennifer on "fluency in language of RDF": Discussed where below? This paragraph is very intriguing and deserves more attention. It seems related to the migration strategies in the Recommendations. Coming up with a plan for making these two paradigms coexist will be extremely important for the success of LLD.
From Jonathan's review, about costs :
- The whole social issue is ignored in the benefits section. Using someone else's linked data is a liability (cost): they might go offline, or they might decide to change the format. Unless you have a contract of some kind there is no protection against these. Similarly, if you make a promise to the community around uptime, updates, or stability, that is a liability to your organization. So LD in the long run, as part of infrastructure design, can have a huge cost due to its social fragility. To recover this cost requires fame or thank-you notes or citations that you can bring to your trustees, or fee for service.
- Cost of coordination is also ignored. Sure, one can unilaterally do RDF design and publish, but this can lead to lost opportunities if it's not exactly what a partner needs, or if it's incompatible with information coming from a similar but independent source. Without coordination, RDF really gives little benefit over XML. These are things that library administrators need to know. By being skeptical and transparent yourself, you'll gain credibility.
From Jonathan's mail, again :
- The whole section 'implementation challenges and barriers to adoption' comes off as a criticism of library culture: "resists change"... "out of step"... "understaffed"... "do not adapt". This is unnecessary and seems like biting the hand that feeds you. It's possibly even alienating and counterproductive. Yes, we know that libraries are endangered, and that this is partly because they haven't found their place on the web, partly because they don't innovate, and so on. But libraries have good reasons to be conservative and these have to be respected. You're proposing an innovation that *has a cost* that you don't acknowledge. You have to show that compared to the null hypothesis (status quo) the benefit will outweigh the cost.
  - So please rephrase in positive terms. LD "can reduce the cost of innovation"... "create new economies"... "help make better use of scarce staff"... "help libraries take advantage of new technological opportunities"... and so on. This section should again be about costs and benefits: what will libraries need to do, *if* they choose to go down this path. They would need to broaden the set of vendors they work with, train staff, interact with other communities, etc. Costs + benefits in each case. Overall, the goal is to help to enable informed decisions.
  - If LLD is worthwhile, then ways will be found to use it. If not, then it *shouldn't* be adopted.
  - How this is written depends of course on the audience. If the audience is people who have already decided they want to do LD, and the idea is to help them push it through their organizations, that's a different pitch. But as I said what I think you want is a document aimed at the skeptical but open-minded reader.
- 'web communities' - I don't think there's any such thing; please specify
- The discussion of bottom-up standards needs some explanation. Definition and examples. I don't really understand why you'd say HTML5 isn't bottom-up; most of what's in it has originated with some single browser ("bottom") and then gone "up". I think you're trying to say that bottom-up is the norm for RDF (with a few exceptions such as RDFS), for some definition of bottom-up.
- "Library standards are limited" - examples would help people like me who are not immersed.
- ROI - you're trying to talk about costs of the status quo separately from its benefit and from the costs and benefits of LD. I think this will in the end mean you are talking about the same issues in multiple separated sections of the document. I would prefer a structure more like the following: for each issue, explain what problem is to be solved, how it is solved by the status quo (how successfully and at what cost), and how it is solved using linked data (how successfully and at what cost).
- Re data rights issues, these are important enough that I think they should be summarized in your report, even if you have a citation to a more comprehensive document. (Factual information not protected in US, sui generis database rights in Europe, etc.)
- "Cultivate an ethos of innovation" - again you're somehow assuming that an organization's scarce innovation dollars should go to LD instead of to something else. That this argument has to be made, should be admitted.
- "Assign unique identifiers" - you're glossing the issue of cost and responsibility here. We tried to address this in life sciences with the shared names project, which has yet to kick in due to lack of attention and funding. This is really hard because maintenance of URIs in perpetuity is both difficult to understand and a hot potato.
- The problem of domain name loss and/or loss of service for a domain name is touched on, and that's good. Backup copies are great, but then tools need to be able to get at them after the primary is lost. It's probably too early to specify just how this should happen (XML catalogs? Memento?) but the problem needs to be acknowledged as something we'll have to face (and pay for) in the future.
From Romain's mail:
- Implementation challenges and barriers to adoption
  - The whole section is clumsy because it makes no difference between various situations. We can find more or less advanced projects: as the ?use case? section shows, libraries can be very innovative. http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport
- Designed for stability, the library ecosystem resists change
  - Comment: The library ecosystem has been changing since Zenodotus. Semantic Web techniques are different from traditional computer services, and budgets are not on a comparable basis. Furthermore, today libraries data are digital data and it?s not necessary to program retrospective conversion of printed catalogues. Data are already digital data, structured with digital formats. The historical depth of the libraries and librarian data is a very important asset in the frame of the semantic web, for which the notion of trust is essential. Libraries improve the quality of their data by constant revisions.
  - Suggestion Even if designed for stability, the library ecosystem moved early to computer systems and keeps adapting to technological changes.
- Library Data is shareable among libraries, but not yet with the wider world
  - Comment: Librarians often work, for instance, with the archival community. For instance, XML DTD EAD (Encoded Archival Initiative) was jointly created by librarians and archivists in order to encode descriptions of archival collections.
  - Suggestion Through cooperation with Archives and Museums, libraries already share data and standards with a ?Wider world?. Moving to Linked Data is a natural continuation.
- Libraries are understaffed in the technology area
  - This part is overstrong and rude to libraries who actually recruit and work in the technology area.
  - Suggestion It is not just a matter of recruiting ?IT people?, but of training librarians so that they are aware and efficient in Web technologies, and making sure Computing departments and librarians work together. This is what libraries do.
- Libraries do not adapt well to technological change
  - Comment: Libraries will need to manage the legacy of MARC format-based data for a long period of time even if they manage to shift to LD strategies and tools for their current practices. This means that before enjoying all the benefits of LD (listed in the scope document), libraries will need to maintain parallel systems, which means an increase of costs and efforts in software and format development and in data management. In the short term library developers will still have to deals with these formats, which are renewed.
  - Suggestion When convincing examples are shown, Libraries adapt very well to technological changes.
- Library standardization process is cumbersome
  - Comment: But possible! Libraries are used to transform their formats, to map them with other formats, to make them evolve when they work on new projects, new technologies, and new types of documents.
  - Suggestion It takes time, so that the formats fits to the need, but it is part of the libraries? culture.
- Library standards are limited to the library data
  - Comment: Library data are not only bibliographic data. Libraries catalogues contains also authority records with many pieces of information about persons, families, corporate bodies, works, and subjects. Authority data provide nominated entities and may provide permanent identifiers for these entities (such as ARK identifiers in BnF catalogues).
  - Suggestion With reliable identifiers, Authority data are also key elements for the semantic web.
- ROI is difficult to calculate
  - Comment: Benefits are as difficult as cost to estimate precisely, but some can and must be underlined. Mutualisation of the creation of data reduces redundancies, increases staff efficiency, and allows librarians to focus on other tasks like research on collections or conservation. Linking the data of a library to cooperative metadata produced by reliable institutions adds value to its data. Opening library linked data may create economical value for a country, by allowing commercial reuses of that data (Open data). Opening library data increases the users traffic and the visibility of collections (through reuse, SEO, etc.), and thus the possibilities of their ROI. Using richer, more flexible, more relevant data improves the accessibility and the services to users: in public institutions, public utility is a ROI by itself. Helping researchers is another one.
  - Suggestion It is difficult to calculate ROI precisely, but it is easy to see financial benefits (re-use, links, cuts of redundant tasks).
- Vocabulary changes in library data are costly
  - Comment: With an Authority File providing permanent identifiers and links, it is relatively easy to update any field linked with it. All changes in authority records can be automatically transferred into related bibliographic records.
  - Suggestion Moving to linked data implies to rely on authority files and identifiers.
- Some data cannot be published openly
  - Comment: In some countries, there is a distinction between ?public information? and ?information that can be processed by machines?. In that case, information that is available for individuals needs to be justified and declared for massive use in computer programs.
  - Suggestion There can be national specificities. They have to be clearly stated by the publishers.
- Rights ownership can be unmanageably complex
  - Comment: Copied and extracted records are one thing. There is also a question about the ?linked data itself?. The need to quote also means, for the provider, being able to report about the use. In some countries (including France) the use of the tax-payer?s money has to be justified. You have to report for the money: the only way to do it is to have metrics. This implies knowing who is using the data, even for free.
  - Suggestion Thanks for feedback and quoting if you use our data!

Draft_recommendations_page

Kai: review, ack'd
Kim: sent as part of his general review
Jeff: review
Lukas Koster comments: comments
- ack'd Karen Coyle [4]
comments on Recommendations
- Patrick Danowski: lengthy comments reposted to list (check)
- Alan Danskin: The British Library welcomes the work of the work of the W3C incubator group on library linked data. The British Library has been experimenting with the practicalities of expression the British National Bibliographic as linked data and our comments draw on this experience. The report should substantiate its assertions regarding the value of linked data more explicitly. It would be instructive include examples of the benefits derived by other communities.
- J. McRee Elrod: use "web" or "Web" consistently
- Alan Danskin on "identify sets": Agree value of authority files as LOD. Release of national bibliographies as LOD also has potential to generate a lot of data without excessive duplication and could provide the hooks for holdings of individual libraries.
- J. McRee Elrod: Would it not be helpful to spell out acronyms at first use in each section, e.g, “ROI”, as is done for “URIS” at 38?
- Alan Danskin: Statistical analyses of redundancy in current metadata processes may identify a lot of waste, but comparing well established standards with emerging standards is complex. Measuring linked open data against current processes will be difficult. A further complexity is that there is not agreement within the LOD community on significant issues: such as types of persistent identifier to be preferred; application of RDF model: there are differences of opinion concerning class/property; use of literals; use of blank nodes. This makes engagement with LOD complex, confusing and costly
- Alan Danskin on migration plans: BL will publish experience of converting BNB from MARC 21 to LOD. In principle, BL is also open to publishing information on the tools employed and where possible, the tools themselves. The tools are only part of the equation; the expertise necessary for their effective deployment should not be underestimated. BL’s experience certainly confirms the expectation that this is an iterative process. http://www.bl.uk/bibliographic/datafree.html
- Alan Danskin on "foster a discussion": BL endorses the finding that any restrictions on reuse of metadata inhibit value as linked data
  - Ed Chamberlain: There are also issues of concern around attribution licenses and linked data.

Attribution only works and thus has practical value at a dataset level. Given the composite nature of RDF, any single triple could be referenced or reused by another record or service. Attribution does not work practically in this context.

- Jody DeRidder on "participation": I honestly think this is the best way to get the library culture to buy in to linked data implementations and the semantic web. If librarians become part owners of the process, claiming their role as information professionals, staking their interests in assisting controlled vocabulary mapping and development and improving access to patrons… then we’ll make progress. You may need to frame this in terms of the Ranganathan 5 laws, modernized:
  - 1) Information is for use.
  - 2) Every user his/her information.
  - 3) Every bit of information, its user.
  - 4) Save the time of the user.
  - 5) Information access is a growing organism. Assisting in getting the right information in the right hands at the point of need: isn’t that what librarianship is all about. Returning to the basic focus of our profession may help to sell the change.
- Alan Danskin on "translate library data": BL experience illustrates that issues such as identification of the real object distinct from the concept of an object are still very much alive. It seems prudent while such fundamental debates remain unresolved to err on the side of caution and identify both separately. Real work is needed on use cases to illustrate that identification of the real object is sufficient for all needs, not just library requirements.
- Alan Danskin on "create URIs" and "create explicit links": agree
- Alan Danskin on "develop best practice": BL model is previewed at: http://www.bl.uk/bibliographic/datafree.html
From Kim's review
- To make the report more to a call for action, perhaps each task heading should be beginning with the word: "Task [number]", e.g. "Task 1: Identify sets of data ...", "Task 2: For each set of data..."
- "Consider migration strategies": "A full migration to Linked Data for library ..." -- Is Linked Data the future (for this Century)?
- "A plan must be drawn up that stages activities" ... could such a plan be sketched in this document?
- "Increase library participation in Semantic Web standardization", two comments:
  - should "Semantic Web" be replaced with "Linked Data" to avoid confusion?
  - vice versa also: Increase Semantic Web (or Linked Data) participation in Library standardization?
- "Translate library data, and data standards, into forms appropriate for Linked Data": "translators of library standards should involve Semantic Web experts" => does this work? In my opinion, it sounds slightly bad if you must ask for help from some specific technology -- why not just learn yourself what is needed?
- subsection "Assign unique identifiers (URIs) for all significant things in library data" -- the paragraph could be longer with some more details and argumentation
- "Create URIs for the items in library dataset" -- this subsection could be shortened + how about URNs?
- minor comment: "Prepare" and "Design" sections contain both the idea of "design patterns" with a quite similar description. Perhaps the idea of design patterns should moved to a "General" section?
- typo: "Commit to best-practice policies...", 1st paragraph: "and efficiency. quality assurance..." (replace the dot with a comma?)
- "Identify tools that support the creation and use of LLD", 1st paragraph: "URI generator" -- what about URNs?
- This sounds very bad from "sales point of view": "Much the content in today's Linked Data cloud is of questionable quality" -- why should our library put our beloved content to this Linked Data dumpster? :)
- in "Preserve Linked Data vocabularies": "Linked Data will remain usable twenty years from now only if its URIs ..." -- how about the whole Century?
From Romain's review
- Identify sets of data as possible candidates for early exposure as LD
  - Comment: Structured data rely on the use of identifiers. Publishing early authority files and controlled vocabularies as linked data will make easier further publication of bibliographic records as linked data, by allowing links to them as a backbone for bibliographical information.
  - Suggestion Authority files can be a basket for the "low hanging fruits" from other libraries.
- For each set of data, determine ROI of current practices, and costs and ROI of exposing as LD
  - Comment: Determining costs and ROI of exposing sets of data will help choosing witch value vocabularies and datasets could have priority. Therefore, determining ROI has to be done globally.
  - Suggestion Not necessarily ?for each set of data?.
- Consider migration strategies
  - Comment: Using Semantic Web technologies inside the library ?catalogue? seems very promising, because it will allow a very more flexible and interoperable use of data: modelling, linking, merging, querying, removing redundancies, integrating external data from various formats and publishing as various formats, etc. This is obviously a great aim for libraries, but it is much more difficult than only publishing data as linked data. It must not be an obstacle: it may be better for a library to publish first some sets of data as liked data than trying from the beginning to migrate its entire catalogue. Therefore, the migration of data does not need to cover all possible data. It can be only the useful part. This is obviously the case when commercial services use RDFa for SEO, with the subset of products which people will be looking for. In fact, when we are just putting data into RDF, it is not useful if there are no links.
  - Suggestion Libraries can ?pick and choose? what is relevant and migrate it.
  - Suggestion Using RDF inside the systems themselves is another question that has to be advocated.
- Identify Linked Data literacy needed for different staff roles in the library
  - Comment: In fact, when using the current datasets so as to use them in RDF, we see that cataloguing still has to address the creation of links. Mainly for reconciliation and alignments of concepts (for instance: ?do those two books tell the same story??). There, the data obviously still needs to be curated by humans. But by re-using links and data produced by others, we can expect the cataloguing work to be:
    - more centralized;
  - more about creating links (less about writing dates, names or page numbers?).
  - Suggestion These evolutions have to be clear on the business side.
- Identify and Link.
- Create URIs for the items in library datasets?
  - Comment: Providing identifier is the only way to make links. For big libraries permanent identifiers are already being used (e.g. ARK identifiers for all resources at the BnF).