Difference between revisions of "Vocabulary and Dataset"

From Library Linked Data
Jump to: navigation, search
(Published Datasets)
(Introduction: Scope and Definitions)
Line 13: Line 13:
In previous [http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Definitions library terminology explanation efforts], we have identified the following types of resources of interest, which as shown later are non mutually exclusive:
In previous [http://www.w3.org/2001/sw/wiki/Library_terminology_informally_explained#Definitions library terminology explanation efforts], we have identified the following types of resources of interest, which as shown later are non mutually exclusive:
* '''Metadata element sets''' or '''element sets''': A metadata element set defines classes of entities and attributes (elements) of entities. In the linked data terminology, such element sets are materialized through (RDF) schemas or (OWL) ontologies, the term "RDF vocabulary" being sometimes used as an umbrella for these. Usually a metadata element set does not define bibliographic entities, rather it provides elements to be used by others to describe such entities.<br /><br />''Examples'':
* '''Metadata element sets''' or '''element sets''': A metadata element set defines classes of entities and attributes (elements) of entities. In the linked data terminology, such element sets are generally materialized through (RDF) schemas or (OWL) ontologies, the term "RDF vocabulary" being sometimes used as an umbrella for these. Usually a metadata element set does not define bibliographic entities, rather it provides elements to be used by others to describe such entities.<br /><br />''Examples'':
** Dublin Core defines elements such as Creator and Date (but DC does not define bibliographic records that use those elements).  
** Dublin Core defines elements such as Creator and Date (but DC does not define bibliographic records that use those elements).  
** FRBR defines entities such as Work and Manifestation and elements that link and describe them.  
** FRBR defines entities such as Work and Manifestation and elements that link and describe them.  

Revision as of 08:53, 7 July 2011

LLD Vocabularies and Datasets

Editors: Antoine Isaac, William Waites, Jeff Young, Marcia Zeng.

This page is a draft for the side deliverable on "LLD Vocabularies and Datasets" as penciled in this plan.

@@TODO: for general TODOs see the Discussion page


Introduction: Scope and Definitions

This document is an attempt to identify a set of useful resources for creating or consuming linked data in the library domain. It is intended both for novices seeking an overview of the library linked data domain, and for experts in search of a quick look-up or refresher. As our incubator reminds in its its recommendations, the success of linked data in any domain indeed relies on the ability of its practitioners to identify, re-use or connect to already available datasets and data models. Library Linked Data is not an exception. Far from it, the complexity and variety of library data resources, many of them already available as linked data at the time of writing this report, makes such an identification effort crucial.

In previous library terminology explanation efforts, we have identified the following types of resources of interest, which as shown later are non mutually exclusive:

  • Metadata element sets or element sets: A metadata element set defines classes of entities and attributes (elements) of entities. In the linked data terminology, such element sets are generally materialized through (RDF) schemas or (OWL) ontologies, the term "RDF vocabulary" being sometimes used as an umbrella for these. Usually a metadata element set does not define bibliographic entities, rather it provides elements to be used by others to describe such entities.

    • Dublin Core defines elements such as Creator and Date (but DC does not define bibliographic records that use those elements).
    • FRBR defines entities such as Work and Manifestation and elements that link and describe them.
    • MARC21 defines elements (fields) to describe bibliographic records and authorities.
    • FOAF and ORG define elements to describe people and organisations as might be used for describing authors and publishers

  • Value vocabularies : A value vocabulary defines resources (topics, art styles, authors) that are used as values of elements in metadata records. Typically a value vocabulary does not define bibliographic resources such as books but concepts related to bibliographic resources (persons, languages, countries, etc.). They are "building blocks" with which metadata records can be built. Many libraries require specific value vocabularies as mandatory for selecting values for a particular metadata element. A value vocabulary thus represents a "controlled list" of allowed values for an element. Resources that can be considered as value vocabularies include: thesaurus, code list, term list, classification scheme, subject heading list, taxonomy, authority file, digital gazetteer, concept scheme, and other types of knowledge organisation system. Note however, that value vocabularies often have http URIs assigned to the value, which would appear in a metadata record instead of or in addition to the literal value.

    • LCSH defines topics of books
    • Art and Architecture Thesaurus defines a.o. art styles
    • VIAF defines authorities
    • GeoNames defines geographical locations (e.g. cities).

  • Datasets : A dataset is a collection of structured metadata -- descriptions of things, such as books in a library. Library records consist of statements about things, where each statement consists of an element ("attribute" or "relationship") of the entity, and a "value" for that element. The elements that are used are usually selected from a set of standard elements, such as Dublin Core. The values for the elements are either taken from value vocabularies such as LCSH, or are free text values. Similar notions to "dataset" include "collection" or "metadata record set". Note that in the Linked Data context, Datasets do not necessarily consist of clearly identifiable "records".

    • a record from a dataset for a given book could have a Subject element drawn from Dublin Core, and a value for Subject drawn from LCSH.
    • the same dataset may contain records for authors as first-class entities that are linked from their book, described with elements like "name" from FOAF
    • a dataset may be self describing in that it contains information about itself as a distinct entity for example with a modified date and maintainer/curator elements drawn from Dublin Core

We do not aim here to draw a complete list of the various resources related to the (library) linked data "cloud". As said, this report is rather intended as an entry point for practitioners to find, understand and explore some exemplar resources. It is especially grounded by the cases our incubator group has gathered. We hope it will prove an inspirational complement to more complete listing tools such as Semantic Web search engines, like Sindice or Falcons, or registries such as the Metadata Registry or CKAN -- we of course encourage our readers to also use these, just as we did ourselves for the CKAN dataset registry.

Datasets and Value Vocabularies


CKAN is a metadata registry for datasets. It is a tool for people to share information about datasets of all types and collaboratively describe them. The CKAN registry is not itself a linked-data service however there is a linked data version for the information it contains. Many of the datasets described in CKAN are in linked-data form.

CKAN has the concept of curated groups of datasets and is used to maintain information about membership the wider LOD Cloud as well as the subset that pertains to Libray Linked Data. The curators of these groups have arrived at a set of conventions for using the tagging facilities in CKAN to describe datasets that are to be included. This includes information about dataset size, example resources and access methods (e.g. SPARQL endpoints) and, crucially, links to other datasets.


When publishing a new dataset, adding it to CKAN means that it is included in a frequently consulted list of datasets. Following the conventions of the LOD Cloud and LLD groups means that its relationships to other datasets are documented and that it will be counted amongst the growing number of linked data corpora and appear in diagrams and visualisations that are produced as part of the study of this type of data. Having such datasets documented in this way means that we can build tools to gain a greater understanding of their nature and how they fit together. Whilst interesting in itself, this process is important in that this kind of understanding makes it easier to determine if a particular dataset is suitable or appropriate for a given task and thus makes it easier to use.

To illustrate an example of the results of this process, consider the diagram below,


Original at: http://semantic.ckan.net/group/?group=http://ckan.net/group/lld

The brightly coloured circles represent the datasets that are part of the LLD group. They grey circles represent datasets that they are connected to but are not members of this group (they typically are members of the LOD Cloud group). The size of the circles and the thickness of the lines are related to the size of the dataset and the number of outward links (logarithmic) respectively. It is immediately apparent that though there are some densely connected clusters of datasets in LLD the majority are actually actually connect through datasets that are not necessarily library data in themselves -- DBPedia and Geonames figuring prominently. It is also apparent that linking to other datasets that do not have this central character is quite common.

Published Datasets

@@TODO: Just before the final delivery of this document, we will add here a snapshot of the CKAN LLD group. I.e., a simple bullet list that sums up the packages available there, with direct pointers to these.

Published value vocabularies

This section describes value vocabularies, which have been made available as linked data and/or mentioned as being relevant by one of the LLD XG cases.

Every entry features a brief introduction to the vocabulary, as well and links to their locations. Cases collected by the LLD XG are also listed under each entry, when they refer to the value vocabulary.

Library of Congress Subject Headings (LCSH)

LCSH is a comprehensive list of subject headings published in print and as linked data. Subject authority headings can be accessed through Library of Congress Authorities.

MARC Code List of Relators (MARC Relators) (also in element sets)

The MARC Relators provide list of properties for describing the relationship between a name and a bibliographic resource.

VIAF (Virtual International Authority File)

VIAF is a joint project of multiple national libraries in the world which virtually combining the name authority files of participating institutions into a single name authority service. As of the winter of 2011, there are 21 authority files of personal, corporate, and conference names from 18 organizations participating in VIAF. [1]

Union List of Artist Names (ULAN)

ULAN is a structured vocabulary containing more than 225,000 names and biographical and bibliographic information about artists and architects, including a wealth of variant names, pseudonyms, and language variants.

It is not yet published as linked data per se, but appears in [2].


The GeoNames geographical database contains over 10 million geographical names and consists of 7.5 million unique features whereof 2.8 million populated places and 5.5 million alternate names. [3]

Dewey Decimal Classification (DDC) summaries

Dewey Summaries is a suitable data set containing the top classes of Dewey Decimal Classification (DDC) 22. It provides access to the top three levels of the DDC in eleven languages and access to Abridged Edition 14 (assignable numbers and captions) in three languages.

Universal Decimal Classification (UDC) summary

The Universal Decimal Classification (UDC) is a multilingual classification scheme for all fields of knowledge. The UDC Summary represents a selection of around 2,000 classes extracted from the UDC scheme. [4]


DBpedia extracts structured information from Wikipedia. The DBpedia data set features labels and abstracts for over three million things, with a half of them classified in an ontology, and contains millions of links to images, external web pages, and external links to other RDF datasets. [5]


RAMEAU is a subject heading vocabulary used by the French National Library. It has been developped starting from the subject heading repository of the Quebec University, being derived itself from the Library of Congress Subject Headings (LCSH). RAMEAU has been published as linked data by the TELplus project.

SWD (Schlagwortnormdatei)

A controlled vocabulary system managed by the German National Library (DNB) in cooperation with various library networks. The inclusion of keywords in the SWD is defined by "Rules for the Keyword Catalogue" (RSWK). [6]

STW Thesaurus for Economics

The thesaurus provides vocabulary on any economic subject. It also cover technical terms used in law, sociology, or politics, and geographic names.[7]

AGROVOC multilingual thesaurus

AGROVOC is a multilingual structured and controlled vocabulary designed to cover the terminology of all subject fields in agriculture, forestry, fisheries, food and related domains (e.g. environment). [8]


WordNet is a lexical database of English where nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (called "synsets"). Each synset expresses a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. [9] Wordnet has been published as linked data by the Vrije Universiteit Amsterdam.

Eurovoc - Multilingual Thesaurus of the European Union

EuroVoc is a multilingual, multidisciplinary thesaurus covering the activities of the EU, the European Parliament in particular. It contains terms in 24 languages (as of May 2011).[10]


PRONOM is the online registry of technical information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value. [11]

Freebase (also in datasets)

Freebase is an open, Creative Commons licensed collection of structured data, and a platform for accessing and manipulating that data via the Freebase API. Freebase imports data from a wide variety of open data sources, such as Wikipedia, MusicBrainz, and others.[12] Note that Freebase is essentially a dataset, but its including many reference resource can lead to using some parts of it as value vocabularies for certain cases.

National Diet Library List of Subject Headings (NDLSH)

The National Diet Library List of Subject Headings (NDLSH) is a list of subject headings applied to the catalog of the National Diet Library, including mainly the topical headings and some proper name headings. [13]

Creative Commons (CC) License set

The Creative Commons provides an infrastructure which consists of a set of copyright licenses and tools that create a balance inside the traditional “all rights reserved” setting that copyright law creates. [14]

Preservation vocabularies from LoC

Preservation Events

A concept scheme for the preservation events, i.e., actions performed on digital objects within a preservation repository.

Preservation Level Role

A concept scheme for the preservation level roles, i.e., values that specify in what context a set of preservation options is applicable.

Work in progress, or relevant for cases but not in progress officially

Aquatic Sciences and Fisheries Abstracts (ASFA) Thesaurus

The Thesaurus is used for the subject indexing of the Aquatic Sciences and Fisheries Abstracts (ASFA), an abstracting and indexing service that covers the world's literature on the science, technology, management, and conservation of marine, brackish water, and freshwater resources and environments, including their socio-economic and legal aspects.[15]

Fisheries Reference Metadata

The Fisheries Reference Metadata system stores all the classification systems (for species, countries, water areas, commodities, fishing vessels, fishing gears, etc.) used by FAO to describe fisheries observations such as time-series data on fisheries capture and production and species fact sheets.

Agriculture Thesaurus and Glossary

The Agricultural Thesaurus and Glossary are online vocabulary tools of agricultural terms in English and Spanish provided by the USDA National Agricultural Library. The subject scope of agriculture is broadly defined in the NAL Agricultural Thesaurus, and includes terminology in the supporting biological, physical and social sciences. The definitions of terms in the thesaurus were separately published as the Glossary of Agricultural Terms.[16]

Art and Architecture Thesaurus (AAT)

A multilingual controlled vocabulary for fine art, architecture, decorative arts, archival materials, and material culture for the purposes of indexing, cataloging, searching, as being a research tool.

Medical Subject Headings (MeSH)

A comprehensive controlled vocabulary produced by the National Library of Medicine (NLM) for biomedical and health-related information and documents.


A classification system for describing and classifying the subject of images represented in various media such as paintings, drawings and photographs.

The Getty Thesaurus of Geographic Names (TGN)

A structured, world-coverage vocabulary of over 1.3 million names, including vernacular and historical names, coordinates, place types, and descriptive notes, focusing on places important for the study of art and architecture.

Other value vocabularies relevant to the LLD field, not mentioned in the use cases

New York Times subject headings

The New York Times uses approximately 30,000 tags to power its Times Topics Pages. These tags (categorized into 'people', 'organization', 'place', and 'descriptor') as published as linked open data and are mapped to freebase, DBpedia, and Geonames.

MARC Countries list

MARC Countries list identifies current national entities, states of the United States, provinces and territories of Canada and Australia, divisions of the United Kingdom, and internationally recognized dependencies. The entries include references to their equivalent ISO 3166 codes.

MARC List for Languages

The MARC List for Languages provides three-character lowercase alphabetic strings that serve as the identifiers of languages and language groups. It have been cross referenced with ISOs 639-1, 639-2, and 639-5, where appropriate.

Relevant LLD Metadata Element Sets - anno 2011

This section lists RDF vocabularies used as metadata element sets in the uses cases gathered by the Library Linked Data group in 2010-2011. These include some the most relevant vocabularies for practitioners who want to re-use available Semantic Web technology for creating or converting data from the library domain.

These vocabularies are represented using the constructs offered by the RDF Schema (RDFS) and Ontology Web Language (OWL) ontology modeling languages. In addition to the documentation made available on their own websites, the reader can view their content using generic ontology creation and visualization tools such as Protégé, the Manchester ontology browser, OWL Sight or the Live OWL Documentation Environment (see for example the DOAP ontology rendered in LODE).

For each element set, we give a pointer to a human-readable website and indicate the corresponding RDF namespace, as well as a common abbreviation used for it. We also provide or re-use a short description, focused on the main scope or usage domain for the element set. We have sometimes emphasized on important design decisions that characterize the element set, including indications on whether the element set is connected to another one, or on its relation to traditional library usages. Finally, cases collected by the LLD XG are also listed under each entry as relevant usage examples.

Metadata element sets published as Semantic Web ontologies

This sub-section lists the relevant ontologies (OWL or RDFS) available at the time of writing this report, as identified by the gathered by the LLD Incubator Group.

@@TODO: Similar to what LOV and the UMBEL doc have done, we could include a graph that shows all our metadata element sets, a bit like for DC in LOV. The links would indicate that a metadata element set "re-uses" another. And the size of the circles would depend on the number of times the vocabulary appears in our use cases (which can be --even manually-- extracted from our LLD vocabulary wiki page, as Paul as done here). PLEASE DON'T FORGET THAT THE MOCK-UP BELOW IS JUST TO GIVE AN IDEA!!!


Other relevant pointers (as a reminder):

Dublin Core

Dublin Core 1.1 is the legacy Dublin Core element set containing 15 basic property elements capable of describing anything. A critical aspect of these properties is the lack of a rdfs:range setting, which allows one to use them both with literal values and fully-fledged RDF resources.

The DCMI Metadata Terms /terms namespace refines the legacy /elements/1.1/ namespace with some rdfs:range restrictions and a variety of new properties. Note that interoperability with the /element/1.1/ set is preserved via rdfs:subPropertyOf.

Friend of a Friend (FOAF)

FOAF is a basic common-sense and widely used ontology for describing persons and other closely-related entities on the web.

Vocabulary of Interlinked Datasets (VoID)

VoID (from "Vocabulary of Interlinked Datasets") is an RDF based schema to describe linked datasets. With VoID the discovery and usage of linked datasets can be performed both effectively and efficiently. A VoID dataset is a collection of data, published and maintained by a single provider, available as RDF, and accessible, for example, through dereferenceable HTTP URIs or a SPARQL endpoint.


The Open Archives Initiative Object Reuse and Exchange model define elements to describe aggregations of web resources, which together form complex digital objects, such as a journal article and its different digital variations and accompanying material. It also proposes a "resource map" mechanism to indicate and describe provenance of metadata on these aggregations, as well as "proxies" to describe any given resource from the perspective of a specific aggregation, when resources are included in different aggregations.


"SKOS provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary."[17] SKOS deliberately avoids providing rdfs:domains with some of its properties (esp. labelling and note properties), enabling one to re-use them for any kind of resource.


SKOS-XL is a SKOS extension that provides support for describing lexical entities attached to concepts. It "reifies" the labels of skos:Concepts, treating them as fully-fledged RDF resources. This allows them to be annotated further, or support linking them using, say, a "isTranslationOf" property.


BIBO (Bibliographic Ontology) can be used as a citation ontology or document classification ontology, or a way to describe any kind of bibliographic things in RDF.


This is a RDF Schema for EXIF -- a standard for images and supports mainly technical metadata, usually embedded in an image file (e.g., JPEG file), where each key of the EXIF specification has been directly mapped to a corresponding property. In order to preserve the groupings of metadata keys that is provided in the original EXIF specification (e.g., pixel composition and geo location), other efforts have been reported, such an EXIF OWL ontology [18].

UMBEL Vocabulary

The UMBEL (Upper Mapping and Binding Exchange Layer) Reference Concepts dataset is derived from the OpenCyc ontology. It includes thousands coherently structured and linked concepts, and is broadly applicable as orienting nodes to any knowledge domain. The UMBEL vocabulary provides with the classes and properties to describe this conceptual knowledge. It also intends to function as the basis for constructing domain ontologies. [19] It re-uses external vocabularies whenever possible.


The vCard ontology enables representing business card profiles defined by vCard (RFC2426).


OWL ontology: http://lexvo.org/ontology

The name Lexvo is derived from the Ancient Greek λεξικόν (lexicon) and the Latin vocabularium (vocabulary).[20] The ontology provides a vocabulary for defining global URIs for languages, words, characters, and other human language-related objects.

MARC Code List of Relators

The MARC relators vocabulary provides a list of properties for describing the relationship between a name and a bibliographic resource.


The Open Provenance Model is a generic model to express and share provenance information. It consists of a lightweight Open Provenance Model Vocabulary which enables basic representation of provenance data, and a more expressive Open Provenance Model OWL Specification geared towards inference.


The CIDOC object-oriented Conceptual Reference Model (CRM) is developed by the International Council of Museums (ICOM) to represent and make interoperable description of objects from the cultural sector. It makes intensive use of events to link objects, persons, places and more conceptual notions together.

Music Ontology

"The Music Ontology Specification provides main concepts and properties fo describing music (i.e. artists, albums and tracks) on the Semantic Web". It applies the FRBR distinctions to the music domain.

Creative Commons Rights Expression Language (CC REL)

CC REL enables describing copyright licenses in RDF.

CiTO: A Citation Type Ontology

CiTO, one of the SPAR ontologies is a minimal ontology for describing reference citations in research articles.


Description of a Project (DOAP) is a vocabulary for describing software projects, especially open-source projects.

W3C Basic Geo vocabulary

This small ontology is aimed at representing Geo Positioning (latitude, longitude and altitude) for spatial objects, according to the WGS84 standard.

DCMI Type Vocabulary

A general, cross-domain list of Dublin Core Metadata Initiative (DCMI) approved terms that may be used as values for the resource type element to identify the genre of a resource.

Dublin Core Collection Description vocabularies

The DCMI Collection Description Application Profile Task Group developed a Dublin Core collections application profile and several vocabularies. It's work was based on the RSLP Collection description schema.

Functional Requirements for Bibliographic Records (FRBR) and related ontologies

FRBR (Functional Requirements for Bibliographic Records) is a conceptual reference model developed by the International Federation of Library Associations and Institutions (IFLA) "to provide a (...) framework for relating the data that are recorded in bibliographic records to the needs of users of those records" (FRBR Final Report, sec. 2.1) and for assessing their actual relevance.

The IFLA "FRBR family" consists of three conceptual models each covering an aspect of the data recorded in bibliographic and authority records. The entities, attributes, and relationships defined by each of the models are included in the Metadata Registry:

The FRBR Final Report describes an entity-relationship model that has been the source of a number of other ontology implementations:

[AI] William, I can't find FRBR mentioned in your case. Can we remove it?

Work in progress to create ontologies

@@TODO: Some of these should be moved to the previous section at the time of final publishing.


MADS/RDF is designed for use with controlled values for names (personal, corporate, geographic, etc.), thesauri, taxonomies, subject heading systems, and other controlled value lists. The MADS/RDF ontology is mapped to SKOS.


ISAD (G)= General International Standard Archival Description. It defines the elements that should be included in an archival finding aid.

W3C Ontology for Media Resources

Defines a core set of metadata properties for media resources, along with their mappings to elements from a set of existing metadata formats. It mainly targetes towards media resources available on the Web, as opposed to media resources that are only accessible in local archives or museums.

ISBD (International Standard Bibliographic Description)

This is a preliminary registration of classes and properties from International Standard Bibliographic Description (ISBD) consolidated edition. The ISBD is useful and applicable for descriptions of bibliographic resources in any type of catalogue.


EAC-CPF is aimed at representing authoritative information about the context of archival materials, including "the identification and characteristics of the persons, organizations, and families (agents) who have been the creators, users, or subjects of records, as well as the relationships amongst them" [21]. It is a parallel effort to the Encoded Archival Description (EAD) standard for representation of archival finding aids.

A core concept in EAC-CPF is the distinction between agents and identities: a same agent can have different identities, and one identity can correspond to several agents.

Documentation, RDF/XML file


MARC has played a crucial role in the creation and exchange of library metadata. The MarcOnt initiative has created an OWL ontology that includes a small sub-set of MARC elements, connected to other ontologies.


Preservation Metadata: Implementation Strategies (PREMIS) defines core set of preservation metadata elements, with supporting data dictionary, applicable to a broad range of digital preservation activities.

EAD and other archive-oriented element sets

EAD standard for encoding archival finding aids using Extensible Markup Language (XML).

  • Usage examples: Cluster Archives
  • Work relevant for EAD in RDF has been done in the LOCAH (linked data available here and documentation here) and EuropeanaConnect (see schema here)

Note that the LOCAH element set only handles a part of EAD, and introduces other elements that the LOCAH participants found useful to publish archival collection data as linked data. Readers may also be interested in the lightweight Archival vocabulary maintained by Aaron Rubinstein for describing archives and the named entities associated to them.

Metadata element sets from cases for which no RDF version is available

Categories for the Description of Works of Art (CDWA)

Categories for the Description of Works of Art (CDWA) includes 532 categories and subcategories for describing describing and accessing information about works of art, architecture, other material culture, groups and collections of works, and related images.


A subset of elements taken based on the Categories for the Description of Works of Art (CDWA) and Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO). It is an XML schema to describe core records for works of art and material culture.

EBU P/Meta Semantic Metadata Schema (P/META)

A standard vocabulary for information relating to programme information in the professional broadcasting industry.


SPECTRUM is a UK-originated standard for managing museum collections, from descriptive metadata for objects to loan information [22]


Metadata Object Description Schema (MODS) includes a subset of MARC fields and uses language-based tags rather than numeric ones, in some cases regrouping elements from the MARC 21 bibliographic format. MODS is expressed using XML.

Other metadata element sets (no RDF version) relevant to the LLD field, not mentioned in the use cases

VRA Core

Visual Resources Association (VRA) Core Categories (VRA Core) specifies a set of core categories for creating records to describe works of visual culture as well as the images that document them.

  • An OWL ontology for VRA core 3.0 has been created by Mark van Assem for the W3C Semantic Web Best Practices and Deployment working group.

Text Encoding Initiative (TEI) Guidelines

The "Guidelines for Electronic Text Encoding and Interchange" is a standard for representing all kinds of literary and linguistic texts for online research and teaching.


(PB=Public Broadcasting). PBCore is a metadata standard designed to describe media, both digital and analog. The PBCore XML Schema Definition (XSD) defines the structure and content of PBCore. The element set and related value vocabularies are available at Metadata Registry.