Section 0. Contact and confidentiality
Contact e-mail:
John McCarthy <JLMcCarthy AT lbl DOT gov>
Bruce Bargmeyer <BEBargmeyer AT lbl DOT gov>
Do you mind your use case being made public on the working group website and documents?
No, you are welcome to use it.
Section 1. Application
In this section we ask you to provide some information about the application for which the vocabulary(ies) and or vocabulary mappings are being used. Please note:
- If your use case does not involve any specific application, but consists rather in the description of a specific vocabulary, skip straight to Section 2.
- If your application makes use of links between different vocabularies, do not forget to fill in Section 3!
1.1. What is the title of the application?
Extended Metadata Registry (XMDR) Prototype This is a prototype implementation of metadata design specifications proposed for edition 3 of ISO/IEC 11179 part 3 see http://xmdr.org/, http://xmdr.org/software/ and http://xmdr.lbl.gov/xmdr
1.2. What is the general purpose of the application?
Extensions to ISO/IEC 11179 Metadata Registry Standard
- What services does it provide to the end-user?
• registration of metadata, including concept systems such as terminologies and ontologies, as well as data elements and value domains (codesets). • registering and managing any semantic information that is useful in data management, data administration, data analysis and linkage of concepts to data. • provide semantics services for semantic computing such as the Semantic Web, semantics service oriented architectures, and semantic grids. • Interrelate concept systems with other concept systems • Interrelate concept systems with data held in databases, terminologies, and metadata deriving from natural language text understanding systems • Enable use of new services for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, semantics based workflows, Semantic Web …. • Capture semantics with more formal techniques (in addition to natural language) -- First Order Logic, Description Logic, Common Logic, OWL • Encourage and enable the sharing of concept systems and traditional metadata through means that reduce the cost of accessing, obtaining and interacting with the broadest range of content • provide semantic services needed to support semantic computing, such as dereferencing the URIs used in creating RDF statements, by providing relevant information describing the referenced concept and its authoritative standing within some community of interest.
1.3. Provide some examples of the functionality of the application. Try to illustrate all of the functionalities in which the vocabulary(ies) and/or vocabulary mappings are involved.
SEE http://hpcrd.lbl.gov/SDM/XMDR/use-cases.html
1.4. What is the architecture of the application?
see http://hpcrd.lbl.gov/SDM/XMDR/arch.html
- What are the main components?
REST(Representational State Transfer) architecture, Metamodel (in OWL) and data format {XML), RegistryStore (Persistence and Versioning), Metadata Content Validation, Indexing (text, asserted, and logical inference), Mapping, and Authentication, (Human) User Interface
- Are the components and/or the data distributed across a network, or across the Web?
Not at present; but in the future, we hope that content data as well as extended metadata registry software, might be so distributed.
1.5. Briefly describe any special strategy involved in the processing of user actions, e.g. query expansion using the vocabulary structure.
Users may choose to expand queries to include inferred as well as asserted information. Users may draw inferences based on XMDR metamodel (ISO/IEC 11179) as well as specific content and relationship of individual sets of metadata.
1.6. Are the functionalities associated with the controlled vocabulary(ies) integrated in any way with functionalities provided by other means? (For example, search and browse using a structured vocabulary might be integrated with free-text searching and/or some sort of social bookmarking or recommender system.)
One of the main purposes of the XMDR Prototype is to demonstrate how concept systems (including vocabularies, terminologies, thesauri, and ontologies) can be used to help integrate, search, and harmonize more traditional metadata registry information about data elements, valid value sets, etc. The functionality for humans and computers is to enable linkage of concept systems and data. This can be utilized in many ways, including finding data elements that are related to particular concepts or sets of concepts. COMBINES TEXT AND INFERENCE SEARCHING
1.7. Any additional information, references and/or hyperlinks.
See http://xmdr.org/ As noted elsewhere, we have been working closely with Harold Solbrig, and using LexGrid to facilitate import of concept systems into XMDR whenever possible. As SKOS gains wider acceptance and software tools to work with it, we can envision eventually using SKOS and related tools for many of the same purposes for which we are currently using LexGrid. We thus hope that SKOS will be able to incorporate many of the current features of LexGrid so that we can easily use concept systems that use SKOS. LexGrid and XMDR may prove to be useful tools for working with content expressed in SKOS. In the meantime, it might be very useful to have software that could translate from SKOS to LexGrid and vice-versa.
Section 2. Vocabulary(ies)
In this section we ask you to provide some information about the vocabulary or vocabularies you would like to be able to represent using SKOS. Please note:
- If you have multiple vocabularies to describe, you may repeat this section for each one individually or you may provide a single description that encompasses all of your vocabularies.
- If your use case describes a generic application of one or more vocabularies and/or vocabulary mappings, you may skip this section.
- If your vocabulary case contains cross-vocabulary links (between the vocabularies you presented or to external vocabularies), please fill in section 3!
2.1. What is the title of the vocabulary? If you're describing multiple vocabularies, please provide as many titles as you can.
XMDR has loaded a number of different concept systems in order to demonstrate different kinds of capabilities, particularly for large, complex concept systems. For the list of the current concept systems included and proposed for the XMDR Prototype, and a summary of their respective characteristics, see the table at http://hpcrd.lbl.gov/SDM/XMDR/contentlist.html
2.2. Briefly describe the general characteristics of the vocabulary, e.g. scope, size...
see http://hpcrd.lbl.gov/SDM/XMDR/contentlist.html, a portion of which is included below...
Dataset Name
XMDR Contact
Graph Structure
Priority
Licensing Issues
Status Survey Form
Status LexGrid Loading
Status XMDR Loading
References and Comments
DTIC Thesaurus (Defense Technology Info. Center Thesaurus)
Gail Hodge (IIA for USGS)
Directed graph (tree + related terms)
1
No.
Yes
Yes
This is an outdated version.
NCI Thesaurus (National Cancer Institute Thesaurus)
Sherri De Coronado (NCI)
Directed graph (tree + related terms)
1
No
Yes
Yes
.
NCI caDSR (National Cancer Institute Data Standards Repository)
Sherri De Coronado (NCI)
Directed graph
1
No
NA
In progress
.
ISO 3166 Country Codes
Frank Olken
List
1
No
Yes, may need a language reload
Yes
ISO 3166 Country Codes Download Page (English and French) or extract from EPA EDR
GEMET (GEneral Multilingual Environmental Thesaurus)
Gail Hodge (IIA for USGS) and Linda Spencer (EPA)
Directed graph (trees + related terms)
1
No
missing
Yes
Yes
Bruce Bargmeyer has a new set of GEMET files from 2006/04. Nothing has been done with the new GEMET files.
Multilingual.
XMDR terminology and concept system content varies greatly in size, from small to hundreds of thousands of concepts, millions of terms, and millions of relations between concepts.
2.3. In which language(s) is the vocabulary provided?
- In the case of partial translations, how complete are these?
XMDR is intended to input concept systems in their entirety from any format. Wherever possible, we have used LexGrid as an intermediate step in loading content. Content expressed in SKOS would make it easier for XMDR to load additional content from diverse fields and sources.
- 2.4. Please provide below some extracts from the vocabulary. Use the layout or presentation format that you would normally provide for the users of the vocabulary. Please ensure that the extracts you provide illustrate all of the features of the vocabulary.
2.5. Describe the structure of the vocabulary.
- What are the main building blocks?
- What types of relationship are used? If you can, provide examples by referring to the extracts given in paragraph 2.4.
2.6. Is a machine-readable representation of the vocabulary already available (e.g. as an XML document)? If so, we would be grateful if you could provide some example data or point us to a hyperlink.
There is substantial content available in two prototype implementation instances on our web site at xmdr.org. We have permission to make the content accessible on the web but may not be able to re-distribute content in bulk.
2.7. Are any software applications used to create and/or maintain the vocabulary?
- Are there any features which these software applications currently lack which are required by your use case?
Some parts of the XMDR architecture are not yet implemented (e.g., mapping). We are working actively to add such capabilities and welcome collaboration that might expedite that work.
2.8. If a database application is used to store and/or manage the vocabulary, how is the database structured? Illustration by means of some table sample is welcome.
Content Systems are translated into XML files that conform to the XMDR metamodel, as described at https://xmdr.lbl.gov/mediawiki/index.php/11179_Diagrams
2.9. Were any published standards, textbooks or written guidelines followed during the design and construction of the vocabulary?
- Did you decide to diverge from their recommendations in any way, and if so, how and why?
We are trying to coordinate our work with development of ISO/IEC 11179 edition 3, and other standards efforts, particularly ISO TC37 and the W3C Semantic Web Working Groups (XML, RDF, OWL and SKOS). Other related ISO standards include 639, 704, 3166, 11179, 12620 and and UML (Universal Modeling Language). We also have used the LexGrid specification (http://LexGrid.org/) because it bridges the SKOS/OWL boundary.
2.10. How are changes to the vocabulary managed?
Changes to vocabularies are the responsibility of the different organizations from which we obtain them. How to keep the experimental XMDR Prototype updated with respect to changing external sources is an active research and development topic.
2.11. Any additional information, references and/or hyperlinks.
Section 3. Vocabulary Mappings
In this section we ask you to provide some information about the mappings or links between vocabularies you would like to be able to represent using SKOS. Please note:
- If your use case does not involve vocabulary mappings or links, you may skip this section!
Although the XMDR Prototype does not yet include facilities for mapping between different concept systems, that is one of our important goals. See http://hpcrd.lbl.gov/SDM/XMDR/arch.html section C.4. XMDR MappingEngine The part of the MDR specification that requires the most work for XMDR is the support for registration and use of mappings between pairs of classification systems, ontologies, schemas, and value domains. Thus far, we have identified three general approaches to mapping being used today: Translation Tables The simpler approach is to build a table of pairs (an unlabeled bipartite graph) between the classification scheme items, concepts, or values which have corresponding or overlapping meaning. They are also sometimes called "correspondence tables"; see for example the mappings provided with NAICS 2002. A one-to-many matching indicates ambiguity. Translation tables are sometimes qualified with a confidence and/or completeness scale or measure, which is necessarily direction-dependent. DL-based Translation A more powerful approach is to use a description logic (such as OWL) to express mappings, which provides more precision. However some non-trivial tool is still needed to "apply" a DL mapping as a transformation. FOL-based Translation First-order logic (FOL) is more powerful than description logics, and so it supports the definition of more complicated mappings. The trade-off is that DLs are typically decidable and tractable, while full FOL is neither. Still, some communities have been using full FOL for some time and found that these theoretical problems rarely (if ever) materialize in practice. Additionally, there is a great variety of DLs, many of which cannot be combined without breaking the decidability and tractability conditions which motivate the use of a DL in the first place, and so any system attempting to leverage knowledge which is expressed in two different DLs will generally be forced to use a substantial subset of FOL anyway. An emerging ISO standard for exchange of FOL axiom sets is called Simple Common Logic (SCL). Rule- or Query-based Translation In the relational database community translations are commonly described as views. Euzenat observes that an equivalent level of expressivity is provided by SWRL for OWL/RDF [Euzenat, 2004]. It is not immediately apparent whether rule-based translation is (in theory) any less powerful (or more tractable) than FOL-based translation.
3.1. Which vocabularies are you linking/mapping from/to?
We have just begun this part of our research and development efforts, starting with mappings between the old Standard Industrial Classification (SIC) codes and their successor, the North American Industrial Classification (NAIC) Codes.
- 3.2. Please provide below some extracts from the mappings or links between the vocabularies. Use the layout or presentation format that you would normally provide for the users of the mappings. Please ensure that the examples you provide illustrate all of the different types of mapping or link.
See forthcoming work by Fred Gey, who is a member of our XMDR team at LBNL, a preliminary copy of which is attached.
3.3. Describe the different types of mapping used, with reference to the examples given in paragraph 3.2.
3.4. Any additional information, references and/or hyperlinks.