This is an archive of an inactive wiki and cannot be modified.

Extended Metadata Registry (XMDR) Prototype

Contact e-mail:

Application

General purpose and services to the end user

This is a prototype implementation of metadata design specifications proposed for edition 3 of ISO/IEC 11179 part 3

A description of use cases is available at http://hpcrd.lbl.gov/SDM/XMDR/use-cases.html

Functionality examples

Application architecture

The application relies on a REST (Representational State Transfer) architecture, making use of Metamodel (in OWL) and data format (XML), RegistryStore (Persistence and Versioning), Metadata Content Validation, Indexing (text, asserted, and logical inference), Mapping, Authentication, (Human) User Interface. More details on http://hpcrd.lbl.gov/SDM/XMDR/arch.html

The application is not distributed at present; but in the future, content data as well as extended metadata registry software might be.

Special strategies involved in the processing of user actions

Users may choose to expand queries to include inferred as well as asserted information. Users may draw inferences based on XMDR metamodel (ISO/IEC 11179) as well as specific content and relationship of individual sets of metadata.

Integration between vocabulary-linked functions and other application functions

One of the main purposes of the XMDR Prototype is to demonstrate how concept systems can be used to help integrate, search, and harmonize more traditional metadata registry information about data elements, valid value sets, etc. The functionality for humans and computers is to enable linkage of concept systems and data. The system combines text search with inference (semantic) search.

Additional references

Project website: http://xmdr.org/

Vocabulary

Titles

XMDR has loaded a number of different concept systems in order to demonstrate different kinds of capabilities, particularly for large, complex concept systems. For the list of the current concept systems included and proposed for the XMDR Prototype, and a summary of their respective characteristics, see the table at http://hpcrd.lbl.gov/SDM/XMDR/contentlist.html

General characteristics (size, coverage) of the vocabulary

A portion of http://hpcrd.lbl.gov/SDM/XMDR/contentlist.html is included below:

Structure explanation

XMDR is intended to input concept systems in their entirety from any format.

Wherever possible, XMDR uses LexGrid as an intermediate step in loading content. As SKOS gains wider acceptance and software tools to work with it, using SKOS for many of the same purposes for which XMDR is currently using LexGrid will be considered.

Should SKOS be able to incorporate many of the current features of LexGrid, XMDR could easily use concept systems that use SKOS. LexGrid and XMDR may prove to be useful tools for working with content expressed in SKOS. In the meantime, it might be very useful to have software that could translate from SKOS to LexGrid and vice-versa.

Machine-readable representation of the vocabulary

There is substantial content available in two prototype implementation instances on XMDR web site at http://xmdr.org/.

Structure of the database used to currently manage the vocabulary

Content Systems are translated into XML files that conform to the XMDR metamodel, as described at https://xmdr.lbl.gov/mediawiki/index.php/11179_Diagrams

Standards and guidelines considered during the design and construction of the vocabulary

XMDR is trying to coordinate its work with development of ISO/IEC 11179 edition 3, and other standards efforts, particularly ISO TC37 and the W3C Semantic Web Working Groups (XML, RDF, OWL and SKOS). Other related ISO standards include 639, 704, 3166, 11179, 12620 and and UML (Universal Modeling Language).

XMDR also has used the LexGrid specification (http://LexGrid.org/) because it bridges the SKOS/OWL boundary.

Management of changes

Changes to vocabularies are the responsibility of the different organizations from which XMDR obtains them. How to keep the experimental XMDR Prototype updated with respect to changing external sources is an active research and development topic.

Vocabulary mappings

Although the XMDR Prototype does not yet include facilities for mapping between different concept systems, that is one of XMDR important goals.

Mapped vocabularies

This part of XMDR research and development efforts has just begun, starting with mappings between the old Standard Industrial Classification (SIC) codes and their successor, the North American Industrial Classification (NAIC) Codes.

Extracts of Mappings

See for example the mappings provided with NAICS 2002, http://www.census.gov/epcd/naics02/N02TOS87.HTM, where a one-to-many matching indicates ambiguity. Translation tables are sometimes qualified with a confidence and/or completeness scale or measure, which is necessarily direction-dependent.

An example (in LexGrid format):

<lgRel:association association="mapsTo" forwardName="mapsTo" reverseName="mappedFrom" targetCodingScheme="NAICS">
        <lgRel:sourceConcept sourceConcept="10">
                <lgRel:targetConcept targetConcept="21">
                        <lgRel:associationQualification associationQualifier="approx" /> 
                </lgRel:targetConcept>
        </lgRel:sourceConcept>
</lgRel:association>

Types of mapping used

The Census Bureau mapping provides three levels of statistical comparability:

XMDR might expand the target mapping attribute to be “almost exact” for a tri-level mapping between NAICS and SIC. The mappings can be 1-to-1 (i.e. exact), or 1-to-many, or many-to-many. Since exact mappings are straightforward, what XMDR should capture is the inexact mappings, describing the dispersion of a single NAICS code into multiple SIC codes and vice versa. If a single SIC code maps wholly (with no leftover) to a list of NAICS codes, that would be a 1-to-many mapping. However if an SIC code maps to only part of particular NAICS codes, the mapping is many-to-many. One way would be to create a list of mapping targets for each code source and identify for each target whether the target is an exact match or an approximate match.

The mapping can be statistical (i.e. which portion of economic for an SIC code is apportioned to one or more NAICS codes, or vice versa) or semantic (i.e. the meaning of the codes are identical or they have overlapping meanings).

Different general approaches to mapping representation will be considered: