IANA Language Subtag Registry in SKOS
This page contains a proposal to encode language codes of the IANA Language Subtag Registry in RDF. It is an implementation of Languages as RDF Resources.
Converting with awk and perl
#!/usr/bin/perl =head1 NAME registry2skos.pl - Convert IANA Language Subtag Registry to SKOS =head1 SYNOPSIS 1. download the registry wget http://www.iana.org/assignments/language-subtag-registry 2. clean lines awk 'NR<3{next} /^[^ ]/{print L; L=$0} /^ /{L=L$0} END {print L}' language-subtag-registry > registry 3. convert by running this script =head1 AUTHOR Jakob Voss =head1 VERSION 0.1a - first draft and proof of concept =cut # General settings my $VERSION = "0.1"; # Print header print <<END; <?xml version='1.0' encoding='ISO-8859-1'?> <rdf:RDF xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/elements/1.1/"> END # Parse registry my %fields = (); open REG, "registry" or die("no registry found!"); while (<REG>) { chomp; if ($_ =~ /^([A-Za-z-]+): (.*)$/) { if (defined $fields{$1}) { # multiple values if (ref $fields{$1} eq "ARRAY") { push (@{ $fields{$1}, $2 }); } else { $fields{$1} = [$fields{$1}, $2]; } } else { # single value $fields{$1} = $2; } } else { transform_code(); %fields = (); } } # Print footer print "</rdf:RDF>\n"; # actual transforming sub transform_code { my $xml = ""; my $description = $fields{"Description"}; if (ref($description) eq "ARRAY") { foreach my $d (@{$description}) { $xml .= " <skos:altLabel>$d</skos:altLabel>\n"; } } else { $xml .= " <skos:altLabel>$description</skos:altLabel>\n"; } if ($fields{"Comment"}) { $xml .= " <skos:publicNote>" . $fields{"Comment"} . "</skos:publicNote>\n"; } if ($fields{"Added"}) { $xml .= " <dc:date>" . $fields{"Added"} . "</dc:date>\n"; } if ($fields{Type} eq "language") { if ($field{"Preferred-Value"} or $field{"Deprecated"} ) { # TODO: deprecated codes not implemented yet } else { my $subtag = $fields{"Subtag"}; $xml = "<skos:Concept rdf:about='#$subtag'>\n" . " <skos:prefLabel>$subtag</skos:prefLabel>\n" . # TODO: skos:notation $xml; if ($fields{"Suppress-Script"}) { # TODO: add an additional code and create an skos:exactMatch } $xml .= "</skos:Concept>\n"; print $xml; } } elsif ($fields{Type} eq "redundant") { my $tag = $fields{"Tag"}; $xml = "<skos:Concept rdf:about='#$tag'>\n" . " <skos:prefLabel>$tag</skos:prefLabel>\n" . # TODO: skos:notation $xml; my $lang = $tag; $lang =~ s/-.*$//; $xml .= " <skos:broader rdf:resource='#$lang'/>\n"; # TODO: link to script if region specified $xml .= "</skos:Concept>\n"; print $xml; } else { # TODO: Type grandfathered,region,script,variant foreach $k (keys %fields) { #print "$k: " . $fields{$k} . "\n"; } } } __END__
Open tasks
- add concepts for grandfathered, region, script, variant
- add skos:broader and skos:narrower between redundant codes and language/region/script/variant
- add concepts for Suppressed scripts and, mapped to with exactMatch or owl:sameAs
- Finish SKOS Mapping so changes can be explicitely modeled
- Clarify modelling of deprecated concepts in SKOS and add deprected language codes
- Define official URIs
- Model other language codes (for instance MARC Language Codes) and create mapping
- Testing
Resources
- Language tags in HTML and XML (W3C i18n)
- The IANA Language Subtag Registry
- Phillips, A., Davis, M., "Tags for Identifying Languages", RFC 4646 September 2006
- Ewell, D., Ed., "Initial Language Subtag Registry", RFC 4645, September 2006.
- Alvestrand, H., "Tags for the Identification of Languages", BCP 47, RFC 3066, January 2001.
- Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995
- International Organization for Standardization, "ISO 3166-1:1997. Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes", 1997.
- International Organization for Standardization, "ISO 639-1:2002. Codes for the representation of names of languages -- Part 1: Alpha-2 code", 2002.
- International Organization for Standardization, "ISO 639-2:1998. Codes for the representation of names of languages -- Part 2: Alpha-3 code, first edition", 1998.
- Library of Congress, ISO 639-2 Registration Authority
- SIL International, ISO 639-3 Registration Authority
- International Organization for Standardization, "ISO 15924:2004. Information and documentation -- Codes for the representation of names of scripts", January 2004.
- Statistics Division, United Nations, "Standard Country or Area Codes for Statistical Use", UN Standard Country or Area Codes for Statistical Use, Revision 4 (United Nations publication, Sales No. 98.XVII.9) June 1999
- ANSI/NISO Z39.53, "Codes for the Representation of Languages for Information Interchange", 2001
- Library of Congress, "MARC Code list for languages", 2003