WebTech (Redaction)

From W3C eGovernment Wiki
Jump to: navigation, search

DRAFT

Will link from main document "Preparing information for publication?"


The Principle of Transparency of the Commons

The publication of information to the public is the most common use of e-Government in the world today, and has been the primary means of Government to Citizen interaction long before the advent of the Internet. The purpose of transparency in e-Government is to reinforce trust in a Government and enhance the base of common knowledge available to the citizenry in their everyday lives. In the day-to-day operation of the Government, this may include a claim of reasonable security expectations for the Government itself on behalf of the citizens, the grant of Copyrights and other Licenses, and also respect for the personal privacy expectations of the citizens, without favoritism toward one subgroup of them or another.
Totc-small.png

The data held by Governments is held in common, and only disclosed for the common good. But it is not a monolithic chunk. While other types of e-institutions might have only one firewall, a Government has at least two firewalls for operational purposes, plus, of course a free transit (transparent) portal to the commonwealth. The Government Holdings are not discoverable, except in the sense of the intellectual exploration of the Commons. In fact, resources in the Public Domain are well mapped and the ownership is not in dispute. The data can be farmed, but not mined, unless by the legitimate owners. Under these circumstances, the access may be extended far beyond the immediate jurisdiction under the very reasonable assumption that all users of the commons are curious too.

Like any publication, written information that reaches the public online has often been through a process of drafting, change, editing and approval by a number of people. Redaction is the process by which Government Holdings, that data to which Governments claim access for whatever reason, is separated from the data held for the common good. This is an age-old practice. In a Library setting, it is the natural separation into layers between the available Media Collections and the Librarian's Lunch.

  • Summary - Notes on redaction as part of the editing process and in the defininition of sanitisation.


  • Semantics - documents may contain information that reduces the usefulness of the publication to the audience. Cover semweb/LD application as well.
  • Accessibility - redaction as an accessibility tool eg: replacing images with text equivalent.

Redaction basics

In the absence of specific Project Requirements, Web Developers tend to view formats in binary terms, as inclusive or exclusive of the data they have the capability to represent. In everyday life, including employment life, certain unwritten common sense rules apply, but the manifestation of the rules is often swayed by other motivations. This almost always leads to mischief, and sometimes to fraud and criminal activity. For example, all parenthood involves some routine data base administration:

SELECT * FROM Children WHERE play_noise=NULL

For Government work, the Common Sense Project Requirements that deal with disclosure of information are normally reinforced by criminal and civil law. While different cultures may have different historical precedents (English Common Law, Sharia Law, Cannon Law, etc.) it is more helpful to classify those precedents into types 1) which provide swift and certain sanctions and 2) those that may delay sanction. Government Authority over Government Entities (including employees) clearly fits into the former type, regardless of culture.

United States Title 5 U.S.C http://www.justice.gov/opcl/privacyact1974.htm
United States Bureau of the Census http://www.census.gov/privacy/

A project which involves the sanitisation of data may involve extensions on the mark-up languages used, constraints on Content and constraints on the File Formats used. The following use cases have been identified.

  • Mark-Up
A distinction between document resources and documentation for the purpose of identifying people.
It is expected that very few datasets will need redaction simply because most Government Agencies have been releasing redacted activity reports to the Public for many years, centuries in some cases. However, the prospect of automated access to reports makes this distinction necessary.
A distinction between people and groups of people.
When one or more Governments get together they call it "a Treaty". When one or more people get together most of the time they call it "Gossip". In both cases, the individual laws of the Governments are observed, and the privacy if the individual person is observed by the group.
A distinction between groups of people and formal groups, Institutions, Organizations, etc.
Oftentimes Working Groups and Task Forces with limited scope are created to solve a narrowly defined problem.
  • Content
Redaction may be necessary where information must be retained in the private domain of an Agency of Government
  1. Classified material - draft / internal documents may contain information of a classified nature that needs to be removed for security reasons.
  2. PII - documents may contain personal information that needs to be removed for privacy reasons.
  • Formats
Data Formats which may by their nature carry data content styled as hidden or visible in common use.
  1. Redaction should always be by deletion, never by "hidden" style or without visible result in the primary display.
  2. HTML meta tags should not be used to convey Personally Identifiable Information because the meta data is hidden by the customary display style of web browsers. While the information is "Public" to the technically inclined, it is secret to the Data Consumer.
  3. "Fine Print" in documents has no place on the Web. The reasoning is the same as for HTML meta tags. The intent of "Fine Print" is obfuscation, not transparency.
  4. Redaction Strategies http://www.fas.org/sgp/othergov/dod/nsa-redact.pdf
Data Formats which may disclose data content in such a way that manipulation can reveal prohibited disclosures.
  1. United States Statistical Data http://factfinder.census.gov/jsp/saff/SAFFInfo.jsp?_pageId=su5_confidentiality
  • Redaction in the perfect world - XML to other formats etc.
  • Redaction automation - cover common tools eg: Microsoft/Adobe.

Purging Resource Description Format (RDF) Documents of PII

The redaction of Resource Descriptions of impermissible disclosures is substantially easier than with many other formats. Like the generation if RDF/XML itself (GRDDL or RDFa), the process can be automated with style sheet transforms (XSL).


Hiding triples with style in RDF/XML or Turtle, is not useful. But formally, since the redaction of the source should be by deletion, CSS3 methods of display modification should never be used.


Personal Information of Citizens and Figures without Official Agency Roles

A Government Agency most likely has existing rules for the redaction of Personal Information of Citizens and Public Figures without an official Agency role. These rules can be can be implemented with the definition of two (new) name spaces and a third name space of tokens, or general categories (example: http://purl.org/pii/terms/ ).


skos:Concept
owl:Thing
Agency
Organization NS
Token NS
Agency
Public NS
Token NS
Location nso:Place pii:misc nsp:Location pii:location
Name nso:Group pii:misc nsp:Person pii:aka
Surname nso:Group pii:misc nsp:LastName pii:group
...

Notes:

  • The http://purl.org/pii/terms/misc Property means that the resource does not contain any impermissible disclosures.
  • The Agency Organization and Agency Public name spaces should have descriptive element equivalent names, but not necessarily be a one-to-one mapping.
The GLPDRD Transform

A [Gleaning Permissible Disclosures from Resource Descriptions] “GLPDRD” (rhymes with LEOPARD) Transform is defined in this way:


From To
PII <?xml version="1.0"?>
<rdf:RDF 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:nso="http://Agency.nso.gov#" 
xmlns:nsp="http://Agency.nsp.gov#">
<rdf:Description rdf:nodeID="C2a"> 
   <rdf:value>123 Main St. Sheboygan WI US</rdf:value> 
   <rdf:type rdf:resource="http://Agency.nsp.gov#Location" /> 
   <rdf:type rdf:resource="http://purl.org/pii/terms/location" /> 
</rdf:Description>
</rdf:RDF>
<?xml version="1.0"?>
<rdf:RDF 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:nso="http://Agency.nso.gov#">
<rdf:Description rdf:nodeID="C2b"> 
   <rdf:value>http://purl.org/pii/terms/location</rdf:value> 
   <rdf:type rdf:resource="http://Agency.nso.gov#Place" /> 
   <rdf:type rdf:resource="http://purl.org/pii/terms/misc" /> 
</rdf:Description> 
</rdf:RDF>
No PII <?xml version="1.0"?>
<rdf:RDF 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:nso="http://Agency.nso.gov#" 
xmlns:nsp="http://Agency.nsp.gov#">
<rdf:Description rdf:nodeID="C1"> 
   <rdf:value>Washington, DC US</rdf:value> 
   <rdf:type rdf:resource="http://Agency.nso.gov#Place" /> 
   <rdf:type rdf:resource="http://purl.org/pii/terms/misc" /> 
</rdf:Description>
</rdf:RDF>
<?xml version="1.0"?>
<rdf:RDF 
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
xmlns:nso="http://Agency.nso.gov#">
<rdf:Description rdf:nodeID="C1"> 
   <rdf:value>Washington, DC US</rdf:value> 
   <rdf:type rdf:resource="http://Agency.nso.gov#Place" /> 
   <rdf:type rdf:resource="http://purl.org/pii/terms/misc" /> 
</rdf:Description>
</rdf:RDF>
Personal Information of Figures with Agency Roles (Office Holders)

In the example above, a location was suppressed for Internal Agency reasons, for example the Home Address of an Agency Employee versus an Office Address for public contact. Especially with the advent of the mobile web it is to the advantage of Governments to have transparent procedures since the supposed secrecy of a location can breed mistrust in the Agency. Many Professionals have had “unlisted phone numbers” for decades. This is just an indication that their phone number is not a Public contact point. That models for Social Networking should choose to ignore this is a failure in the models and not a newly discovered need to know.

With the Personal Information of Public Figures the problem is the same, with much more flexible tolerance limits. If a publicly accessible URL exists for a person, then that person alone has the right to authorize the continued availability. The mere right to challenge the veracity of the content is insufficient to authorize continued availability.

A Public Office description may have three different representations, depending on the wishes of the Office Holder.

Example 1 - <db:Governor> with authorization to use a biographic link:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:db="http://dbpedia.org/resource/"

xmlns:pii="http://purl.org/pii/terms/">

<rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts">

<db:Governor>

<rdf:Description rdf:about="http://dbpedia.org/resource/Deval_Patrick">

<rdf:type rdf:resource="http://purl.org/pii/terms/misc" />

</rdf:Description>

</db:Governor>

<db:Nickname>Bay State</db:Nickname>

<db:Capital>

<rdf:Description rdf:about="http://dbpedia.org/resource/Boston">

<db:Nickname>Beantown</db:Nickname>

</rdf:Description>

</db:Capital>

</rdf:Description>

</rdf:RDF>

Example 2 - <db:Governor> without authorization to use a bibliographic link, but with authority to use a name of an Public Office Holder:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:db="http://dbpedia.org/resource/"

xmlns:pii="http://purl.org/pii/terms/">

<rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts">

<db:Governor>

<rdf:Description rdf:about="http://dbpedia.org/resource/Governor">

<rdf:value>Patrick Deval</rdf:value>

<rdf:type rdf:resource="http://purl.org/pii/terms/misc" />

</rdf:Description>

</db:Governor>

<db:Nickname>Bay State</db:Nickname>

<db:Capital>

<rdf:Description rdf:about="http://dbpedia.org/resource/Boston">

<db:Nickname>Beantown</db:Nickname>

</rdf:Description>

</db:Capital>

</rdf:Description>

</rdf:RDF>

Example 3 - <db:Governor> without a bibliographic link, with authority to use the name of a Public Office Holder.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:db="http://dbpedia.org/resource/"

xmlns:pii="http://purl.org/pii/terms/">

<rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts">

<db:Governor>

<rdf:Description rdf:about="http://purl.org/pii/terms/aka">

<rdf:value>Patrick Deval</rdf:value>

<rdf:type rdf:resource="http://purl.org/pii/terms/misc" />

</rdf:Description>

</db:Governor>

<db:Nickname>Bay State</db:Nickname>

<db:Capital>

<rdf:Description rdf:about="http://dbpedia.org/resource/Boston">

<db:Nickname>Beantown</db:Nickname>

</rdf:Description>

</db:Capital>

</rdf:Description>

</rdf:RDF>