Toward More Transparent Government
Workshop on eGovernment and the Web

Summary Report

Executive Summary

On 18-19 June 2007, the World Wide Web Consortium (W3C) and the Web Science Research Initiative held a workshop entitled, Toward More Transparent Government at the US National Academy of Sciences in Washington, DC.

(photo credit: NAS)

The goal of the workshop was to find ways to facilitate the deployment of Web standards across eGovernment sites and help shape the ongoing research agenda in the development of Web technology and public policy in order to realize the potential of the Web for access to and use of government information.

The Call for Participation had required participants to submit position papers. 22 position papers were received.

The workshop was chaired by Daniel J. Weitzner (MIT/W3C), Ari Schwartz (Center for Democracy and Technology) and Nigel Shadbolt (University of Southampton, UK).

Content

Workshop Topics

Part 1 - Social and Economic dynamics of public sector information flow on the Web: Keynote and Panel Discussion
Part 2 - Data Integration and Semantic Web Technology: Keynote and Case Studies
Part 3 - Location Information: Keynote
Part 4 - Third party intermediary services for citizens: Panel Discussion
Part 5 - Challenges in interoperability and standards compliance: Panel Discussion
Part 6 - Position papers
- Improving the delivery of eGovernment services through the use of Web technologies
- Policy Approaches
- Semantic Interoperability of Information on the Web
- Web Services and Architectures
Next steps and wrap-up
Thanks and Acknowledgements

Part 1 -- Social and Economic dynamics of public sector information flow on the Web

Chair: Nigel Shadbolt (University of Southampton)

Keynote address: Carol Tullo (Office of Public Sector Information, United Kingdom) [slides]
Ari Schwartz (Center for Democracy & Technology) [panelist]
Mills Davis (Project10X/SICoP) [panelist]

How can we use the World Wide Web to make government more open, efficient, accountable and accessible? We have seen the Internet and the Web transform large sectors of society in many parts of the world's economies and in many aspects of individual lives. Are governments around the world learning to take advantage of the Web to the extent that other sectors do? Keynote speaker Carol Tullo, Director of the Office of Public Sector Information in the British government, identified three key considerations in promoting greater openness and use of public sector information: creating incentives and removing disincentives to sharing information, developing strategies for 'sharing the risk' of information openness with the private sector, and re-thinking the traditional risk averse approach to personal statements by government employees. In addition to widely shared public policy goals that ought to motivate governments to open up their information stores for greater re-use, any discussion of public sector information must begin with a realization that information-related goods and services make up roughly 40% of the gross domestic product in most industrialized countries. Thus government information policy must be seen as a matter of the greatest importance.

Greater sharing of public sector information (PSI) will begin by creating active incentives for government agencies to open their data along with removing disincentives and barriers to data sharing. While many countries' laws and official policy statements declare public access to government information to be a priority and even define government information as owned by the citizenry, the gap between these broad policies and practical reality may be great. Legislatures and government agencies will have to look to creating specific incentives to encourage greater effort toward information sharing. Additionally, real and imagined concerns about privacy, security, and copyright act as barriers to creating public accessibility of government data. In the UK, much government information is covered under Crown Copyright with the expectation that the agencies that produce this information will support their own budgets through a revenue stream based on selling licenses to the information. While the copyright and revenue requirements do pose real barriers to dissemination of this information, it also allows the relevant government bodies to invest more in the quality and innovative uses of this public sector data. In contrast, the United States Government is specifically precluded from asserting copyright interest in almost all government information. Yet even without the barriers that government copyright imposes, there is still wide variation in the successful dissemination of and access to government information in the US. Some agencies put real effort into making data open and accessible. Others treat data accessibility as a low priority at best.

Going forward, governments are encouraged to re-think that traditional risk-averse approach to access and dissemination and look for ways to "share the risk," in Carol Tullo's words, of government data access and use with organizations outside the government. Risk-sharing means recognizing that data created for one use can be valuable for other, unanticipated uses. In this risk-sharing model, government agencies accept that they may lose some control of the data that they think of as 'theirs', but realize the added social benefit of enabling new uses that the agency itself could not bring about on its own.

Along with risk sharing, governments have begun to recognize the need to re-think traditionally risk-averse behavior with respect to information dissemination. While statements and information release by individual government employees has been tightly controlled in the past, the more open style of information exchange on the Web, including blogs and wikis, suggests a need to re-examine these historic restrictions. New protocols to encourage responsible participation in such online fora can yield considerable benefit but must be developed with due attention to needs for accuracy and fairness from the government. The importance of more flexible information access models were clearly demonstrated by the unexpected benefits brought by rapidly-assembled, ad-hoc disaster relief efforts mounted around the world, from Tsunami relief in the Indian Ocean basin to hurricane relief in the southern United States.

Part 2 - Data Integration and Semantic Web Technology

Chair: Daniel Weitzner (W3C/MIT)

Keynote address: Tim Berners-Lee (W3C/MIT) [slides]
Nigel Shadbolt (University of Southampton) [slides] [panelist]
Stuart Shulman (University of Pittsburgh), Collaborative Research: Language Processing Technology for Electronic Rulemaking [slides] [panelist]
Michael Lang (Revelytix/SICoP), Creating and using OWL vocabularies in a wiki: Knoodl, Semantic Wiki [slides] [panelist]

Digital information about nearly every government function is beginning to be created at an astonishing rate. Hidden in plain sight amidst all of this data is the key to great transparency and accountability of government activities. Citizens need the ability to re-use and integrate this data yet technical tools and government policies that shape the way we manage, share, integrate, and analyze this underutilized trove of data are sorely out of date. Use of XML is an important step to data availability, but is not enough alone to enable integration of data described in different schemas. Progress toward better data integration will happen through use of the key piece of technology that made the World Wide Web so successful: the link. The power of the Web today, including the ability to find the pages we are looking for, derives from the fact that documents are put on the Web in a standard form and then linked together. The Semantic Web extends this use of the link to linking among data files. The Semantic Web will enable better data integration by allowing everyone who puts individual items of data on the Web to link them with other pieces of data using standard formats.

Governments looking toward the benefits of Semantic Web technologies may find encouragement in recent pilot projects conducted by the UK Office of Public Sector Information and the University of Southampton's Electronics and Computer Science school. Working with the Camden City Council, the pilot project combines local government data on hygiene inspections of restaurants, Ordnance Survey (Great Britain's national mapping agency) master maps for the Boroughs, and PointX Points of Interest data for Great Britain. Integrating these data sets can lead to information such as a map with locations of food premises in Camden, coloured according to their hygiene score. While all of this data has been previously available, it was not feasible to integrate it into a single useful service without Semantic Web technology. The ability to reuse and remix enables a third party to take various set of data from the public sector, combine it with data from the private sector, and generate new value.

Enabling this type of data integration can be done on a step-by-step basis. Start taking an inventory of the data and then develop ontologies that expose the underlying structure of that data to those who would link and re-use it. Semantic Web will allow developers and users to map ontologies and mix with others later. There is no need to change how data is managed: "don't upset existing systems; build on top", as Tim Berners-Lee said. However some issues arise about who can use the data and for what: licensing and insuring accuracy, as well as ensuring adequate secondary data distribution.

More flexible government use of email can increase citizen access as well. More flexible government use of email can increase citizen access as well. For example, Stuart Shulman of the eRulemaking Research Group recounted a situation in which new tools for the management of citizen email were desperately needed by a government agency. In response to the proposed listing of the polar bear as threatened under the Endangered Species Act, the US Fish and Wildlife Service, several environmental groups launched "Action Alerts" that generated over 500,000 emails during the 90-day comment period. Agency personnel can use the new tools to review, bookmark, and annotate the full set of e-mails received. The system will create accurate counts of the exact duplicates and automatically highlights the unique text comments generated by the public. As a result of these "Tools for Rules," less time will be spent finding the unique comments and more time will be spent reviewing the passages of text inserted by interest group members.

Finally, it is worthwhile to highlight some concern about current huge investments in XML-based technology, the need for training on Semantic Web, and the need for better information on migration paths. We were introduced to a tool called Knoodl a Semantic Wiki that eases development and lets domain experts participate in ontology development even if they are not experts on core Semantic Web technologies.

Part 3 - Location Information

Keynote Address: Vanessa Lawrence (Ordnance Survey, Great Britain) [slides]

The UK Ordnance Survey creates and manages national digital data location data for Great Britain down to building level detail, maintaining a database of 440 million features with approximately 5,000 changes made daily. In 2006/07, 99.86% of Great Britain's real world features were represented in the database within six months of completion on the ground. At most 27% of Great Britain is profitable to map however Ordnance Survey maps the whole nation. From the database, Ordnance Survey produces a variety of digital data and paper maps for business, leisure, educational and administrative use. Revenues from licensing have to exceed the costs and profits are put back into the Ordnance Survey. Less than 50% of the trading revenue is sourced from the public sector.

Ordnance Survey was created to map the south coast of Great Britain to try to prevent the invasion of southern England by Napoleon and is now a civilian organization. Since 1999 Ordnance Survey has had government 'Trading Fund' status, giving it more responsibility for its own finances and planning and more freedom to develop new initiatives as an independent government department. It does not operate outside Great Britain but joined EuroGeographics, a group that represents most of the European National Mapping and Cadastral Agencies, to build the European Spatial Data Infrastructure (ESDI) to achieve interoperability of European Geospatial Information. Ordnance Survey is also working on the OpenSpace project aimed to encourage people to innovate with their own map-based ideas using Ordnance Survey's data made available for free for non-commercial purpose.

Location plays a critical strategic role in many aspects of business decision-making. For example, insurance firms regularly include geographical analysis as part of their underwriting process. There is massive potential in assets of local government. One problem in England is buried utility services. The cost is extremely high for not knowing precisely where pipelines and cables are buried. There are 4 million road excavations every year. The Institution of Civil Engineers has recommended that all buried services should from now on be captured under the Digital National Framework standards, the standard way of sharing information across utilities.

Change is familiar to this community, with maps moving from paper to digital data files to an object oriented spatial database to Web services. The transition has involved not only topography but transport network layers as well as addresses and image data. Every part of the database has a unique 16 digit code which allows people to link information to the Ordnance Survey information. Ordnance Survey has adopted the Open Geospatial Consortium standards and started to use Semantic Web technologies and publishing ontologies; when the Ordnance Survey started to use international standards, help build them, and share data in those formats, interoperability started to be seen as something achievable. Instead of people just collecting the data on their own, open, transparent data formats and ontologies enable people to link their data with others. Everyone looks at the same object in different ways, and so it is important that people are able to link information.

Part 4 - Third party intermediary services for citizens

Chair: Ari Schwartz (Center for Democracy and Technology)

Tom Steinberg (MySociety, UK) [slides]
John Wonderlich (The Sunlight Foundation), How the Sunlight Foundation Promotes Transparency in Government [no slides]
J.L. Needham (Google), Ensuring government is only one search away: Implementing the Sitemap protocol [slides]
Lynne Bradley, Carrie Russell (American Library Association) [slides]

From small organizations such as MySociety to large ones such as Google, we heard how the citizenry is interacting with the Government. MySociety's Fix My Street allows a citizen to report local problems (graffiti, fly tipping, broken paving slabs, or street lighting) based on postal code search. The citizen can stick a pin on a map and it will send the information to relevant authority. Masses of geographical and political jurisdiction information are needed to make this project work. The project achieved a good relationship with the government and has attempted to teach the government how to continue this kind of participation. We also heard examples of innovative services (both for-profit and non-profit) such as Follow the Money, MapLight, OpenCongress, GovTrack and TheyWorkForYou, all of which build on government data but provide services that the government agencies could not or would not offer. In some cases this is because the agency could not afford to do so and in other cases, the services cast a critical, unflattering light on government actions. Needless to say, numerous commercial services are also built up by adding value to government data; everything from mapping services to advanced data analysis of government activities has been offered.

Most third party services that enable re-use and integration of government data are forced to overcome a common problem: the raw data on which they depend is in uneven formats and it thus requires hard work to make it usable. Some such as OpenHearings or LOUIS (whose goal is to create a comprehensive, completely indexed and cross-referenced depository of federal documents from the executive and legislative branches of government) do all that hard work and then publish the information in standard formats such as RSS and iCalendar for other to reuse. Making the data available in standard, semantic based formats from the outset would allow much easier reuse for people and would make government more transparent and accountable. What data? Since those services are usually driven by the users, one recommendation could be to set up a site so they could describe the pieces of information they want and find if certain key players show demand for some kind of information. When considering pricing for this information, governments should examine the charging ratio because in most cases it was written pre-Semantic Web, the economics have changed, and the cost of getting charges wrong could be enormous.

Despite problems with data format, public sector information is seen to be extremely trustworthy and authoritative. Citizens usually get to government information in the same way they get to news stories or other sources of information; through a search engine. Some government data on the Web is not readily discoverable through major search engines, however. In some cases an agency may think it does not need the Web channel or may be concerned that a user navigating deep into a Web site may be left with inadequate context. Google, Microsoft and Yahoo are promoting the sitemap protocol, a way to tell the search engine what can be used and what can be not, keeping the owner of the information in control, used in sites such us plainlanguage.gov.

Finally, some citizens who lack Internet access at home or in their place of employment go to public libraries to use the web. Public libraries are trusted institutions and their buildings are often unusually sturdy, due to age. Thus, they are often able to withstand hurricane and disaster damage, making them exceptionally useful as central points for information regarding family members, emergency services, and other necessities during natural disasters. Currently, almost 100% of libraries in the US offer Internet and WiFi access, and they are starting to play a new role, for example helping with taxes and immigration and insurance and as a portal to other government services.

Part 5 - Challenges in interoperability and standards compliance

Chair: Daniel Weitzner (W3C/MIT)

David Capozzi (US Access Board) [slides]
Judy Brewer (W3C/WAI) [slides]
Miguel Porrúa (Organization of American States) [slides]
Kevin Novak (Library of Congress) [no slides]

W3C established the Web Accessibility Initiative (WAI) in 1997 to develop strategies, guidelines, and resources to help make the Web accessible to people with disabilities. WAI develops accessibility guidelines, specifically the Web Content Accessibility Guidelines (WCAG), the Authoring Tool Accessibility Guidelines (ATAG) and the User Agent Accessibility Guidelines (UAAG). The most widely known of these is WCAG, which has been referenced in Web accessibility policies in many countries. US Section 508 requires federal departments and agencies that develop, procure, maintain, or use electronic and information technology to ensure that Federal employees and members of the public with disabilities have access to and use of information and data comparable to that of the employees and members of the public without disabilities. The focus of Section 508 is improved access to mainstream technologies and improved interoperability with assistive technologies. The Web portion of the initial Section 508 standard was generally based on WCAG 1.0.

Broad adoption of WCAG has not come without challenges. Many government policies and standards diverged from the global standard in one way or another, creating fragmentation and raising the need for harmonization. So there are a variety of WCAG 1.0 spin-offs around the world, and authoring tool developers saw no unified market. In the US in 2006, the Access Board convened an advisory committee (TEITAC) to advise on updating Section 508 standards and bringing it into more harmonization with current standards. Harmonization of accessibility standards is an issue at the provincial/state and local level as well as at the national level. W3C/WAI coordinates with international standards organizations to promote increased harmonization of accessibility standards . In addition to its work on Web accessibility guidelines, WAI also reviews W3C technologies to ensure support for accessibility needs, and develops materials for education and outreach on Web accessibility, including materials to prepare the Web community for the next release of the Web Content Accessibility Guidelines, WCAG 2.0. When allocating resources to Web accessibility, it is advisable to use W3C/WAI guidelines as these are the international standard, and to use any available local resources to train Web developers and accessibility assessment resources.

We heard from the Organization of American States (SEDI/OAS) about the political framework and projects in Latin America and The Caribbean (LAC). As the result of the demands from LAC countries for horizontal cooperation, OAS and other international organizations established REDGEALC, the Network of LAC eGovernment Leaders to improve cooperation, training, exchange of experts, and organize workshops and annual exchange meetings. REDGEALC has 60 members from 32 countries. The basis for their work is set up mainly in two documents: Declaration of Santo Domingo, signed in Dominican Republic on June 2006: “Good governance and development in the Knowledge-based society”, and eLAC 2007: Plan of Action on the Information Society for Latin America and the Caribbean, within the framework of the WSIS process; some of eLAC goals are directly related to eGovernment and interoperability standards. Some of the most advanced countries when talking about eGovernment and interoperability are Brazil, Chile, Mexico, Colombia and Trinidad & Tobago, having Colombia one of the more successful efforts in interoperability with a framework based in their own developed XML schemas (GEL-XML). Nonetheless, at the latest REDGEALC meeting it was agreed that there's still great lack of interoperability in the region, and people still have to fill out the same forms dozens of times.

The US Library of Congress (LoC) exemplifies many challenges faced by an institution that handles a very large base of information. One of the problems is the lack of consistent HTML implementations in browsers, making it costly to make a Web page correctly visible in all. There are also metadata consistency issues, e.g. you may often find a photograph and all you have is the information that exists at the bottom of the photograph and no further information. So you have significant issues in terms of search and display. The Library also has various different priorities. The search technologies are a decade old and use old tools that are file structured. Web Accessibility is also important and even though the Library does not have to comply with section 508, it has chosen to do so. Site-mapping is a laudable goal but its difficult due to the fact that the Library is organized under a file system. Its also a legacy and funding issue. The LoC is trying to separate content from information but it takes time to crank out standardized pages and captioned webcasts.

Part 6 - Position papers

Improving the delivery of eGovernment services through the use of Web technologies

Chair: José Manuel Alonso (W3C/CTIC)

Anil Saldhana (Redhat), Secure E-Government Portals- Building a web of trust and convenience for global citizens [slides]
Kevin Novak, Michelle Springer (Library of Congress), Government as a Participant in Social Networks. Adding Authority to the Conversation [slides, slides]
Chris Testa (Library of Congress), Designing the User-Centric eGovernment: The THOMAS Legislative Information System [slides]

The Web has become a much more active experience and the question now is how this service will benefit “me”, the individual in an active not in a passive sense. Institutions such as the Library of Congress are authoritative sources of information, having a large knowledge base, offering 22+ million items online to millions of users from almost 200 countries, but Wikipedia and others are quickly becoming the first stop. Institutions need to evolve and must now meet users in their spaces, communities, and environments, and recognize the importance of the relationship to their users and the relationship and relevance of the content to their users. The new issue is how, as an authoritative source, to handle handle user-created content among various growing social networks. These government and academic institutions are slow to change but need to adapt to this new model. The Library opened a blog and a RSS feed, and is experimenting with Flickr to gain a better understanding of how social tagging and community input could benefit both the Library and users of the collections. Can those tags added by the public improve information of the picture itself? And if so, how can that information be mixed with the authoritative one that LoC already has in their systems? Institutions are becoming more open to outside participation but there is also the responsibility of being an authoritative source. These questions can be easily generalizable and applicable to many public institutions.

Services such as the much used, intergovernmental cooperation THOMAS, the legislative information system, are also evolving but not without challenges. The audience of THOMAS is very diverse and this presents a key challenge to define intuitive user interfaces when all these different users have different behaviors. On the architectural side, there's no URI that identifies the definitive source of a bill. The Library is also adding more metadata to the system to allow the user to search everything in just one search box, as well as a top-level taxonomy, add enhanced navigation, and data re-use.

There was also discussion about how to build a trust context for the citizen in order to increase use of services and the importance of governments providing the average user a secure government service. Having a secure portal is an efficient and timely way to provide eGovernment services. One of the biggest problems is how to make the end-user feel more secure while making her life easier. One potential improvement is to delegate the manual security procedures as much as possible to technology, such as the Web browsers, making them more "security-aware". The technology already exist, what is needed is to provide the context to make them work and inform the user better about the security context in which she is operating (eg. making impossible for hackers to fake security indicators). There is also the idea of just one Federated identity that works across government sites, that way when you link from one trusted eGovernment service you can trust that they will be securely transferred to another eGovernment service. Services need to be interconnected to make them more friendly to use.

Policy Approaches

Chair: Ralph Swick (W3C/MIT)

Phillip Hallam-Baker (Verisign), A Pre-History of Web Politics [slides]
Jeffrey C. Griffith, Beyond Transparency: New Standards for Legislative Information Systems [slides]
John Sheridan (UK Office of Public Sector Information), Position Paper [slides]
Patrice McDermott (OpentheGovernment.org), What is E-Government – How Will It Affect Us?

In Phillip Hallam-Baker's words "the Web is about changing power and politics". If you control the information you can control the power and freeing that information frees that power. There have been laws in the US for decades requiring the government to disseminate information. Often the agencies themselves cannot even put up all of their information. Right now it is hard to get a policy paper or memo without knowledge of the structure of the agency. This gap could be closed by a description of the record locators in terms of who makes what and when they are disposed of. Yet for most agencies, such data is not available on the Web. It's not possible to get to a truly Web-enabled government until the government releases the information that it has. Much information is not available not because its not technically feasible but because either the political will is not there or there is a political decision to keep the data private which means there's still a very long way to transparent government. Transparency is necessary but not sufficient; clarity, context, timeliness and accuracy are other important features this information needs. Citizens are using the Web more and more and are demanding more information. There is a lot of information that citizens can’t find and can’t use, either because its print, or because it has not been cataloged or indexed. This is also a problem between agencies, since different agencies can't know everything other agencies are doing.

Legislative bodies have a complicated organizational structure and it's not always clear who is in charge of what. A usual challenge is that a combination of the management decision and the political decision makes the definitive decision and what is also needed is technologists who inform the political layer and are willing to implement innovative decisions. Governments should not think about the technology, but think about the functional interfaces between their citizenry and the public. Standards can make these interfaces more efficient. Consider e-forms: if instead of trying to do every form exchange in electronic format, if government allows from the beginning citizens to print out forms and send them in, that would be a good start to have eGovernment working and the system could evolve gradually. Another example is the first version of THOMAS, it had its problems however the public loved it because they had access to information they never had before. Agencies also have to see that the most innovative services in this sense, such as the already mentioned OpenCongress and GovTrack are being developed by outside organizations.

Some governments are going a step further and providing the information in formats that are easy to access and reuse. The UK OPSI, is working on having referenceable URIs in every section of every document it publishes and is working on publishing the London Gazette in RDF form and using Semantic Web technology in defining small notices. This could act as a root leading to a trunk of data-mashing and flowering of utility.

Semantic Interoperability of Information on the Web

Chair: Brand Niemann (EPA/SICoP) [slides]

Mills Davis (Project10X/SICoP), Position Paper from the Federal CIO Council's Semantic Interoperability Community of Practice (SICoP) [slides]
Chris Testa, David Woodward (Library of Congress), Improving Search for Government Resources: Open Standards and Government, Industry Collaboration [slides]
Gary Berg-Cross (Engineering Management & Integration (EM&I)), Exploring eGov Cooperation and Knowledge Sharing using Geospatial Ontologies in a Semantic Wiki [slides]
James Bryce Clark (OASIS), Practical lurches towards semantic interoperability, and standards mash-ups in public sector data [slides]
Richard C. Murphy (General Services Administration), Information Flow in the Federal Enterprise Redux: Governing Federations, Sharing Information and Ensuring Privacy [slides]

As we heard several times, citizens cannot find the information they want. One of the biggest issues is that things are not on the Web, and the first step is to get them out there. Government Web sites and knowledge portals are among the most complex Web sites in existence, based on the size (368 million Web pages in US .gov domain in 2005), number of users, number of information providers and the diversity of information. In the US Library of Congress, the deep and broad sets of collections are often not reachable from the lead page search field. LoC's approach to solve the problem is metasearching, integration of its different catalogs to leverage the knowledge between applications. The main criteria was that citizens had to be able to search all of the resources that were available on the net. One of the solutions is not to change the underlying systems but to leverage them to improve discovery of information. In a short development time the system can offer a meta search that allows the user to see one result set from all the various systems.

Web content is growing, but does not always cohere in ways that make it easy to navigate and use. This is a problem that also appears when defining enterprise architectures: the meaning of things in a strategy are not well related to the things below them, and many of them are based on natural language which people can understand but are difficult to process. The content in documents frequently use a round-about language that is hard to read and that many people cannot even understand and, of course, misunderstandings arise (eg. it depends on who reads a given document/policy to see how it's applied). This should be avoided but it's not easy yet. The Semantic Web is the enabling technology of collaboration and cooperation to improve these scenarios, but major problems still are procurement categories and intergovernmental funding in researching the technologies to allow progress to the ubiquitous Web, increasing social connectivity and increasing knowledge connectivity. It's very important that increased semantic standards are supported by analysis.

The Semantic Web is a work in progress and we are seeing increasing Semantic Web deployment now, but there are also some emerging technologies such as those in the AQUAINT (advanced language understanding, reasoning about time, and question answering) and NIMD (research and development for gathering intelligence from massive data) programs. These two projects provide approaches to allow disambiguated statements and give context to allow the moving forward of knowledge sharing.

We heard some successful examples from OASIS, a member led, international, non-profit, standards consortium concentrating on global information exchange, about the use of Semantic Web based standards in real use cases. In the EU Auto Repair case, automobile manufacturers and people repairing them came to OASIS and developed a standard that relies on RDF to exchange information over time to accomplish a variety of tasks. There was lack of political agreement to accept it but eventually it was passed into law.

There is also the question about open standards. Increasingly, it matters to government regulators and implementers whether standards are developed under an open, fair, vendor-neutral process. Apparently, US agencies prefer voluntary consensus standards to federally created ones. We need to make sure all the standards bodies are working together and standards organizations are not often very good at teaching people how to use a standard and explain best practices, what a service is and when you should use it, to allow people to use the technology as it was designed to be used. Semantic Web is one of the areas where more work with this approach could lead to better results.

Efficiencies in the market place can be created via standards. Some work at US agencies such as GSA has been to specify knowledge artifacts within language standards. There is some work that goes on to do some early standardization through what is called a profile, an informal representation of what people would like to standardize.

Some US agencies have invested in semantic technologies and the question about privacy and trust of the data came up again and they had to examine the risk-exchange. For example, when dealing with privacy issues, there is a difference between personal and government privacy – personal privacy is control over the information that you give out; governmental privacy covers who may see the information that I have or the government collects about me. It's likely that semantic technologies could be both a means to expose more data and a solution to protecting its proper usage. Resources such as the US privacy act that are written very algorithmically could be modeled in OWL and therefore could answer questions that a user could ask of the code.

Web Services and Architectures

Chair: Kevin Novak (Library of Congress)

Brad J. Cox (Binary Group), Paving the Bare Spots: Towards an Enterprise-wide Defense Service Bus (and amplification) [slides, slides]
Cory Casanave (Model Driven Solutions), The Architecture of Services: Achieving Business Value with SOA in the Government [slides]

Much of the discussion over the workshop was about higher-level information services but not much focus towards the existing infrastructure that makes them possible, such as Software Oriented Architectures (SOA), and how security and interoperability is handled in these projects. This session concentrated more in the latter and why standards and interoperability policy are necessary but not sufficient. We were briefly introduced to SOA and then to other slightly different architectural approaches: Service Component Architecture (SCA) and Model Driven Architecture (MDA).

Although for some SOA is just start exposing capabilities as services, using these to make new services and “mash up” applications, SOA is more than that. It's an enterprise and business architecture approach, a way to understand and integrate the enterprise in the context of its community and as a network of business services, and a way to expose existing capabilities to integrate applications and create new composite solutions. As any architectural solution, SOA-based ones should be planned for longevity and loose coupling. Governments can create interoperability policy directives and standards and mandate that those solutions should conform to them. Brad J. Cox proposed a different model: getting the desired behavior by having a community of practitioners providing reference implementations.

In any case, there are gaps in the process and things that the government feels it absolutely must have, security being the name for the bundle; and all those security things have to be added to each and every service and added at the code level. A new approach is the Service Component Architecture, an emerging OASIS standard based on JBI (“Java Business Integration”) that is programming language agnostic. It’s a component based approach to replace monolithic code that allows to re-use off-the-shelf components without any code change which means that the components can be pre-tested and stockpiled and ultimately configured not with manual XML commands but to some higher-level language.

The idea of treating the enterprise as services at both the business and technology levels make you able to decide whether to use XML alone or RDF in XML, the technologies will change but you can’t isolate your business semantics from that. In the Model Driven Architecture approach the basic idea is to have a layer of architectures, and a logical systems model of how the technical solutions will work, and then you get to a technical model that has the actual technical specifications.

Next steps and wrap-up

The final workshop session considered key lessons learned and identified possible next steps that W3C and WSRI could take together with the eGovernment user, vendor and research communities around the world.

A. Key learning and goals

After two days of policy and technology presentations, the over-arching theme heard over and over again is the need to take steps, institutional, legal and technical, toward publishing data with re-use in mind. We identified four main steps to meet this goal:

Shift the risk and share the rewards: giving independent service creators open data leverages their energy
Linked Data is the goal: all public sector information published by governments should be presented on the Web using standard structured data and Semantic Web formats so that users and third-party service providers can explore, discover and exploit links amongst this data and links between PSI and other information on the Web.
Start simple: get information on the Web in any open, public format at all. Once information is on the Web, we can work on semantically enabling it, but if it is not up at all nothing can be done.
Apparent difficulty of standards compliance: even today is difficult to build Web pages that work well across different browsers, and designing to standards such as Web accessibility is perceived by some as costly.
Semantic Web as next step: while most commercial systems are already using XML/SOA, it has been hard to convince governments to make this switch as they still do not perceive the Semantic Web to be a mainstream technology.
Global harmonization: standards should tend toward global harmonization to ease procurement and enable sharing of best practice recipes.

B. Next steps

Six key next steps toward improving access to government through better use of the Web and the goal of more open, linkable PSI are:

Best practice guidelines: Best practices drawn from the successes (and failures) of efforts at opening, sharing, and re-using government data should be collected into a set of best practices that identify productive technical, institutional and public policy paths toward more open government information.
Ease standards compliance: partly through additional effort to package, promote, and train on best practices and existing material; standard bodies should try harder to speak in terms of government needs.
Increase citizenship participation: recognize new channels, get the information to the citizens where the citizens are looking for it (YouTube, Flickr, and similar) and make better use of wikis and blogs as means to increase citizenry awareness and participation.
Collective outreach by standards bodies: Leading standards bodies working in fields that support better government data integration and sharing, including but not necessarily limited to the World Wide Web Consortium (W3C), the Internet Engineering Task Force (IETF), OASIS, and the Open Geospatial Consortium (OGC), should work together to promote use of their open standards and identify any gaps to be filled in creating a complete suite of standards to enable open government information.
Advancing the state-of-the-art in data integration strategies through pilot studies and proof-of-concept implementation: Governments and computer science researchers ought to continue to work together to advance the state-of-the-art in data integration and build useful, deployable proof-of-concept demos that use actual government information and demonstrate real benefit from linked data integration. These proof-of-concept tools ought to be targeted to applications that will show real improvement in areas that politicians, government offices and citizens actually need; they should also address the need to provide business cases for transitioning from XML/SOA systems to Semantic Web based ones.
Business case for data integration: gather case studies that demonstrate tangible benefits of government use of open standards to help get information out of proprietary standards.

Thanks and Acknowledgements

W3C & WSRI thank:

UK participants for traveling a long distance to share their perspectives
Enthusiastic support from many in the US Government
Herb Lin and staff at the National Academy of Sciences
Eugene Zinovyev (Center for Democracy and Technology) for taking minutes
University of Maryland Mindlab for workshop support
Judy Brewer (W3C) and Ralph Swick (W3C) for editorial suggestions

Report: Daniel Weitzner and José M. Alonso
$Id: summary.html,v 1.94 2008/02/08 16:29:09 jalonso Exp $

Toward More Transparent Government Workshop on eGovernment and the Web