Difference between revisions of "Use Cases"

From Data on the Web Best Practices
Jump to: navigation, search
(Open Data Life Cycle)
Line 20: Line 20:
 
* Linked data
 
* Linked data
 
* JSON/XML
 
* JSON/XML
 
=== [[Open Data Life Cycle]] ===
 
 
====Data on the Web ====
 
Data from diverse domains (ex: governmental data, cultural heritage, scientific data, cross domain) available on the Web on a machine processable format.
 
 
==== Data on the Web Life Cycle====
 
 
* A set of tasks or activities that take place during the process of publishing and using data on the Web.
 
 
* The process may pass through some number of iterations and may be represented using a spiral model.
 
 
Figure
 
 
==== An Overview of the Data on the Web Life Cycle====
 
 
===== Data collection =====
 
*Sources selection: identification of data sources that may offer relevant data (ex: relational databases, xml files, excel documents)
 
 
===== Data Generation =====
 
* 1st iteration: Dataset project
 
** Define the schema of the target dataset (structural metadata)
 
** Choose standard vocabularies
 
** Data (ex: FOAF, DC, SKOS, Data Cube)
 
** Dataset (ex: DCAT, PROV, VoiD, Data Quality Vocab)
 
** Data Catalog (ex: DCAT)
 
** Choose data formats (machine processable data)
 
** Create new vocabularies
 
** …
 
* 2nd iteration: ETL process (Extract, Transform and Load)
 
**Extract data from the selected data sources, transforms the data according to the decisions made during the dataset project and loads the data into the target dataset
 
**Metadata generation
 
**Produce (manually or automatically) structured metadata according to the metadata standards defined during the dataset project
 
=====Data Distribution=====
 
*1st iteration: Plan
 
**URIs project: Design URIs that will persist and will continue to mean the same thing on the long term
 
**Choose a solution(s) for data publishing data catalogue, API, SPARQL endpoint, dataset dump, …
 
*2nd iteration: Publish
 
**Publish data and metadata: Make data and metadata available on the Web
 
*3rd iteration: Update
 
**Update data: Make a new version of the dataset available on the Web
 
**Update metadata: Make a new version of the metadata available on the Web
 
=====Data usage=====
 
*Explore data: Identify important aspects of the data into focus for further analysis
 
*Analyze data: Develop applications, build visualizations, …
 
*Give feedback: Provide useful information about the dataset (ex: dataset relevance, data quality,…)
 
*Provide data usage descriptions
 
  
 
== Use Cases ==
 
== Use Cases ==

Revision as of 00:55, 20 March 2014

Use Case Notes

Types of use cases

What more do we need? What are we missing?

These may not be groupings for the document; they could be just a checklist to make sure we've accommodated everyone we need to. But the actual document may be arranged differently.

Topics

Do we have enough of each of these? What are we missing?

  • Governments
  • Cultural History
  • Research

Technologies

Do we have scenarios that cover each of these technologies? (Not to focus on technologies; just to make sure we aren't missing any types of Data on the Web)

  • CSV
  • Linked data
  • JSON/XML

Use Cases

To add a new use-case, copy the use-case template and complete all of the sections. Use-case elements are optional, depending on information available. If you want to add a challenge or requirement to somebody else's use-case, please add your name in brackets after your update.

Use Case Template: Please Copy

Contributor:

Overview:

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges:

Potential Requirements:

DBPedia

Contributor:

Overview:

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges:

Potential Requirements:

Documented Support and Release of Data

Contributor: Deirdre Lee (based on email by Leigh Dodds)

Overview: While many cases of Data on the Web may contain meta-data about creation data and last update, the regularity of the release schedule is not always clear. Similarly, how and by whom the dataset is supported should also be made clear in the meta-data. These attributes are necessary to improve the reliability of the data so that third-party users can trust the timely delivery of the data, with a follow-up point should there be any issues.

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges:

  • Describe release schedule in meta-tdata
  • Describe support mechanisms in meta-data

Potential Requirements:

  • Propose use of dcat properties dct:accrualPeriodicity and dcat:contactpoint
  • Potentially extend dcat?

Feedback Loop for Corrections

Contributor: Deirdre Lee (based on email by Leigh Dodds and OKF Greece workshop)

Overview: One of the advantages of publishing Open Data is often quoted as improving the quality of the data. Many eyes looking at a dataset helps spot errors and holes quicker than a public body may identify this themselves. For example, when bus-stop data is published, it may turn out that the official location of a bus-stop is not always accurate, but when this is mashed-up with OSM, the mistake is identified. However, how this 'improved' data is fed back into the public body is not clear. Should there be an automated mechanism for this? How can the improvement be described in a machine readable format? What is best practice for reincorporating such improvements?

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges:

  • Should there be an automated mechanism for this?
  • How can the improvement be described in a machine readable format?
  • What is best practice for reincorporating such improvements?

Potential Requirements:


Datasets required for Natural Disaster Management

Contributor: Deirdre Lee (based on OKF Greece workshop)

Overview: Many of the datasets that are required for Natural Disaster Management, for example critical infrastructure, utility services, road networks, are not available online as they are also deemed to be datasets that could be used for homeland security attacks. (will expand on this use-case once slides are available)

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges:

Potential Requirements:

OKFN Tranport WG

Contributor: Deirdre Lee (based on OKF Greece workshop)

Overview: The OKFN Transport WG have identified the following shortcomings with transport data on the web... (will expand on this use-case once slides are available)

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges:

Potential Requirements:

Publication of Data via APIs

Contributor: Deirdre Lee

Overview: APIs are commonly used to publish data in formats designed for machine-consumption, as opposed to the corresponding HTML pages whose main aim is to deliver content suitable for human-consumption. There remains questions around how APIs can best be designed to publish data, and even if APIs are the most suitable way for publishing data at all (http://ruben.verborgh.org/blog/2013/11/29/the-lie-of-the-api/). Could use of HTTP and URIs be sufficient? If the goal is to facilitate machine-readable data, what is best-practice?

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage: Developer
  • Quality:
  • Size: Use of APIs can serve to increase size of data transfer
  • Type/format: html/xml/json/rdf
  • Rate of change: static/real-time
  • Data lifespan:
  • Potential audience: machine-readable

Technical Challenges:

  • APIs can be too clunky/rich in their functionality, which may increase the amount of calls necessary and size of data transferred, reducing performance
  • Collaboration between API providers and users is necessary to agree on 'useful' calls
  • API key agreements could restrict Openess of Open Data?
  • Documentation accompanying APIs can be lacking
  • What is best practice for publishing streams of real-time data (with/without APIs)?
  • Each resource should have one URI uniquly identifying it. There can then be different representations of the resource (xml/html/json/rdf)

Potential Requirements:

NYC Open Data Program

Contributor: Steven Adler

Overview: Carole Post was appointed by Mayor Bloomberg as Commissioner of the NY Departnmen of IT (DOITT) in 2010 and was the first woman in the city's history to be CIO. She was the architect of NYC's Open Data program, sponsored the Open Data Portal and helped pass the city's Open Data Legislation. On March 11, she gave a presentation to the W3C on her experiences changing the city culture, building the Open Data Portal. A recording of her presentation is provided here: Carole Post Webinar - NYC. A copy of her presentation in PDF can be found here: - Carole Post Presentation on NYC Open Data


Elements:

Recife Open Data Portal

Contributor: Bernadette Lóscio

Overview: Recife is a beautiful city situated in the Northeast of Brazil and it is famous for being one of the Brazil’s biggest tech hubs. Recife is also one of the first Brazilian cities to release data generated by public sector organisations for public use as Open Data. An Open Data Portal Recife was created to offer access to a repository of governmental machine-readable data about several domains, including: finances, health, education and tourism. Data is available in csv and geojson format and every dataset has a metadata description, i.e. descriptions of the data, that helps in the understanding and usage of the data. However, the metadata is not described using standard vocabularies or taxonomies. In general, data is created in a static way, where data from relational databases are exported in a csv format and then published in the data catalog. Currently, they are working to have dynamically generated data from the contents of relational databases, then data will be available as soon as they are created. The main phases of the development of this initiative were: to educate people with appropriate knowledge concerning Open Data, relevant data identification in order to identify the sources of data that their pontential consumers could find useful, data extraction and transformation from the original data sources to the open data format, configuration and installation of the open data catalogue tool, data publication and portal release.

Elements:

  • Domains: Base registers, Cultural heritage information, Geographic information, Infrastructure information, Social data and Tourism Information
  • Obligation/motivation: Data that must be provided to the public under a legal obligation (Brazilian Information Acess Act, edited in 2012); Provide public data to the citizens
  • Usage: Data that supports democracy and transparency; Data used by application developers
  • Quality: Verified and clean data
  • Size: in general small to medium CSV files
  • Type/format: CSV, geojson
  • Rate of change: different rates of changes depending on the data source
  • Data lifespan:
  • Potential audience: application developers, startups, government organizations

Technical Challenges:

  • Use common vocabs to facilitate data integration
  • Provide structural metadata to help data understanding and usage
  • Automate the data publishing process to keep data up to date and accurate

Dados.gov.br

Contributor: Yaso

Overview: Data.gov.br is the open data portal of the Brazil's Federal Government. The site was built in community, in a network pulled by three technicians from the Ministry of Planning. They managed the WG3 from "INDA" or "National Infrastructure for Open Data". The CKAN was chosen because it is Free Software and present more independent solutions for the placement of data catalog of the Federal Government provided on the internet.

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains: federal budget, addresses, Infrastructure information, e-gov tools usage, social data, geographic information, political information, Transport information
  • Obligation/motivation: Data that must be provided to the public under a legal obligation, the called LAI or Brazilian Information Acess Act, edited in 2012
  • Usage:
  • Data that is the basis for services to the public;
  • Data that has commercial re-use potential.
  • Quality: Authoritative, clean data, vetted and guaranteed;
  • Lineage/Derivation: Data came from various publishers. As a catalog, the site has faced several challenges, one of them was to integrate the various technologies and formulas used by publishers to provide datasets in the portal.
  • Size:
  • Type/format: Tabular data, text data
  • Rate of change: There is fixed data and datan with hogh rate of change
  • Data lifespan:
  • Potential audience:

Technical Challenges:

  • data integration (lack of vocabs)
  • collaborative construction of the portal: managing online sprints and balancing public expectatives.
  • Licencing the data of the portal. Most of data that is inn the portal has not a special licence for data. As you can see, there is different types of licences that applied to the datasets.

ISO GEO Story

Contributor: Ghislain Atemezing

Overview: ISO GEO is a company managing catalogs records of geographic information in XML, conformed to ISO-19139. (ISO- 19139 is a French adaptation of the ISO- 19115) An excerpts is here: http://cl.ly/3A1p0g2U0A2z. They export thousands of catalogs like that today, but they need to manage them better. In their platform, they store the information in a more conventional manner, and use this standard for export dataset compliant to Inspire interoperability , or via the CSW protocol. Sometimes, they have to enrich their metadata with other ones, produced by tools like GeoSource and accessed through SDI (Spatial Data Infrastructure), with their own metadata records.

A sample containing 402 metadata records in ISO 19139 are in public consultation at http://geobretagne.fr/geonetwork/srv/fr/main.home. They want to be able to integrate all the different implementations of the ISO 19139 in different tools in a single framework to better understand the thousand of metadata records they use in their day-to-day business. Types of information recorded in each file, see example at http://www.eurecom.fr/~atemezin/datalift/isogeo/5cb5cbeb-fiche1.xml are the following: Contact info (metadata) [Data issued]; spatial representation ; reference system info [code space ], spatial Resolution ; Geographic Extension of the data, File distribution; Data Quality ; process step, etc.

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains: Geographic information,
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size: hundreds (~500) records at regional level
  • Type/format: XML, CSW API
  • Rate of change: Low rate of change
  • Data lifespan:
  • Potential audience:

Technical Challenges:

  • Achieve interoperability between supporting applications, e.g.:, validation and discovery services built over metada repository
  • Capture the semantics of the current metadata records with respect to ISO 19139 standard.
  • Unify way to have access to each record within the catalog at different level e.g.:, local, regional, national or EU level.

Potential Requirements:

Dutch basic registers

Contributor: Christophe Guéret

Overview: The Netherlands have a set of registers they are looking at opening and exposing as Linked (Open) Data under the context of the project "PiLOD". The registers contain information about buildings, people, businesses and other individuals public bodies may want to refer to for they daily activities. One of them is, for instance, the service of public taxes ("BelastingDienst") which regularly pulls out data from several registers, stores this data in a big Oracle instance and curates it. This costly and time consuming process could be optimised by providing on-demand access to up-to-date descriptions provided by the register owners.

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges: In terms of challenges, linking is for once not much of an issue as registers already cross-reference unique identifiers (see also http://www.wikixl.nl/wiki/gemma/index.php/Ontsluiting_basisgegevens). A URIs scheme with predicable URIs is being considered for implementation. Actual challenges include:

  • Capacity: at this point, it can not be asked that every register owner cares for publishing his own data. Some of them export what they have on the national open data portal. This data has been used to do some testing with third-party publication from PiLODers but this is rather sensitive as a long term strategy (governmental data has to be tracable/trustable as such). The middle ground solution currently deployed is the PiLOD platform, a (semi)-official platform for publishing register data.
  • Privacy: some of the register data is personal or may become so when linked to others (e.g. disambiguate personal data based on adresses). Some registers will require to provide secured access to some of their data to some people only (Linked Data, not Open). Some others can go along with open data as long as they get a precise log of who is using what.
  • Revenue: institutions working under mixed gov/non-gov funding generate part of their revenue by selling some of the data they curate. Switching to an open data model will generate a direct loss in revenue that has to be backed-up by other means. This does not have to mean closing the data, e.g. a model of open dereferencing + paid dumps can be considered, as well as other indirect revenue streams.

Potential Requirements:

Wind Characterization Scientific Study

Contributor: [1]

Overview: This use case describes a data management facility being constructed to support scientific offshore wind energy research for the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) Wind and Water Power Program. The Reference Facility for Renewable Energy (RFORE) project is responsible collecting wind characterization data from remote sensing and in situ instruments located on an offshore platform. This raw data is collected by the Data Management Facility and processed into a standardized NetCDF format. Both the raw measurements and processed data are archived in the PNNL Institutional Computing (PIC) petascale computing facility. The DMF will record all processing history, quality assurance work, problem reporting, and maintenance activities for both instrumentation and data.

All datasets, instrumentation, and activities are cataloged providing a seamless knowledge representation of the scientific study. The DMF catalog relies on linked open vocabularies and domain vocabularies to make the study data searchable.

Scientists will be able to use the catalog for faceted browsing, ad-hoc searches, query by example. For accessing individual datasets a REST GET interface to the archive will be provided.

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains:
  • Obligation/motivation:
  • Usage:
  • Quality:
  • Size:
  • Type/format:
  • Rate of change:
  • Data lifespan:
  • Potential audience:

Technical Challenges: For accessing numerous datasets scientists will be accessing the archive directly using other protocols such as sftp, rsync, scp, access techniques such as: http://www.psc.edu/index.php/hpn-ssh

Potential Requirements:

BuildingEye: SME use of public data

Contributor: Deirdre lee

Overview: Buildingeye.com makes building and planning information easier to find and understand by mapping what's happening in your city. In Ireland local authorities handle planning applications and usually provide some customised views of the data (pdfs, maps, etc.) on their own website. However there isn't an easy way to get a nationwide view of the data. BuildingEye, an independent SME, built http://mypp.ie/ to achieve this. However as each local authority didn't have an Open Data portal, BuildingEye had to directly ask each local authority for its data. It was granted access to some authorities, but not all. The data it did receive was in different formats and of varying quality/detail. BuildingEye harmonised this data for its own system. However, if another SME wanted to use this data, they would have to go through the same process and again go to each local authority asking for the data.

Elements: (Each element described in more detail at Use-Case Elements )

  • Domains: Planning data
  • Obligation/motivation: demand from SME
  • Usage: Commercial usage
  • Quality: standardised, interoperable across local authorities
  • Size: medium
  • Type/format: structured according to legacy system schema
  • Rate of change: daily
  • Data lifespan:
  • Potential audience: Business, citizens
  • “Governance”: local authorities

Technical Challenges:

  • Access to data is currently a manual process, on a case by case basis
  • Data is provided in different formats, e.g. database dumps, spreadsheets
  • Data is structured differently, depending on the legacy system schema, concepts and terms not interoperable
  • No official Open license associated with the data
  • Data is not available for further reuse by other parties

Potential Requirements:

  • Creation of top-down policy on Open Data to ensure common understanding and approach
  • Top-down guidance on recommended Open license usage
  • Standardised, non-proprietary formats
  • Availability of recommended domain-specific vocabularies.

Considerations

Use-Case Elements

  • Domains, e.g.
    • Base registers, e.g. addresses, vehicles, buildings;
    • Business information, e.g. patent and trademark information, public tender databases;
    • Cultural heritage information, e.g. library, museum, archive collections;
    • Geographic information, e.g. maps, aerial photos, geology;
    • Infrastructure information, e.g. electricity grid, telecommunications, water supply, garbage collection;
    • Legal information, e.g. supranational (e.g. EU) and national legislation and treaties, court decisions;
    • Meteorological information, e.g. real-time weather information and forecasts, climate data and models;
    • Political information, e.g. parliamentary proceedings, voting records, budget data, election results;
    • Social data, e.g. various types of statistics (economic, employment, health, population, public administration, social);
    • Tourism information, e.g. events, festivals and guided tours;
    • Transport information, e.g. information on traffic flows, work on roads and public transport.
  • Obligation/motivation, e.g.
    • Data that must be provided to the public under a legal obligation, e.g. legislation, parliamentary and local council proceedings (dependent on specific jurisdiction);
    • Data that is a (by-)product of the public task, e.g. base registers, crime records
  • Usage, e.g.
    • Data that supports democracy and transparency;
    • Data that is the basis for services to the public;
    • Data that has commercial re-use potential.
    • Data that the public provides;
    • Utilization Rates once the Data is published;
  • Quality, e.g.
    • Authoritative, clean data, vetted and guaranteed;
    • Unverified or dirty data.
  • Lineage/Derivation, e.g.
    • Where the Data Came from;
    • What formulas were used to process the data;
    • How long the contolling authority had the data;
  • Size (ranging from small CSV files of less than a megabyte to potentially tera- or petabytes of sensor or image data)
  • Type/format, e.g.
    • Text, e.g. legislation, public announcements, public procurement;
    • Image, e.g. aerial photos, satellite images;
    • Video, e.g. traffic and security cameras;
    • Tabular data, e.g. statistics, spending data, sensor data (such as traffic, weather, air quality).
    • Data Classification
  • Rate of change, e.g.
    • Fixed data, e.g. laws and regulations, geography, results from a particular census or election;
    • Low rate of change, e.g. road maps, info on buildings, climate data;
    • Medium rate of change, e.g. timetables, statistics;
    • High rate of change, e.g. real-time traffic flows and airplane location, weather data
  • Data lifespan
  • Potential audience
  • Certification/Governance
    • Individuals or systems that certified the data for publication
    • Processes and steps documented to publish
    • public access and redress for data quality
    • Indication of FOIA Status

Linked Data Glossary

A glossary of terms defined and used to describe Linked Data, its associated vocabularies and Best Practices: http://www.w3.org/TR/ld-glossary/

Common Questions to consider for Open-Data Use-Cases

  1. Did you have a legislative or regulatory mandate to publish Open Data?
  2. What were the political obstacles you faced to publish Open Data?
  3. Did your citizens expect Open Data?
  4. Did your citizens understand the uses of Open Data?
  5. Did you publish data and information available in other forms (print, web, etc) first?
  6. How did you inventory your data prior to publishing?
  7. Did you classify your data as part of the inventory?
  8. How did you transform printed materials into Open Data?
  9. Does your city certify the quality of the data published and what steps are involved in cerficiation?
  10. Do you have data traceability and lineage - ie, do you know where your data came from and who has transformed it?
  11. Can you provide an audit trail of data usage and security prior to publication?
  12. Can you track the utility of the data published?
  13. Are you using URIs to identify data elements?
  14. Do you have a Data Architecture?
  15. What is your Data Governance structure and program?
  16. Do you have a Chief Data Officer and Data Governance Council who make decisions about what to publish and how?
  17. Do you have an Open Data Policy?
  18. Do you do any Open Data Risk Assessments?
  19. Can you compare your Open Data to neighboring cities and regions?
  20. Do you provide any Open Data visualization and analytics on top of your publication portal?
  21. Do you have a common application development framework and cloud hosting environment to maintain Open Data apps?
  22. What legal agreements and frameworks have you developed to protect your citizens and your city from the abuse and misuse of Open Data?

Stories

NYC Council needs modern and inexpensive member services and tools constituent services

Date: Monday, 23 Feb 2014 From: Noel Hidalgo, Executive Director of BetaNYC To: NY City Council’s Committee on Rules, Privileges and Elections. Subject: For a modern 21st Century City, NY Council needs modern and inexpensive member services and tools constituent services.

Dear Chairman and Committee Member,

Good afternoon. It is a great honor to address you and represent New York City’s technology community. Particularly, a rather active group of technologists ­ the civic hacker. I am Noel Hidalgo, the Executive Director and co­founded of BetaNYC [1]. With over 1,500 members, BetaNYC’s mission is to build a city powered by the people, for the people, for the 21st Century. Last fall, we published a “People’s Roadmap to a Digital New York City” where we outline our civic technology values and 30 policy ideas for a progressive digital city [2]. We are a member driven organization and members of the New York City Transparency Working Group [3], a coalition of good government groups that supported the City’s transformative Open Data Law.

In 2008, BetaNYC got its start by building a small app on top of twitter. This tool, Twitter Vote Report, was built over the course of several, then, developer days, now, hacknights, and enabled over 11,300 individuals to use a digital and social tool to provide election projection. [4]

Around the world, apps like this catalyzed our current civic hacking moment. Today, hundred of thousands of developers, designers, mappers, hackers, and yackers (the policy wonks) volunteer their time to analyze data, build public engagement applications, and use their skills for improving the quality of lives of their neighbors. This past weekend, we had Manhattan Borough President Gale Brewer, Councilmember Ben Kallos, Councilmember Mark Levine, a representative from Councilmember Rosie Mendez, and representatives from five Community Boards kick challenge over 100 civic hackers to prototype 21st Century interfaces to NYC’s open data. [15]

Though this conversation on rules reform, you have an opportunity to continue the pioneering work, a small talented team of civic hackers and I did WITHIN the New York State Senate.

In 2004, I moved from Boston to work for then Senator Patterson’s Minority Information Services department. In 2009, I re­joined NY State Senate’s first Chief Information Officer office. Our team’s mission was to move the State Senate from zero to hero, depoliticize technology, and build open­reusable tools for all.

In the course of four months, we modernized the Senate’s public information portal. Leading the way for two years of digital transparency, efficiency, and participation. These initiatives were award winning and done under the banner of “Open Senate” From Andrew Hoppin’s blog, the former NY State Senate CIO. [5]

Open Senate is an online “Gov 2.0′′ program intended to make the Senate one of the most transparent, efficient, and participatory legislative bodies in the nation. Open Senate is comprised of multiple sub­projects led by the Office of the Chief Information Officer [CIO] in the New York State Senate, ranging from migrating to cost effective, open­source software solutions, to developing and sharing original web services providing access to government transparency data, to promoting the use of social networks and online citizen engagement.

We did this because we all know how New Yorkers are getting their information. I don’t need to sit here and spout off academic number of digital connectivity. One just has to hop into a subway station to see just about everyone on some sort of digital device. For a modern NY City Council with 21st century members services, the council needs a Chief Information Officer and dedicated staff. The role of this office would be similar to the NYSenate’s CIO. Be empowered to create ranging from migrating to cost effective, open­source software solutions, to developing and sharing original web services providing access to government transparency data, to promoting the use of social networks and online citizen engagement.

Through this office, the Council would gain an empowered digital and information officer to coordinate the development and enhancement of member and constituent services.

Member services could be improved with the following.

  • Online and modern digital information tools.
    • Imagine a council website that you can call your own and include official

videos, photos, hearing, press releases, petitions, interactive maps of your district, online forms, event notifications, and online townhalls.

  • Usable and updateable committee websites.
  • Constituent tracking & relationship management tools.
    • Imagine being able to take a constituent issue and automatically file a 311 complaint and monitor the status of the complaint to completion. Imagine being able to send targeted constituent messages and reduce your paper mailings.
    • Imagine being able to survey your constituents via a mobile app or sms.
  • Better business and internal technology practices
    • No matter where you are, from desktops to mobile devices, you could always have access to council's internal systems while on the go.
  • A more usable interfaces to legislation
    • Imagine a simpler interface to legistar that integrates constituent comments and public feedback.
  • Real Time dashboards of 311 call and ticket status, municipal agency performance tracking, and budget expenditure tracking.
    • Imagine a monitor in your office and a website you could send to your constituents that demonstrates government performance in your district.
  • A universal participatory budgeting tool that works for all council districts.
    • Imagine a tool that cuts across the digital divide and empowers all to participate in participatory budgeting.

In our “People Roadmap to a Digital New York City,” we specifically call on the Council to adopt the following programs.. This is a brief summary of them:

  • Create "We the People of NYC," a petition tool for any elected representative. [6]
  • Update and centralize NYC’s Freedom of Information Law [7] [8]
  • Publish the City Record Online [9]
  • Expand the 311 system by implementing and growing the Open311 standard [10]
  • Release government content under a Creative Commons license [11]
  • Equip Community Boards with better tools [12]
  • Expand Participatory Budgeting [13]
  • Put the NYC Charter, Rules, and Code online [14]

Hidalgo, Noel Monday, 24 Feb 2014 BetaNYC’s testimony in favor of better member services

URLS Referenced:

Hidalgo, Noel Monday, 24 Feb 2014 BetaNYC’s testimony in favor of better member services


Palo Alto Open Data Story

On February 17th we heard a use case presentation from Jonathan Reichental, CIO of the City of Palo Alto.
A recording of the use case presentation can be found here: Palo Alto - Open by Default
1. We can explore the use of URI's for Open Data elements and physical things in a city that have multiple data elements
2. Cities are not yet tagging their data with metadata to allow comparability
3. There are not yet mechanisms to allow citizens to improve data completeness
4. Cities have internal processes for assuring data quality including sign-offs from IT and public officials but these activities are not recorded in metadata and provided with the datasets
5. Cities are not tracing origin and lineage
6. tuples would be a good way to identify relationships between things and data elements that could allow machine comparability of data sets in an internet of things that open data describes

Palo Alto pledged to be a partner with w3C in our WG, which is a great outcome.

ISO GEO Story

ISO GEO is a company managing catalogs records of geographic information in XML, conformed to ISO-19139. (ISO- 19139 is a French adaptation of the ISO- 19115) An excerpts is here: http://cl.ly/3A1p0g2U0A2z. They export thousands of catalogs like that today, but they need to manage them better. In their platform, they store the information in a more conventional manner, and use this standard for export dataset compliant to Inspire interoperability , or via the CSW protocol. Sometimes, they have to enrich their metadata with other ones, produced by tools like GeoSource and accessed through SDI (Spatial Data Infrastructure), with their own metadata records.

A sample containing 402 metadata records in ISO 19139 are in public consultation at http://geobretagne.fr/geonetwork/srv/fr/main.home. They want to be able to integrate all the different implementations of the ISO 19139 in different tools in a single framework to better understand the thousand of metadata records they use in their day-to-day business. Types of information recorded in each file, see example at http://www.eurecom.fr/~atemezin/datalift/isogeo/5cb5cbeb-fiche1.xml are the following: Contact info (metadata) [Data issued]; spatial representation ; reference system info [code space ], spatial Resolution ; Geographic Extension of the data, File distribution; Data Quality ; process step, etc.

BuildingEye: SME use of public data

Buildingeye.com makes building and planning information easier to find and understand by mapping what's happening in your city. In Ireland local authorities handle planning applications and usually provide some customised views of the data (pdfs, maps, etc.) on their own website. However there isn't an easy way to get a nationwide view of the data. BuildingEye, an independent SME, built http://mypp.ie/ to achieve this. However as each local authority didn't have an Open Data portal, BuildingEye had to directly ask each local authority for its data. It was granted access to some authorities, but not all. The data it did receive was in different formats and of varying quality/detail. BuildingEye harmonised this data for its own system. However, if another SME wanted to use this data, they would have to go through the same process and again go to each local authority asking for the data.

Recife Open Data Story

Recife is a beautiful city situated in the Northeast of Brazil and it is famous for being one of the Brazil’s biggest tech hubs. Recife is also one of the first Brazilian cities to release data generated by public sector organisations for public use as Open Data. An Open Data Portal was created to offer access to a repository of governmental machine-readable data about several domains, including: finances, health, education and tourism. Data is available in csv and geojson format and every dataset has a metadata description, i.e. descriptions of the data, that helps in the understanding and usage of the data. However, the metadata is not described using standard vocabularies or taxonomies. In general, data is created in a static way, where data from relational databases are exported in a csv format and then published in the data catalog. Currently, they are working to have dynamically generated data from the contents of relational databases, then data will be available as soon as they are created. The main phases of the development of this initiative were: to educate people with appropriate knowledge concerning Open Data, relevant data identification in order to identify the sources of data that their pontential consumers could find useful, data extraction and transformation from the original data sources to the open data format, configuration and installation of the open data catalogue tool, data publication and portal release.

Dutch basic registers

Author: Christophe Guéret

Story: The Netherlands have a set of registers they are looking at opening and exposing as Linked (Open) Data under the context of the project "PiLOD". The registers contain information about buildings, people, businesses and other individuals public bodies may want to refer to for they daily activities. One of them is, for instance, the service of public taxes ("BelastingDienst") which regularly pulls out data from several registers, stores this data in a big Oracle instance and curates it. This costly and time consuming process could be optimised by providing on-demand access to up-to-date descriptions provided by the register owners.

Challenges: In terms of challenges, linking is for once not much of an issue as registers already cross-reference unique identifiers (see also http://www.wikixl.nl/wiki/gemma/index.php/Ontsluiting_basisgegevens). A URIs scheme with predicable URIs is being considered for implementation. Actual challenges include:

  • Capacity: at this point, it can not be asked that every register owner cares for publishing his own data. Some of them export what they have on the national open data portal. This data has been used to do some testing with third-party publication from PiLODers but this is rather sensitive as a long term strategy (governmental data has to be tracable/trustable as such). The middle ground solution currently deployed is the PiLOD platform, a (semi)-official platform for publishing register data.
  • Privacy: some of the register data is personal or may become so when linked to others (e.g. disambiguate personal data based on adresses). Some registers will require to provide secured access to some of their data to some people only (Linked Data, not Open). Some others can go along with open data as long as they get a precise log of who is using what.
  • Revenue: institutions working under mixed gov/non-gov funding generate part of their revenue by selling some of the data they curate. Switching to an open data model will generate a direct loss in revenue that has to be backed-up by other means. This does not have to mean closing the data, e.g. a model of open dereferencing + paid dumps can be considered, as well as other indirect revenue streams.


Wind Characterization Scientific Study

Author: [2]

Story: This use case describes a data management facility being constructed to support scientific offshore wind energy research for the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) Wind and Water Power Program. The Reference Facility for Renewable Energy (RFORE) project is responsible collecting wind characterization data from remote sensing and in situ instruments located on an offshore platform. This raw data is collected by the Data Management Facility and processed into a standardized NetCDF format. Both the raw measurements and processed data are archived in the PNNL Institutional Computing (PIC) petascale computing facility. The DMF will record all processing history, quality assurance work, problem reporting, and maintenance activities for both instrumentation and data.

All datasets, instrumentation, and activities are cataloged providing a seamless knowledge representation of the scientific study. The DMF catalog relies on linked open vocabularies and domain vocabularies to make the study data searchable.

Scientists will be able to use the catalog for faceted browsing, ad-hoc searches, query by example. For accessing individual datasets a REST GET interface to the archive will be provided.

Challenges: For accessing numerous datasets scientists will be accessing the archive directly using other protocols such as sftp, rsync, scp, access techniques such as: http://www.psc.edu/index.php/hpn-ssh

Use Cases Document Outline

Use Cases Document Outline