Projects/FB-PM-Demo

From W3C eGovernment Wiki
Jump to: navigation, search

Freebase Background

(NOTE - most of the W3C eGov-IG folks prob know all this about Freebase and other stuff mentioned...this is again a recreation of work within a US Fed Gov only wiki in an attempt to establish the utility of GLD.)

Anyone can create a 'base' on Freebase. I created one called 'bizmo'. It is not listed in the Freebase public directory of bases (mostly because I have some obvious IP licensing issues here, and I'm just using this implementation as an 'academic' example). While it can be accessed directly if you know the URL, it won't make much sense to anyone that doesn't know what's going at this early stage, which is why I've provided the Interaction script steps 1-10 below. (Skip to this if you already know about RDF, Linked Data, WOA, etc.) I am the only member and the administrator of this base. Social media features such as community contributions and discussion forums are available per base, and you can get feeds on individual contributions and/or overall base activity (updates).

Caveat Vendor

I'm not selling Freebase, although I really like Freebase (insert joke here). I chose Freebase/Parallax for this particular demo because it enabled me to quickly and easily sketch out a datamodel that can link across disparate domains like; Performance Management, Enterprise/Segment/Solution/Technical Architecture, Capital Planning, Earned Value Management, SDLC, SOA, etc. I've actually only worked on a very few of these domains thus far, and recreating these concepts in this way has legal issues I've yet to address. How Freebase the site actually works is consistent with Web Oriented Architecture (WOA) principles, and also allowed me to demonstrate a novel way for a user to experience a dataset based on this (draft, work in progress) graph oriented datamodel.

Interaction script

(NOTE for Govies that unfortunately are still using IE6 - Parallax/Freebase isn't real happy with IE6, but everything should work fine on pretty much any other browser you're likely to be using. You do not need to have a freebase account or be logged in to go through steps 1-10 below.)

Below are step by step instructions to experience the many to many set visualization and faceted browsing interaction capabilities using the (experimental) Parallax interface to the bizmo (or any other) base on freebase.com:

  1. type 'exhibit 53' (with no quotes) into the (only) form field on the Freebase Labs Parallax homepage to search through all bases on freebase.com (a search suggestion box should appear and say 'Working...' - if not, press space or enter)
    • the suggestion box populates, saying 'Topics mentioning Exhibit 53 in their text content'
  2. select the first suggestion that says '...in Exhibit 53 collection (2)'
    • the result is a page that shows two instances of the 'Exhibit 53' topic (or datatype), one HHS Exhibit 53 topic and one GSA Exhibit 53 topic
  3. on the top right where the interface screen says 'Connections from the topics on this page', click on the 'Contains (3)' link
    • the result is a page that shows three instances of an 'Exhibit 53 Recordset' topic
  4. again on the top right where the interface screen says 'Connections from the topics on this page', click on 'more connections', and when the search/filter selection dialog box comes up, make a connection 'To Other Types of Topic' by clicking on the 'Supports Federal Goal' property
    • the result is a page that shows that the 'National Health Information Network' topic that 'Support Federal Goal' 'Health Care Reform - you have just traversed a graph datamodel from data originally exposed by the ITDB (in this case, line item level Exhibit 53 data) to understand its (example) relationship to (placeholder) Administration priorities
  5. on this page, select the 'National Health Information Network' (NHIN) link
    • the result is a page about that resource with other associated properties that link to other topics
  6. select the 'view on freebase' link, and that takes you out of the Parallax interface and to the unique Web resource on Freebase that represents the NHIN.

(Instance counts like (2) or (3) will change based on how many topics Parallax finds in the base, which may change as my work on bizmo progresses.)

I have only (manually) incorporated targeted bits of data from the US Fed Gov ITDB, just enough to enable the interaction script above, to exercise and demonstrate the connective fabric of the emerging datamodel that I'm actually proposing to work on. I haven't automated any ITDB data ingest into bizmo because (that's easy to do and) not the point I'm trying to make with this effort.

Bizmo Notes

Every data schema element or entity, and instances thereof in any base on freebase.com is referred to as a 'topic', so a topic is either a datatype specification or an instance of some datatype. In the technical sense, topic is the base type that all other types are built from. Topics have relationships to other topics, called 'properties'. The underlying generic data model is a graph, based on 'triples' of the form Topic-Property-Topic (1-2-3). Each Topic and Property is a Web resource that can be dereferenced by HTTP GET'ing URI's that return HTML, RDF/XML or JSON syntax representations on freebase - these ideas are a few of the key concepts in Web Oriented Architecture and Linked Data, and that buzzword's underlying 'RESTful' architectural style.

The Freebase Topic-Property-Topic triples are referred to as 'Subject Predicate Object' or sometimes 'Node Arc Node' in the RDF and Linked Data communities. Whether you call them predicates or properties, links are the key to data federation and integration across disparately designed/developed/deployed (aka federated) datasets. In contrast, when a Community of Interest comes together to define message oriented (consumer and provider) Information Exchange Package schemas, ala the Data Reference Model and its leading example NIEM, it may be more appropriate to (policy enable) tree based standard data structures (like XSD's) - but - when that coordination cost is too high, graph orientation makes crowdsourced integration possible through open standards that don't necessarily require shared standard schemas across all participants in a 'federation'.

RDFS Notes on Bizmo

Triples (Topic - Property - Topic) in this (draft, work in progress) bizmo datamodel:

Schema

From the OMG BMM

  • Goal / amplifies / Vision
  • Objective / quantifies / Goal

Mixed, but CPIC concepts mostly

  • Federal Enterprise / (has) Fed Ent Goal / (of type) Goal
  • Federal Agency - Maintains - Exhibit 53
  • Exhibit 53 - Contains - Exhibit 53 Recordset
  • Exhibit 53 Recordset - (has a bunch of properties including) Supports Federal Goal - Goal

Instances

When you look at topic types, such as an Exhibit 53 Recordset, on the right under 'Exhibit 53 Recordset Topics' you see instances of this type. In the case of this type, there are (currently) 3 instances, and this link takes you to the 'view' of the instances of an Exhibit 53 Recordset type.

  • Obama / is of type / Federal Enterprise
  • Obama / has a Fed Ent Goal / Health Care Reform
  • HHS / is of type / Federal Agency
  • HHS / maintains / HHS Exhibit 53
  • HHS Exhibit 53 / contains / Nat Health Info Network Connect
  • Nat Health Info Network Connect / supports Obama Goal / Health Care Reform

The 'Federal Enterprise' type/topic contains triples (properties that link to other types) that use the open standard Business Motivation Metamodel (BMM), and has an example 'Obama' instance (which is shown on the right under 'Federal Enterprise Topics'). On that Federal Enterprise schema topic definition page/resource, if you select the property 'Federal Enterprise Goal', it links to another type 'Goal', and when you select 'Goal' on that page, it shows a 'Goal Topic' on the right with an instance called 'Health Care Reform'. Goal is a datatype/schema/topic from the BMM (among many others from the BMM), that is linked to by the Obama instance of the Federal Enterprise. Given that Federal Enterprise schema that links to a Goal, and an instance of that as a Federal Enterprise Goal, when the Exhibit 53 Recordset property Supports Federal Goal links to that Goal instance, the connectivity from Agency Portfolio Investment to Administration Goal is achieved. If you're into US Fed Gov stuff, you might also find the 'Assessment' type from the BMM interesting, along with others like 'Policy' and 'Directive', since these could be logical base types for many of OMB's work products and analytical requirements.

Other Notables

I created this structure solely to enable a quick and dirty demonstration of the utility of graph oriented datasets and graph traversal. The 'connections' that the novel Freebase Parallax 'faceted browsing' interface provides (as in the Interaction script above) are just the Properties of the datamodel that link one topic to another. No system changes are required to introduce any new Topics or Properties to an existing datamodel. The underlying data structure is just a triple, and this dynamic structure can easily accommodate evolving datamodels in any domain without changes that would otherwise be required to static structures in relational data stores or XML Schemas. Furthermore, the processing of this dynamic graph of data remains consistent as new domains are introduced and linked. This is a big deal for enhanced adaptability of information, not to mention increased agility of supporting IT infrastructures. Given these circumstances, higher ROI and lower TCO are likely to follow.

One of the great things about graph orientation is that it makes it easy to 'bridge' between disparate data models and standard metamodels. I began with the BMM and some simple types and properties to represent budgetary items like an OMB Exhibit 53. Some simple properties create the linkage from low level EVM data to high level Strategic Goals and their quantifying Objectives. The transparency of the ITDB gives us 'who' and 'what' EVM like info, and the addition of and linkage to a BMM based data structure gives us 'why' info.

What's Next?

From here, I/we/domain-SME's/the-crowd can easily add other open standard metamodels for SOA, BPM (and whatever else...), giving us an opportunity to add lots of 'how' info to the existing ITDB 'what' and the emerging 'why' that this BMM based work in progress demonstrates. Ultimately, there are other power tools that make this RDFS stuff look simplistic, but it's a good starting point to explain and socialize most of the technical concepts and approaches.

Using this WOA approach, all of these Web resources are things that can and should be indexed, registered by, or otherwise available on data.gov. When these Web resource representations are feed entries with embedded metadata using the W3C RDFa standard, the state of a recordset (like an Exhibit 53 line item) can be automatically published with structured data that can be extracted from the human readable page, which is one approach to 'real-time' enabling data.gov. Continuing with the radical disruptive transparency we now have (thanks VK!), it would also be pretty easy to show how any (ITDB, Freebase, Virtuoso, whatever) powered data aggregation/integration/dissemination site can leverage and link to other sites in the Linked Open Data cloud that expose SPARQL endpoints to obtain RDF data, like the usaspending.gov Grants and Contracts databases that I exposed (accessible within GSA only) using the D2R application mentioned on thenationaldialogue.org and in this Putting Government Data Online whitepaper.

Other notes on Freebase

IMHO the Freebase implementation is really dead on in terms of Web 2.0, with the underlying Resource Description Framework Schema (RDFS) based graph orientation, that leverages and is largely populated from other external Linked Open Data (LOD) cloud sites like dbpedia.org (and others), and their faceted browsing direction interaction direction of Parallax. Where the ITDB currently starts with an FEA categorization and ultimately drills down into a single dataset, this approach allows for either top-down or bottom up traversal that can span multiple datasets. The User doesn't even need to know anything about the domain of interest. The Freebase API provides a proprietary query language called MQL, and although it is based on the defacto standard JavaScript Object Notation (JSON) which makes it readily accessible to programmers, the Linked Open Data cloud typically exposes 'SPARQL Endpoints' (Semantic Processing and RDF Query Language), which is the W3C standard and recommended approach. Like any other surface level language, translation from one syntax to the next is relatively straight forward. Freebase also gets the Social aspects of their bases right - each Parallax result page is a 'permalinked' (persistent) Web resource (page) that can be embedded as a widget in any other Web (blog, wiki, whatever) page. The base topics and properties can also be used to drive various dimensional reports, that can be linked to or embedded, representing one of the best examples of Web centric ad-hoc analytical reporting capabilities that I know of, blurring the lines between Web 2.0 techniques and old school Business Intelligence in a very cool way.

george at thomas dot name