Day 1 - 5 October 2009
Introductions and Invited Speakers
9:00 "Logistics", Richard Campbell, FDIC (no slides)
Hello, I would like to welcome you to this workshop on behalf of the FDIC. (He runs through the logistics, including cautioning people to avoid bringing their cell phones close to the wireless mikes for fear of very loud squeels). I would like to introduce the FDIC's CIO Mike Bartell who has been very much involved in XBRL for years, and has spoken at multiple conferences about it, and is one of our leading people here. He has a 10 minute speech to give.
9:05 Opening comments by Mike Bartell, CIO, FDIC (no slides)
Welcome to the FDIC, I am glad that you all could make it. This is about our third conference on XBRL here. We've hosted a couple of XBRL US conference over the years, so we've be at it for a little while now. I've been involved in the XBRL International group for a number of years. Diane and I and others. Anybody that will show up this early on a Monday morning to talk about the Semantic Web, you all deserve a badge.
Financial data standards, whilst there has been a tremendous amount of progress, there is still a long way to go, and not just technically, although will be primarily the focus of this workshop, but also from a business standpoint. You still talk to CFOs and financial people, and ask them do you feel comfortable about the state of your financial data. Do they feel informed, do they think they can inform their peers, the regulators, other business partners by sharing financial information without having to qualify it, or without an army of people behind them to vet it. The reality of the situation is that most CFO's are still somewhat in the dark in terms of what data standards and what the potential of the Semantic Web really are, and why they are important to business, yet business relies on financial data to survive and grow.
To a large extent the technology that is available and the data standards that exist are not being realized at the level they probably should be. So, I'm hopeful that not only the technology part can continue to make ground, but that from a business point, the financial folks, the non-technical people start to see the benefits and business value of interactive data and being able to share information that isn't qualified or without throwing large numbers of people at it.
The analogy I would make is like the barcode idea, but think in terms of the languages spoken across the globe and the dialects within those languages. Financial data is in many ways the same thing. It may be very well known and consumable at the local level, but when you try to bring it up a notch and try to share it across organizations, industries, and heck even here in Washington, across agencies, or even bureaus, it becomes an impossible journey. You almost have to apply an unbelievable level of roll up to be able to make meaning of shared information, or take a long time.
That's a bad state, we need to do better and can do better. Technology exists to do that, and we need to work more across organizations, and to try to leverage XBRL's potential. Like I said, technology isn't the real problem. It is really about the business and organization, and the extent that technology can open doors and shine a light on that, that's a great thing.
There's a couple of milestones in October, it will be ten years since the first meeting of XBRL International, and I have been involved for the last four years. This month as well marks the four years of the call system from all of the banks, and represents one of the largest uses of XBRL at that time and probably even today. We're very proud of it, and it has proven the business case for it. In 2006 we issued a white paper on the benefits on the business values and efficiency gains from using XBRL. A lot has changed since then. I'd like to see more kinds of such white papers to show the financial community what can really come from this.
Rather than using XML and basic tagging, start to really experience the power of XBRL on a global scale. For those in Washington and those in government, it would be great to start at a local level to demonstrate financial data exchange even among agencies. May be what we need to do is to have some small pilots at the federal level and to show what is possible, rather than bringing together armies of people to standardize data the old fashioned way. If we can do this and roll up data at the federal level and look like a single organization from the financial stand point instead of a collection of stovepipe organizations like we are today.
So today and tomorrow should be exciting, you are going to hear a lot of great discussion. I appreciate the opportunity for the FDIC to host the meeting, along with the W3C and XBRL International. Don't forget, it would be great to bring business into this, not just the technology, we need to bring them together. The technology is relatively mature, it is the business side where we ought to wake people up and show them the possible business values from sharing data and working across organizations at a national, government level and may be even at a global level. Should be a great two days Thanks again for letting us, FDIC host for you.
(URL for CDR system - https://cdr.ffiec.gov/public/)
9:15 "Introduction to the Workshop from the Chairs", Dave Raggett, W3C and Diane Mueller, JustSystems & XBRL International
Welcome. I am incredibly excited and thrilled to see so many of you here. I would really like to thank Mike Bartell and Richard Campbell from the FDIC for the amazing facilities that they have loaned us for this event. My co-chair is Dave Raggett from the W3C, who also sits on the XBRL International Standards Board. I am with JustSystems and I am the Vice Chair of XBRL International, and sit on the Board of Directors, and am involved in a number of the technical working groups as well.
To give you a really quick introduction to what the format of the workshop will be. We have stated some themes for what will be touched upon during the day. Although the workshop has been organized by W3C and XBRL International, we have a number of people here from other standards groups, e.g. StratML and from NIEM. We're trying to do some cross pollination and what we need to do in terms of harmonization. We are trying to bring in some of the business use cases, as Mike pointed out in his talk. We have done a lot of work with preparers and will talk about the SEC use case in a little bit here. This technology and the idea of financial information impacts the whole of society. The capital markets, the individual investors, government to government communication of information. It really is a huge ecosystem that we are talking about, through which the financial information flows.
We are hoping to bring together not only the financial institutions, but also the regulators, the standards people, and people from the Semantic Web research communities. So we are trying to do a Workshop and brainstorming and discussions. It's an open and public forum. Everything is being recorded and will be transcribed and available. We invite your participation in discussions and break-out topics. Use the flip charts to write ideas. We have invited speakers, panels, thanks to those who volunteered. Tomorrow we will do some break-out sessions. from what you write on the flip charts. There are separate flip charts for Challenges, Resources, Topics for Breakouts, Commentaries.
We will make a report and publish it as well. We encourage you to be very active. So about the flow of the workshop. Three key questions. What is. What's next. What's missing. What is: here about current challenges. talk about harmonization. don't mean singing [laughs]. and in the afternoon we'll talk about What's missing. What do we need to do to connect the dots?. Some interesting conundrums. What are the consumers going to do with all this data?. What are the use cases?. How do we go about feeding the Semantic Web?. The Semantic Web is waiting with baited breath for all these data filings.
Some logistics. Please use mics to queue up. We have people on phone. Use the irc channel ( #xbrl) and twitter (#w3cxbrl). If you are giving a talk or demo, please try to use FDIC computer. And come see us. So our goals. Bringing all of these communities together. We want to nurture this ecosystem. and tease out what are the requirements to support this. and how to resource and fund it. That's often the stumbling block, how to resource it. No matter where you are. try to get the right people in the room at the right time. and get those projects funded. So I would like to invite Tony Fragnito, CEO of XBRL International to talk next.
9:20 "The role of XBRL International", Tony Fragnito, XBRL International [slides]
I want to start by apologizing for those of you who have heard introduction about XBRL. Our Mission [slide]. A 501(C)6 organization, 600 participants, 24 jurisdictions. relationships with international associations. this network provides a collaborative organizations. We have a tremendous breadth of people involveed. standards development, best practices board. We are global, always looking for places to meet; broad representation. board of directors and an int'l steering committee. comprised of reps from jurisdictions and at-large members. from the business reporting supply chain. That group is growing, nearly 30 individuals. Our work is accomplished through work of volunteers. and a staff of five. Small staff organization, but look at contributions. from all the jurisdictions that advocate from XBRL from local level. We really have had a significant impact in our ten year history. XBRL is global. This map demonstrates global reach. In China, Japan, Australia, and a new activity in South America.
We have tremendous reach, all around the globe. What we do: Specification is the area where we put most of our resources. Our steering committee are focused on the specifications, and on use cases as the way to evolve the standards. So why join. The standard is very young. XII has existed for 10 years. We rely on volunteers. And encourage you as individuals and employers. to join any actvity that is worth your time. What we have done with best practices board is to create a database of projects. At least 45 countries with active XBRL projects. and many have multiple projects. Talk about breadth of XBRL, you can see many projects. Where are they?. Largest slice in the gov't sector. not-for-profit, exchanges. One of comments I hear, is where is the data. Not as much data as people anticipate. Projects are in reporting systems in gov't, like call reports. As public company reporting data becomes available, we will see an increase in data. Back in 2007. Looking at all the projects, the number of distinct projects was over 200. and what we know is coming online. Several projects are in implementation phaes. LIke SEC, requiring filing. Look at SBR projects, Companies House in UK. In 2009-2011. These are our projections. on projects that have reported. See that number is above 4.5 million. instances per year. So it shows the huge increase in data over next few years.
9:25 "The role of the World Wide Web Consortium", Karen Myers, W3C [slides]
Starting off with a little fun. the Internet begins 40 years ago. Easy Rider was top grossing film. 20 years ago, the WWW was envisioned, and Raiders of the Lost Ark was at the box office. Tim Berners-Lee founds Worldwide Web Consortium W3C. W3C - International standards consortium. HTML, XML, Semantic Technologies. all ROYALTY-FREE. to be open and interoperable. What does that mean?. W3C is a market-making org, not just a standards body. Billion dollar web areas. Emerging markets. Video in the Web is exploding. Workshop at Cisco. tagging, time texting video. Make video a first class citizen. The Web of Data, as Tony F mentioned. Many initiatives on linking data. Oil and Gas, Healthcare and Life Sciences. also an EGov initiative, financial services and reporting. How does all this happen?. Organizations make a commitment (see value, get vision) and devote resources. In 65 working groups. also W3C staff. An effective process: neutral, open, collabrative. Bring together the constituencies/stakeholders. Salute to members, request to raise hands. [numerous hands raised]. Also invited experts. If you have questions, feel free to ask Karen!. Diane invites David Blaszkowsky to the podium.
9:30 am "Financial Reporting and the SEC", David Blaszkowsky, Director, Office of Interactive Disclosure, Securities and Exchange Commission (SEC), no slides
Thanks very much Diane and Dave. At some point this went from a few lines of comments to an invited talk. I won't take a lot of time and will be speaking from some informal notes. First of all let me say good morning and thanks to all of you for being here for what I consider to be an extremely important and timely opportunity for how we start to put to use all of the wonderful information that is now starting to become available. There is already a lot available now and a lot lot more on its way. How to make it yield forth the benefits that have been talked about, that has been anticipated for so long. I do want to thank W3C, XII and FDIC for organizing this meeting and do hope that now we are running our program, that we will be able to make a bid for sponsoring the next one at some point.
(SEC disclaimer) My comments today reflect my own opinions and do not reflect those of the SEC. ... We need to look for the pay off, there is lots of good information coming in, how do we put it to use either in the marketplace or in our ability to supervise in the areas we are responsible for. What we have done is really built on all of the great work by the FDIC. So this is my salute to the FDIC and happy birthday, I guess the fourth anniversary. I salute that and all of the other improvements to XBRL, improvements in taxonomies, improvements in software to make all of this happen.
We have made a few innovations of our own in the past really four years since we got into the XBRL business, first with the voluntary filing program which went for two and a half, maybe three years and then yielded rules last Winter that I know that many of you were very happy to see after all of the delays, that we did promulgate rules to require all US listed companies to prepare XBRL filings over the course of three years. Also rules to require mutual funds to start filing their risk and return information. That won't start until 2011, but this is vital information for investors as well as regulators, and other players out there, and a rule to require ratings information from NRSROs to be provided in XBRL format as well.
So a lot of exciting things have happened from all of that action, and we have just completed a few weeks ago our first filing period for quarterlies for the largest 500 companies or so, the ones with five billion or more in public float, and you know it seems to have worked. We are kind of happy about that, but we're not going to wave any flags about it. The technologies for in take worked. The technologies for viewing and being able to make sense of it have worked, and it is going a little bit beyond where we expected it to be.
Not only are we taking in quarterlies, but we have received at least one 10K so far. I was interested to see that we have received our first footnote detailed tagged filing at the end of last week as well. As many of you know well, the questions of how that footnote tagging will be done, well now someone has done it. I don't quite know how it looks as yet, but I'm hoping that this meeting will help to address the challenges of what we are going to do with that information as it comes in, and to make sense of what will be very complex, and very interesting information.
So it is an exciting time and it feels good, but interactive disclosure, interactive data. In response to Diane's question we use interactive data rather than XBRL because it is about those who file and those who use information. It is not about feeling cool. The data alone isn't the point. It is not just about access, it is about operationalizing such that we get not just access but transparancy. That's what we're looking for, bringing it past the point of disclosures and disclosure in various forms, whether it is like HTML, God bless it, or XBRL, God bless it too. Making it every more meaningful and bringing it to a regime where disclosure is at an every higher plane of transparency, and we are not there yet!
But we are at the most important transition time, a real inflection point. I would just like to relate to you, and as I was writing this it scared me a bit. About 30 years ago, I started college, I feel very old around those kind of things, but at the end of the first week of orientation, which was roughly about this time, at our school we had an aims of education address. Kind of a pompous tradition in some ways. We were told how hard we had worked on our SAT tests, but that was now irrelevant, and we had to now focus on getting educated in a liberal sense, not just the fill in the box sense in standardized tests. At the end of that speech, we got to scurry off for a good nights sleep or whatever was our ambition. Thinking about it, at least where we are in the SEC's program, and other programs around the world where people are reaching out to us to share ideas and their learning, and to hear about ours, I see a similar kind of point.
An awful lot of success and great and successful people in this room, much has been done, and much has been achieved in this 10 year old endeavor. But it is risk in that it is at an inflection point. XBRL has passed its SATs. It is now ruled in multiple countries around the world either for internal SBR type programs or in other cases, its for public information such as securities as in our model, or the Japanese and Korean models. But we need to clarify the aims of XBRL for financial data. Using it to make our markets safer, better use of capital, more free flowing to those who need it. That is what really is the challenge here. We have accomplished a lot, but we have a whole new phase to hit in order to make this revolution in data truly stick, because if it doesn't go beyond what is merely about data, it won't prove its value in the investments that have been made, and will be found to be wanting.
And part of that is what I saw when I looked over the agenda for this meeting today, gets to in many different ways from many different directions. About multi-use software, not just one-off software. Market tools that might even be off the shelf for those who want to make sense of this. Really making the data sing to all those who want it, and not just those who are able to apply it to a particular purpose, perhaps an internal purpose for which it was developed. Not that that is a bad thing, we all want the reports that we are able to generate, but in order to make this stick in the long run, how do we make the marketplace really want this information, so the tools will be built, which by the way will drive down the pricing and increase the innovation and increase the value of future tools in a very virtuous circle that we have seen in so many areas of software innovation over the decades.
I think that the SEC's program, our own contribution to this is that we have forced that moment, and our continued activities will hopefully egg it on. On top of this each step has forced improvements in XBRL, and the challenges in dealing with extensions in a large scale and across many hundreds even thousands of filers. These are the kinds of things that we know have caused in some cases inspiration and in some cases indigestion in folks, but these are all challenges to push forward the thinking of those who provide software, those who provide information, and those who use it.
Of course, what I hope will be coming out of here will be continued steps towards fully fleshing out architectures to accommodate the maximum complexity of content that we and others might need to see in content and use from content. So in ten years in the US we've had much public info, certainly the FDIC data, the FIIEC data, which has been there for four years now, but within the US, a challenge is now that not a lot of this information been made publically available. With our (SEC) information now available we are hoping that people will be jumping in from all walks of life, from all perspectives, from all interests, which really gets us to the urgency of this meeting.
There is lots of data starting to flow. There is, and with respect to the many software organizations that are here, there is relatively little in the way of software to use it, certainly to use it beyond the most basic purposes. I know that there is a lot going on in labs, I hear people tell me that we've got this or we've got that, but there is very little available for us to take information and do interesting things with it and that's a tough pattern. It isn't by itself, just having the data out there, a great contribution to transparency, we've had sitting out there like our forms 3, 4 and 5. But no tools to manage them with, to transform them.
So to access what is one the themes of this workshop, I'll just call it transparency, in my case, it is not just about the data, but about how the data is applied. Critics over the past two years, certainly when I came to this XBRL endeavor 2 years ago, the critique was really about cost, and practicality. Well I think you know that we've solved that. We've found that tagging can be done relatively inexpensively, relatively painlessly. I still say relatively, some would argue with that perhaps, but certainly the press and many contexts within industry would suggest that that is the case. So now we are back at the point to justify the cost, and address the challenges that are out there to anwser the question just what can I do with it. What is the point of it if there is no way to use it. Why aren't the existing sources and tools good enough to be very direct about this, to give this information to enforcement accountants, and enforcement lawyers in our offices and our regional offices and elsewhere in our organization.
Superior data may be its own justification, but technically and philosophically, that may be appropriate, but economically and bureaucratically, no that's not good enough, and again we need to come back again for the need for tools. To a financial web as a vision, yes, but first we need practical baby steps to make this work, at least for my purposes here. For conventional tasks and social tagging, we need tools that are easy to use, and that exploit the unique benefits of XBRL. That's how you justifiy it, not just doing what it is possible to do with other sources that are out there. How do we exploit these many benefits of XBRL that have been talked about in articles, that many of you out here have written, and many of you have spoken about, and I have spoken about over the past two years.
How to bring that to the multiple constituencies, for which we are talking, so us as regulators are certainly keen to make sense out of this, but also retail investors, where much of the promise has been made. To the institutional investors, to strategic analysts, to issuers themselves who are providing this information, to do new and better things, and run their companies in greater compliance and greater success.
Let me talk for a moment about what we are looking for, what we are hoping to gain. We're hoping to gain real deep value from this information. Again how to do our jobs better, which has at least two major components to it. To do our current tasks better as they are, the conventional tests, to automate what we are already doing. Certainly that's quite possible, we can find footnotes more rapidly and compare footnotes as a simple objective. But how to do new kinds of things. I saw some wonderful pictures up there in Karen's presentation, how to link different kinds of information together, how to red flag, how to enable us to do our job faster and better in order to protect the markets and protect investors in a superior way.
We've begun to issue RFI's and RFP's and those of you who are here with commercial organizations who produce software and tools, you know that we are out there, and that we are excited about opportunities to see what you have for us and to ultimately bring some of these tools into our organization. But also what we are hoping to see is a market come to life, because it is not just about us, it is about the tools that investors of all kinds will be looking to have. So we are hoping that there will be a dynamic market for this coming out of the kinds of deliberations that you will have at this workshop.
So am I fixated on GAAP? Am I fixated on mutual funds and NRSRO's, and not talking enough about the Semantic Web and all of that. As I said earlier, yes. I am, and I am not going to apologize for that. I don't mean to focus on those to the exclusion of all the neat initiatives for corporate actions, social responsibility tagging and neat things that are going on in some of room here are involved in. All of these together, to be relevant to us need to work in simple tools. So I know that may be narrow, and I know that some of you may be interested in general ledger or other exciting extensions (no pun intended) that can come from XBRL.
There are so many operational improvements that can come from this revolution that is now ten years old. But so many of these are dependent, at least in our case and perhaps in the larger US case on our progress, our data, certainly the SEC data, is out there supplementing the FDIC data that has been out there for four years. A lot of people are looking at it. It is a kind of poster child for XBRL and what people might benefit. If not explicitly the case, at least implicitly.
I am not worried that you won't come up with things at the end of this workshop, and at the end of this month and at the end of the year, we will start seeing wonderful tools, things that we can do with it. I look forward to great success coming from this workshop and look forward to great ideas, and to my colleagues and team members who are here, I know that we are going to come back from these two days with an awful lot, and an awful lot that we will be able to bring to our colleagues and to the commissioners and our chairman as evidence that we are going to make a good result of the investment from the SEC's perspective. So with that thank you very much for indulging me with these comments and thank you to the organizers and I wish you all a great workshop.
10:00 am "The National Information Exchange Model (NIEM) Program", Anthony-Hoang DHS OCIO) and Justin Stekervetz (Deloitte) on behalf of Donna Roy, Director, Enterprise Data Management Office, and Executive Director, for National Information Exchange Model [slides]
Donna Roy passes on her regards, had a sudden conflict. She is grateful for the opportunity for us to present today the National Information Exchange Model (NIEM) Program.
Our goal today is to present a little bit about NIEM as a program and what we are doing, for next 12 - 18 months. Given your topic today, I have provided time for Justin to cover the Recovery act. What is NIEM? The Program was founded 4 years ago and focused on information sharing. NIEM is about information in motion as opposed to information as stored, managed, and used. As I meet with people from Dept of Homeland Security, I am asked: I think I know what NIEM is. Do I have to change my system? Do I have to go to my 30 year old undocument system? No - just information as it moves beyond firewall, across boundaries. Address program to the Information as it moves problem. What do you get "in the box" with NIEM? Two life cycles - two heart beats. Like oxygen in the life cycle. Right hand side - flower petal. Left hand side - fish hook. If you get nothing else, this explains it. The LHS - the flower petal - is our approach to managing a large data model. Equivalent in XBRL might be taxonomies. Our way to manage vocabulary, data models, organized around a CORE. Opening up the core you see people and other common constructs. Share semantics, concepts, data objects that traverse domains. Governed by all communities of interests. Reaching out to the petals, you see the communities. Things specific to a specific context. and others that cross contexts. NIEM 2.1 has 7000 - 8000 data components organized across the flower petal. Approximately 1500 - 2000 in the core, the rest in the domain areas. Uses XML Schema.
We have a model to do this. We have schema subset generation tool. It allows you to search the model. I can do a specific search. It's like shopping Amazon.com. pick the components and properties you need. swipe credit card [joking]. but you get the large data model. It's about taking parts of large NIEM model. and applying it to discrete business needs. If you were involved in increasing XML taxonomies. there is a lot of vetting. So what's value of a highly valued model without a good process. And that's the lifecycle process. that feeds back.
There are common approaches to doing extension. Use traditional XML schema methodologies. and added augmentation. to deal with multiple inheritance. We're talking about extensions. You can map to NIEM model 80 percent. now a 20 percent. gap; so let's say 10 percent. you cannot share. but another 10 percent can provide value back to the community. so you'll provide those exntions back. the governance is going into the screening domain. Or it will iterate. it goes through vetting. It's a back and forth process between the two diagrams. Integrate governance, architecture, SEMs, logical model itself. and into the XML schema, exchange. how it gets implemented into what gets onto the wire. Fully integrated from governance to what hits the wire. That's the lifecycle process. The flower petal diagram and our governance approach. and the IPD. the fish hook.
Information exchange ?. With this is a lot of business document. specific context data. you will find business rules. Some people want RuleML. So paint the business context for this information exchange. So if you pick up IPD from PA or TX. or FBI, all these IPDs follow same format. A couple program updates. We put out NIEM 2.1 release. Haven't had a major releaes of NIEM in two years. The core did not change, it's a minor release. just the petal areas. The ones in red are brand new domains. ChemBio. Nuke. Family services; HHS. Maritime program with Dept of Navy. Successful NIEM implementations. We look at defining characteristics. It's a multi-tiered implementation. Articulating value across all levels of the enterprise. We had a training.
Deep dive into XML Schema, NIEM. We got feedback. That said, 'I'm not XML literate'. I need to know about project time, scope, budget. So we have taken our program training and broken it into 7 modules. A big investment for us this year. Key focus areas: Health. where health community overlaps with Homeland Security and local interests. Public Health, like H1N1. Donna Roy would say, 'Person, credential, benefit'. Applies to many areas. a high value area to focus on. Second area is transparency with XBRL. We had Campbell Pryde present at NIEM conference last week. and Mark Bolgiano, XBRL US. Cyber Security, network defense is another area. Domain independence. We want to make those flower petals self-sustaining. Give them more capabilities. Better platforms.
My colleague from SEC said this. Government relies on tools. we rely on vendors to make tools. We don't build very many. In terms of end-user tools. to exploit or create, we will rely on industry. Exploratory areas moving forward. Worked for first time with civil society groups. they are tech savvy. I learned that architectural style matters. in this work of transparency. We want to learn from this initiative. and realize that architectual style does matter. Most of initiatives have been around SOA architecture. but we will be exploring new areas. Last few minutes for Justin.
Speaker: Justin Staggerwitz.
I did work on Recovery Act budget work. Team was tasked with data exchanges for requirements of act. Three for schemas. Financial weekly, fundint notifications, and representations. You see various data models we came across in OMB and GSA systems. Looked at USA Spending.gov. others to come up with elements that you see here. NIEM was brought in. due to its extensive reach at local and state levels. a lot of states know these systems. PA, GA, TX use NIEM extensively. Also aligned well with tranparency requirements of the Act. NIEM is comprised of core set and extension. We created a number of extensions for financial activity report. Go to NIEM.gov and see extensions. Concentrate on recipient reporting. Off Recovery.gov. There is another site, Federalreporting.gov. Data fed in by recipients. 10 Oct recipients upload to Web site. Then agencies can do reviews of data. 30 Oct. the data will be available to public on data.gov. So where are we headed?. Once we see public data. we will know more. We are going through an exploratory phase. power of XBRL is strong. would be advantageous to work in harmony.
Question from audience.
Mike Foley, IBM.
It's not about cross-over from XBRL. How does NIEM compare to OASIS core components. I'm curious about those cross-overs and XBRL as well.
Anthony: A great question: How do we compare cores when it comes to the UNC effect. and XBRL core. another area where I am not so versed. What you want is higher level. Concept of the core. We did evaluate CCTS. it is commercially trade focused. What we really needed was something to address governmental processes. you would find similarieis. where do you draw the line for core and extensions. We deal with same things in terms of having a core model. Practically driven by the governance authorities. What belongs in core or not. Like SemWeb, what belongs in an upper ontology or not. Drawing that line is difficult. As more comes, they become candidates for core.
Question: What is timeline for integration of XBRL into NIEM model.
scribe: the public interest community wanted to see integration. What about path dependencies. It will be harder to implement later on.
Anthony: So two questions:. What timelines exist today for XBRL and NIEM connection. Secondly, will it be even more difficult to implement since it wasn't done out of the gate. To the first point, we are working on it with Mark Bolgiano and Campbell Pryde.
Anthony: Strategic business value. we need help identifying those use cases. those business case areas. That would be a great question to put out to this audience. Are there transactional needs, financial information. those two attributes in particular. Where NIEM and XBRL come in. and figure out the joint value prop. I don't think it will be difficult for Recovery.gov to implement it later on. It was about leveraging existing content management systems. We were not ready for that capability. There is the data and the reports that come out. Two ways to look at the data. We need to identify the value added within architectural areas. It's not a one-size-fits-all approach. For me personally, I learned alot from what the open gov't groups. wanted in terms of open architectures. But internal groups weren't ready. and do within the timelines. It was a reach. It's about timing, but I think it will happen.
Q: Recovery.gov loves XBRL, but didn't have time to do it.
Anthony: yes, that's fair.
Diane: Thank you Anthony and Jeffrey. So NIEM and XBRL use cases would be good topics for the white boards for our breakout sessions. Thank you both for coming.
Financial Reporting with XBRL
10:45 am "Financial reporting at the front line" (Schema+XLink slides), Walter Hamscher, Member of Technical Staff, Office of Interactive Disclosure, SEC
Walter provided pointers to sample data consolidated from two different kinds of data available from the SEC website:
Starting again with Speaker Walter Hamscher. Please think about questions, challenges and put up on flip charts. Will take questions hope for lively discussion. Good morning, I'm Walter Hamscher. DISCLAIMER. My thoughts are my own. Title is a pun. want to get to key indices for accessing XBRL data. ("Keys" to SEC interactive data). Drinking from Fire hose (or tepid bath). Please ask questions at any time. Subtitle for talk. How I Learned to Stop Kvetching and Love EDGAR [on screen]. Ahead of its time. People not leveraging FDIC data. Will discuss, as with any good talk, syntax and semantics. Why it is what it is. time permitting EDGAR. and "so what". EDGAR a quantum leap better. good, fast, cheap (absolutely free). Syntax from the Top Down: EDGAR in General, RSS feed, One Sample Interactive Data Submission. From data in envelope to XBRL.
Here is EDGAR in general (on screen). You can put in names, ticker symbols, CIK. Central Index Key, the one best identifier. companies go by different names, merge, split. but CIK is a KEY. If you know orgs CIK, you will get data relevant to them. Many Filers have similar names. Once you find the right Filer, you will see indicator showing Interactive Data as appropriate. Original HTML/ASCII example on screen. Conventional representation for a financial staement. Looking "inside" you will see an HTML 3.2 + style not including CSS. all formatting is localized. Often generated by MS Word. Lots of formatting detail for small amounts of data. Good news is you can take a small piece of it and it will present. very localized. That's important, as we will see in a moment. Stuck with this level of HTML. constraints from security, transport reasons. Looking at Interactive Data, we see something different. You start with the RSS feed. Last 100 filings with Interactive Data.
On IRC, GeoffShuetrim anyone else not getting sound via the phone connection?.
Trivial to take once an hour and assemble entire sequence. If you look at the individual filings, you will find an "envelope". RSS formatting. and important information about the filing itself. Important properties same as for any other data feed. irredundant. timely - if there was an amendment, I want to see the last one. Who is the Filer?. Can also see assistant directorship to help guide industry. All identified by URL. whether an instance or a taxonomy. Going into the data. you can see the Interactive Data viewer. Report is similar to prior HTML example with addition. of a few important features.. Click on the line item heading, and you will see the actual concept, the namespace. if you see gross profit, as published in the taxonomy, fixed, unchangeable. Filers can call it what they want, but you can see the single concept. A very non-trivial exercise to determine what data corresponds to which?. We have an accounting issue and what the law can require. Gross profit can be called Net Revenue on the FS. In red, you see the summary of the financial statement. and down here you see the Notes to the FS. It is insufficient to look at a line item like Net Revenues and know what it means. without the Notes. In magazine industry, eg, the Notes explain when they recognize subscription revenue. Note on screen - here is a text block placed inside of tags. that is tremendously useful. still useful as people can read it and use the table of contents approach. XBRL revealed. If you look at a note - typical one on screen. Might be formatted as HTML; next year, every number that appears in the note will have its own separate tag. true of the entire Note. Note as a whole has one tag. If this were a live demonstration, I would be able to hove my mouse and show you the [individual items]. were also separately and individually tagged. with additional metadata to tie it all together.
On IRC, Daniel_Bennett: wow. Walt pointed out an RSS feed that seems to blend RSS 2.0 with some RSS 1.0.
Individual numbers in Notes would pop out. That's why the data will be considerably better than what we have. Underlying model is on screen. Want to show you just how much is inside the file. Here is the Consodliated Financial Statement. Pink concepts (w apologies to the color blind). columns with contextual information (incl period of time). and information on how company derived numbers (calculation links) to check the arithmetic in the files. People assume numbers sent to SEC normally add correctly. But there have long ben sign flips and other noise. Most often not in the "most important" large numbers. But market also interested in "less important" figures. XBRL provides interrelationships and calculations - way more than a chunk of HTML. Just because the display (on screen) is one representation doesn't mean you are limited to that presentation. You can display but line items going into the Statement of Stockholders' Equity. or pivot it just like a spreadsheet; things that were columns can now be rows. If there are four dimensions in the underlying data, that could be 24 different ways to display it. add the time dimension for lots of possibilities of organizing the data. The requirement for the Filer is to provide the detail. Many rules in EDGAR Filer Manual on how to present the data to be compliant. Another viewer on screen; incremental display another tool. This isn't like PDF where you view with a single viewer. A data format with a variety of applications to show it and use it. If all you want is a liner file, you can go back to the HTML/PDF traditional filing. Explaining the technology behind XBRL.
A venerable set of slides. Role of XML Schema, XLink, extensibility, taxonomies. You have an instance document (same term as used elsewhere in XML world). Organization [of US GAAP instance] is flat. Company, section of company are context. Facts with IDREF pointing to context. Facts point to Concepts in Taxonomy. Gross Profit, Fair Value Disclosure - Text Note - can be numbers, text, dates. concepts in XML Schema. Also use XLink for interrelationships [and certian resources]. DTD and schemas often thought of as being fixed in time. XBRL has to allow the creator of the document to EXTEND the concepts that are available. None of the laws or accounting standards [in the US] fix what can be presented or called. So we made XBRL so it could be extended. On screen syntax of an instance. Context with start dates and end dates. Content [of US GAAP filing] is quite flat. with concept, digits (reliability) content, pointers to context, units. There is a long list of these things. Role of XML Schema in validation. true for us, FDIC, and others. You want primitive data. You want compound data structures. You also need to be able to do some kind of arithmetic. also have co-constraints amongst data values. Different types of filings (e.g. 10K) will have. different set of requirements than other filings or reports.. Also cross-document constraints.. We use XML Schema for primitive data types and compound data structures.. We use XBRL for calculated data values and co-constraints.. Must be global, must allow customization. We have 10Qs and 10Ks with different requirements. Can use Formula for that. Nothing in XBRL for cross-document analysis. I am a fan of relational databases for that purpose..
XML Linking language. For an XML item, you have labels, presentation, calculations. We use XML Linking language to represent relationships. [Graphics illustrated onscreen]. Lets group that stuff together in a linkbase and create different kinds of linkbases. Label, definition, presentation, reference (calculation). When we say linkbase, we mean a kind of relationship between concepts. We have a mechanism to be able to overlay and extend. [describing complex illustration onscreen]. We allow this overlay to happen in a taxonomy or an XBRL instance document. we start by defining policies (which often have nothing to do with data). People would create company, use filing agent, we would try to extract and analyze data late.. Idealized model, we get nicely structured bucnhes of data which go into big honker database organized across 4, 6, 10 dimensions. and when people want data, they can get what they want.. Financial regulation governed by Reg S-X, and you cannot require more than what it requires. A report is a sales pitch (the 10K and 10Q) and so the notion is that it is much more than a data file.
There are intermediaries everywhere. That example I gave you about purchase. financial property and assets and IP. I want to report as same line item. together they are not immaterial. If I had 20 categories of purhases. that cannot work. Something about extensions. although top level is clear. but when you get to order and materiality. the extensions get weird. but they are the solution to the problem. Government has to give people the ability to customize the presentation of data. May see extensions in the details, not part of material items. That's all I can say about that. When we talk about that multi-dimensionality, people get weirded out. So if I'm at a bus stop looking at schedules. you are dealing with dimensions. to force into two dimensions. when interested in underlying dimensions. So we're going to work on that.
Diane: any questions?.
Louis Matherne, Clarity Systems: There is a big gap between data modeling and presentation side(and original co-founder of XBRL, when he was AICPA).
(people in audience nod in agreement with Louis's point).
Walter: We are going to turn around way you want to see by getting dimensions right. does not hold today between filing and engine.
Diane: Walter will be on panel after lunch. Thank you, Walter. Next up is Linda Powell, Federa Reserve Board.
11:15 am "Federal Reserve Board's Micro Data Reference Manual (MDRM)", Linda Powell, Federal Reserve Board of Governors
Speaker: Linda Powell. I am filling in for Mark Montoya. Won't be the same presentation.
Linda: Disclaimer. The views are my own, not those of the Federal Reserve. Federal Reserve has responsibility for monetary policy. as well as reserve. A little background. We are receiving a lot more data than in past. So much more metadata. and explosion of financial data. historically we had data we purchased from vendors. and it used to be economists would buy a data set and manage it. And the data collected by FedReserve, FDIC and other banking regulators. Data from Federal Reserve system and other regulators has been centralized. We had metadata. We did a good job organizing, documenting. But that was not case with data we purchased. So I have spend time with that. We had a business problem. Lots of data. not all of it well documented. And we need to manage our metadata better. We have 3 data repositories. FRS (MDRM). Metadata is data about data. or what the economists call the code book. Our collection level meta data. is housed in one repository. what kinds of periods, what types. at high level. where is data from. A few years ago we created our vendor metadata repository. Variable level data we purchase from vendors. MDRM is like a combo of bottom two.
I will do a brief demo of MDRM. What's name of data collection or survey?. How long collecting, frequency, if confidential. info about overall data set. Variable level data. Person who developed MDRM (not me). was forward thinking. they designed the nomenclature to be semantic for humans. It was created before the www. Person saw that economists and regulators needed to understand all the data. and not research every variable. Every variable has a two-part MDRM number. What is the data collection, call report. pneumonic for domestic, int'l data. or combination of two. tells you collection of data you are looking at. The suffix, four characters. is the accounting concept you are looking at. 2170 example. is total assets. Doesn't matter if I'm looking at domestic finance, insurance, or bank holding companies. If I am looking for total assets. I look to 2170. MDRM covers financial data collected and stored at Fed Reserve. and structured data. daily structure of bank data. within banking industry. and covers the supervisory data. when regulators go out and examine the banks. I am going to do a quick demo. It is a public Web site. From the Federal Reserve's main page. Under reporting forms. forms we use to collect our data. there is a link to the micro data reference manual. we make almost all data available to public. So our sister companies can access the MDRM manual. Broken out to variable level and collection level metadata. Click on collection level.
These are all of the series we collect. For those familiar with BaselII. Describe what is the series, the sub-pneumonics. confidentiality of data. So the variable level metadata. There are a number of ways to search. Can type in "total assets". But I'm going to do look-up with MDRM number. total assets is a popular term. Used on data collections. So then we have different ways to drill down. See all the series with total assets. and there is a description. get into the nitty gritty descriptions. there are differences with total assest on bank, mortgage company. So we detailed these differences. and the distinctions between industries I should take into account. So while I'm out here. I'll talk about. the central data repository project. Early someone asked about CDR familiarity. May I see a show of hands?.
[about five go up].
A few years ago some folks from FDIC, OCC, FedRes got together. and started a project to collect data using XBRL. decided to use MDRM as basis for the taxonomy. Banks were used to seeing the MDRM numbers. It was designed to be semantic for economists and individuals. and translated nicely for semantic for the Web. Show you the CDR Web site. So this is the main portal for banks. for CDR. Most banks use vendors who have created software. They download the taxonomies. which change yearly and more lately quarterly. Vendors go out and download the taxonomy. and import the taxonomy into their software.
This is one of the largest industry advances. Used to be vendors got MSWord docs or Excel spreadsheets. And people had to manually read through the docs. So the download functionality includes the edits. Vendors can import into their software and give it to the banks. So once we receive it. Once regulatory agencies receive data.
we then make available to general public. So you can see if I had put in Huntington. you can see four or five banks. Click on which period you want. Here is the actual call report filed by Huntington Nat'l bank. some is confidential data. most is public. Banks don't send data pretty like this. We get an instance document with metadata behind the scenes. Because it is so well formatted and documented. We can go in and create a document. We can also present in a variety of ways. Just by changing the XBRL presentation. This is driven by the MDRM numbers. So that's the demo on MDRM. I have been speaking about MDRM, and other metadata repositories. The end users don't want to go to three repositories. They want it easily and in one place. So why did we create a vendor data repository?. Because we buy such a massive quantity of data from vendors. It was too hard to store in MDRM. One of best features of MDRM is also a complaint. Because it forces Rules. you cannot call assets something other than 2170. In long run good for end users. but causes a bottleneck in the beginning.
Collection level data we implemented in 2004. Users on day one said, "Great, this is all I need". then next day, said "I want more". So we decided to give them more. We decided to look at the int'l standards. Dance is data and news catalogue. So we looked at Int'l standards. All three meta data repositories were created for a reason. Here is the MDRM Schema. I'm going to talk about the DANCE application. We went through and looked at int'l standards. For collection level we went to Dublin Core. They had robust collection level data. Worked nicely for DANCE. original variables plus variables we added. Then we looked at vendor metadata repository. Then we looked at XBRL. designed for financial reporting. So here were the original variables in black. In the vendor metadata repository. yellow based on our review of XBRL. We did something similar with the MDRM. Over time we will incorporate more of the XBRL and Dublin Core variables into the MDRM. What I'd like to show here. is within the vendor metadata repository. we have a table for the FDRB name. and the Dublin core name, and XBRL name. Fed publishes other data. SDMX element name. DDI. Another int'l metadata protocol. focused on social sciences data. The Fed is interested in things like GDP. number of cars produced. For every element in vendor metadata schema. we wanted to tie it back to what it's comparable to in the regulatory data. So we added this table. to tie back programmatically.
We can develop interactive systems so users can get the data. 2170 on MDRM and something else on Bloomberg data set. Here are my concluding thoughts. A lot of people talk about importance of metadata for discovery data. But in addition to discovering data. It's important for understanding data. Such as GAP accounting rules. talk about specific distinctions. I found review of int'l standards for metadata to be very useful. Getting a breadth of knowledge. and depth of topics, material, data types. that you cannot get on your own. So not me in my office trying to figure it out all. in 2004 when we first did our data catalogue. I got hundreds of peoples' ideas by using int'l standards. Using a nomenclature that spans data collections is important. Many different collections of data: thifts, credit unions. What's reported for credit unions may not be applicable to banks. but some overlap. So if you give some concepts same name. it is very helpful to end user. they know right away it's the same concept for total assets for example.
Diane: Any questions?.
Ms. Shaw, FDIC: Are you clarvoyant? Are you seeing Fed requiring.
Linda: No. When we reach point for int'l standards. that knowing difference between them is important.
Ms Shaw: Good disclaimer.
Roger: I wonder about design of data coming into FDIC. When Walter Hamscher was speaking about data coming into FDIC. my understanding is you have a one-size fits all. wonder you thoughts on design.
(Note: question from Roger Debreceny.
Linda: We have designed our data collection process. and getting it across entire industry. We have regulators who go out into the fields.
Shidler College of Business.
University of Hawaii at Manoa.
Linda: And that's where they will get information about an institution having something unique. like toxic assets. Don't know if there are thoughts about collecting that. Banks don't want to tell us if they have something toxic. that gets discovered during supervision process. I don't know if SEC has examiners that go out. I think they rely on accounting firms. Is that correct?.
?: Yes we do have examiners.
Linda: different types of toxic assets are captured through examination process. not the reporting process. Would be interesting to see how useful.
Matt, IBM: on input and external models. dimentions were not present. In taxonomy and models, is there intention to make more dimentionally oriented.
Linda: I'm not sure what you mean by dimensions.
Matt: you need to learn the table explicitly.
Mike Rowling, IBM.
Matt: I'm looking for explicit use of metadata.
Matt: areas of exploration without having to know the model. a general XBRL tool, or learn the model?.
Linda: No one wants to learn the model. So we have created different views and a front end to discover and review the data.
Diane: Thank you Linda.
If you heard things you want to call out. Please add comments to the white pads. then into the afternoon sessions.
12:15 "Legislative XML: Injecting XBRL into the Appropriations Supply Chain", Daniel Bennett, eCitizen Foundation
Documents should be stored at dependable URLs (slide 5). Being able to click on the rendering (slide 6) and see the source data. Daniel: I came to this from work on Legislative XML.. The use of human readable, yet machine processable data..
The data could be linked to (citable) with permanent, dependable URIs. Today a high percentagle of legislation is drafted this way.. I went to a meeting where Diane introduced me to XBRL (slide 8). Could we tag the financial data?. Lawyers need what you see is what you get, so InlineXBRL looks appealing in that regard.. If we get this right, we can track from the OMB through the agences to the recipients.. We haven't got there yet, but this is the direction we are moving in..
Slide 12 describes some of the considerations involved.
Because of the URIs, users will be able follow the links all the way back.. Daniel prepares to show a movie, which unfortunately can't be linked from the minutes.. He encourages people to get involved (slide 15). and build a path to trustworthy data!.
The movie features several key people from the SEC explaining the opportunities for using tagged linked data..
Question: XBRL is a global standard, but there is a lack of coordination on legislative XML across the USA..
JH Snider, iSolon: Emphasizes importance of global standards.
Daniel: Joe Carmel is not here, I'll try my best to answer. In 2001 we talked about issue of states, Senate, House, NARA, GPO. we all had a vision of possibility to do this. the people who understood legislation thought it was too big to get done.
XBRL has dealt with those issues. XBRL's metalevel makes it easier to do, to build in rules. When people talk about general ledger stuff. it's about passing data. but not so much validating it in the open. But that has changed. 9-10 years ago, we hadn't gotten to that point. So now XBRL allows you to layer in other standards. Have your own taxonomy for how committees are set up. and then build standards around those thing. and go down to the municipal layer.
Look to the future; doing bond reporting and use EMMA.
JH Snider: It's more compelling if it's 40,000 legislatures, not just one.
Daniel: I missed an important slide. About laws, rules and regulations.
Users must evaluate the current laws and regs. In other words, because they have not come up with standards. all of the data is suspect. you have to take it back. you cannot trust it.
When you build systems, you need, business, legal, and tech layers. Until laws and regs are done in a standard way. at least with citations. the ability to do citations is doomed. Code is law; law is code is point we are at.
Walter Hamscher: Does legislative XML. propose a universal naming scheme for legislation?. a unique identifier for the fragments.
Daniel: Yes, citations should be URLs with. identifiers. They are unique; more browsers are getting X pointers.
Walter: So you mentioned different versions. is there a canonicalization of those names. the official, the original.
Daniel: You hit the point on the head. you know how to do in printed material. If someone quotes the law, but no link back to it. it's a morass. that's a key point. every aspect, including financial data. that should be citable to that instance.
Walter: Would that go back to others like Lexis Nexis.
Daniel: We all produce silos. interoperability can be done down the line if standards are used.
Walter: Do you have an example of URIs for legislation.
Daniel: We have not worked that out. Web site called Legislink.org. LOC created the Handel system. use XPath into XML versions. we are trying to work it out. working with Tom Bruce.
Walter: Start small, with the EDGAR Filer manual.
Daniel: Let's talk.
Diane: This morning was about the "What is". This afternoon we are starting to draw out the "What's missing". Daniel referenced the missing of citations from legal documents. Next we have Cate Long from Mutiple-Markets. and the head of an open source project called Risky.
12:45 "Reporting of default statistics by credit rating agencies", Cate Long, Multiple Markets
Cate: Good afternoon.
I would like to share information about SEC's mandate for credit rating agencies. I would like to thank Diane and Dave, FDIC, W3C and XBRL for hosting event. I'm learning a lot and hope you are, too. Like to give a general overview of the credit markets. Might be helpful to share background information. Ratings are opinions expressed by credit rating organizations. These organizations have been in news lately. Looking at these more carefully. A number of investor groups were looking at rating agencies. due to mis-priced risk.
Ratings are opinions. In the last crisis, they were spectacularly wrong. They are embedded in laws in US and int'l. Generally ratings express the risk of default only. that it would default. The fixed income market. goes by a number of names. the bond or debt or credit markets. all refer to this large market. Used by gov'ts, corporations, financial institutions to raise funds. Put on balance sheets of banks, pensions, mutual funds. Relative size of market.
US debt issuance.
These numbers are down from peak in 2007. Bond market has declined, but still an enormous market. A bond is a different type of security than a stock. issued for a fixed period of time usually. the issuer agrees to pay the investor a fixed value at a future date. and to pay investor the amount of the face value of the bond at a future date. Underwriters are the investment banks. they match the issuer with the buyers. Like JPMorgan Chase, Merrill Lynch. they look at credit markets, current conditions for issuance. depending upon where in credit cycle. Look at how far out to put the paper, long or short term. and the yield.
Credit agencies are brought in at this point. They create their opnion on the credit worthiness of the issuer. This is important part of process. Debt syndication connects buyers and sellers. There is high demand for debt. Particularly good paper. It's an institutional process. Retail investors tend to pick up bonds in odd lots. 90 percent of bonds are held to maturity. Tend to go off balance sheets around credit event. good paper will be bought by II and held to maturity. Investors tend to be big institutions. Many rated products out there.
[slide shows types of bonds].
The process of credit agency and underwriter coming together. So an example is Ford Credit. They issued a spread over a Treasury Bond. Ford has weak credit. They paid 8.7 percent.
Lowest ratings. Weak issuers can go to market. Ask if you have questions. You probably know these names from recent media. Moodys, Standard & Poor's who issue almost 90 percent. So legislative focus on this oligopoly. They are forward looking. Going back to Ford Credit, a triple C, weak credit. If Walmart or Microsoft issued bonds. they have strong balance sheets. Microsoft is a double A name. so it gives you the relative risk of owning securities. and how much return you should get for owning that risk. If you own Ford Credit, you should get a lot of yield. because it's possible they won't pay you back. We take those ratings and aggregate them on a broad, quantitative basis. We get default statistics. Some private and gov't entities. have defaulted on their obligations. Even high-rated issuers will default. That points to capital or liquidity structure. It's not a magic formula. Both Lehman, Bear Stearns and AIG were rated "A" at time of collapse. That's important.
An "A" rated doesn't mean it won't ever default. But SEC and bond markets want to understand the risk of default. So we went back to look at the data. And used that looking forward. to determine probability. by extension you can look at rating agencies. and likelihood of default. Here is two sets of default stats. from Moody's and S&P. Look to single B, a low rating. the risk of default for Moody's was 1.3 percent. and S&P was 2 percent. though comparable scales, the default statistics vary. and these differences are big in the bond markets. Ratings are useful for investors to understand risk. but not fixed. So how do SEC and Congress legislate usefulness and standardization. in this space?. It is complicated. there are some missing pieces.
In 2006 after many hearings. Congress passed the Credit Rating Reform Act of 2006. Require reporitng of default statistics. SEC adopts rules for "performance statistics". Agencies needed to publish default statistics on their Web site. SEC went further. and proposed that agencies expose individual data. Lehman was rated single A. up until time it went bankrupt. What SEC is doing with new rule. Is to require agencies to expose their ratings. so we can compare how the ratings agencies exposed risk over time. It helps us to evaluate the accuracy and performance. of various rating firms. So this new SEC rule, adopted in Sept.. Ratings firms get paid to rate issuers. They are going to have to expose all their paid ratings. in XBRL on their Web site, one year after they have done it. There is also another group of ratings agencies. that don't get paid.
SEC gave them two years. to expose to public in XBRL. Not sure if you published XBRL tags yet. Until tags are adopted, agencies can publish data in any format. I see David shaking head yes. So for public, academics, journalists. we can begin to mash-up data. Researchers have taken default rates by industry groups by time. Hotel gaming industry is cyclical. On a macro basis, start taking credit ratings and mash them up with GDP, growth rates. authors wanted to make point of other countries' asset exposure in US. Many things we can do with XBRL in the financial markets. The financial markets are different from banking space. As part of looking at XBRL, and looking to knit them together. Electronic trading standard is FIX. FpML is mark-up language for. ?.
Because of my work on the Hill. There is a pending bill to make XBRL the standard for disclosure to the US government. (FpML is a standard for swaps, derivatives and structured products). The Netherlands and Australia are doing this.
Diane: Open up to questions.
What's missing is taxonomies and tags. Standard business reporting.
Ben H: When credit agencies report. on individual success or failure. When do they have to exercise more due dilligence. and look at the transactions beneath the financial statements. The SEC has been rule writing in the structured finance area. There is now in the Senate. and House a fight about liability. if agencies don't look further and do more due dilligence. Fight over how much.
Ben: Will they require some XML based, vouchable system. on the companies they report on.
Cate: The services provide them with data. unlikely to do wholesale. Investors and credit rating agencies looking at. encourage more analysis. not just their own.
WalterH: When rating agencies publish a set of ratings per company. they have their own numbered schemes. SEC publishes its data. Is there any chance NRSRO require the CIK or some other identifier to be linked. If not, who should maintain that concordance.
Cate: I have been encouraging open sourcing. hard to create transparency in fixed income markets. Is CIK system adequate?.
Cate: Offline, a good project. Deutsche exchange looking to do something.
Walter: Different agencies are limited to what info they can ask for. We cannot publish our concordance. it has to happen somewhere else.
Cate: That may explain why you have not published the tags.
Cate: Yes, I just wanted to flush that issue out.
Walter: Yes, all the agencies must recognize that is an issue.
Require CIK in the data published.
Cate: Rating is issue-specific.
Walter: one to many mapping is better than nothing.
Cate: I sense a solution in this room and maybe we can find it.
RogerD: A similar question to Walter's. I have been working with SEC's XBRL data. I had to take public data. and go back to private data. to pull out data points. to mash together data. Has to be a public policy reason.
A comment. You were appropriately focused on performance rating. Intention that each rating. or modification must come out in public domain?. What volume?.
Cate: A million plus. 700 thousand in large firms. Every tranch has a rating. top three, other seven. 1,000 to maybe 25,000. the database size, cohort size; they roll forward. Let me go back to the public identifier. We have stock symbols. used openly. equity market information is much higer. We need to find a better system.
Diane: That would be good question for break-out.
Ariel Blumencjweg: The information you collect for ratings. most people don't pay that much attention to it. the first rating issue has been in terms of analyzing credit information. capturing capital structure information. I wondered if XBRL community had thought about. extending the standards to the fixed income side. or the balance sheet side of the companies. One of the things going into the ?. senior secure. senior debt. that should be fairly limited extension. of a label system. and not go into the unique identifiers for every security.
Cate: we need an open, common shared way of sharing data.
Ariel: There is an initiative in the asset back securities space. to create a data base for low level data. Industry has managed to make it a closed system.
Diane: Thank you all for participation.
Diane: Set up for panel.
1:15 pm "Best Practices for Improving Data and Metadata Accessibility", panel discussion moderated by Diane Mueller, JustSystems
With the current push to improve public access to data on the Web, the interoperability and harmonization of the data and the metadata that is used to describe it is one of the keys to ensuring the Web continues to be accessible to all participants. This panel will discuss current best practices and lessons learned that could be applied and what steps might be taken to ensure that the Web continues to be accessible to all participants in the financial reporting supply chain.
- Howard Kaplan, SEC
- Kevin Webb, Sunlight Foundation
- Daniel Bennett, eCitizen Foundation
- Walter Hamscher, SEC
- Dennis Newcomber, XBRL International, Best Practices Board
Diane: Broad topic of meta data access. Like us to think about lessons learned from implementations. what are best practices. and what can we learn from each other. many are done in various places. I was happy to hear Cate talk about the concept of mashing up the data. make links between these different sets of data. Issues around lots of industry standards. and how do we harmonize.
Like to start off with Dennis Newcomber with XBRL II best practices international board. Also invite you to participate in questions. From XII perspective, what are the best practices. to XBRL content online beyond US?.
Dennis: Beyond US Gap project.
Let me point out, that XII cannot force people to disclose data publicly. speaking on US GAAP. First we have a process. and post on XII Web site. So you can go look at actual taxonomies and supporting documentation. We have a collaborative environment. Generated by a team of volunteers. publicly available. That's about it.
Kevin Webb, Sunlight Labs.
outsider perspective. like eCitizens. One of things we were discussing. are challenges around identification and disambiguation. how are we talking about the same identities.
Kevin: At Sunlight Foundation, looking at Fed spending data. it's a stumbling block. not enough agreement. in terms of spending. hard to know where it went, what the purpose was. It's a hard problem. As we start to build cross-government data systems. we have to look at how we build. I have looked at FedRes work and FDIC. best work for building hierarchies. Within Federal spending, not really. often hand off to private industry. We need to come back to that decision.
Diane: With that outsider's perspective. some of finacial content can appear daunting, or be mis-understood. There are some lessons from the Recovery.gov. some of complexity is unavoidable. but more needs to be done to make it understandable. for citizens. What's missing? What else needs to be done?.
Kevin: The way Federal budget processes works. is a complicated term. even people in private sector don't understand appropriations and budgeting processes. what states things are at. and how decisions play out. We have a team of journalists looking to translate this. but I haven't seen an interpretive layer on Recovery.gov side.
Diane: Daniel, the OBM, GTO, what's your perspective on making this more accessible?.
Daniel: There was a study recently. for how people can get info on the Internet. There are two pieces: the architectural problem. every piece of data should be at a URL. and it should be structured so it can be consumed. and it should be human readable. Then get to other part of real people and their lives. When you create better data. at URLs, well architected. People will build the apps into Twitter. So the more, better structured data you have. you can grab the stuff. But if you document it, there can be translation tools.
Diane: You mentioned investigative journalism. It is more important to get good information. but there is a challenge of authentication of sources. citations, etc..
Daniel: I like talking about trust. if you have dependable URLs, and good structured information. people love SQL data bases. but what they are not doing is putting it out as a document. and it's well documented, and easy-to-find URL. that's citation, the ability to cite things. Vision of Tim Berners-Lee, but we're still talking about it. If we understand and move forward. We go to SEC.gov. we know it's put out by SEC. so we can trust the domains. If people grab that, they should always bring the link back to the original data. so they can always go back.
Diane: Daniel brought up concept of citing down to the cell. logistics of access to financial data. We learned it's not enough to make the data available. need infrastructure. What do you see as roadblocks to eco-system?.
Walter: Challenge to answer that. people don't experience data except through an applicatoin. so it really doesn't matter how good the data is. and I'm surprised how simple that access has to be. The XBRL data on the SEC Web site. different types of data. valuable stuff in one header. find pointer to document you want. I want to say that vendors and software developers. who want to get into XBRL. You can set sights lower and add value. No need to take on a big problem. If you want to publish and equivalents class. (SGML header to EDGAR filing has good content without even digging inside). Take impaired inventory. You need an equivalence table. for compared inventory. and a given time period. takes some accounting knowledge and analysis. a translation table is quite valuable. Doesn't have to be huge. Set sights lower. drill into that data and provide the mapping. Prepared inventories for example. Go into a financial statement. revenue recognition. executive compensation. Think about the application. the end to end. so it fits into somebody's spread sheet. Focus on that application.
Dave: I would add something. Private sector is in a better position to make the data accessible. Short of rethinking EDGAR. private vendors have a lot more flexibility. where they can link to. A company might harvest files. and provide links back to original company. The SEC cannot do. We have also seen. and become aware of some really interesting ways to present the data. Not just a question of accessing the data. but how you want to work with it. Visualization tools. Combined with other social data. Over time, trusted companies will provide. take that same data and present it with tols. and ways to manipulate it. SEC is not providing for the average investor.
Diane: Where are you in understanding governemnt funding. for the developoment of these tools?]. Where to deploy resources.
Daniel: It's vital that governments publish data in simple formats. Just by putting it out there, it can be used in real time. ways to access. then it's available for value-add. The issue is the gov't needs to put out the data. Then the vendors or non-profits play with it. everyone can mash-up the data.
That is an important government function. The data we want to see is the data from system of record. What we get is from USA spending. second class citizens. some of data is rife with errors. have to ask what if we had access to original data.
Daniel: Having URLs is crucial. For example elections. FEC has information about elections. Go to Web site. and check into information about 17th Congressional district. but that information is a proprietary .com. You can create URIs that are not truly URLs. Gov't can product good URLs for metadata.
Diane: You are repeating. NIEM and NIST have rich metadata behind the firewall. The White House visitor data log is now public. I traced the URL to a CSV file. and I looked at SEC site. for visitors. and this is beginning of trying to do mash-ups. There is a lot of extrapolating. I won't discourage anyone from putting raw data out there. But what's missing is good strong meta planners. There is the contest, Apps for America. Wonder if we can do more to let people know data is out there.
Kevin: We discover where weaknesses are with public scrutiny. public sector may not know how bad it is. Contract data; cannot understand where the money is going. Until gov'ts know people are looking at them. then it becomes a communal effort to fix it.
Diane: I'd like to open it up.
Jim Snider, iSolon: Question for Kevin: What is the appropriate level of government. to solve the URI. inter, intra-agency. rather than a government wide problem. other data bases aren't just XBRL. agencies. I know there was a proposal for identifier practices across government. Should this be at the macro level?. or within communities?.
Kevin: I think people who look at it day-to-day can speak to it differently. This is a fundamental problem that we have to solve. cuts across all of it. Whether private sector transparency. or gov't oversight. and you have been stymied for decades. it cuts across the ability to build great systems. I think the government needs to step up and recognize this. Another area is CUSIPS. too much control for private industry. government has to step up to role. digital naming.
David: I don't think that the highest levels. of various gov't agencies. recognize the nature of this problem. and the importance of providing information. It would need to become a priority of Congress or OMB. someone would need to require it,. put in place a plan, and fund it.
Jim: So this is a larger problem than XBRL community.
Daniel: It's easily solvable today. Use Web standards. Cool URIs. ISBN. By creating a URL you are naming an object. Cool URIs. Then you have done that. and track translation tables between systems. www.ibm.com can be linked to SIK. if there is a URL for that. whatever internal number, just append the domain. great for metadata.
Dave: I didn't mean to say it's a technical problem. It's a matter of making it a priority. Set aside the money, time and staff to make it happen.
Walter: I see it as a lot of small, binary efforts. some people see it as the Second Coming.
Kevin: Less a technical issue than ownership. We have a naming standard. but get to issue of rights. at government level.
DavidN: This is a common problem in large companies. [?] Issue of naming is core. Semantic Web provides technology solutions. but power to name is power to control. so legal issues. privatization. so there may not be correct paradigms. think about digital paradigms. at eCitizen Foundation we held workshops on names. Universities and vendors should get together. Organize a top-level commission. to look at a next-generation naming scheme. take into account states. put request forward to the Obama Administrationwe hope you will come to next identity workshop.
Patrick Slattery, Deloitte: same disclaimer. good deal of complexity, some risk. I cannot see a private organ take on risk without monetary. proposal just made is right way to go.
Diane: So we have identified that identity is one of roadblocks. to a Semantic Web financial system.
Walter: I don't see roadblocks. Hard work has been done. Getting the SEC to mandate this for public companies is an important step forward. The idea that everything we get. that comes back out in one format. is a tremendous thing. building tools and applications to take advantage of that data. is important. I have to emphasize it happens bottom up not top down. in a localized fashioned. like impaired inventory. they don't need the entire mass of it. Think in terms of smaller applications. things that can deliver data into Excel. Income tax notes. huge potential value not being put there. They get fascinated with the taxonomy. but lose site of the data. you get the idea.
Diane: People create the mappings. so sharing of that. so mappings can be reused. Have them publicly available. Is there a role for XBRL II to host or vet?.
Diane: other industry sectors that have addressed these problems?.
Daniel: I just want to touch on financial systems. you should apply BLT. business, legal, and technology layer. where SEC says there is no way to acknowledge the law definitive way. a system for legal implications for how you do stuff with money. other stuff has to get done. Wikipedia has been a great experiment. eCitizen has been looking at crowd sourcing dispute resolution. when it comes to taxonomies.
Diane: And tools for collaborating on taxonomies.
Walter: A lot of public companies have wikipedia pages.
Diane: Other questions? Thank our speakers.
Diane: Invite next speaker.
Dan Schutzer, FSTC not hear.
Serving the Investor Community
2:45 "Leveraging XBRL from Japan and Korea to provide Global Insights", Hatsu Kim, VP - Global Fundamentals, ThomsonReuters
We are working with documents in Korea and Japan. Thomson Reuters went through merger. We were dealing with various data. show you some of the areas. Two areas: Market division. XBRL is fast, accurate, cheap, and scalable. we can analyze, model, and trade data. Making data at a competitive price is a challenge. cost conscious production facility. XBRL came into our life at the right time. Benefits us. Intermet based; exchange. XBRL is more convenient than CSV files. better automation. Internally looking at XBRL. process is simple. document comes from market, we put internal ids. parse the data and feed data to internal users. and then through product to customers. We really take a small portion of XBRL technology. comes down to a relational data base.
We have been talking about issue for some time. Most difficult part was convincing management into this process. Many of us were learning at the beginning. We wantd to show what we could get through. Feed the data into the process. Those things sorted by priority. Priority is less important than before. Our experience is 2.3 seconds. Once data is coming in, we store in temporary XBRL data base. Then back out to CSV files. We have taken full advantage of XBRL. Not all data comes in XBRL format. Had to go back to see what is missing. Put original data into final database from which we serve our product. We get about 80 percent of data points in XBRL. the efficiencies we gained. was about 20 percent. We have more work to do to get to full automation. Internally we could not execute so much internally. So we hired outside. Once data is coming out. a couple communications we use. are media headlines. This is now automated. Our clients expect data feeds from us. They feed it into their model. Those are the basic processes we are taking.
Look at properties and benefits. Accurate, cheap, scalable. We used to have language concerns. now XBRL is language neutral. Automation is to make it cheaper. Our plans are to invest in community. We focus on financial data. but are not limited to this. also private company data. Some challenges in uisng XBRL. Not many countries are on board. We get 95 percent of data from press releases. XBRL is not full set of data.
Tokyo Stock Exchange was not ready to adopt. they have been promising filings since January. They were not ready to take on new taxonomy. Some other challenges. When there are two sets of data. in one document. Legacy infrastructure is not ready to the new technology. Same tag used twice or three times. Net income and income statement used in cash flow statement. but considered same data. Our system is not ready to take that. We take CSV file and push it to that. data need to be pushed twic. how do you know which has been pushed more than once. It's not easy. Internally we can hire consultants, but we can only do so much. We don't have internal resources to do that. Other things are about storage of XBRL documents. Our products are not ready to interface. We talked about using our currency with XBRL. So that's what we went through so far. The filing itself is there. It's the iceberg above the water. But we're at primitive stage of using it. The most valuable part. even language is tough. accounting standards is confusing. when accounting standards differ.
Diane: Thank you. you have drawn out some interesting points.
Roger: That was a fantastic presentation. You are in a position to observe in the real world. the export of many XBRL initiatives. I got the impression. that you are still taking these XBRL feeds and forcing them against your internal taxonomy. Are you getting the benefit. pushing information to users. Did I hear you say that in US you get info from press releases?.
Haksu: Taxonomies. we put our internal code. we like to map in [?]. when a company use base taxonomy. we know what to do. but we don't know what to do with extensions. We put some logic to assign our code. and meld them together.
Technology point of view. Japan, they have one taxonomy. industrial formats are shared. but only CNI format. Look at Korean format. Elements not used in CNI, we had to map all those together. Technical point of view. We need to be prepared in advance. When implementedthey will bring up these new issues. Consistency challenge with implementations. I did say 90 percent of data is from press releases in US. We push data out immediately to our users. We are leaders in the market. Only those not available in PR, like 10K and 10Q. better than two years ago, we now have a full income statement in the press release. So US is a lower priority.
Diane: Thank you very much. Coffee on the second floor. be back at ten of four.
3:30 "A pathway to improved fundamental financial analysis: The Singapore experience", David Watson, WHK Horwath
David: present experience in Singapore providing interactive access to financial data. strategy is to develop XBRL solutions to prepare for when Australia comes online next year. Singapore mandated public & private companies to file in XBRL in Bovember 2007. mandate based on transparency, better reporting, better business analytics.
(ACRA is the Singapore regulator involved).
David: recognised a need for systems to aggregate and analyze XBRL data. worked with Singapore government to build Open Analytics. launched less than 10 days ago. Singapore mandated requirement for filing in XBRL but have not (until now) made the data available back to the public. led to perception that XBRL was only a compliance cost, with no one else benefitting.
David: strategic aim was to develop a solution that can be reused in Australia despite initial focus on Singapore.
David: immediacy goal - get analysis of new filing within 5 minutes. today, analysis available as of midnight of the day a filing is made.
David: interface to Open Analytics premise is that you can search for company name or registration #. searches all public and private companies that report to ACRA. excludes only those co's that report to monetary regulatory agency (e.g. banks). Singapore doesn't have a requirement for privacy once data is lodged with the regulartory authority. perform comparative assessment - compare a company against peers as per industry classification. first thing available is basic report on filed financial data, augmented with some analyses (PDF report). looking at company's performance in isolation - 28 key ratios in 7 perspectives - profitability, liquidity, growth, capital mgmt, etc. can drill in to learn how to interpret ratios. can view trend analyses. provide audit trail back to elements within Singapore XBRL taxonomy. "industry analysis". scatterplot of revenue for hotels and restaurants filing in Singapore. pick competitors to plot against one another. in the past could purchase benchmark statistics from benchmark services. solutions like this give more context as to whether this is meaningful - e.g. historical context.
scribe: Singapore gov't back-processing non-xbrl bond data back to 2004. strong positive feedback from press/journalists. academic institutions: "gives us unprecedented access to local data". "analysis 1" - developed many years ago, XBRL compliant recently. analysis 1 is a solution designed to allow consultants etc to add value based on XBRL. xbrl acts as conduit to allow data to flow straight into analytical systems. understand cash flows, what-if scenarios, - suite of analytic tools for engaging the client, XBRL makes this possible. benefits to derive from xbrl mus tbe coupled with the costs of adhering to the standard. in singapore, what to do with the data was an after thought. better to ensure that there's not too much time delay before seeing tangential benefits. close the feedback loop.
Questioner1: Are you going back to 2004 to retroactively input XBRL data for companies?. how?.
David: Not sure.
Questioner2: (1) Can you tell us a bit about the project undertaken to implement this? (2) how do you handle extensions?.
David: (1) started at a conference, (???). (2) have not given much thought to the extensions.
Diane: Is Open Analytics publicly available?.
David: Was a requirement that the solution be affordable - it is pay per view. what i've shown costs about $30 US dollars to access per year.
Questioner3: how to compare companies with different fiscal years?.
David: assumption that companies with year-end in same calendar year are comparable.
Questioner4: For $30 you can access your competitors that are private, not just publicly traded information?.
Questioner5: Do you have companies that operate in multiple segments? How is that handled for reporting and what-if?.
David: 2 industry classification schemes - SSIC and SGX. problem with SSIC is that tha majority of public companies have SSIC classification of "holding company". tried to provide another way of segmenting by using SGX (stock exchange / sectorial classification).
Diane: Next is panel discussion.
4:30 "Finding the Nuggets in Financial Reports", panel session moderated by Diane Mueller, JustSystems
As corporate investor-relations sites, social media and financial information start adding XBRL and other data standards into the mix, we’ll take a step back and discuss what the users of this information want to do with it. What does having access to financial content as XBRL mean in practice for investors; just what are investors doing with the content they have today, and what do they expect to do in the future? What do investors/consumers expect from a fusion of cloud computing, search engines and the Semantic Web?
- Micheller Leder, Footnoted.org (slides)
- Dan Schutzer, FSTC
- David Watson, WHK Horwath
- Brian Broesder, AOS
- Ashu Bhatnagar, Good Morning Research, Softpark
Ashu Bhatnagar, Good Morning Research, Softpark - RDF tagging of XBRL documents.
scribe: web site for sharing of financial tagging.
Brian Broesder: focus on level 3 financial instruments.
Michelle Leder: footnoted.org.
Diane: financial nuggets are embedded in text blocks.
Ashu, you've enabled people annotating financial statements - how does that process work and what are some roadblocks to it?.
Ashu: free site - goodmorningresearch.com - inspired by real-life experience at research firm. research analysts producing financial models. sales folks sharing insights with buyer side hedgefund clients et al.. insights were annotated to make info quickly available. in the firm, used Microsoft Sharepoint. another approach using other software - traders and analysts exchanged (electronic) notes (Tamale). people should be able to comment on xbrl files rendered as a table. post comments, ratings, . hurdles are noise factor. lots of abuse if you open it to anyone.
Michelle: site looks at things companies bury in their sec filings (footnoted.org). (technical difficulties).
Diane: What is AOS? how do your tools help determine what's relevant and how do clients use that info?.
Brian: Sophisticated institutional investors w/ unsophisticated tools. primary tool is Excel. difficult to look across a portfolio of securities. ability to click through multiple holdings and find risk, take action, etc.. we primarily deal with private securities. lots of disparate assets. try to find shared identifiers to identify common characteristics to look through a centralized platform to view complex financial instruments. pull out most relevant information from each type of asset. find overlap between asset classes. need a much more powerful database then Excel to do this.
Diane: Where do you see XBRL fitting in?.
Brian: I'm new to XBRL - we're going to figure out how we might use it. private security info is scattered across many formats.
Diane: David, we see with wikiinvest, youtube-like tools, lots of re-use and linking back to authentic source - have you thought about how people can share analytics and link back to the source?.
David: model we've put forth - version 1 in Singapore - is read-only. no save button, can't add commentary. no reason we couldn't do that down the track. being able to supplement with your own analysis is very powerful.
Diane: how big was the team andproject for Singapore?.
David: developed Open Analytics in 3.5 months.
Diane: what else can we do to make SEC filings more accessible?.
Ashu: sold on integration of xbrl + semantic web technologies. Thompson Reuters Open Calais extract RDF from text - currently a standalone technology outside of the domain of XBRL. future is in integrating technologies to solve investment mgmt, asset mgmt problems.
On IRC, GeoffShuetrim ciao till tomorrow.
Day 2 - 6 October 2009
Welcomes everyone. thanks FDIC for wonderful hosting and speakers. Today's focus is more technical. including focus on Semantic Web. talk about developing ontologies. nurturing the financial ecosystem. how to we resource all those pieces, bring together individual silos. We saw some great applications yesterday, such as Singapore. many different views of the data. how do we make this more of a connected ecosystem?. How do we get a common identity identifier, tagged with metadata. lacking common links. A lot on our plate today. Yesterday was about "What is". and some of "What's missing". Today look a the ecosystem. Let's talk about the break-out systems. We're looking to have four focus topics. At lunch we'll talk about them. I have teased out four.
Some people have asked us about a diagram of this ecosystem, to map this out. The identification. One of key things is next steps. how do we promote the collaboration. with XBRL, W3C, NIEM, etc.. how do we keep this conversation alive and continue the collaboration. Its' a public and open forum. Everything will be published in the report findings. We have been asked. When you step up to ask a question. Please state name and affiliate slowly before you speak. And please sign release. This is being video taped. Dave, do you want to put up your slides. Dave is a W3C Fellow on the financial data on the Web. and my co-chair for this Workshop.
9:35 "Introduction to linked data and Semantic Web technology", Dave Raggett, W3C
Semantic Web is about giving computers better understanding. a Web of meaning. Think about booking some travel. to do actions you have to go to many Web sites. but they don't mean all your needs, and may be biased in terms of marketing. If you want to combine the information, you have to put a lot of effort into collecting information. that's a waste of our effort. We should let the machines do the work. and that's what the Semantic Web is about.
It is, essentially, the Web of Data and the technologies to realize that.
Here are some of the languages.
RDF, OWL, Rules, GRDDL, SPARQL, SKOS. POWDER, RDFa. the devil is in the details. Main point is about Linked Data. and the power of URLs. key insight is relationship of these triples. thanks to the uniform representation. Let's look at an example. a book store. slides adapted from Ivan Herman. Here is a relational database table talking about books. So you can take that data and map into some relations. The arrows combine relationsihp. Like the ISBN number. Ovals take an idea of one to many or many to many. you have to create those. Next data source is a spreadsheet. from France. They using their own terminology for the same book. translation for the same book. the terminology is different. So we abstract into binary relations. then you merge the data sets. We note that the URIs are the same, so we merge the nodes. If it's the same URI, it's the same resource.
Now we can add something additional. Computer doesn't speak French. so you have to tell it that auteur and author are the same thing, and it's a person. So here is the graph on bottom right. and now additional information about the author Gosh. You can now ask for information from different sources related to this author. We started with different data sets. You map into some abstraction based on URIs. then you can manipulate. So the key thing is that by using the URIs, you can combine data sets from different organizations and countries. and use the power of the Web. So the essence of making good things happen is to have the data sources. and use standard protocols and query mechanisms. combine name spaces. also data sets. Do remote queries or do local queries. Also use URIs for APIs. We want to encourage people to innovate. More data, more applications. Here is a diagram. Linked Open Data Cloud. People who are making their public data available. People who have contributd their data.
Question: What do the arrows mean.
References between the data sources. where they build connections to the different data sets. and a year later, more people are exposing their data. So now we need to do the same with data sources, to add value. SEC by making its data available, is opening up new possibilities. and as others do we will have more information.
Question from Walter Hamscher.
The US Census data points to Geonames. Does that mean US Census points to this third party site.
Dave: I can get back to you later on that.
so, say, there is an entry on Amsterdam in dbpedia and there is an entry in geonames, and then there are links that 'join' these two in the datasets.
Dave: OWL ontologies allows for richer semantics. There will be a talk about translating XBRL taxonomies into OWL. OWL has many possibilities, but comes at a cost. if you have lots of rich meaning, it can slow things down. So there are many OWL profiles. OWL DL is used by many. OWL2 looks at relationship between SemWeb tech and relational database technologies. Rule Interchange Format. Rule languages are sometimes more convenient to express. Many kinds of rule languages. You can use RIF between similar families. Relationship between XBRL and the SEmantic Web. If you have XBRL, why bother to translate into another format?. Because XBRL uses XLink heavily. it becomes expensive to process. can convert into other models. Rather than process XBRL directly. preprocess into format that supports query. No standard query languages for those models. So perhaps SemWeb query languages can help. SemWeb are mature standards. Complex queries, large data sets. SemWeb makes it easier to combine large, diverse data sets. Can allow people to start talking about what combinations of data are possible.
Thought I would show a few examples. This is part of US GAAP taxonomy. converted into RDF taxonomy using Turtle. Two reporting concepts. and a parent/child relationship. An example of an XBRL instance file. an identifier, start and end date, currency measure. XBRL taxonomies loosely equate to OWL ontologies. But are some details. Yes, I'll use US GAAP, but may take away some relationsihps. automated mapping is possible. We will have a talk later. SemWeb can provide richer meaning.
Diane has been speaking about an ecosystem. Some possible ideas here. Publishers of raw data, investor relations sites, new sources. data aggregators like Thomson Reuters. Possibility of doing this with SemWeb by making triples directly queriable. Some particular reporting concept, and ask quickly. High-level APIs. Upload smart queries into the network. Idea being means to upload scripts to capture some of that. Scripts could do things like custom analytics. Smart search engines. Googles allows you to search. query across thousands of machines. Why can't we do same for financial information. But get search engine to do more useful work. Search engines are getting smarter, but specify the intent of the search. and it may be based on your own preferences. Which brings up the topic of privacy. If you have this linked information, you do need to look at this. I am involved in a EU project on privacy. Thank you.
Question: Walter Hamscher, US SEC.
There are several different layers of lanugage you described. in XBRL there is not inheritence layer. Do you have to translate to an ontology to get inheritence?. Some are propoerties. all liabilities have those propoerties. there are different levels of OWL.
Walter: What technology layer do I need to get inheritance?.
Dave: OWL DL would be a good choice for inheritence. the computable one. OWL Full can capture Semantics.
Lee Feigenbaum: There is a layer in between.
RDF Schema is effectively a layer in between RDF and OWL and will handle inheritance. [See slide 28]
Eric Cohen, XBRL Global Ledger WG: You spoke about expressing XBRL with OWL. Are you talking about taxonomies or instances with OWL. is that the movement of point A to point C.
Dave: First point was what you apply OWL for. in this case mapping taxonomies into OWL. We will hear a talk later. Taking XML schema into OWL. The example I showed. mapped linked bases into RDF. I was not making interpretations, just mapping taxonomy directly. If you want ot add rich semantics, that's an addition.
Eric: Follow up questions. talking about triples. Where would those relations. and encourage innovation. Where would those relations be stored?. So someone could see doc name, date and relationship?. Is there a central repository where people can see and reuse this?.
Dave: That is a good question. gets back to issue of provinance. show me the source of the data. you can imagine that SEC could provide direct access to RDF. and provide data in some form, and let others provide translations to APIs. that's question of what the ecosystem should look like, who are the players.
Eric: Last point. XBRL wasn't developed just for e-reporting. it is ecosystem of busienss reporting, transactions, business events. so may different types of reports. I have seen a few papers in this area. Does it makes sense to turn 500 transactions into RDF or OWL. XBRL GL is at that detailed level. a seamless audit trail in the business reporting supply chain. My question is where it makes sense for XBRL and OWL. Like RIF for some business rules. What is the complementary solution. versus end reporting and full reconcilliation.
Dave: You can have data in many different representations. and then having common abstractions in RDF and OWL. but there are other data sets you will want to combine. XBRL and SemWeb are not in competition.
Question from ? FDIC.
Query language itself; how you publishe data. Query language, in your view, is that going through a. HTTP protocol. is that browser or server level?.
Dave: The underlying abstraction is in terms of triples. which represent vocabulaires. Query mechanisms, like SPARQL. you can get triples. or you can get results rendered back in another form. Optimizations is like database queries. You really want to make queries the responsibilities of the engine.
Question: Is it about browser, and then SQL?.
Dave: Some of implementations of SemWeb are built on top of relational data bases.
9:05 "Open discussion of the impact of Semantically-enabled data and techniques within eGov projects, e.g. Data.gov", Brand Niemann, EPA (PPTX)
Diane: We are running short on time. Some clarifications needed. Glad that Dave set the stage for further discussions. Thanks for intro to SemWeb. Introducing Brand Niemann, EPA.
Diane: We have other information and tutorials to share on different topics.
Brand: Thank you Diane and Dave for invitation to have an open discussion. I'd like to pick up on Dave's presentation. Concept of operations for Obama Administration initiative called Data.gov. 100 data sets. that various US agencies have submitted. to make gov't agency more transparent and usable. Federal CIO has asked for version 2.0. Those of us in SemWeb community have lobbied for SemWeb in version 2. "Success" draft document. states that Semantic Web will enable Data.gov. first, do outreach for people to do this, to put data into SemWeb format. For those of you to whom this is new, who does that RDF and OWL enablement of the data. Seems like a fair amount of you have those questions. I won't attempt to answer, but will provide broader context. I would like your initial reaction to this. There are several meetings coming up to present these reactions. That meeting will take place toward end of OCtober. in context of senior data.gov and federal CIOs. Want to see financial data handled in that as well. You have heard about Open Linked Data. My role in EPA and in W3C eGov IG is to explain to senior level people in administration. Key concepts of RDF and OWL for senior managers.
Works for structured, semi-structured and un-structured information. Metadata and data travel together. Last week we had enterprise architecture training. You are all about explaining and constructing architecture. for you organization. and explain the business case for it, the rationale. those metrics important to senior people. interface between technical and CEO. information exchange and interface in organizations. Last week we did enterprise architecture in the agencies. We came to step four. All agencies will need to do this to justify their investments in technology. and submit to Office of Management and Budget. So a step four and principle five. I said this is where the Semantic Web fits in. It's what I put forward to EPA and OMB. three categories. How you mark-up data; business problem; and what result to present. Many silos. we create over and over again. Put the mark-up in the data. and we will get more powerful applicatoins.
Last three are SemWeb standards. Most agencies are using XML. But XML is not powerful enough and not designed for dynamic run-time applicatoins. that OWL and RIF were designed to do. We want to leave data where it is. and create a data Web. May use this to explain to senior managers why you want to do this. Before we go into discussion. A brief history of how we got here. OVer last five years, many communities of practice have been meeting. Like SOA. Yesterday we discussed a lot of similar issues. We have talked about how to bring XBRL and NIEM together. Yet another mark-up language (YAML). people keep inventing languages with XML. Obviously there is a need to bridge across this, the Semantic Web. We are working on this for April conference. Benefit of intelligence sharing environment.
I asked Jeremy Warren. is that still the goal and he said yes, we are using ontologies. We had an ontology workshop at National Science Foundatoin. We had a good diversity of groups. Imagine a graph with semantic expressiveness and benefit. Doesn't make sense to build an ontology. a useful paradigm to bring diverse groups together around this issue. At Semantic Technologies Conference, Dave and Diane showed XBRL and RDF. Lee Feigenbaum involved in pilots. And we have SemWeb meet-ups to build community with people around Open Linked Data. We have a long history of bringing together groups using XML mark-up. and collaborate on use of SemWeb technologies. Here are links to more information. I'll stop here. Happy to answer questions.
Diane: I'll ask first question.
Diane Mueller, JustSystems and XBRL Consortium.
Yesterday the issue of identity identification was a hot topic. We need to look at resolving the data sets. Has that issue come up in your conversations.
Brand: Yes, that question has come up many time. One pilot project done by Arun ?. He was looking at global acquisition. He built a dynamic ontology that bridged human resource and financial data. Another pilot was done with data reference. We looked at statistical abstract. Has existed for 100 years. 1500 databases. What data.gov implemented in version one. is to use that taxonomy. What I further recommended is that we build an ontology. What we have is four levels of information. that represents considerable expertise. 40 topics, subtopics,. we have standardized d. Library of Congress has made progress in this area. Data modeler who started in classic way. discovered and embraced the SemWeb as a better way to do data modeling.
(Arun is possibly Arun Majumdar, who was at the aforementioned NSF conference - see http://nsfaccountingontology.wik.is/Participants/Arun_Majumdar).
Diane: What we discussed is trying to get someone to own that problem. so the standards get rolled out in an international way. identify entities in our data sets. Lots of great ideas, but no one stepping up to own the problem. If we can ask the Obama Admin CTO to take ownership of that in the US.
Brand: I think that Tim Berners-Lee and Jim Hendler probably made that argument when they met with Vivek Kundra. There is also a ground swell to put data standards, data models into a ? form. this is a community effort. I couldn't agree more that there needs to be a focal point for funding.
Question, Ben Hu, DAMA: Some comments.
this conference is very useful. I am looking for something to come together.
not sure we are getting there. Ontology, level of abstraction. but we still come back to square one. So many ontologies popping up. How do you see another ten years. if we use ontologies. will people fine out more easily, or is some order needed?. Right balance between standards efforts and innovation. Where is this line?.
Brand: It's a fuzzy line; we see examples of both. the Ontolog forum. society of new applications. Lots of flowers blooming, moving foward. The new emphasis on transparency and openness in this admin has opened the door for SemWeb work. Next six months with SOA will be critical. If people in agencies step forward and contribute data in RDF and OWL, we will see progress. if not, we will miss an opportunity. This has been around ten years. There is much enthusiasm in Linked Open Data. SemWeb meet-ups are enthusiastic.
Ben: You mentioned enterprise architecture. Is there an architecture that is ammendable. so many different standards already out there. Can we have a richer architecture picture to pull things together.
Brand: How else do we do federation across the Web? What other activity is there?.
Ben: Starting from vocabulary. you cannot just invent new vocabulary every day. vetting out the process.
Dave: We are running short on time. Shared vocabularies is part of the opporutnity for the collaborative process.
JH Snider, iSolon: Address the identity question.
OMB looking to take ownership. take examples there. they are raising this at high levels. to realise potential of SemWeb.
Brand: Communty wants to retain ownership. I went to a vocabulary camp recently. ontology developed and owned by the community. I don't hear much discussion of who owns it, it's just there and open and available. and get on with the applications work.
Diane: Thank you for your talk and discussion. I would like to bring up Herm Fischer.
10:00 "Rendering and visualizing XBRL and non-XBRL data from multiple entities", Herm Fischer, UBmatrix & Mark V Systems
One of things I need to think about is whom am I representing.
the preparer of the data, the authority, the aggregator. or the person trying to access this, the consumer. an institution or an individual. My theme is multiple entities. Do we report in one instance document, or separate instances that share. The examples I will use XSLT. and where we succeeded in doing a Web Service. What does multiple entity rendering mean. multiple browers in tab. or in same row. equivalent thing in same party. do we grab data from source in its original. and prepare in advance. Or grab on the fly. lots of isues. Does it come in as XBRl. or did it come in in a Web form, gleaned from PDF or HTML, or Word doc?. stuff is auto-extracted and semantics inferred. perhaps post-modern world the stuff is provided in a schema. allows me to do semantic type stuff, or do I infer a structure?. To me that's in the future. So we'll stick with XBRL centric world. and how that leads to a database centric future. So what I need to know. as I take this multiple entity data and combine it together. Need to know how it lines up. or maybe different tags.
To me, the model of the data, the concepts in there. If I've got different entities. France has only one chart of accounts. only one set of concepts. Prudential reporting in Europe. each country extends them, and bank extensions. SEC submitter. can extend own concepts. Need to align them up. EDInet is like US GAAP prior to 2008. where taxonomy is taken with its linked bases. rather than cut and paste the schedule. So everybody has a different way of doing it. And rendering different data from different periods. small or big tweaks, the linked bases have changed. We now have a versioning spec coming through evolutions. We have XBRL Formula that can merge things together. The linkbase shows the semantics. I think these linkbases are very important. how you align them makes them challenging. One of technologies we have is iXBRL. it is a way to keep in sync the rendering in an instance doc and not lose the synchronization that represents the rendered data.
Here is an example of what that technology does. It's simple enough. so a fairly elaborate XSLT can suck out data. Rendering with table cell might have a number, but it's recoverable back. green stuff shows numeric items. XBRL wants to see things in an ISO notation. in Belgium, commas and dot formats vary. so there is a no display dif. It was difficult to come to agreement, but it's something simple. It's a rendering way. for multiple entity?. Single way. If I have a multi-entity instance document. and there is more than one instance, this si a solutions. But different instance docs from differetn companies, getting more complex. SOmething tracked along, a rendering link base. meant to be. not sure if will continue. an entity part of taxonomy set. How to format dimensions. Goes along with one DTS. If I merge a bunch of DTS for different years and companies, that's different.
Dave mentioned if you cannot do with XSLT, XML people don't think it's real. XSLT is XML. Here's one I did. Provide a multi-entity submission all in one instance document. Each submission, SEC style. Make sure this XSLT is agnostic of dimension names. so it was hard. Idea to come up with dynamic composition. use the entity model expressed in dimension axes. presentation base. shared accounts, formally called line itmes. and take random instances and render them. Idea to use XSLT and take tables with these dimensions. Both came from walking from XSLT. to carve up the rows. Multiple entities, different funds; more challenging to write. So it's achieveable. A model extraction pass. and a rendering pass. for headers. with functions for XBRL processors. Here is a Web service that combines separate filings. every year different taxonomy. reduced to a number of filings. clicks from filings, periods. which part of schedule. maps into column headers and rows. So when multiple entities are selected. they can put rows in columsn. Merge logically, not by filing date, another challenge.
Each period is a separate submission. So what has to be done is a tree merge; it's complex. A lot of sparse rows and columns. To me this tree merge is a complex, challenging area. So a lesson learned. There is a big footprint. Dealing with XBRL processors. 50-150 megabytes. Do a Web search of ten entities over 3 years. that's nearly a gig footprint. try that on your Tomcat server and there is smoked coming out. I fixed this with a caching strategy. a band-aid approach. Not the right thing. Grab things into Excel. We have other technologies for separate instance documents. The versioning spec is helpfu. Versioning spec has been split into six profiles. base ones do name space and mapping. So what I'm thinking, like what Walter said yesterday. We need a different way. We have a versioning spec for different models. and map into something neutral. Sounds like Cobb relational databases. and algebra. I think this is the right conclusion. the meaning of XBRL will change. We're not using floppy discs, Web is ubiquitous. XBRL has to move into a newer media.
My footprint on gigabyte. Name points were multi-entity. Means that challenging entities are emerging. Line item semantics, dimensional semantics, period version issues. In Japan that was an issue. There is the rendering tooling issues. Do I have Excel logic, Web features. Main thing is that our media needs to evolve. I have been working with XBRL for a number of years. So we need to move beyond that "USB stick".
Diane: Thank you Herm. Time for questions and coffee.
Question: Walter Hamscher, SEC.
You covered a lot of ground, all very interesting. Go back to two points. about the multi-entity work. and reconcile two points. Merge trees in different instances. notion of calculation tree. deep graph structures. you also said you need to put them into relational data bases so they can compute. Are you looking at finite set?. Those are two different computing paradigms?. Is there a magic reconcilliation.
Herm: The tree version in Japan example. It was reasonably static. the concepts used were so different. they put in a special label for what to line up; took human preparation. I don't have a good example for that. Here we are using US GAAP, so there should be good alignments in the three. I think those are implementation issues to be dealt with. I built these trees offline. when rendering very fast. under three or four seconds. but if there are 50K items, this needs to be revisted. Big name engines do have associative logic. not just reverse tables but graph structures.
Question: Brand Niemann: The most impressive SemWeb application is done with SPIN.
SPARQL inferencing representation. What they did was refactor artifacts into OWL. used SPIN to get to a new ontology. semantically harmonized artifact across the others. you may want to do that with XBRL. try them in this modeling environment. to get this higher form of Sematnic interoperability.
Daniel Bennett, eCitizen Foundation: Have you looked at adding name spaces within the document?.
What would appear first?. Did you look at overlaying IDs?. Citations or links into the document. A certain portion of presentation in the doc. Look at IDs and use URL# tag to get to a certain point.
Herm: These were single purposed solutions. XSLT used functions of XSL processor to extract model of taxonomy of the line items and dimensions. Also built. modeling or view. form fact items, built a structure easy to process using keys and funcitons. using XSLT too. a neutral form. a custom extraction easy to write rendering base. I don't think it's scalable. an example about doing single purpose things with taxonomy driven data. Japan example with binary caching is nicely deployable. but not scalable to European reporting. There I would want to link the whole thing and go to relational. Or maybe OMG idea. some kind of joining process that can be mapped. that tag to same thing.
Daniel Helps to have a URI to delink into it.
Just wondering if you used IDs in that XSLT experiment to allow citation to the middle.
Herm: Just for a quick and dirty demo to use XBRL, not to express right way to go forward.
Diane: We could talk about that for a long time. Thank you Herm.
On IRC, BenjaminGrosof: I'm back on. Please let me know if you'd like to test the phone setup for my presentation, e.g., during this break. Waves to Karen, Dave, and Diane..
Hi Ben. Glad you are back on irc. You will be calling in soon I think. We did a test this morning and seems to be good. Let me go check with audio right now.
Yes, give me one second. I will speak to you and ask you to responds.
10:45 "XBRL Taxonomy Extension Comparability Issues and Potential Semantic Web Solutions", Ashu Bhatnagar, Good Morning Research, Softpark
Diane: Welcome Ashu Bhatnagar our next speaker.
I bring my experience from a Wall Street project. People there are waiting to see applications for XBRL data. and make more analytical tools. Before I begin. Like to know if you know the term alpha. on the sales floor. where brokers are trading stocks. people looking to find that profit. that requires all the data delivered to them. be in as much dtail as possible. wide a level of filings as possible. institutional investors account for 75 percent of transactions. Requiring data from 3-4000 companies.
So when market opens. trading strategies before day begins. time is of essence. five to 15 minutes is it. cannot humanly analyze the data. Long tail to the data. Let me begin. Talk about taxonomy and how it fits into XBRL taxonomy and SemWeb space. talk about extensions as seen by filers, tech vendors, regulators. and some personal experiences. Institutional investor in Australia. And look at some potential solutions. So here is an XBRL Architecture. bringing this up for the SemWeb colleagues. they are familiar with a layered diagram. This is aclose as you can get. Instance doc in yellow. middle is taxonomy schema. not get into the arrows and exports and their directions. Taxonomy extensions at the bottom. What is the Wall Street view of the data?. They think data is a commodity. But organiztion of the data will be very valuable. Increasingly so. They will not be willing to share it. Precendent in how to search and organize data. Lots of people will be interested in this. High speed trading operates in miliseconds. Common financial metrics. then analysts. They need specific taxonomies.
On IRC, BenjaminGrosof: My updated position paper, and talk slides, are now posted on my personal web page http://www.mit.edu/~bgrosof/.
those financial metrics are used for metrics, benchmarking, etc.. Role of investors. this is where SemWeb comes in. In world of financial systems, necessary for folksonomy. Most recent filings with XBRL. lot of time extensions being done. because the legal department requires exact commas and names be used. So label needs to be modified. Large part of extensions are at the label level. So may need to give both. XBRL standard will not change. If you also look at comparability. It's not at data or information level, it's at the analysis level. Moves through data supply chain. at decision-making level. Screen it and find comparables. This is from taxonomy guide. Not every company will find mapping tags relevant for their industry and needs. Two types. Entity Specific Extensions. Receiving Files Extensions. certain forms. Like Basel II. The folksonomy will soon be at bottom of this pyramid. the users will create new extensions.
XBRL Cloud. tool for top 100 filings of XBRL. It analyzes filings. and what kind of extensions. A good source to look at. Find what kind of extensions are being done by different companies. a lot coming in from legal disclosure. Information supply chain. The moment you start analyzing. Somebody has to normalize the data. Why put into SQL structure?. Most of tools are built on SQL. Take XBRL data and rise to normal SQL database. Based on SPARQL, could be analyzed at file level. One of key challenges. is hardly any time data is useful to Wall Street. unless it's used, merged with non-XBRL data. compare the price, volume, index. Needs to be a lot of data. Somebody else has to create XBRL data for indexes, for example. That kind of information Thomson Reuters, FactSet, Bloombery. that's where the big market is. Where comtability issues are for filers. Filers want to take off filings for extensions. Interested in deadlines, and error free filing. Do an extension for legal risk. For regulators, picture is like Wall Street analyst.
Ideally extensions should be given to an XBRL jurisdiction. and they do the extensions. time intensive. If SemWeb tech or Google could automate, that would be good. For technology vendors. Before XBRL it was a world of HTML and PDF pages. Thomson Reuters takes this data and puts into a useful form. XBRL is like a zipped file. Still need to open and extract. Then there is a culture thing. Wall Street folks are very siloed. Finance and accounting experts, but don't now anything about technology. What are differences and nuances for dealing with that. Q3 2008 may not be an apples and oranges comparison. What does business need from the tech, and what can tech do for business. Comparative issues. How they are remapping. I have not seen major data aggregators on board with XBRL in a significant way. It's more a matter of when. not if. When they do extract, then adding taxonomy will be good.
Show you a hedge fund experience from Australia. They are not yet using XBRL; no XBRL tools to consume hedge fund data. Basic for semantics. built with RDF. Grid which is 2006 companies on Bombay stock exchange. Any data cell can be clicked to open property boxes. here are my tags and labels. each of tags is actionable. Finally, last point. about the trust issue. Where does that fit. Trust issue about who is doing what. Authored and edited by selected experts; by experts and non-experts; "buyers beware". Semantic Web architecture on left compared to XBRL architecture. It neatly overlaps into a vertical segment. Issues with proof and trust are well addressed in Semantic Web [stack]. I am looking for a man-machine-man system. Front end is humans. then machine mash-ups. Thank you.
Diane: Thank you Ashu. What you just described is part of the under pinnings of the financial echo systems. about Trust. and brings up other issues. Look forward to further conversations. Introduce Mike Cataldo, Cambridge Semantics.
Walter Hamscher: It was a hot August day.
I was sitting and thinking about Excel spreadsheets. what I was just thinking about. And Karen Myers from W3C sent me an email. introducing me to a new W3C member, Cambridge Semantics. located not far from me. So I sent them some data. I sent URL to Diane and Dave. a couple data sets. that even within SEC, SemWeb is a good way to combine data.
11:15 "The agility of Semantic Web tools for XBRL and other financial data", Michael Cataldo, Cambridge Semantics
Speaker: Mike Cataldo, CEO Cambridge Semantics. Reason why I asked Walter up here, he won't ask me any questions! [laughs]. Start off with an overview of how we look at the world of Semantics. then go into the example we looked at with Walter, SEC. We think semantics is a whole new paradigm. We think semantic Web is Semantic Enterprise. What I see as a CEO is the cost to buy or build information systems. what you wish for. Relationship, more you spend to what you wish for. May not get custom because it's too expensive. But game-changing technology is something you can access. Like PCs. then the Web. We think semantics is in the same space. allows us to offer advantages to those who come first. revenue increase, cost containment, risk mitigation. We met with a large financial institution president. And he said, 'I asked our crew about our exposure with Lehman Bros. and they didn't know'. Yesterday's game changing tech is today's mainstream tech. So what's my exposure to whatever?. Pre PCs, took a long time to answer. now look at traditional data structures. Semantic technologies allows everybody to play very quickly. users only need to describe it. then you can get it. allows integration to happen at the desktop. real time insight into the data; we'll show some examples. reduced time and more play. XBRL meets Semantic Web standards. So what do I mean by that?. Compliance? No. Lee thougth it might be Internet dating? No [laughs]. XBRL takes what was unstructured and adds structured data. Semantic standards add meaning to that data. Now it becomes very consumable, usable, and agile. Bring Walter back in. This is your slide. This is an XBRL schema.
Walter: We have 437 live filings. mostly 10Qs and 10Ks. with that complexity with trees. it boils down into. 9 tables. this database on SEC Web site. has this structure. So what I said yesterday was take a slice of it, not boil ocean. They are looking at facts. public float value. small sub-source of data. I encourage you to play around with that data. The meaning is obvious; it's a Microsoft Access diagram.
Mike: Yes, the meaning idea is obvious to the technies, but not non-techies. So this is structure, but need to add meaning. Add tools so that non-techies can use that information. and combine data with other data. with live use. and share data with someone else. drill back to see source. So here is a scenario. Identify cases of naked short selling. by semantically linking with faile-to-deliver reports. So here is the source data from Walter, SEC. We dropped it into an Excel spreadsheet. Plug-in called Anzo for Excel. allows spreadsheet data to be linked to SemWeb. what happens here. Ontology is created from the spreadsheet data. We don't want to create one giant ontology in advance. create one focused one to connect to others later. What's in this box. is a fails-to-deliver ontology. Idea is to do simple things like link contents of spreadsheet to concepts in the ontology. See highlighted her. is the fails data. This data matches this concept. and has in common this with other data. the concepts match. part of semantic fabric and available. Another product we use is Anzo on the Web. Here is how it works.
You have the ontology exposed. Go through and choose the data. Pull data into a view. can sort it. reformat it. and filter or drill down; term is faceted browsing; we're adding a facet. Narrow down by industry. filters show on left side. Petroleum drilling list. from same source data. and look at that any way we want. Look at if it fails to deliver as percent of market cap. Pull that in, create a scatter plug view. Something is wrong. So question is, what happened. Go back to source data. This process takes about one hour. now we go back and look at source of bad data. Value of these certain companies. expressed in billions, and it's a percentage. Go back and look at graphed data which tells a story. Shows a high increase of fails-to-deliver in a time frame. which coincides with the mortgage crisis. So merging XBRL with Semantic standards. we can combine data from various sources. learn more from it, combine it, share it.
A couple more views. Then move onto Recovery.gov stuff. Same data arranged by company. narrow down to banks. XBRL Meets Semantic Standards; Love at First Site? Great Match. So Recovery.gov. If we want to answer the question: Is there a correlation between. Recovery.gov has spread sheet format about how dollars were spent. and Census.gov has spreadsheet. So now three different sources in a spreadsheet. So we answer the question of job creation by state graphically on map. shown by state. or create other views of same data. Process is pretty state forward. Link source data to semantic fabric. create ontology, go through process. pull data into visualization tools. In this case, multiple data sets. add filters. narrow down. pretty straight forward. When you can combine structure with meaning. and apply semantic standards, it brings our ability to understand data to a whole new level.
Eric Cohen, XBRL GL.
Mike, you said something different from what I say. You took a jump I did not catch. XML is about structure and Semantic adds meaning. XBRL describes and brings a meaning. how to structure things. using linked bases, this is where you can get meaning. and how to relate in different way. what I didn't see Semantic Web in there. Saw tools with bells and whistles. seeing XBRL tools. data from Walter. but I didn't see the rich metadata and taxonomies in XBRL. I missed where OWL, RDF, added to XBRL. and addd the meaning which you say is structure.
Mike: As far as not seeing OWl. our approch is to keep that away from the user. We want a non-technical user to sit underneath the covers.
Eric: Can we peel off and see the inside? I have seen demos without semWeb. make sure I can see it.
Mke: Lee can give you a much better level of meaning.
LeeFeigenbaum: I think it's a good question but hard to do justice to it. I am not an XBRL expert. XBRL has done a lot of work to overcome lack of meaning in XML. You have added lots of levels of indirections. to capture properties in XBRL. When I look at that from outside, it's something specific to XBRL world. by using W3C semweb standards, I have a flexible set of tools. to link in other data. fails data. doesn't have consistent filing with companies. This exmaple is a tool box to let end users do things on the fly. One set of data uses CIK, others use CUSIp. We are linking on company name. that's a partial answer to your very big question.
Ben Hu, DAMA.
For enterprises. you mentioned don't worry about knowledge base or domain, come later. Do you find semantic is incompatable. it's quick thing and get it done.
How do you manage that?. It's not always semantically re-engineerable. It's not easy.
Mike: No, we have not run into that. some different schools of thought. We want to provide tools for enterprises to derive quickly. Choice to implement quickly and bring together later. or define everything up front. so our architecture brings together smaller ontologies first.
Diane: I love the boil the ocean analogy. appreciate the demo today.
On IRC, BenjaminGrosof: wrt Eric Cohen's about how Semantic Web helps introduce meaning, there are two basic complementing aspects. The first is expressive and rigorous logical knowledge that precisely defines the intention of assertions in terms of what conclusions they sanction. The second is (hyper)textual documentation of a human-readable kind, e.g., for primitive/leaf concept.
11:45 "Opportunities for Semantic Web knowledge representation to help XBRL", Benjamin Grosof, Vulcan Inc. [remote]
Diane: Next, we'll bring up Ben Grosof.
Speaker: Ben Grosof, Vulcan, Inc.. I will be speaking about opps for semweb and knowledge representation for XBRL. I work for company of Paul Allen. Worked on knowledge representation. also advise venture capital. Have been working on XBRL since 2000 at MIT. From 2004-2007 I was a scientific advisor to XBRL. and also helped connect XBRL to W3C. Feel like this workshop is a bit like my baby, too and delighted to see it happen. I have worked a lot on semantic rules. Some industry standards on rules. Rule Interchange Format and Rule Profile. and value around SemWeb. some ideas in Oracle and IBM products. I will be speaking about use of knowledge representation techniques. in XBRL and realm of apps that XBRL addresses. First talk about overall relationships.
History of Parallax. XBRL and SemWeb evolved separately in parallel. communities non-overlapping but increasing synergy. Large opps for synergy to leverage and share technical approaches and application domain. Overall three axes. Better expressiveness and understanding of expressiveness. Techniques for interoperability. what you can do and what will be tough. and third, Performance optimization techniques. Dave Raggett's talk. and Mike Cataldo and Lee Feignenbaum's talks address the knowledge representation issue. RDF is better sharing than plain XML; a complement not a substitute. think of it as a layer on top. having directed graphs, tree, traversal is what you care about. rather than order traversal. all good things. Focus on semantic rules. better for sharing than business rules. One important qualitative point. They can handle exception, change in update and formulation relatively gracefully. Talk about default, higher order. I'll walk through some examples.
Another important think with KR. is sophistication and knowledge acquisition. targets business users. area of UI and authoring. of taxonomy, mapping relationships, taxonomy extensions. Queries used; techniques such as rascal. Semantic wikis. Semantic media wiki plus run by Vulcan. Slides and position paper are on my Web site. PDF and PPT have links you can click. Via knowledge interchange is important. being able to do automated translation between KR formats. support business rules systems. semweb community has put energy into. XBRL not as much in systems rules area. existing compliance systems. are all in business rules systems. draw on SemWeb community there will be an advantage for XBRL. A third area where SemWeb can offer to XBRL. ontologies and knowledge bases. so if working in Oil andGas, or HCLS. Fourth is Virality. SemWeb has penetration in ecommerce, health, media, social networking, marketing. has an intersection with the XBRL ecosystem community.
In turn, SemWeb folks should collaborate with XBRL folks. Financial reporting. XBRL is in every aspect of business, gov't and non-profitl. it's quite practical, good platform. for adoption. XBRL offers a technical challenge to firm up connections to XML Schemas. some connection there but could be made better. Next drill down into Semantic uses for XBRL.
Semantic rules good for mapping. define an extension, map one to another. this is fairly demanding from an expressiveness viewpoint. Let me give examples. someone gives info without pricing. or shipping. When you look at e to e ratios.
in financial ratios. not just what fiscal year is what Ashu mentioned. but last four quarters of reporting. and the next quarter. with or without appreciation. what changes under accounting rules. revenue for a sub-category. See an example shortly. More generally, the issue of mapping between different sources and consumers of financial info. each has its own analytic or pro forma view. different jurisdictions. another issue, big one, dealing with exceptions and one-time events. A few years ago I worked as a financial analyst for a Wall Street firm. they published a quarterly newsletter; I prepared ratios. this is pre-Web. the footnote were all real action. Michelle's slides yesterday said it was all about that. An example from those days. was sale of mid-town NYC building. Oh, so that's why they make it look reasonable even though they lost money on loans.
Another important use of semnatic rules for XBRL. XBRL has infor to be integrated with call rules, regulation, laws. revolve around trust, privacy, security, access control, authorization. authorization to see an account, compliance, governance and other operations. also rules in analytics and monitoring. may be contextualized. triggered actions based on monitoring. SPARQL and XQuery can provide. ways for querrying. decisions and triggered actions.
Example of an exception in an ontology translation. Let's say a company is reporting. where categorization includes price of small company acquired for its intellectual property. This is common in IT or biotech. but may want to exclude certain info. Get the feeling. the first is the rule. Normally bring it over. Next rule says, what counts for acquisition, does not count. So we give that another label, acquisition for not operating. Then we see a rule which is an exception. Fourth rule counts as an acquisiiton. Then we have other information. about R&D salaries. and acquisition amounts. and what it includes. what counts or does not count toward an R&D operating costs. map from the pro forma. which may use a taxonomy extention from one to the other.
SemWeb Rules. user integration, business partner in M&A setting. before absorbed. Familiary and training. standardized rules better than a one-off system. Easier to modify. better quality and transparency. open literature. Provable guarantees helps with governance. Reducing vendor lock-in. and expressive power. and various kinds of reformulations.
Diane: Five minutes.
Ben: Background for KR for meaning. slide 16. declarative logic. basis for almost all structured knowledge management. databases, semantic rule standards, ontologies.
Semantic Web standards "stack". what's new is logic framework. SIlK is a large research program. brings key features you need for ontology mapping. closely integrated actions, higher order for reformulations. there is a lot more detail available through links in dec. and the position paper. slide 20 wrap up. What's coming with Semantic Rules. They will increase in adoption. and in expressiveness. by W3C extension for standards. plan to propose a new dialect. under framework. similar to SILK. KR challenges. What can't we do?. numerical reason integrated with symbolic reasoning needs more research. specifics of money, time and date. convenience around contextual reformulation for ontology mapping. OWL and RDF are weak. For you adn the community to do's. Learn about SEmWeb and KR if you are XBRL person. think about strengths and weaknesses. plan ahead about design choices. More info on Web page. Soon more detailed tutorial presentation on SemWeb Rules. I'll be giving at ISWC in Chantilly, VA later this month. Thanks for your attention.
Diane: Thank you for the depth and overview of bringing the two communities together. there is a higher order of expressivenes down path of SemWEb tech. but still needs to be collaboration of domain expertise in XBRL taxonomy. Where do you see that collaboration happening. Still requires a lot of domain expertise.
Ben: I think you put your finger on something important. XBRL community is similar to other industry consortia. that have developed detailed taxonomies in detail in XML or OWL. things like ACCORd. those best owned by industry sector, in this case XBRL II and other financial industry. What's critical is bring SemWeb rules people to engage in that. Would help if governmental and private sector would bring resources to that. A lot of this is development and standards work. I am encouraged by success of XBRL in mobilizing SEC and adopting standards. Given financial melt-down and its impact, important to see gov't resources dedicated. Like tens of millions, not single digit millions. per year over the next year or two. Ramp up to hundreds of millions. Just in US alone. Crticial issue is how to make the case to whom. Look at Recovery.gov and amount of money Treasury is putting into things. a percent should be going into oversight.
Diane: I think there is a huge consenus in this room on that point. All the time. People are hungry looking. Thank you for your time.
Diane: Half hour break. We are running behind. Over lunch we'll talk about break-out topics.
12:45 "Challenges for combining different sources of financial data", Edward Curry, DERI
Ben Manning, Intuit: You have a diagram of funnel.
of data for users. This approach seems to require anticipation of what users need. What about something more reactive to feedback. Something people bring up and retain. Not just rely on queries that are a one-way flow. People could rate it and there would be preservation of the knowledge.
Ed Curry: Yes, queries are performed by the users.
and do take into account the presentation of results.
Diane: Thank you Edward. Next from DERI is Sean O'Riain.
1:15 "Social Semantics and Linked Data for improving access to financial data on the Web", Sean O'Riain, DERI
Speaker: Sean O'Riain. I am presenting on behalf of Alexandra Passant. Look beyond boundaries. not just financial facts. Ed was speaking about the heavier end of our work. This is the lighter end. We looked at Financial LInked Data. and Social Linked Data. People were talking about toxic assets. would be good to link trend with facts. make information more transparent and discoverable. If you can link data you can traverse it. Linked data to enable transparent discussions. Four key principles for linked data. Wikipedia also has a good primer on linked data and Tim Berners'Lee's TED talk 2009. Four principles. URIs as names for things. USe HTTP URIs to look up those names. Get back other info and link to A, B, C; get to a linked data graph. Essential thing is you can jump across silos. In terms of technologies. for enabling linked financial data. You need to represent the data you are looking at. Ontologies will do that. Key point is that the ontology tells you about the semantics. so you can combine statements. Different ways to export and wrap RDF data. Three ontologies. community, people, linked data definitions. pull information, people, and conversations together. This morning Dave showed you the Linked Open Data Cloud. you just link to the data. BBC is a good example. they use dbpedia as their source. to present information on their Web site. Get information on geonames. Go to geonames to get geo info; get topics; dip in and link information. This allows an advanced querying capability. A large linked data cloud. SPARQL is the W3C standard for querying the RDF graph. It has some limitations, but looking to extend RDF model to include. a [?] aspect. See comparisons among the diverse data.
Social Web. main driver behind Web 2.0. collaborations: blogs, wikis, tweets. Discussions around topics of interest. No easy movement across wikis and blogs. big problem. opportunity for people to comment on this data. Begin to comment on financial data. Apply linked data principles to that discussion forum and make data machine readable. Would provide move to greater insights of financial discussions. part of the ecosystem if you like. Two vocabularies. One is FOAF, friend of a friend. allows you to aggregate discussion in forum or blog. very lightweight. easy to integrate, community to support, APIs exist. Here is the ontology. Contextualize and publish a conversation. get a greater level of analytics; but can still aggregate the data. An example that we did, a mash-up. A search engine Sig.ma. It aggregates people's data. uses a widget. Can see who is talking about the same topic. Can see across different discussion groups, bulletin boards. Actually doing a variation of this for [?]. Social semantics and financial linked data. Look at linking the agencies and grants. Look at bottom. see a post about a grant, in relation to an agency. See who is saying what about a topic. If you have a library of ontologies, you can use own, or extend it. Data transparency. Identifying authors. is an employee talking? Are they talking about mergers andacquisitions. You see that first level of trust. Begins to get into provinance discussion. hopping across communities. Idea here is, can you come up with a hot topic before it happens.
Is there something hot that we should pay attention to. Looking at community effects. Conversation of topics. Looking at non-technical users; light-weight. Use URI identifiers. We will look at privacy and trust issues. ALl those questions about where it came from, do I trust where it came from. for what reason is information being provided. Linked data; how, what should we link. Start at ground level slowly. and layer on. get the aggregated effect. Let's you do that light weight, then add on the analytics. Get our governments to share public data. could link more data; need an ontology.
Diane: Thanks very much. I am interested in the noise on the Web effect. At Harvard Law Lab there was a discussion about crowd sourcing to get feedback. Seems that social ontologies. would be good way to crowd source if there is noise.
Sean: If you can link you can aggregate. even mediocre analytics. can give you a big bang for the buck.
Diane: Identification and citation of using URIs in the post.
Sean: Yes, you can link back.
Walter: If URIs are implicit in the Web. is there a database of URIs worth looking at?.
Walter: It's implicit in the structure.
Sean: Use a URI at any point.
Walter: Can I make predicates that decompose the URI?. I could build a database that references relationships.
MateusCruz: I will present on behalf of Paulo..
Walter: You are really pointing out the Web object. it's not really like a database.
Walter: here is the revenue number for 1996, are you really talking about that particular number, as for all you know there may have been an amendment later on. everything in this column says it's about revenue. May want to talk about this in trust and privacy. not everything about company on Web is reliable. I'm done.
Patrick Slattery, Deloitte: Thanks for your presentation. Have you considered some form of tiered participation to manage noise. And reason I ask. Beyond commentary, have you looked at ways to share models?.
With financial side, needs to be trustworthy.
Ashu: I would like add to Walter's comments. What I believe SemWeb brings beyond identification of URIs. if we look into databases. The result with Google could be 10K matches. the question has 10K documents. so SemWeb lets you refine your query further. to an advanced search. That is rules uniqueness.
Diane: Thank you very much Sean, very thought provoking.
1:45 "Multidimensional queries on financial data", Matheus H. Silqueira on behalf of Paulo Caetano da Silva, Central Bank of Brazil [remote]
Diane: Mateus could you introduce yourself.
First apologize for Paolo Caetano da Silva. who could not present today.
Mateus explains XML technologies related to project:. XLink.
XLink Based Data Metamodel. First we did mathematic formulization of the data model. We had some changes, additions based on XBRL changes.
Summary of the data used by XPath+. extension based on XPath. allows the query in a document which links are used to represent information. Useful in XBRL. widely used. You can navigate in linked space.
We talk about query statements of proposed language:. LMDQL. multidimensional queries. First one where we can specify a variable for query. and a different way to represent. under assets. left part of slide we have government, bank, and private bank. two ways to bridge the bank node.
LMDQL Operators. one of main operators is OperatorDefinition. create in real time. can be stored in a library so it can be retrieved for later use. and for horizontal analysis. When they have, for example.
Diane: You have to wrap up. analysis operators are based on ?.
Scribenick having difficulty hearing.
LMDQL Implementation Aspects.
Diane: Thank you Mateus for stepping in for Paolo. We are going to switch order and do break-out sessions before the last panel.
2:15 "The MUSING Approach for Combining XBRL and Semantic Web Data", Christian Leibold, Semantic Technology Institute, Austria
I am happy to speak for a lot of people. The Musing project. or Initiative. a project funded by EC. Semantic Intelligence. I'd like to talk about our motivation, how and why we use ontologies. We integrated XBRL. and tell how we accessed the information. and some conclusions.
The Musings initiative. showcase in three vertical domains. Finance sector, credit risk. Internationalization, risk evolve from business in int'l context. Operational risk management and education tools. Focus on IT intensive organizations. Many people in the project. Christian names participants.
Organizational structure. Semantic based approach. not just about integrating data but about integrating the knowledge.
slide: Linguistic Structuring.
a German example. says a new member of the board was appointed on March 7. there is a predicate. We can do that in a couple languages: German, French, English, Italian and Dutch. Example from Belgian national bank. Semantic specification drawn from anotated funds. Information fusion. Combining XBRL structural recap. Deriving ontologies from XBRL structures. Linkage between a taxonomy and rest of our ontology family. points to context and to an item. to whole context or to a single item. with all the related information. we can point to single item or the whole instance. Context concept and domain ontologies. Musing knowledge base is a legal entity.
Example of data associated with XBRL. Each box is one ontology, one problem area. generic and upper level information. more specific info farther up you go.
Use case from several work streams. credit risk. regional indicators from internationalization. This ontology structure was used as a schema for a repository. Do a full closure and dump into a data base. Use SPARQL queries to select an update.
Diane: Save time for questions; go ahead and conclude.
Christian: To conclude. Opportunities for continued work. We did some annotation in context of business rules. We integrated XBRL to a knowledge base. Happy about that. Thanks for your attention.
Diane: Thank you for traveling to be here.
The depth and breadth of projects. are impressive. Like to collaborate with you to cross-pollinate. the Deutsche GAAP.
Christian: We have not looked into US GAAP; focus was EU. Israeli XBRL may be similar; we have some experience with that. Come to point where we have reached sufficient quality.
Diane: Other questions? Thank you for presentation.
Dave: We hope to hear from Roberto in Spain.
2:45 "Ontological challenges for financial information: some lessons from the Rhizomik initiative", Roberto García, Universitat de Lleida, Spain [remote]
Diane: we are putting up slides; can you introduce yourself and topic.
Diane: Thanks, Roberto.
Speaker: Let me introduce myself.
(the phone sound quality deteriorated badly at this point)
On IRC, BenjaminGrosof: Diane and Dave: If you'd like me or other remote people to join in a break out session, please let me/us know how. Otherwise, I'll assume I won't. Thanks.
Diane: ten minute coffee break. then do the break-outs. 45 minutes each. moderator and a spokesperson. and a short presentation.
Break out sessions
Diane: Welcome back from the break. Here are some potential break-out topics.
1. Solving the Entity Identity Mismatch.
Show of hands on this topic?. [8 hands].
2. Financial EcoSystem.
We have a wonderful reporting supply chain diagram. Now that we have talked about silos. and the XML-enhanced silos. to show to outside world. what about diagraming that. [3 hands].
3. Dealing with text in financial information. processing that text. we heard about inference. hand on finding text. [1 hand].
4. Collaborating and sharing ontologies. DERI, Musings, Rhizome. [1 hand].
5. Resourcing the collaboration. [3 hands].
Diane: So maybe combine five and six.
6. Potential pilot projects. looking for resources and funding. [four hands].
Diane: I hear entity identity and resourcing. Other topics?
Dave: There are two rooms with tables and chairs, and space in foyer areas.
Diane: Let's use the two break-out rooms. take a flip chart board.
Appoint a facilitator and a spokesperson to report findings. Next steps and key issues addressed. Anyone else left. Who's left, I'll work on the ecosystem diagram. stay in this room.
Be back here at 4:20pm and ready to present your findings.
Get your coffee!.
On IRC, BenjaminGrosof: I'll be off for about an hour for another meeting, will try to rejoin for the final discussion after the breakouts..
On IRC, GeoffShuetrim: Have the breakout sessions ended yet?.
Ending in 5 minutes.
almost ready to present.
We come back from the break out sessions.
There were 3 groups: a) financial entity identity management, b) resourcing the collaboration and potential pilot projects, and c) diagramming an interconnected financial ecosystem..
Diane reports on the ecosystem group.
Ecosystem break-out group
We didn't get the diagram done in the 45 minutes, but we did get some useful insights, e.g. producers and consumers. We looked at what we could tackle, e.g. the concept of findability on the Web. By making the data easier to find, more people will come. What kinds of content are out there, e.g. CSV, Excel spreadsheets, XML files. We should encourage meta enhanced content e.g. XML and RDF rather than CSV and Excel, although those can be good with supplemental metadata, provided this can be located by search engines. The other piece of the puzzle is to work with the search engine vendors, and to help them to support financial data better.
We would like to work with search engines to enable them to exploit richer metadata. There are also the considerations of what kinds of queries that people, and how they can be handled efficiently. Some issues we discussed include:
- authentic relevant source to queries
- repeatability, e.g. same results 20 mins later
- concerns around privacy
- entity naming issues
We would like to continue this discussion especially around findability, and wonder what would be the appropriate forum for that.
Entity Naming break-out group
Cate reports on the financial entity identity management group. [20 mins into recording]
(we have flip charts and a typed up summary).
Thanks for everyone who participated, we had quite a lively group. In the financial services space there are many naming schemes for financial entities. This can create problems for mashing up data from different sources and creating a shared view. An open source system for naming would greatly help. This could cover the relationships between entities and the hierarchy of subentities within a given entity.
We quickly went through the systems that are out there:
- CUSIP and DTTC
- ISIN (Europe)
- SEC CIK
- AVOX (Deutsche Bourse, open source wiki solution)
- IRS system
Please advise us of other relevant systems. We quickly brainstormed about these and desirable characterstics of naming systems. The semantic possibilities and all the information out there that you could bring together. It was mentioned that the States already register companies, and Delaware has an extensive database that are reliable and do register at the source the corporate entity. Unfortunately that is a proprietary dataset and unlikely that we can convince States into making this into the public domain.
The DNS model is one that is familiar, and one that the technical participants noted some problems with. The wiki model as used by AVOX has some possibilities. Then we started getting into the hard core of what opportunities there are. The reality is that we already have major naming systems such as that held by the IRS. How do we could we merge these into the public domain, and that's where the discussion got really interesting. The most promising approach is the merged agency model that would be a harmonization process among the regulators and government agencies that oversee financial institutions. Everyone maintains their names, but expose enough information to allow for comparision of names across systems.
The important of this dialog is being able to merge datasets. But it also allows us to index companies across this data, which would be of incalculable value.
Linda's notes on naming entities: We talked about the different existing models, both public an private. We also talked about the attributes of good namimg systems. There would need to be some sort of governance or responsible party who would have something to say about what is reasonable. There would have to be properties like persistence over time, it needs to be international, companies could identify themselves, and depending on where the information came from you would have varying degrees of confidence, presumably if it came from regulators, you would have higher confidence than if it came from a Wikipedia type of forum. There would need to be a confidence level indicator.
There is a great desired to capture the hierarchical organization of companies, and the details of ownership. A myriad of attribute type information. The name the institution, where it is located, what kind of a business it is. The scheme needs to be extensible, so that companies could add a bit more information about itself.
Someone notes that while the Delaware database is closed New Jersey is operating an public naming service for businesses. You can get to it on the business gateway from the Department of Revenue. It includes the name of the company and the owner.
Resourcing the collaboration and potential pilot projects
Karen reports back on the resourcing break-out group: Since we are going last we had the luxury of throwing together a couple of slides. Our group looked at resourcing and pilot projects. We took a broad high level look, and focused on the benefits of bringing the appropriation process and the reporting process together at Federal, State and Local level.
We want to show that this has real value and that it isn't just smoke and mirrors. We don't list XBRL and the Semantic Web which at this level are enablers. We talked about approaches, and feel that the state library communities have a rich experience in working with taxonomies that could be usefully brought to bear.
Tim McNamar notes that we do have a system in the US Senate on XML for draft bills. It wouldn't be a big stretch to add the proposed appropriations to that for tracking purposes. We have a nascent project to do two things. We have an agency in the Department of Defense that wants to do that and wants to track the appropriations right out to the recipients and to be able to feed the results back to Congress. What do you do to learn the lessons and roll it out to the entire government. We have come up with the idea of raising some private money to support that as a public/private partnership. I guess that we will be able to set up a foundation to manage this within a few months.
Tim notes that only a small part of the government knows about XML, so the challenge will be to reach out to good success stories and educate people how to replicate that. We want to look at Medicare and Medicaid, and that will necessitate state taxonomies. He would like to get help with making all of this fly and encouraging greater transparency of the use of goverment funds.
Diane wraps up with a summary of next steps and thanks for the organizers and the FDIC for hosting.
Diane: So how do we continue the conversation with various communities, and how we get that funded?
How does W3C work? Chartering of Working Groups or Interest Groups. These involve some staff support. This isn't the case for Incubator Groups which offer the lowest costs. So the Identity Group could possibly come together as W3C Incubator Group.
Diane: so if we could create an Incubator Group on this topic, for the financial web and ecosystem. Karen, can we keep mailing list open, and do some mailing out of findings and reports. and maybe some wikis.
Karen: Process is we need three member companies to start an Incubator Group. A mailing list would be really easy to setup, and we could also provide a wiki to go along with that.
Diane: a mailing list would be a great start, and we need to keep this at an international level, and find a way to fund the conversation.
Tim: In break-out session, I cannot raise money for an international activity at this time. until we have some demonstrations. I cannot sell that now.
Diane: End of day, would you like to share comments. glaring holes.
Dazza Greenwood, eCitizens Foundation.
We have been talking about financial accounting for business. broader context for bigger business benefits. What you get if we do this right is transformation of business itself.
Dave: We will be taking recordings. and putting together more detailed minutes using the recordings and the IRC log, as well as the reports from the break-out sessions. We will get that back to you. And set up a mailing list. For an Incubator Group you need a scoped charter and we can set up discussions to facilitate that.
Diane: There may be other identity folks. to incorporate their requirements into other groups. Thanks again to XBRL International, W3C and FDIC for hosting. Look forward to the next time.
On IRC, GeoffShuetrim: Thanks again for supporting my remote attendance. It has been very informative.