W3C

Schema.org Panel Minutes

Minutes of the public panel at the Semantic Technology & Business Conference, San Francisco, on 6 June 2012

Panelists:
Dan Brickley (schema.org), R.V. Guha (Google), Steve Macbeth (Microsoft), Peter Mika (Yahoo!), Jeff Preston (Disney Interactive), Evan Sandhaus (The New York Times), Alexander Shubin (Yandex)
Moderator:
Ivan Herman, W3C
Scribe
Karen Myers, W3C


Ivan: Before w start the real thing
... just realize that there is a video recording of this session
... know that you are being recorded.
... Welcome to schema.org panel
... some of you may remember that we had a BOF one year ago when Schema.org was announced
... it was an interesting panel
... Good to come back and see what has happened and what will happen
... I would like to do a round of self introductions
... My name is Ivan Herman, W3C Semantic Web Activity Lead
... Guha, Google, I started this whole madness
... My name is Alex from Yandex in Moscow
... role is to use SW technologies to improve search and verticals
... Steve Macbeth work for Bing's platform team
... improve search
... Dan Brickley, work on schema.org
... Evan Sandhouse, NYTimes lead architect and work on schema.org and news properties
... Jeff Preston, Disney Interactive games, web sites
... Peter Mika, Yahoo, and work on schema.org
... I will begin this session by asking some questions, but involve you in the discussion as much as possible
... Only a few questions for the start
... I said mainly focus on the future, but let's see what has been achieved in the past year
... Dan or Guha can you have some feeling for what has been achieved
... what is the acceptance, presence

Guha: We are happy at the adoption, openness
... we see many hundreds of thousands of sites using schema.org mark-up a year ago if you asked if we would approach these numbers, I would have laughed
... adoption rate is huge
... and a year ago, people thought that the major search engines thought we were going to hi-jack the Semantic Web
... we have worked with W3C to work on new formats coming out
... formats are taking place in open way
... and have more intense scrutiny
... Over summer
... lots of activity taking place, including use of RDFa in schema.org
... and use of Good Relations has come to fruition
... we are way happier than we thought we would be

Steve Macbeth:Degree to which people inside company looking at Schema.org looking at interoperability. Windows 8 was supporting schema.org as way to share apps
... looking at this as a way to share data
... and help accelerate this
... as more than a search engine but also an entry point
... take a look at work of publishers
... our view on adoption is about 7% of pages we crawl have mark-up
... a year ago when we started it is pretty amazing adoption
... vibrancy around extensions and building up is great to see

Peter Mika: We are very excited about the adoption; there have been two studies at www2012 workshop
... some statistics
... analyzing microdata, rdfa and schema.org adoption; please check out these studies

DanBri: There is a class for comment
... we focused on cleaning up design mistakes
... external references is great example of Guha changing his mind in public
... good that we listened
... most of energy that there is a public list
... boring internal list
... when a product manager has a great idea for a schema I send him to the public list
... because they will get better feedback, not just because it's the morally right thing to do

Ivan: I put on my official hat
... we had this discussion about microdata and RDFa
... just one statement
... that RDFa1.1 we expect to publish tomorrow
... we had lots of interaction with schema.org people

DanBri: As a standard, a recommendation
... let's say thank you

[applause]

scribe: please stand up if you were part of this work

[2-3 people stand]

[applause]

DanBri: It's better than microdata
... it's a synthesis, it's the best thing out there
... Schema.org does not consume anything
... eventually they will consume it
... Microdata is out there
... but we care about the schema, the vocabulary
... there is a great conversation with MS guys about microdata
... and with Norm Walsh
... the conversation and details of syntax has moved on

Ivan: I move to the other side
... the people who consume it and the people who produce it
... maybe Jeff you can start

Jeff, Disney: What I think we gained the most from schema mark-up:we used to have a problem with videos and games
... lots of games that kids play and watch
... Phineas and Ferb videos and games
... often got mixed up in the search engine pages
... wanted video but ended on game site
... once we started wrapping videos with schema.org, we noticed that we saw it less
... better guest experience
... because they landed on what they were searching for
... improved user experience as we provide better mark-up

Evan: I would like to pivot slightly about implementing schema.org
... AT NYT we were active on int'l news council
... worked with rnews to align with schema.org
... we were all very gratified at the degree of openness to add the properties
... the ITC felt satisfied to have structured data for news stories
... NYT properties
... we print in multiple venues
... schema.org did not have alternative headline
... people search for the headline
... but in different geo regions we use different headlines
... so by adding alternative headline, this gives ecosystem better data and help our print readers find what they are looking for
... NYT has a five-step for implementation of schema.org
... first phase started in Jan.
... article pages at NYTimes.com have schema.org markup that aligns with ITC news
... can see mark-up in the pages
... We are about to roll up rnews/schema.org
... One thing that is a big benefit that accrues to publishers
... that is not as much on radar here
... if a vendor wants to integrate with you, you need to give them access to a data tier
... schema.org gives vendors new way to integrate with content
... without popping the hood
... newsright is now integrating with NYTimes
... and as new opportunities come up, it will be a good way to integrate without creating burden for NYTimes engineering staff

Guha: We did a project with US Veterans affairs
... there will be many veterans returning who will be looking for jobs
... people would markup job postings as to whether it was veteran-friendly
... the DOD dept created a search engine
... that searches subset of web that have markup that says these jobs are veteran friendly
... this was unthinkable five years ago
... we say this is how you can say your job is veteran friendly
... and point them to the search engine
... it was built by a few people over a few days
... so not just make existing search results better, but create a new class of search applications on the web

Ivan: My next question is what exactly the search engines do with that data
... I understand if you do not answer all the details
... start with Peter

Peter: so I would give a generic response.
... the best I can do here
... the way people should think about schema.org
... in 1997 search engines came together to create sitemaps to describe structure of your web site
... to me schema.org is an extension
... a semantic way to tell us what your site is all about
... search engine needs to understand your site and direct users there
... that is the key thing we are trying to make happen

Steve Macbeth: I think that the investment we have made over last yearis moving away from where search was
... vertically siloed place
... work others have been doing
... to bring structured data closer together
... move away from highly verticalized web
... to larger corpus
... research on how to build a search engine in a new way
... not a lot of surface features leveraging schema.org
... it is an important initiative to us
... better relevance, richer features
... once we figure out how they work, the features will be scaled
... do in a limited way
... cost of scaling that feature is very high
... I view schema.org as way of scaling structured data
... and experiment in one domain
... get something working and you can be confident that that approach will scale easily
... how to bring the structured data world and web corpus together
... big release in early May are early signs of that
... we introduced a new three-pane approach
... social, structured data
... more visual representation from schema.org mark-up
... and we can leverage to do better job

Alexander Shubin: two directions: rich snippets or enhanced results
... on search engine page
... you can get snippets
... another direction is to improve entity ?
... for video search
... get entities from web
... check quality of information
... if you do not get these entities
... we are improving that
... for now we have about 15 different programs for schema.org mark-up
... these two directions
... We have ideas for how to improve indexing and pages
... main problem here
... we can say chicken and egg
... we want to use it
... we can ues it only when sites use schema.org mark-up
... I think that is the main directions of using it

Guha: there was a start-up convention
... how does the starter work
... Gene ? was asked this
... he said very well

[laugh]

... A few weeks ago we announced knowledge graphs, gives you more structured data on the right-hand site
... WSJ is making investment in structured data
... wikipedia, feeds, all this mark-up on the pages
... not a single page where this comes
... or targeting a specific features
... but structured data is permeating indexing and the way we think about it

Ivan: My last question, before audience Q&A
... how do you see the process to evolve in future

DanBri: we are making this up as we go along
... no one has tried to scale up this much before
... spent my life around web standards
... cannot improve schemas in one room of people around volcanoes and hair dressers
... We are doing many more new areas
... improve tech documentation, tv and audio, a sample of next package
... after that, activity streams and sports
... goes to long tail quickly
... only way to scale is by being modular
... rnews is great example
... came to us with something fully formed
... and we did some integration
... same with sports
... always room for more integration
... already have sports center
... if we have another hierarchy to cover sports in detail
... maybe too complicated if you go path of karate clubs
... we are hosted at W3C, in effect like a W3C Community Group
... perhaps the domains could become WGs
... we also have the option to use W3C's teleconference facilities

Evan: The schema.org sponsors were open to IPTC concerns
... if there were two properties with same name
... our willingness to be flexible
... encourage you to be flexible
... end result is going to better than what any one group can come up on their own

Guha: I would like to thank rnews for being the guinea pigs
... in the health vocabulary area
... we have NCBI
... and National Library of Medicine on board
... who are helping us to craft this vocabulary
... because wide-spread adoption of vocab will help adoption
... we are also doing this in genealogy area
... we expect to partner with organ that know their topics well
... we are the glue that hold this together

DanBri: Guha mentioned the external enumerations
... as a group we are the wrong people to come up with places of worship
... we have a list of six or seven groups
... so external enumerations is our word for integration with these other lists like wikipedia
... connect to external vocabs

Ivan: the question to the audience
... some infrastructure issues again

EricF: please stand, shout and we will repeat the question

Ivan: Questions

Greg Beane, Australia: I am interested in comments about structured data, but I don't know what you are referring to
... I am disappointed with the poor quality of structured data from data.gov.uk
... are you seeing structured data sets
... are you crawling them
... is it available
... can you apply some pressure to get that done

Guha: I have no idea about what is happening in AU and UK
... in US we had discussions with dept of education
... and other gov't initiatives
... we are still working on them
... US gov't has huge sources
... would be good to market that
... hoping that would happen

Ivan: Asked Jim to answer questions

Jim Hendler: we have begun the discussions; which agency is releasing it but not down to level

Ivan: more questions

Mike Uschold: I have not researched schema.org; people use words schema, vocabulary... what is it?

PeterM: we use schema
... Mike was asking the form of this ontology or schema, the richness of this representation
... it is an ontology
... more friendly to the web master community
... it has classes, properties, data types
... all of that is in there
... we publish structured versions in RDFa and in OWL and in microdata

Kendall Clark: I just did a search on schema.org, could not find terms for [labor union, gay, lesbian, terrorist]
... what is in there, is it reflects the point of view of big corporations making it
... not the only point of view on the web
... so the original idea about SW
... anyone can say anything about anything and that is completely gone with this approach
... we should mourn this
... not a question

Guha: Semantic Web is not gone
... you are welcome to use what you want
... we are a bunch of corporations, I understand
... what we did was to document
... you say radical and I have no idea if you are talking about ions or free radicals
... how do you expect to understand what you meant
... so we wanted to create a documentation site
... you are free to use and mix with ours and your vocabularies
... we are not telling the world this and this alone
... merely the service for the web master community
... if you do this, our apps, our programs can provide you with this functionality
... not saying only use this
... try to make it easier
... Look at the rise of adoption
... there is a need for this clarity
... we think about doing more mark-up
... but that's not what web masters wake up thinking
... for that class of people we wanted to make it clear

Peter: some of things things are not in that level of detail
... you are welcome to propose extensions for lesbians on the public mailing list
... we are taking away your ability to make extensions
... We are making sure if you use schema.org vocabulary that we understand what you mean
... understand the standard terms
... you can create extensions through site maps

Alex: We have no goal to do every vocab on schema.org
... some directions
... to look and the search engines are interested
... special vocabs for special staff
... another direction is to provide some mechanism
... to publish this if they want to
... they can use this special vocab through extension mechanisms that you are invited to use
... I think the best idea is to find some if you are interested to find some lesbian or gay vocab and send to the public mailing list
... and we will think about including it
... as you propose

DanBri: to be fair to Kendall's point, there was a lot of monolithic language
... that was unfortunate and we have moved away from that
... search engine sites are not going to focus on lesbians and terrorists
... our hand over to wikipedia community can be in your machine readable data
... if you want to add multiple types
... go crazy
... that is a separate conversation from search engine products

Ivan: other questions?

Paul Guall, developer from NY:  take a more bourgeois view
... I first thought it would be hard for smaller entities to consume the data
... wonder about progress on tools

Evan: one thing I would like to say is that the IPTC
... is talking with Parsley to address community interest to do that
... has been work on two blogging platforms
... plug-in for Wordpress
... Stephan Corlosquet has helped you embed in your Drupal blog
... to help democratize implementation of schema.org

Steve Macbeth: I think investments we make in schema.org is helpful to developers
... and creates incentives for publishers to consume that markup
... I think there is huge value
... we create an incentive that no individual developer can create
... the more markup there is, the more individual developers can consume
... existence comes from that
... it's part of what we give back to the community
... and why we opened up work to the community so that people can have more of a say
... we do get opinions on the mailing list
... we don't look at the nature of the person
... we don't listen to NYTimes more than to an individual; we look at the merits of the idea
... to make it consistent and wholistic

Peter Mika; there has been an improvement, more browsers, parsers
... expect tools to multiple; there are some good tools out there

DanBri: There was a conscious design choice
... easier on consumers with schema.org vs. microformats
... millions of sites is better than a handful of sites

Ivan: questions

Stephan Corlosquet, Drupal developer: there is structured data linter that Greg Kellogg and I developed last year
... you can put in your URL and you can visualize your data
... First question is about the licensing of schema.org
... terms of service
... I cannot relate that to any well known license
... is it going to be in public domain license?

Guha: I believe that structure is CC0
... we did not want to be in a position that at some point
... some troll comes back and says some word is patented
... and you by encourage people are liable for this
... also wanted to make clear to web masters that we would not tell them that they are infringing on our stuff
... we are giving stronger guarantees
... if you are a web master, go ahead and use
... but we cannot guarantee a patent troll in Texas will give you trouble [a few chuckles]

Stéphane Corlosquet: where do you draw the line on what will be part of the schema; where do you cut the long tail of specialized schemas

Guha: let me answer second part first
... we realized we were in long tail with list of countries, cuisines
... and we needed a way to link to different long tails
... which led to enumerations
... in case of Good relations
... we have a healthy debate about what should we do
... whole bunch of things about nutrition
... we don't have a well defined rule to apply

Evan: I want to add as a practitioner and NYtimes rep
... external enumerations, could also be called controlled headings
... schema.org will be mechanism to link our concepts to a web of data
... NYTimes tags everything by hand
... and we have for 160 years
... I am excited that we will be able to link to new entities
... work of last SW decade of work, so I am excited about schema.org

?: Domain ranges; inferencing requirements?

Guha: Very very little
... now there may be certain inferences; we decided to keep it that way
... may of us have long history with knowledge representation
... I feel that SW took a wrong turn when things went to OWL
... took a long time to adopt
... loose about what inferences can be drawn
... will lead and guide usage of this vocabulary

DanBri: You can get inferences out of the class hierarchy
... there is a class that we have tended to ignore
... talking about inferences with moving
... same things as property
... think that is where the fun part for inference lies; when you are talking about the same thing

EricF: no more time

Ivan: thank you everyone