Semantic Web Structured Data Schemas at TPAC -- 23 Sep 2016

<phila> scribe: phila

<scribe> scribeNick: phila

Tour de table

betehess: We've released a lot of schema.org org, planning to release more.
... have a few things we'd like to see in schema.org

danbri: what about W3C infrastructure etc.

betehess: I don't know what it means to operate with W3C in this case.

danbri: It seemed a shame not to get together. We used to have very energetic hacky meetings.
... Over time SWIG stopped meeting at TPAC... fizzled out into a bunch of mailing lists.
... schema.org has had a weird relationship with W3C. We use a W3C mailing list, now a CG
... series of conversations with W3C what it might do in this space.
... schema.org has 400 open issues. I'd really like to fix some of those if we can.
... Maybe given who's here we should talk about process issues.

jtandy: Introduces self.
... I want to surface my data on the web. If it's not in a search engine, it's not really on the Web. In my/geospatial community, people publish through Web services (WFS etc.)
... These aren't indexed. I'd like that data to be accessed.
... There are shortfalls in schema.org
... Also want to understand what users have to do...
... I'm not doing it to create structured data on the web.
... I personally don't believe we're in a situation where machines can automagically infer things found on the Web.
... It's about consistent naming etc.
... I've worked in the WMO to publish a lot of their controlled vocabulary. I've been using SKOS-based registry

-> http://codes.wmo.int WMO Codes Registry

jtandy: I have a weather schema, can we try and unpack the black art of data description, data feeds

<Ralph> PhilA: I'm interested particularly in what W3C should do to be better at [vocab support]

<Ralph> ... in hindsight it was a bad decision for W3C to not get involved in supporting big vocabulary development

<jtandy> [ weather schema open issue: https://github.com/schemaorg/schemaorg/issues/362 ]

<Ralph> ... the REC Track process doesn't suite many vocabs

<Ralph> ... there's lack of understanding of the difference between the specification and the namespace document

<Ralph> ... we had discussions this week on some namespace questions

fsasaki: I've been in multilingual area at W3C. I'm a fellow of DFKI language tech centre
... I want to use schema.org for cross lingual access

<Ralph> PhilA: Doug Schepers has a demo consisting of an SVG document containing structured data

<fsasaki> (and for using structured data that is off the web for generation schema.org on the web)

fsasaki: In weather reports, these are often auto-generated from data that is not on the Web. What's generated is just text. The text is already structured

<Ralph> ... the structured data allows for a screen reader to provide an incredibly rich description of the data in the file

phila: Talks about Doug Sheppers' accessibility SVG demo

Ralph: I'd like to see how the CG can/is working with other bits of W3C. If there are un met needs in terms of process. We'd like to meet those unmet needs.
... I think the schema.org experience can be replicated for others. Tooling, process etc.
... I'd like top understand what we can do.

Francesco: Working in Florence research institute. Want to know more about how schema.org works
... Where I work, about 18 months ago we were thinking of adopting schema.org to integrate all the research data in our various databases.

danbri: Do you find yourself developing schemas?

Francesco: No. We couldn't find a lot of support and couldn't see the benefit of doing this on our site.
... I'm here as a user more than a contributor.

danbri: schema.org grew quickly because for some people there were clear benefits. It may be being used deep within Google as well.
... In the context of W3C, one of our roles ... if you publish it, we think people on your side should be able to use it. If a widget in your browser tells you about what's there you'd see problems.

Francesco: People are interested in schema.org because it's an SEO booster. But what are the restful architecture advantages
... We're still very interested.

Danbri: We've struggled to get people to adopt this for 15 years, we have components...

ericP: How do you optimise the balance between chaos and process to make sure that the output is useful.
... The more process, the more restrictions, but the less you get the more chaos you have.
... There are two different poles and people tend to cluster around one or the other.

fabgandon: I'm a leader of a research team at INRIA, working since 1999. So as part of that we have a stable namespace server. We publish all those with LD principles.
... Multilingual schemas
... Should multilingualism be important. We annotate, we publish in LOV. We publish the French chapter of dbpedia. Latest this is the whole history of the French dbpedia released as LD.
... We are interested in every step of the lifecycle.

danbri: Who do you feel about W3C's role? What should happen?

fabgandon: My initial perception is that W3C shouldn't be involved in vocabs not related to the Web. LDP clearly is, for example.

danbri: Hosting of namespaces?

fabgandon: Hmm... I don't want it to be a bottleneck.
... Can be tricky in terms of governenance.

ericP: You mean process bottleneck or tech

fabgandon: I wouldn't want all the namespaces hosted at one place.

<Zakim> jtandy, you wanted to ask what "fully compliant" means

jtandy: When I look at schema.org, I don't imagine making my websites fully compliant. Is it a little smattering? Is it a full thing?

<Zakim> Ralph, you wanted to comment on decentralized namespaces

Francesco: Standards help with accessibility, integration etc.
... But it's usefeul to have under control, what's the structured data in a university?

<Zakim> Ralph, you wanted to comment on decentralized namespace

Ralph: Phil clairifed one part of Fabien's point - we're not seeking to centralise namepsaces
... Recent experience suggests that developers don't find the proliferation of namespaces helpful
... So the attraction of schema.org is that it's one place.

fabgandon: If the issue is namespaces. OGP didn't want the webmasters to have to add 10 namespaces to their pages

<jtandy> http://www.w3.org/2011/rdfa-context/rdfa-1.1

Ralph: The context file in JSON-LD is meant to help that.

danbri: A lot of this dates back to 1997 when DC held their first workshop.
... They saw themselves as one of a future many metadat schemas.
... When we did RDFS, we didn't need people to separate out different aspects. We haven't got a social process to match the distributed nature.
... JSON-LD context files are meant to be easy for webmasters.

betehess: If you want to put schema.org on your website you have to think about what you're trying to do.
... It's important to bear in mind validation. You have to use Google's tools, follow their assumptions etc.
... Adoption is always a problem. There are many people know nothing about Sem Web so schema.org helps in that education.
... The use of JSON-LD by Google has got several new people looking at JSON-LD
... They know JSON, so they're already comfortable.
... [Something about OG]
... The perception of og is ... FB only uses a fraction of it.

danbri: People don't use schema.org because it looks good but because it's useful.

betehess: It's becoming a way for our website to talk to others.

danbri: So people are consuming your schema.org?

betehess: Yes

danbri: That would be good to document.
... It's an unfortunate pressure on schema.org that people see it as a Google-only thing. Bing, Yandex and Yahoo are there too.

<Zakim> jtandy, you wanted to ask how to get schema.org can play nicely with other vocabs

jtandy: How do you get schema.org to play nicely with other vocabs
... The big one people want to use... we're happy making SKOS concept schemes - it might be useful to create a SKOS-like thing in schema.org

danbri: We did SKOS on the back of SWAD Europe, it had much the same purpose as schema.org. Straurctured data for people who don't think in triples
... What we've done well in RDF is establish SKOS as widely deployed
... In schema.org we have a thing on job postings. The definition includes links to various other things including some spreadsheets that look a lot like SKOS.
... It would be good if that data were exposed in a more accessibile way. schema.org od not the place to define heirarchy of restaurent cuisines.

ericP: We don't want to be a central point of evil, let alone a central point of failure.
... W3C has various failure avoidance mechanisms. We're a pretty safe place to do styuff.
... If you want a central place, we're a reasonably good place. But there's no complacency
... Typical question - is it rdf:Class or rdfs:Class? The more we have a single namespace the better.
... We can also improve tooling to help of course.
... Working on FHIR, people like schema.org so maybe we can share health records using schema.org markup
... Few people want to chare health records online
... issues around namespaces

danbri: Much as I love the RDF community... it's just a name for a technology... Semantic Web rebranding brought in some new people... then we had anotehr rebranding for a subset.
... So we ended up with two communities at two poles.
... Both ends naive. LD thought you could find production grade data on the open webn.
... We're never going to get to the stage where you just query a SPARQL endpoint and use it.

[Discussion of clean data, DBPedia, pharma resources]

danbri: BBC teams found DBPedia useful but it drifts and can break. You need to keep track

[Demo from Felix]

<fsasaki> http://fsasaki.github.io/stuff/tekom2016/

[Discussion]

fsasaki: Take a term like snow in EN, schnee in DE
... They all have a certain meaning
... many cultures have lots of meaning for a term
... At a high level they refer to the same concept of snow
... want a language-agnostic level
... Concept, meaning, expression is the heirarchy

danbri: Big discussion with the i18n group about RDF in the old days
... Where is the RDF world now with JSON?
... I just joined the WPWG to try and clean up microdata

fsasaki: Whatever you do you're screwed. You have the option of a separate field for a string and its language
... Currently discussing Activity Streams, Web Annotations... OK, some people say do the separate field solution. Better in JSON-LD. All have their own ways to inpterpret
... Spans with language info don't work in JSON
... People want some control characters inside the string. It's a no-go
... Another thing is directionality
... No one is happy...

danbri: Microdata is OK but when you go to JSON-LD it breaks

fsasaki: The micropub spec builds on microformats which is i18n bad

<fsasaki> [background on the json / i18n metadata issue, see this document: http://w3c.github.io/i18n-discuss/notes/json-bidi.html ]

newton: Would like to talk about translations

VeraMeister: Introduces topic of CMS adding structured data automatically. Not all concepts are available in schema.org.
... Course and Course Instance are in pending

danbri: Do you like the definitions?

VeraMeister: Yes. It makes sense. It's a kind of thinking. We also use the concept of CreativeWork
... A year ago there was a request for an education CG, but her way to think is more commercial. We're more concerned with organisational side of universities.

danbri: That led to a separate CG looking at courses etc. We think it's finished and I expect it to be in the next release.

<jtandy> scribe: Jeremy Tandy

<jtandy> scribenick: jtandy

betehess: there are a number of issues around tooling
... when putting the data out there
... we need to validate
... we need tools to do the validation
... samething with facebook validator
... there was no good way of doing this
... so we started developing our own tools
... using schema [schema.org] validation
... it's all done "manually" - which is to say that we write bespoke code to do this
... just like the W3C validator, the tool needs to tell you what's wrong
... if the community was to use SHACL or [SHEX?]
... to describe the rules, then [we wouldn't need to start from scratch]

danbri: we phrased this in terms of validation - but also we need to think about meeting the needs of particular consumers

betehess: agrees - [different outcomes require use of different schema.org terms]

phila: I also see this when I was doing some SEO stuff
... all was "CreativeWork" - but I wanted subtypes
... (but I was too tired to figure that out at 3AM)
... even in your own system, you [implicitly] use profiles

danbri: the structured data testing tool from Google does several things
... it will check syntax
... then it will look up the latest version of schema.org and try to validate against that
... errors are reported like "red ink on the page"
... it's intimidating
... we need to move toward saying that "you've passed the basic tests"
... we have triples coming through
... even if they won't fit into Google tool x & y

phila: that's what SHACL is for -

ericP: another question is how do I find what systems / applications can consume this data
... if you're already marking your data up, it would be useful to say what systems / components you know can use the data

betehess: [@@]

ericP: you wouldn't add extra triples [to describe]

betehess: but we don't really know how people are using data

phila: and we never will know
... I tend only to use microdata because that's the only format universally used

betehess: things have moved on

danbri: yandex has json validator ... most read RDFa now
... but I wrote a page that describes how Google uses the structured data

phila: it would be useful to provide SHACL rules that describe the profiles used by each of the services (Bing, Yandex, Google search) and other tools
...

danbri: we do something close to SHACL when we submit to the schema.org repo
... mostly it's SPARQL queries - these aim to ensure that what's in the repo is well structured
... some of these tests are about policies for managing terms; e.g. inverse properties being redundantly asserted

phila: on that sanity checking, how much effort would it be to add "stable", "testing" etc. categories to each term in schema.org?
... marking terms as "stable" would give confidence to user

danbri: this is pretty much doable ... testing and [@@] are "pending" ... stable is stable ... everything else is on the spectrum

betehess: it's useful to see the usage numbers for each term- the more people that use, the more stable the term

phila: so long as the 3M websites used the term correctly

danbri: talks about some new features being prepped for US election; it's only on a few sites
... we want to avoid objective rules like "its only stable if it's on a 1000+ sites"
... we don't delete things
... we've just introduced the schema.org "attic" which is where we can hide stuff we don't use anymore
... we're reticent to say that we won't ever change things any more
... [talks about changes to schema:Person]
... it would be useful to collect evidence of the use of terms in formal settings

phila: we're looking for mechanisms to say "this term is stable"
... ericp was suggesting that if a term is used in a REC it must be stable
... this is quite a good idea

RESOLUTION: This is not a formal resolution but it creates a link in the page:

<phila> If we let schema.org know that a standard references a term, that term can refer back to the standard. That makes it harder/less likely that schema.org will make changes

[and this is a way of asserting the stability of a term]

fabien: you might want to amend the definitions of terms to reflect the way that people actually use the terms
... we do this in the LOV community (?)

<fabgandon> ... http://lov.okfn.org/

vera: question - can someone explain [@@]

danbri: it came out of the experience of matching tidy RDFa into scruffy HTML / WHATWG
... the microdata "fork" of RDFa made a lot of concessions to simple publication
... but at the cost to machine readability
... schema.org tries to re-use terms in many places

> domainIncludes & rangeIncludes ... is a little looser than the "neat and tidy" OWL

betehess: back on subject about SHACL
... schema.org reflects what people are actually doing
... but many people don't refer to the text

<fabgandon> paper "Analyzing Schema.org" http://iswc2014.semanticweb.org/raw.githubusercontent.com/lidingpku/iswc2014/master/paper/87960257-analyzing-schemaorg.pdf?raw=true

betehess: SHACL could be used to show how you are _using_ the ontology in a given context

danbri: suspects that SHACL and SHEX could be part of the critical infrastructure in the next couple of years
... schema.org retains a lot of flexibility

<phila> schema.org Data Model

<betehess> [[ We also expect that often, where we expect a property value of type Person, Place, Organization or some other subClassOf Thing, we will get a text string, even if our schemas don't formally document that expectation. In the spirit of "some data is better than none", search engines will often accept this markup and do the best we can. ]]

danbri: we don't promise that these types go with these properties - this might evolve over time

[ericP does the adapter dance ... HDMI to VGA ... sigh]

betehess: shares his screen

<betehess> https://www.apple.com/newsroom/2016/09/highlights-from-apple-music-festival-10.html

betehess: looking at the source for the above page
... line 1324
... if we'd had a "shape" for this block of code, then we could have asserted some additional rules about properties like datePublished, dateModified
... we have additional rules about how these terms are used based on our policies

ericP: notes that you are using schema.org's JSON-LD context
... does that cause problems

betehess: no - schema.org's context does what we need here
... including "width" and "height"

ericP: so you're saying all the properties are in the @context - but Apple only wanted to constrain the use of those propeties?

betehess: yes

danbri: ecosystem question
... for a long time we didn't use the @context
... we = Google BTW
... but we do now
... we still don't use other people's contexts
... would people here like to see Google consuming multiple contexts?
... everything [more or less] gets converted into triples; right now we only support the schema.org context - but we aspire to do more

ericP: does something wrong in Google parsing happen if more context are used?

danbri: there are two cases
... i) people override the schema.org context with some extra stuff e.g. facebook.schema.org (??)

ii) people referencing multiple contexts ... see https://github.com/schemaorg/schemaorg/issues/1186

newton: I would like to use schema:Person and foaf:Person - how does Google decide which one to use

danbri: Google made a decision a while ago, we used to use things from multiple namespaces ... but now we try to use just one "big" namespace

betehess: so what next?

ericP: do you want a tutorial?

<newton> +1

> folks seem to be happy for a 10-minute aside on SHACL

fabgandon: my research team have a validator and would love to work with interesting use cases ... like these

<betehess> betehess: is it an implementation of SHACL? or ShEx? or custom thing?

fabgandon: but [there's lots of moving parts]

phila: hopefully the SHAPEs stuff is moving to just one spec

ericP: does his data-shapes/SHACL tutorial ... [not minuted]

Ed Draft spec is at https://w3c.github.io/data-shapes/shacl/

and https://w3c.github.io/data-shapes/shacl-abstract-syntax/

<phila> schema.org's context file

<phila> danbri: This file is very big

schema.org's Context File

<phila> issue 1186

https://github.com/schemaorg/schemaorg/issues/1186

<ericP> schema.org JSON-LD

<phila> danbri: We wanted to be able to say for each property, whether it expects strings or URIs

<phila> ... http://gs1.org/voc/cheeseFirmness

<phila> danbri: They have lots of terms...

<phila> ... The JSON-LD Contexts allows us to flatten it all down to one file

<phila> ... Found that using two simple contexts leads to mistakes/ambiguities

<phila> ... Because you don't know whether Person or Product came from which namespace

<phila> ... Change the order of the context files you get different triples

<phila> ... Same discussion around XML 10 years ago

<phila> ... People were parsing RDF/XML files with XSLT etc.

<phila> ... Web devs won't parse RDF/XML and handle triples

<phila> ericP: Do devs care about any of this?

<phila> danbri: That's the problem.

<phila> ... We want to decentralise. To plug in GS1 and wikidata, we have to make our context file a lot bigger.

<phila> ... Talks about Foreign Fetch

<phila> ... Can be used to access local copy held in your service worker, rather than having to get the original. It's cache plus logic. It can work across multiple sites.

<phila> [Lunch]

<ericP> schema.org JSON-LD

<danbri> see also https://github.com/schemaorg/schemaorg/issues/894 from Richard Wallis

<danbri> http://pending.schema.org/partOfEnumerationValueSet

<phila> [Unscribed session looking at schema.org issues]

<phila> https://github.com/schemaorg/schemaorg/issues/894

<danbri> <http://schema.org/codeValue> <http://www.w3.org/2000/01/rdf-schema#comment> "The actual code." <http://health-lifesci.schema.org/#3.2> .

<danbri> <http://schema.org/code> <http://www.w3.org/2000/01/rdf-schema#label> "code" <http://health-lifesci.schema.org/#3.2> .

<danbri> Consider http://health-lifesci.webschemas.org/code http://health-lifesci.webschemas.org/codeValue

<danbri> health-lifesci.schema.org

<danbri> https://twitter.com/danbri/status/763391811603861505

<phila> WE think that RJW's proposal can be handled using existing schema.org terms. Make code into MedicalCode, make code a super property