TPAC/2011/Semantic Syntaxes
Semantic Syntaxes
- Proposer: Tantek Çelik
- Discussion Leader: Ben Adida
- Type of session: discussion
At the recent schema.org workshop, there was quite a bit of discussion of what syntax to use for adding semantic information to HTML documents from among: microdata, microformats, RDFa.
Ben Adida presented on the evolution of RDFa 1.1 and RDF 1.1 lite, and noted how RDFa has based many simplifications on microformats' syntax.
microdata itself has been evolving since it was first proposed, based on use-cases provided by RDFa proponents.
microformats has also been evolving with microformats 2, and most recently is proposing to use the "itemref" innovation of microdata over the previous "include-pattern"
It was clear from the discussion in the room that multiple syntaxes are actively co-evolving and learning from/with each other.
If you're interested in semantic syntaxes (microdata, microformats 2.0, RDFa) this session is for you. Topics:
- How are syntaxes evolving?
- What features are syntaxes borrowing from each other?
- Is there a common (JSON?) data model that syntaxes are converging on?
notes
Scribe Fabien Gandon FabGandon on channel #semsyn of irc:irc.w3.org:6665
FabGandon changed topic to : Semantics and Syntax what syntax to use for adding semantic information to HTML http://www.w3.org/wiki/TPAC2011/Semantic_Syntaxes FabGandon: Ben Adida opening the session with the history of microformat the small “s” semantic web. FabGandon: ... vcard, events, places ... tantek: please capture notes persistently on the wiki page also: http://www.w3.org/wiki/TPAC2011/Semantic_Syntaxes#notes FabGandon: ... at the time at creative common and micro-format looked grat FabGandon: ... Ben commenting on http://en.wikipedia.org/wiki/Microformat FabGandon: ... microformat didn't look the right solution because remixing was not a use case FabGandon: ... RDFa was a reaction to this FabGandon: ... many people think RDFa is only for XHTML FabGandon: ... (showing RDFa 1.1 Lite markup example FabGandon: ... http://manu.sporny.org/2011/rdfa-lite/ FabGandon: ... RDFa 1.1 : profiles, vocab, etc. JeniT: RDFa 1.1 Lite W3C Editors Draft is at http://www.w3.org/2010/02/rdfa/sources/rdfa-lite/Overview-src.html FabGandon: ... SearchMonkey 2006 Yahoo product using RDFa to customize search results http://developer.yahoo.com/searchmonkey/ FabGandon: ... really nice pipeline FabGandon: tantek: Yahoo never actually deployed such appraoch in their main search engine FabGandon: ... only used in Search Monkey KevinMarks: "energetic discussions with Ian" is my new band name FabGandon: Ben Adida: Microdata focusing on search engine by Ian Hickson http://www.w3.org/TR/microdata/ FabGandon: ... another syntax where you can plug any vocabulary tantek: Hixie defined a syntax in microdata, in microdata, and a licensing vocabulary and also a set of sample vocabularies, vcard in microdata, vevent (from iCalendar FabGandon: ... RDFa makes it easy to mix different vocs, Microdata simplifies the way the page is adorned with property-value pairs FabGandon: ... http://schema.org/ provides a list of vocs using microdata see http://schema.org/docs/schemas.html FabGandon: ... Tantek made the point at the Schema.org workshop that multiple syntaxes is a good thing. KevinMarks: also http://www.data-vocabulary.org/ was hixie's original schema home for microdata FabGandon: Tantek: Timeline: microformats -> RDFa 1.0 -> Microdata -> RDFa 1.1 -> Microformats 2.0 each one building on the return on experience of the previous ones FabGandon: ... microformat 2.0 http://microformats.org/wiki/microformats-2 FabGandon: ... a surprising feedback is that even microformats are sometime a too complex syntax FabGandon: ... microformat 2 providing a list of optimizations to simplify markup (e.g. root classes FabGandon: ... avoid hierarchy mechanisms in the mark-up FabGandon: ... avoid class name collisons by separating the syntax from the vocabularies FabGandon: ... web sites updates by different peoples tend to lead to loss of markup KevinMarks: that happened to me with Google Profiles - someone stripped out my hCard classes :( FabGandon: ... so use prefix class names (scribe was lost here FabGandon: ... example of microformat 2 : <a class="h-card" href="http://benward.me">Ben Ward</a> KevinMarks: "h-*" for root class names; "p-*" for simple (text properties; "u-*" for URL properties, e.g. "u-url", "u-photo", "u-logo"; "dt-*" for datetime properties; "e-*" for properties where the entire contained element hierarchy is the value FabGandon: Tantek: examples of prefixes h- fo root class names, p- for properties, u- for URL properties, dt- for datetime properties, e- for properties, etc hober: Can I use different prefixes for the same property in different instances of the same format? KevinMarks: http://microformats.org/wiki/microformats-2#naming_conventions_for_generic_parsing KevinMarks: in principle, yes FabGandon: Noah : question about the simplification of the hierarchy mechanisms. FabGandon: Tantek: the voc with the less hierarchy are the ones that got most adopted. KevinMarks: we forgot the other semantic markup <script> with JSON in... tantek: and most reliably adopted tantek: KevinMarks - that's invisible FabGandon: Ben Adida: RDFa tries to reuse the RDF stack as much as possible. tantek: and duplicated (violates DRY KevinMarks: yes KevinMarks: we should discuss though, and OGP way too KevinMarks: to explain why/what FabGandon: Greg Kellogg: mapping of microdata to RDF KevinMarks: also mapping of microformats to RDF via GRDDL FabGandon: ... the mapping was removed because there is no right answer identified for now. FabGandon: ... one proposal is to come up with a form of registry of mapping FabGandon: Phil Archer: working with European commission on developing new vocs tantek: ... great that microformats and microdata to end up with the same JSON FabGandon: ... are there use cases to tell me when to use microformat vs. RDFa vs microdata ? tantek: http://www.w3.org/wiki/Html-data-tf KevinMarks: Monica should explain OGP JeniT: PhilA, working on it: http://www.w3.org/wiki/Choosing_an_HTML_Data_Format FabGandon: Ben Adida: RDFa strongest when you need to mix vocs from different sources and/or when you need the RDF stack FabGandon: Tantek: the focus on syntax is not important during dev, the important question is the voc. FabGandon: ... let's move the hard questions to the vocabulary which is the most important for communication FabGandon: ... if we do our job the syntax won't be the problem. FabGandon: Monica : at social cast we use several syntaxes KevinMarks: Monica Wilkinson: OGP puts data in the <head> - violates DRY on purpose to avoid designers 'breaking things' FabGandon: Ian: the syntax also depends on who is the consumer FabGandon: Key: the problem is also that many people don't know enough about these technology to even see the syntax problem. FabGandon: Alex Russell: the problem is also that they don't perceive the added value to do that FabGandon: ... one way to address that is through web components FabGandon: ... encapsulate the UI value and the data value at the same time FabGandon: Ian: microdata is also useful in drag and drop actions for instance KevinMarks: the rel- microformats have become part of HTML5, per Alex's point FabGandon: Hadley Beeman: how far are we in separating the syntax and vocs ? FabGandon: Ben Adida: RDFa did that from the start KevinMarks: "people didn't know about it" is not a feature FabGandon: Tantek: if everyone does his own voc you end up whith babel FabGandon: TimBL: there will always a small number of small vocs extremely used and then a long tail FabGandon: (no way I can capture TimBL hyper-speach tantek: Alex Russell and TimBL having a healthy discussion about vocabularies. KevinMarks: timbl: vocabularies have a fractal nature - we should not build just for the big head or long tail of vocabularies KevinMarks: timbl: it worries me when you say "we built the web in wishy-washy way, so we can do this in wishy-washy way" KevinMarks: timbl: if I put the data on many websites I should be able to reconstitute the database table without loss FabGandon: Alex Russell: more and more data are ending up in javascript and we need to get that back into declarative format. FabGandon: ... meaning drifts with the updates of the system FabGandon: ... I have hopes for slang not for fixed vocs. FabGandon: Tantek: the immediate UI experience is the best way to have high quality data KevinMarks: tantek: first person benefits are the greatest path to high data quality. Add to addressbook link meant that data was much better FabGandon: ... schema.org diverges from existing vocs and that’s a mistake. FabGandon: Noah: couldn't we use the validators to promote common practices FabGandon: ... alert people on what is going on FabGandon: Eric Franzon: what the state of tool dev is? plugins, etc. FabGandon: Ben Adida: several CMS include microformat in wordpress, RDFa in drupal, etc. FabGandon: Ben Adida: partial reuse should be suported, its better than nothing. KevinMarks: look at the #semsyn tag on twitter too FabGandon FabGandon: Greg Kellogg: one of the problem is the lack of indirections FabGandon: Ben Adida: Web dev and Vocab dev are two different communities. FabGandon: TimBL: RDF engines should be able to do the follow your nose on the voc mapping. FabGandon: ... validator may be too far away but browsers have the ability to show the sources FabGandon: ... this could be where we could have thye view data and be able to correct any problem before copy paste FabGandon: Alex: who's putting the data in the page in the first place. gkellogg_: In RDFa, @vocab allows for a form of indirection FabGandon: ... think of it in evolutionary terms, tantek: Steve Zilles on web developers vs. scripts putting markup on the web. FabGandon: Steve Zilles: the web dev are not the only one to put the data in the pages, also scripts FabGandon: Tantek: there is always a human, a human created and maintained the script. Vincent a quitté le salon (quit: Quit: This computer has gone to sleep FabGandon: Steve Zilles: yes but it comes from a DB with a schema. tantek: Our experience (back at Technorati was the even data from databases (DB rots over time. Up to 30-40% of RSS/Atom feeds were broken / inconsistent with the *visible* HTML pages. tantek: Databases / scripts are not long term.
Twitter Archive Dump
fabien_gandon 02/11/2011 19:22 breakout session #semsyn at #w3c #tpac what syntax to use to add semantic information to HTML http://t.co/cOLcNHwy JeniT 02/11/2011 19:23 @fabien_gandon Are you going to live-tweet? #semsyn? kevinmarks 02/11/2011 19:24 #tpac #semsyn @benadida is explaining #microformats history - the lower case semantic web http://t.co/OK2uwNaV fabien_gandon 02/11/2011 19:26 @JeniT on irc.w3.org:6665 channel #semsyn kevinmarks 02/11/2011 19:28 #tpac #semsyn claims @benadida remixing fields from other schemas was not a #microformats goal hadleybeeman 02/11/2011 19:32 RDFa lite 1.1 - W3C Editor's Draft 30 October 2011, via @jeniT http://t.co/FKCteMBR #linkeddata #semsyn #TPAC kevinmarks 02/11/2011 19:35 energetic discussions with Ian is my new band name #tpac #semsyn eyeonprofit 02/11/2011 19:36 RT @kevinmarks: "energetic discussions with Ian" is my new band name #tpac #semsyn bsletten 02/11/2011 19:40 RT @kevinmarks: #tpac #semsyn @benadida where #microformats, RDFa, microdata agree is on using the actual contents of the page as data (the DRY principle) kevinmarks 02/11/2011 19:40 #tpac #semsyn @benadida where #microformats, RDFa, microdata agree is on using the actual contents of the page as data (the DRY principle) kevinmarks 02/11/2011 19:43 #tpac #semsyn @t #microformats RDFa and microdata have all been devloped int he open, which shows that open specification works kevinmarks 02/11/2011 19:44 #tpac #semsyn @t now explaining the http://t.co/T8obHriv - now simpler and more coherent. washes brighter. ciberch 02/11/2011 19:44 RT @kevinmarks: #tpac #semsyn @t #microformats RDFa and microdata have all been devloped int he open, which shows that open specification works kevinmarks 02/11/2011 19:46 #tpac #semsyn @t: every social networking site has a name, photo and URL per person, so we can assume p-name u-url and u-photo for h-card kevinmarks 02/11/2011 19:47 #tpac #semsyn @t: the more complex and hierarchical the syntax is, the more it reduces data quality (per Guha) kevinmarks 02/11/2011 19:48 #tpac #semsyn @t there was no way to write a generic #microformats parser - with http://t.co/T8obHriv this is possible kevinmarks 02/11/2011 20:01 #tpac #semsyn @benadida RDFa is at its best when you want to mix already-existing vocabularies without seeking consensus or need RDF stack kevinmarks 02/11/2011 20:02 #tpac #semsyn @t the right thing to do is develop an open vocabulary first, then worry about the syntactic mapping to #microformats et al kevinmarks 02/11/2011 20:06 #tpac #semsyn @t the vocabulary is about agreement; people stripping out code is a syntax issue kevinmarks 02/11/2011 20:08 #tpac #semsyn Alex Russell:we get to a point where the search engine pipeline and the end-user are seeing different things on the page kevinmarks 02/11/2011 20:08 #tpac #semsyn Alex Russell: when you mark up with #microfromats et al you aren't directly addressing the primary user of your page kevinmarks 02/11/2011 20:10 #tpac #semsyn @slightlylate: we should treat these syntaxes as things that should be in HTML eventually and become first class kevinmarks 02/11/2011 20:10 #tpac #semsyn @slightlylate: data we mark up is probabalistically semantic - not first-person semantic kevinmarks 02/11/2011 20:14 #tpac #semsyn @timberners_lee vocabularies have a fractal nature - we should not build just for the big head or long tail of vocabularies kevinmarks 02/11/2011 20:15 #tpac #semsyn @slightlylate: yes data is wishy washy - enterprise cases are full of this kevinmarks 02/11/2011 20:15 #tpac #semsyn @timberners_lee: it worries me when you say "we built the web in wishy-washy way, so we can do this in wishy-washy way" kevinmarks 02/11/2011 20:16 #tpac #semsyn @timberners_lee: if I put the data on many websites I should be able to reconstitute the database table without loss LogicalB0T 02/11/2011 20:16 Fascinating. RT @kevinmarks - #tpac #semsyn @slightlylate: yes data is wishy washy - enterprise cases are full of this kevinmarks 02/11/2011 20:18 #tpac #semsyn @slightlylate: I see more and more data in JSON on the web, and if we want a declarative form people make a second version kevinmarks 02/11/2011 20:19 #tpac #semsyn @timberners_lee: data cleanliness is always a problem kevinmarks 02/11/2011 20:20 #tpac #semsyn @slightlylate: meaning drifts over time - we're not going to get there by defining ontologies ahead of time kevinmarks 02/11/2011 20:21 #tpac #semsyn @t: first person benefits are the greatest path to high data quality. Add to addressbook link meant that data was much better kevinmarks 02/11/2011 20:22 #tpac #semsyn @t: if you're making up semantics for the sake of it, it will rot. 'you might someday look nicer in a search engine' !enough kevinmarks 02/11/2011 20:23 #tpac #semsyn @t: RFC 6350 - vcard4 drew on Portable Contacts, hCard experience. http://t.co/K9aXzf9R Person ignored this MartijnLinssen 02/11/2011 20:24 @kevinmarks With all due disrespect, W3C is a tech-fest run by nerds. We need business standards #tpac #semsyn ciberch 02/11/2011 20:24 What are the main use cases for #semsyn (micro formats, microdata, RDFa), stream publishing ? ala #facebook kevinmarks 02/11/2011 20:24 #tpac #semsyn @t: http://t.co/K9aXzf9R diverged from every existing vocabulary arbitrarily. and made things worse. ciberch 02/11/2011 20:25 Or HTML APIs ? #semsyn kevinmarks 02/11/2011 20:29 #tpac #semsyn @ciberch: having HTML APIs that make sense of the data on the page will drive this (see http://t.co/CKSXwVML ) tonyfish 02/11/2011 20:30 RT @kevinmarks: #tpac #semsyn @timberners_lee: data cleanliness is always a problem tonyfish 02/11/2011 20:30 RT @kevinmarks: #tpac #semsyn @timberners_lee: if I put the data on many websites I should be able to reconstitute the database table without loss tonyfish 02/11/2011 20:30 RT @kevinmarks: #tpac #semsyn @slightlylate: yes data is wishy washy - enterprise cases are full of this tonyfish 02/11/2011 20:31 RT @kevinmarks: #tpac #semsyn @timberners_lee: it worries me when you say "we built the web in wishy-washy way, so we can do this in wishy-washy way" kevinmarks 02/11/2011 20:32 #tpac #semsyn @t: as soon as you say indirection or subclass, you've lost most web developers @benadida: save pain for vocab developers kevinmarks 02/11/2011 20:33 #tpac #semsyn @timberners_lee: I like what python does - from foaf import date - can bring in namespace pieces from elsewhere kevinmarks 02/11/2011 20:34 #tpac #semsyn @timberners_lee: just as a browser has view source - we should have view data too kevinmarks 02/11/2011 20:34 #tpac #semsyn http://t.co/wKzNTpjI enables bringing in a vocabulary to define keys kevinmarks 02/11/2011 20:35 #tpac #semsyn @slightlylate: you view source on something to work out how it was done and borrow it for your own site. ciberch 02/11/2011 20:39 @kevinmarks yup ideally we will move away from js apis that build iframes to pull html markup for a widget #semsyn hadleybeeman 02/11/2011 20:41 Very good session on semantic syntaxes: RDFa, microformats and microdata run by @benadida & @t. #semsyn #TPAC
- Html-data-tf
- ...