October 23, 2016

ishida >> blog

New app, Pinyin phonetics

Picture of the page in action.

A new Pinyin phonetics web app is now available. As you type Hanyu pinyin in the top box, a phonetic transcription appears in the lower box.

I put this app together to help me get closer to the correct pronunciation of names of people, cities, etc. that I come across while reading about Chinese history. There was often some combination of letters that I couldn’t quite remember how to pronounce.

It’s not intended to be perfect, though I think it’s pretty good overall. It works with input whether or not it has tonal accents. If you can suggest improvements, please raise a github issue.

See the notes file for a description of the less obvious phonetic symbols.

In case you want something to play with, here’s the text in the picture: Dìzào zhēnzhèng quánqiú tōngxíng de wànwéiwǎng. It means “Making the World Wide Web truly worldwide”, and the Han version is 缔造真正全球通行的万维网.

by r12a at October 23, 2016 05:23 PM

October 19, 2016

W3C Blog

DOI/DONA vs. the Internet

I’ve recently been looking at some of the DO (Digital Object) literature, and DOI/DONA, and I’m still not clear what it amounts to nowadays. I’ve discussed it a bit internally at W3C and wanted to share my views here as well.

If it wasn’t for the IoT and ITU context, I’d say it’ll have a negligeable effect on the existing Internet architecture using URI/http/DNS/IP. But we do have this context of IoT chaos in terms of standardization (the main reason why we’re working on the Web of Things, or WoT, as a more abstract layer), so I think it’s worth discussing a bit more in our community.

Anyway, this is part of an old story.

I remember one particular aspect that was discussed on some of our W3C lists in 2003 when DOI tried to get a URI scheme (i.e. doi:), after having rejected the idea of a URN namespace (e.g. urn:doi:) or a non-IETF-tree scheme (like org:doi:) as too second-class citizen flavored.

For the novices, DOI is just another persistent identifier catalog and syntax, using a “global/local” grammar (e.g. 10.101/something), which should ideally be presented as urn:doi:10.101/something to fit with our architecture (this apparently works in some tools, even though urn:doi is not a registered URN namespace) or even as doi:10.101/something, using a new URI scheme this time (which is also used sometimes in interfaces or papers, even though it’s not registered with IANA either, and doesn’t resolve as such).

In the end, today, all DOIs use the DNS and the http URL scheme like in http://dx.doi.org/10.101/something to enter their resolution space (which is main reason why they didn’t get their doi: scheme in the first place, since they have no independent resolution “running code” of their own for the schema itself, they just use http with DNS and URL querying).

But the point is, once the initial http://dx.doi.org proxying is done, they (DOI/DONA) provide a full resolution system independent of DNS for their own internal PI syntax, with a commercially based hierarchical registration system comparable, from a distance, to ICANN/IANA/DNS, with registries, fees, registrants, etc. I haven’t looked at their pricing, persistency policies, etc. This is a service that the scholarly community seems to appreciate a lot, e.g. to be able to dereference an ISBN number into some resources about it (e.g. the book itself, a summary, some metadata, a link to a bookstore, etc.). What’s their competition in this space ? Local universities/librarian portals with URL querying using ISBN ? purl, ark maybe ? But without a central root (Global Handle Registry) like what DONA is providing.

It’s also still unclear to me how their hierarchical name space is organized, e.g. by countries, by industries, both, flat ? using what semantics ? whatever it is, better or worse than ICANN’s gTLD, ccTLD and subdomain policies, one paper was saying that e.g. for China/Russia/Iran, one main advantage over ICANN is that it’s not run by a California non-for-profit but by a Swiss non-for-profit – which BTW has ITU singled out in its Statutes as a partner of choice. There goes down the drain our usual “multistakeholder matters/geography doesn’t” argument..

An interesting question is why all these countries, somehow active in ICANN/IETF  are at the same time trying to fragment the root ? I read somewhere that the same person who is/was overseeing the ITU/DOI work is also on the ICANN GAC for instance, so there is communication. Another question worth asking is why is there a recent RFC asking all IETF RFCs to also provide DOIs ? As if an ietf.org URL was not stable enough.

The ITU move to endorse DOI is clearly political, and there is no denying that having the top I* headquartered in the states and having Trump as a potential head of the same states is worrysome for lots of folks on the globe.

BTW, there seems to be also a metadata stack of some sort being used/integrated in the DOI system, called indecs, which I haven’t looked at, and some questions related to URN syntax not being able to support their needs for structured identifiers.

So far, I haven’t seen any deployment of a custom protocol of their own (like http or ftp) to justify doi: as a first class URI citizen, e.g. something that browsers would implement more readily. It looks like the Handle part of their system defines that, or used to (there was an hdl: URI scheme available at some point). Same thing for the deployment of a “bind” of their own that OSs would have to implement as well, to connect user agents to their DOI servers directly, without going through DNS. For now, DOI looks like an alternate root that doesn’t use the open DNS software infrastructure (but have to use a URL to access their resolution space).

Once they do that – deploy software that connect directly to their main resolver/handle, and I think they will if they reach enough critical mass, what’s behind their doi: syntax doesn’t use DNS or IP, it’s just a private identifier binding space run by the DONA/DOI organization/servers and their registries, with a promise of uniqueness and persistency (maybe I’m just used to W3C and ICANN and we’re as opaque for newbies, but I can’t say it’s very transparent to me in terms of who is controlling what, what ontologies are used, etc., but then again, they don’t really sell cashable global names like coconuts.net, but dull series of unique ID/numbers, which look more like IP numbers that domain names from the outset, except that they are supposed to be assigned to the same resources forever).

Of course, today, most (all?) doi: identifiers not only use URLs to resolve their ID (funny enough, at some point, back a year or so, doi.org resolution was down because DOI had forgotten to renew its .org DNS registration), but they also return URLs as their main typed values in  DOI records, since that’s the most easily resolvable ID today on the Internet.

Overall, I’m not too worried, since being on the same root as the rest of the planet brings more economic advantages that anything else nowadays, but let’s not forget the IoT context, with most of the “Things” already in control of governments (i.e. your fridge, your electrical plug or bulb, your car, are already all subjects to gov conformance) so an easy target for gov to impose a particular network interface vs. the Internet stack.


by Daniel Dardailler at October 19, 2016 06:42 PM

W3C and Big Data

The term big data, or to give it marketing spin, Big Data, means different things to different people. For operators of wind turbine arrays it might mean handling the 2GB of data created by each turbine every hour describing its power generation and the sensor readings that are then used to predict and prevent component failure. For an agronomist, it might be the result of running natural language processing on thousands of academic journal articles and combining that with images of different vine varieties, and their own research, to derive new insights for wine makers. These are not random examples, rather, they are two of the seven pilots being undertaken in the Big Data Europe project, co-funded by the EU’s Horizon 2020 program.

What’s that got to do with W3C? Why are we even in the project?

Two of the Vs of big data, velocity and volume, are being tackled effectively by improved computing infrastructures – tools like Apache Flink and Hadoop HDFS respectively. The Big Data Europe project uses Docker and Docker Swarm to make it easy to instantiate any number of such components in a customized work flow. Apart from the relative triviality of a Web-based UI, this has nothing to do with W3C technologies. Where we do come in though is in tacking the V that creates most problems: variety.

Block diagram of BDE architecture showing, among other things, the Semantic Layer Block diagram of the Big Data Europe architecture. Note the role of Docker Swarm and the Semantic Big Data application

Mixing data from different sources, and doing so at scale, can be hard. It requires a layer within the architecture that adds meaning and interoperability to the data; in other words, it needs a semantic layer. The use of URI references as identifiers and RDF-encoded ontologies to create a semantification application is a key component of the Big Data Europe Platform, realized by the SANSA stack, a processing data flow engine that provides data distribution, and fault tolerance for distributed computations over large-scale RDF datasets.

The project will be presenting its work on Dockerization and semantification at the forthcoming Apache Big Data Europe event in Seville, 14-16 November for which registration is open.

by Phil Archer at October 19, 2016 12:30 PM

October 17, 2016

W3C Blog

I 💙 [love] W3C… what about you? (Meetup in Barcelona)

The W3C Office in Spain is glad to present I 💙 W3C… what about you?, a meetup to be held on October 24 in Barcelona. This informal meeting aims at gathering the local industry, public bodies and researchers to discuss challenges, opportunities and potential collaboration. In an open discussion, W3C staff and Members will present the latest activities of the Consortium that will be of interest for the Spanish community.

All local companies working with Web technologies are invited to participate, presenting their challenges, needs, and successful projects using W3C standards (or not!). We would like to understand interests and barriers for those companies in terms of Web technologies.

Although everyone will be invited to introduce their own topics, we will focus on some of the hottest technologies such as: Web Payments, BlockchainWeb of ThingsData on the Web (Big Data, Open Data, Linked Data), Web Apps, and Digital Publishing.

The event will be open and free, but registration is required. Read more about the event, and book your seat now!


by Martin Alvarez-Espinar at October 17, 2016 01:13 PM

October 12, 2016

W3C Blog

WCAG 2.1 under exploration, comments requested by 1 November

The Web Content Accessibility Guidelines Working Group announces a plan to develop WCAG 2.1, which builds on but does not supersede WCAG 2.0. The group would like input from stakeholders on this plan.

Web Content Accessibility Guidelines (WCAG) 2.0 became a W3C Recommendation on 11 December 2008. It has been one of the major resources for making web content accessible to users with disabilities, referenced by accessibility policies of many countries and organizations, translated into twenty languages, and it has become an ISO standard (ISO/IEC 40500:2012). Supporting these references, WCAG 2.0 was structured to be a stable resource, and technology-specific implementation guidance was provided separately (in the Techniques and Understanding supporting documents) and updated as web technologies evolve.

WCAG remains relevant nearly a decade after finalization. Technology has, however, evolved in new directions. For instance, the widespread use of mobile devices with small screens and primarily touch-based user input methodologies has led to challenges making content that conforms to WCAG 2.0 accessible on those devices. Technology evolution makes it possible to meet the needs of more users, and users with low vision, or with cognitive, language, or learning disabilities see new benefits that should be better represented in guidelines. Further, the increasing role of the web in our lives means technologies such as digital books, payment systems, driverless vehicles, etc. now need to be addressed by web accessibility guidelines.

In 2015, the WCAG Working Group had chosen to develop extensions to WCAG 2.0, in order to provide targeted guidance quickly, without changing the meaning of conformance to WCAG 2.0 itself or disturbing policies that reference WCAG 2.0. In review of the Requirements for WCAG 2.0 Extensions, however, it became apparent that the interrelationship of extensions could be complicated, and accessibility for some user groups could vary if organizations chose to meet some extensions but not others.

After careful deliberation and consultation, the Working Group has now decided not to put the new guidance in extensions, and instead to work on an updated version of Web Content Accessibility Guidelines. This dot-release, WCAG 2.1, will build on WCAG 2.0 to provide guidance urgently needed for today’s technologies. As a dot-release, it will have a restricted scope, be as similar to WCAG 2.0 as possible, and be fully backwards compatible. While WCAG 2.1 will be available to organizations that wish to follow updated advice, WCAG 2.0 will not be retired. References to WCAG 2.0 and within it will continue to be valid, and sites that conform to WCAG 2.0 will still have valid conformance claims. Over time, the hope is that policies and sites will migrate to the newer guidelines at times that make sense for them. Because of the backwards compatibility, sites conforming either to WCAG 2.0 or to WCAG 2.1 will share a common base of accessibility conformance.

In order to develop WCAG 2.1, the Working Group needs to obtain support from the W3C Membership. The current WCAG charter anticipated this refresh and indicated a plan to combine content, user agents, and authoring tools within a single Working Group (and thus the scope of Web Content Accessibility Guidelines, Authoring Tool Accessibility Guidelines, and User Agent Accessibility Guidelines). The Working Group plans to propose a new charter (current draft in-progress) to include WCAG 2.1 and then publish a first review version of WCAG 2.1 in early 2017, aiming to finalize the new 2.1 specification by mid 2018. This is fast work for standards of this type, and it will be necessary to focus on the most critical issues for today’s technology, while reserving other issues for future work. The group will also work on requirements and a first draft for a new major version update (3.0 version) of accessibility guidelines that encompasses the full scopes listed above. Issues that cannot be resolved in time for the mid-2018 publication of WCAG 2.1 are expected to be deferred to later versions of WCAG 2 or to the restructured 3.0 guidelines.

Input about this project will help to ensure that it meets the needs of content developers and web users, and is beneficial, not disruptive, to organizations that use WCAG 2.0.

At a high-level, we call reviewers’ attention to the following points and aspects of the WCAG WG’s ongoing work-mode:

  • WCAG 2.0 remains an active Recommendation, available for reference by sites and policies.
  • New web accessibility guidance may address technology changes.
  • Updated guidance will be incorporated via dot-releases such as WCAG 2.1 (and ultimately 3.0) rather than through extensions.

To send feedback to the working group, please send email to public-comments-wcag20@w3.org. You can also reply to this message to carry out discussion on this list. In order for us to process feedback in time to take next steps on this plan, we request feedback as early as possible, and by 1 November 2016. If you would like to be more involved in shaping this work, consider joining the WCAG Working Group.

Thank you for your time in helping keeping web accessibility guidelines current.

Andrew Kirkpatrick, WCAG WG co-chair Joshue O Connor, WCAG WG co-chair Michael Cooper, WCAG WG staff contact

by Michael Cooper at October 12, 2016 09:08 PM

October 11, 2016

W3C Blog

W3C Fall 2016 meeting and W3C Highlights

TPAC 2016 logoWe held this year’s all work-group annual meeting a few weeks ago in Lisbon, Portugal. TPAC 2016 was a very productive and smooth meeting, where more than 550 experts from the Web community met. It was the highest attendance ever for a TPAC held in Europe, and close to the highest ever anywhere. Notably, it was the first time in TPAC history that there were temperatures above 20°C (68°F) every day.

Almost 40 Working and Interest Groups met and as many breakout sessions took place during the Technical Plenary unconference, to discuss emerging technologies that may benefit from standardization work at W3C. Read more on the advancements to the Open Web Platform and specific industry requirements for the next generation Web in the press release we issued as the meeting week concluded.

A few topics stood out, reflecting the innovative nature of the meeting: The Blockchain session was crammed with people. Web VR is another big and promising topic that we are looking at (the W3C Workshop on Web & Virtual Reality takes place next week in the Silicon Valley).

The week was punctuated with a number of features:

  • Tim shared his vision on redecentralizing the Web during the Plenary session;
  • The Publishing Community met. This is part of the effort to bring the Publishing community closer to the Web community (read our May announcement);
  • We opened the meeting to W3C Community Groups, 20 of which took us up on the offer and met throughout the week;
  • The Web of Things PlugFest, where all envisioned building blocks came to life during demos, was a highlight. We held a series of PlugFests timed with the Web of Things Interest Group face-to-face meetings this year and were able to show the interoperability work to the bigger audience of TPAC. The W3C Advisory Committee has been reviewing a proposed Web of Things Working Group charter.
  • W3C Members had an opportunity to discuss the recent W3C staff reorganization which changes focus from domains to strategy, project, industry, architecture & technology management.
  • A small and peaceful demonstration against W3C’s work in EME took place and gave way to conversations between protesters and members of the W3C Team and W3C community.
  • We entertained our audience with a private screening of the documentary film ForEveryone.net about Sir Tim Berners-Lee, the history of the Web and a call to protect its future. Tim, during his interview with Coralie Mercier after the showing, emotionally touched on his experience in watching his own movie in the company of the Web Developer community who has worked with him for decades in enhancing the Web.

Lastly, as part of preparation for TPAC, we published for the Membership “W3C Highlights – September 2016,” now public, which I invite you to read.

by Jeff Jaffe at October 11, 2016 03:54 PM

September 30, 2016

ishida >> blog

Timeline: 16 Kingdoms period

This shows the durations of dynasties and kingdoms of China during the period known as the 16 Kingdoms. Click on the image below to see an interactive version with a guide that follows your cursor and indicates the year.

Chart of timelines

See a map of territories around 409 CE. The dates and ethnic data are from Wikipedia.

Update 2016-10-03: I found it easier to work with the chart if the kingdoms are grouped by name/proximity, so changed the default to that. You can, however, still access the strictly chronological version.

by r12a at September 30, 2016 09:59 PM

September 19, 2016

W3C Blog

Bringing Virtual Reality to the Web platform

The world of Virtual Reality in 2016 feels a lot like the world of Mobile in 2007: a lot of expectations around a new way to interact with users, a lot of innovations in hardware and software to make it a reality, and a lot of unknowns as to how these interactions would develop.

Having played a tiny role in making the Web a better platform for mobile devices, and with the feeling that Progressive Web Apps are finally bringing us where we need to be in that space, I have been looking in the past few months at where the Web needs to be to provide at least one of the major platform for Virtual Reality experiences.

Beyond the expected application of VR to gaming and video consumption, many innovative use cases have emerged to make Virtual Reality a compelling platform for e-commerce, news (see for instance NYTVR), learning and discovery, communication and social inclusiveness, engineering, and many other use cases.

As such, VR feels to me like a big new set of opportunities for creativity and expression, with its more immersive and more direct interactions. The Web ought to be able to cater for this space of innovation.

The Web comes with well-known strengths in that space:

  • As the number of headsets and other associated VR devices grows by the day, the plasticity of the Open Web Platform to adapt content and services to a great many of device types, varying in processing power, resolution, interactions and operating systems is no longer to demonstrate, and is certain to bring content and service providers a uniform platform on which to build.
  • As wearing a headset tends to isolate users from their external environments, there is a risk that VR experiences remain limited to intense but somewhat exclusive type of content or applications (e.g. games or videos); but the Web has proved excellent at providing an on-ramp to engaging users (as Progressive Web Apps demonstrate). While it’s hard to imagine oneself immersing into a real-estate VR experience while looking for a house, the idea of starting a VR experience while browsing a particularly appealing house on a real-estate Web site seems much more compelling.
  • The Web was created first and foremost to facilitate sharing, and the continued demonstration of the power of URLs to enable this some 25 years after its inception is a testament to the robustness and strength of that approach. VR would hardly be the first ecosystem to benefit from the reach and social effect enabled by the Web.
  • Finally, as a fundamentally open platform, that anyone can use, build on and contribute to building, the Web can ensure that the new space of creativity enabled by VR is not stifled by the rules and constraints of closed and proprietary platforms.

But to make these strengths applicable to VR, the Web obviously needs to provide the basic technical bricks that are necessary to build VR experiences.

Fortunately, many such technologies are already in place or are making good progress toward widespread development.

WebGL has provided the basic layer for 3D graphics for a number of years and has now reached widespread deployment.

The Gamepad API brings the necessary interface to the various type of devices used to navigate in virtual experiences.

The Web Audio API features, among its many amazing capabilities, spatialized audio, providing a critical component to truly immersive experiences.

But critically, the possibility of projecting graphics to VR headsets, taking into account their optical and geometrical specificities, has been recently enabled experimentally via the WebVR API that Mozilla started (recently releasing it in its nightly builds), soon after joined by Google and Samsung with their respective browsers, and recently joined by Microsoft.

While this collection of APIs can easily be perceived as a steep learning curve for many Web developers, another project pushed by Mozilla, A-Frame, demonstrates the expressivity of encapsulating a lot of such APIs in Web Components. With A-Frame, a few lines of HTML-like markup suffice to create a first VR-enabled scene, including all the plumbing needed to make the experience of switching from regular browsing to the more immersive view.

WebVR is being developed in a W3C Community Group, but is not on the W3C standardization track yet. It will be one of the core topics of the upcoming W3C Workshop on Web & Virtual Reality I am organizing next month (October 19-20) in California. The goal of that event (open to all practitioners of the field) will be to establish the overall roadmap to standardization to make the Web a robust platform for Virtual Reality.

Let’s all work together to make sure Web & VR grow together harmoniously!

by Dominique Hazaël-Massieux at September 19, 2016 12:19 PM

September 16, 2016

ishida >> blog

New Persian character picker

Picture of the page in action.

A new Persian Character Picker web app is now available. The picker allows you to produce or analyse runs of Persian text using the Arabic script. Character pickers are especially useful for people who don’t know a script well, as characters are displayed in ways that aid identification.

The picker is able to produce UN transcriptions of the text in the box. The transcription appears just below the input box, where you can copy it, move it into the input box at the caret, or delete it. In order to obtain a full transcription it is necessary to add short vowel diactritics to places that could have more than one pronunciation, but the picker can work out the vowels needed for many letter combinations.

See the help file for more information.

by r12a at September 16, 2016 06:26 AM

September 15, 2016

W3C Blog

HTML – from 5.1 to 5.2

There is a First Public Working Draft of HTML 5.2. There is also a Proposed Recommendation of HTML 5.1. What does that mean? What happened this year, what didn’t? And what next?

First, the Proposed Recommendation. W3C develops specifications, like HTML 5.1, and when they are “done”, as agreed by the W3C, they are published as a “Recommendation”. Which means what it says – W3C Recommends that the Web use the specification as a standard.

HTML 5.0 was published as a Recommendation a bit over 2 years ago. It was a massive change from HTML 4, published before the 21st Century began. And it was a very big improvement. But not everything was as good as it could be.

A couple of years before the HTML 5 Recommendation was published, a decision was taken to get it done in 2014. Early this year, we explained that we were planning to release HTML 5.1 this year.

There is an implementation report for HTML 5.1 that shows almost all of the the things we added since HTML 5.0 are implemented, and work out there on the Web already. Some things that didn’t work, or did but don’t any more, were removed.

HTML 5.1 certainly isn’t perfect, but we are convinced it is a big improvement over HTML 5.0, and so it should become the latest W3C Recommendation for HTML. That’s why we have asked W3C to make it a Proposed Recommendation. That means it gets a formal review from W3C’s members to advise Tim Berners-Lee whether this should be a W3C Recommendation, before he makes a decision.

Meanwhile, we are already working on a replacement. We believe HTML 5.1 is today the best forward looking, reality-based, HTML specification ever. So our goal with HTML 5.2 is to improve on that.

As well as fixing bugs people find in HTML 5.1, we are working to describe HTML as it really will be in late 2017. By then Custom Elements are likely to be more than just a one-browser project and we will document how they fit in with HTML. We expect improvements in the ability to use HTML for editing content, using e.g. contenteditable, and perhaps some advances in javascript. Other features that have been incubating, for example in the Web Platform Incubator Community Group, will reach the level of maturity needed for a W3C Recommendation.

We have wanted to make the specification of HTML more modular, and easier to read, for a long time. Both of those are difficult, time-consuming jobs. They are both harder to do than people have hoped over the last few years. We have worked on strategies to deal with making HTML more modular, but so far we have only broken out one “module”: ARIA in HTML.

We hope to break out at least one substantial more module in the next year. Whether it happens depends on sufficient participation and commitment from across the community.

We will further improve our testing efforts, and make sure that HTML 5.2 describes things that work, and will be implemented around the Web. We have developed a process for HTML 5.1 that ensures we don’t introduce things that don’t work, and remove things already there that don’t reflect reality.

And we will continue working to a timeline, with the HTML 5.2 specification heading for Recommendation around the end of 2017.

By which time, we will probably also be working on a replacement for it, because the Web seems like it will continue to develop for some time to come…

by Charles McCathie Nevile at September 15, 2016 11:45 AM

Just how should we share data on the Web?

The UK government is currently running a survey to elicit ideas on how it should update data.gov.uk. As one of the oldest such portals, despite various stages of evolution and upgrade, it is, unsurprisingly, showing signs of age. Yesterday’s blog post by Owen Boswarva offers a good summary of the kind of issues that arise when considering the primary and secondary functions of a data portal. Boswarva emphasizes the need for discovery metadata (title, description, issue date, subject matter etc.) which is certainly crucial, but so too is structural metadata (use our Tabular Metadata standards to describe your CSV, for example), licensing information, the use of URIs as identifiers for and within datasets, information about the quality of the data, location information, update cycle, contact point, feedback loops, usage information and more.

It’s these kind of questions that gave rise to the Data on the Web Best Practices WG whose primary document is now at Candidate Recommendation. We need help from the likes of Owen Boswarva and data.gov.* portals around the world to help us gather evidence of implementation of course. The work is part of a bigger picture that includes two ancillary vocabularies that can be used to provide structured information about data quality and dataset usage, the outputs of the Spatial Data on the Web Working Group, in which we’re collaborating with fellow standards body OGC, and the Permissions and Obligations Expression WG that is developing machine readable license terms and more, beginning with the output of the ODRL Community Group.

A more policy-oriented view is provided by a complementary set of Best Practices developed by the EU-funded Share-PSI project. It was under the aegis of that project that the role of the portal was discussed at great length at a workshop I ran back in November last year. That showed that a portal must be a lot more than a catalog: it should be the focus of a community.

Last year’s workshop took place a week after the launch of the European Data Portal, itself a relaunch in response to experience gained through running earlier versions. One of the aims of that particuilar portal is that it should act as a gateway to datasets available throughout Europe. That implies commonly agreed discovery metadata standards for which W3C Recommends the Data Catalog Vocabulary, DCAT. However, it’s not enough. What profile of DCAT should you use? The EU’s DCAT-AP is a good place to start but how do you validate against that? Enter SHACL for example.

Those last points highlight the need for further work in this area which is one of the motivations for the Smart Descriptions & Smarter Vocabularies (SDSVoc) workshop later this year that we’re running in collaboration with the VRE4EIC project. We also want to talk in more general terms about vocabulary development & management at W3C.

Like any successful activity, if data is to be discoverable, usable, useful and used, it needs to be shared among people who have a stake in the whole process. We need to think in terms of an ecosystem, not a shop window.

by Phil Archer at September 15, 2016 10:12 AM

September 14, 2016

ishida >> blog

Timeline: 5 dynasties & 10 kingdoms

This shows the durations of dynasties and kingdoms of China in the 900s. Click on the image below to see an interactive version that shows a guide that follows your cursor and indicates the year.

Chart of timelines

See a map of territories around 944 CE.

by r12a at September 14, 2016 08:54 AM

September 13, 2016

W3C Blog

Portable Web Publications Use Cases and Requirements FPWD

How do publications differ from web sites? What are the nuances of publishing on the web and making use of the tools of the Open Web Platform? Do publishers really need more than linked web sites? Yes, we do! Portable Web Publications Use Cases and Requirements provides detailed use cases and requirements from the Digital Publishing Interest Group, focusing on two primary issues. These use cases look at the portability of published works, which allow users to transfer their books, articles, and magazines from state to state and device to device. The document also seeks to define the book or publication as a rightful citizen of the Open Web Platform. Thousands of years of successful history, knowledge and information sharing in easily consumable, producible, and storable formats must be recognized as we focus on the tools of the Open Web Platform and what it means for Publishers, Authors, and Readers today. We welcome your feedback on GitHub.

by Tzviya Siegman at September 13, 2016 03:17 PM

September 12, 2016

W3C Blog

Dave Raggett at Industry of Things World

IoTW 2016 logo

It is our pleasure to announce that Dave Raggett, W3C lead for the Web of Things; and Georg Rehm, Manager of the German / Austrian Office of W3C, will hold a workshop at Industry of Things Worls in Berlin, Germany on Monday September 19, 2016.

Workshop description: The value of the Internet of Things will be in the services, especially those that combine different sources of information. However, today there is a severe lack of interoperability across different platforms. This will increase risks and lower the return on investment. We can expect the striking heterogeneity in standards to continue as different platforms serve different needs. This workshop will provide an opportunity to discuss properties and characteristics of overarching umbrella standards that are able to bridge the gaps between platforms. The big challenge is to enable semantic interoperability and end to end security. The workshop will introduce the work being done in the World Wide Web Consortium (W3C), with its Web of Things group, in collaboration with other alliances and standards development organisations. These future Web of Things standards are being designed on the foundation of the rich set of existing W3C standards such as RDF, XML, OWL, Semantic Web etc. To enable a high level of interaction among the attendees, the number of places at the workshop is strictly limited, so please register as soon as possible.

by Coralie Mercier at September 12, 2016 01:48 PM

September 07, 2016

ishida >> blog

Notes on case conversion

Examples of case conversion.

These are notes culled from various places. There may well be some copy-pasting involved, but I did it long enough ago that I no longer remember all the sources. But these are notes, it’s not an article.

Case conversions are not always possible in Unicode by applying an offset to a codepoint, although this can work for the ASCII range by adding 32, or by adding 1 for many other characters in the Latin extensions. There are many cases where the corresponding cased character is in another block, or in an irregularly offset location.

In addition, there are linguistic issues that mean that simple mappings of one character to another are not sufficient for case conversion.

In German, the uppercase of ß is SS. German and Greek cannot, however, be easily transformed from upper to lower case: German because SS could be converted either to ß or ss, depending on the word; Greek because all tonos marks are omitted in upper case, eg. does ΑΘΗΝΑ convert to Αθηνά (the goddess) or Αθήνα (capital of Greece)? German may also uppercase ß to ẞ sometimes for things like signboards.

Also Greek converts uppercase sigma to either a final or non-final form, depending on the position in a word, eg. ΟΔΥΣΣΕΥΣ becomes οδυσσευς. This contextual difference is easy to manage, however, compared to the lexical issues in the previous paragraph.

In Serbo-Croatian there is an important distinction between uppercase and titlecase. The single letter dž converts to DŽ when the whole word is uppercased, but Dž when titlecased. Both of these forms revert to dž in lowercase, so there is no ambiguity here.

In Dutch, the titlecase of ijsvogel is IJsvogel, ie. which commonly means that the first two letters have to be titlecased. There is a single character IJ (U+0132 LATIN CAPITAL LIGATURE IJ) in Unicode that will behave as expected, but this single character is very often not available on a keyboard, and so the word is commonly written with the two letters I+J.

In Greek, tonos diacritics are dropped during uppercasing, but not dialytika. Greek diphthongs with tonos over the first vowel are converted during uppercasing to no tonos but a dialytika over the second vowel in the diphthong, eg. Νεράιδα becomes ΝΕΡΑΪΔΑ. A letter with both tonos and dialytika above drops the tonos but keeps the dialytika, eg. ευφυΐα becomes ΕΥΦΥΪΑ. Also, contrary to the initial rule mentioned here, Greek does not drop the tonos on the disjunctive eta (usually meaning ‘or’), eg. ήσουν ή εγώ ή εσύ becomes ΗΣΟΥΝ Ή ΕΓΩ Ή ΕΣΥ (note that the initial eta is not disjunctive, and so does drop the tonos). This is to maintain the distinction between ‘either/or’ ή from the η feminine form of the article, in the nominative case, singular number.

Greek titlecased vowels, ie. a vowel at the start of a word that is uppercased, retains its tonos accent, eg. Όμηρος.

Turkish, Azeri, Tatar and Bashkir pair dotted and undotted i’s, which requires special handling for case conversion, that is language-specific. For example, the name of the second largest city in Turkey is “Diyarbakır”, which contains both the dotted and dotless letters i. When rendered into upper case, this word appears like this: DİYARBAKIR.

Lithuanian also has language-specific rules that retain the dot over i when combined with accents, eg. i̇̀ i̇́ i̇̃, whereas the capital I has no dot.

Sometimes European French omits accents from uppercase letters, whereas French Canadian typically does not. However, this is more of a stylistic than a linguistic rule. Sometimes French people uppercase œ to OE, but this is mostly due to issues with lack of keyboard support, it seems (as is the issue with French accents).

Capitalisation may ignore leading symbols and punctuation for a word, and titlecase the first casing letter. This applies not only to non-letters. A letter such as the (non-casing version of the) glottal stop, ʔ, may be ignored at the start of a word, and the following letter titlecased, in IPA or Americanist phonetic transcriptions. (Note that, to avoid confusion, there are separate case paired characters available for use in orthographies such as Chipewyan, Dogrib and Slavey. These are Ɂ and ɂ.)

Another issue for titlecasing is that not all words in a sequence are necessarily titlecased. German uses capital letters to start noun words, but not verbs or adjectives. French and Italian may expect to titlecase the ‘A’ in “L’Action”, since that is the start of a word. In English, it is common not to titlecase words like ‘for’, ‘of’, ‘the’ and so forth in titles.

Unicode provides only algorithms for generic case conversion and case folding. CLDR provides some more detail, though it is hard to programmatically achieve all the requirements for case conversion.

Case folding is a way of converting to a standard sequence of (lowercase) characters that can be used for comparisons of strings. (Note that this sequence may not represent normal lowercase text: for example, both the uppercase Greek sigma and lowercase final sigma are converted to a normal sigma, and the German ß is converted to ‘ss’.) There are also different flavours of case folding available: common, full, and simple.

by r12a at September 07, 2016 04:03 PM

August 31, 2016

W3C Blog

Memento at the W3C

memento follow your nose through time architectureThe W3C Wiki and the W3C specifications are now accessible using the Memento “Time Travel for the Web” protocol. This is the result of a collaboration between the W3C, the Prototyping Team of the Los Alamos National Laboratory, and the Web Science and Digital Library Research Group at Old Dominion University.

The Memento protocol is a straightforward extension of HTTP that adds a time dimension to the Web. It supports integrating live web resources, resources in versioning systems, and archived resources in web archives into an interoperable, distributed, machine-accessible versioning system for the entire web. The protocol is broadly supported by web archives. Recently, its use was recommended in the W3C Data on the Web Best Practices, when data versioning is concerned. But resource versioning systems have been slow to adopt. Hopefully, the investment made by the W3C will convince others to follow suit.

Memento is formally specified in RFC7089; a brief overview is available from the Memento web site. In essence, the protocol associates two special types of resources with a web resource, both made discoverable using typed links in the HTTP Link header. A TimeGate is capable of datetime negotiation, a variant on content negotiation. It provides access to the version of the web resource as it existed around a preferred datetime expressed by a client using the Accept-Datetime header; the version resource itself includes a Memento-Datetime header, which expresses the resource’s actual version datetime. A TimeMap provides an overview of versions of the web resource and their version datetimes. The need for datetime negotiation had already been suggested by Tim Berners-Lee in his W3C Note about Generic Resources but it was not until 2009 that datetime negotiation was effectively introduced in an arXiv.org preprint Memento: Time Travel for the Web.

memento follow your nose through time architecture

Memento provides a bridge between the present and the past Web

Adding Memento support to versioning systems allows a client to uniformly access the version of a resource that  was active at a certain moment in time (TimeGate) and to obtain its version history (TimeMap). When a version page in a system that supports Memento links to a resource that resides in another system that supports Memento, a client can uniformly access the version of the linked resource that was active at the same moment in time.  If the linked resource is in a system that does not support Memento – it does not expose a TimeGate – the client can fall back to a default TimeGate that operates across web archives and retrieve an archived resource using the uniform datetime negotiation approach. Alternatively, the client can resort to the TimeGate of a specific web archive, such as that of the Internet Archive or the Portugese Web Archive. But, while resource versioning systems hold on to their entire resource history, web archives merely store discrete observations of (some) web resources. As such, with pages retrieved from web archives, there is no certainty that the archived page was active at that same time, but rather only around that same time.

A variety of tools is available to add support to systems that handle resource versions and expose associated APIs. Memento support was added to the W3C Wiki pages by deploying the Memento Extension for MediaWiki. Memento support for W3C specifications was realized by installing a Generic TimeGate Server for which a handler was implemented that interfaces with the versioning capabilities offered by the W3C API.

Memento can be leveraged programmatically, for example, by adding Accept-Datetime headers to curl commands, or by using the Python Memento Client Library. The Time Travel portal exposes an API that covers web archives and resource versioning systems with Memento support. The API can, for example, be used to construct a URI that redirects to the version of a resource as it existed around a given date. For example:

Browsers do not yet natively support Memento, but its cross-time and cross-server capabilities can be experienced by installing the Memento extension for Chrome. Try it out for yourself. Browse over to the W3C AWWW and pick some dates in the extension’s calendar between 1 September 2002 and 1 September 2004. Navigate to the version of the specification that was current at the selected dates by right clicking the page and choosing “Get near saved date …” from the context menu. Notice how the centrality of REST in the specification diminishes over time. In each version, find the reference to the IANA URI Registry and right click the link, this time choosing “Get near memento date …” to see the Registry as it existed around the time of the version of the AWWW specification you are on. You will retrieve versions of the Registry from web archives and notice its evolution over time, for example around 1 September 2002 and around 1 September 2004. Compare the archived state of the Registry conveyed with its current state by right clicking in an archived page and choosing “Get at current date”.

Further pointers:

by Herbert Van De Sompel at August 31, 2016 07:24 AM

August 30, 2016

ishida >> blog

Language subtag tool now links to Wikipedia

The language subtag lookup tool now has links to Wikipedia search for all languages and scripts listed. This helps for finding information about languages, now that SIL limits access to their Ethnologue, and offers a new source of information for the scripts listed.

Picture of the page in action.

by r12a at August 30, 2016 10:38 AM

August 26, 2016

W3C Blog

Building Blocks to Blockchains: a Report on the W3C Blockchains and the Web Workshop

Blockchain workshop graphical representationIn June, W3C hosted a workshop to determine whether there were any aspects of blockchains that intersected with Web technologies, and if there were any specific technologies that were mature enough to consider for incubation toward standardization. We had lots of promising discussions and identified several next steps. You can read more details in the W3C Blockchains and the Web workshop report, released today.

Following the success of the workshop, we have begun to coordinate blockchain activities in the newly-formed Blockchain Community Group, which has chairs from Asia (Youngwhan “Nick” Lee), Europe (Marta Piekarska), and North America (Doug Schepers). The Blockchain CG has a regular coordination meeting on Thursdays, and is planning to start topic-specific short-term technical teleconferences as needed.

Blockchain is comprised of a broad set of cross-domain technologies, and thus our two chief tasks are to:

  1. monitor the work of other groups (Web Payments, Internet of Things, etc) to make sure we are aware of what is happening in those fields, as well as groups outside of W3C, so we can complement their work); and
  2. propose use cases beyond what is happening in existing working groups to ensure we identify the applications of blockchain that do and do not make sense. There is not yet consensus on which applications of blockchain technology are appropriate uses of the technology, but there is general agreement on a subset of useful applications, and these are the ones which we plan to dedicate resources to.

The Blockchain CG is intended partly to coordinate the activities of other topic-specific CGs, in addition to working on its own reports and deliverables. We will continue to work closely together with the new Blockchain Digital Assets Community Group (formed as an outcome of the workshop), and participate in the already-active Interledger Payments CG. Moreover we will collaborate with the Verifiable Claims task force of the Web Payments Interest Group.

As part of this ongoing coordination, we are planning an informal meeting during the W3C’s TPAC meeting in Lisbon, Portugal. It will take place on Tuesday 20th of September at 10:30–12:30, and will also have a short session during the Web Payments Interest Group f2f on Friday 23 September; the Interledger Payments CG is also meeting at TPAC, on Thursday, 23rd of September. Leading up to TPAC, we will build an agenda to make the best use the time we have together in Lisbon. If you wish to attend TPAC, you must be a member of one of the Community Groups or Working Groups meeting there, and register by September 2.

After the September face-to-face meeting, the Blockchain CG will continue with regular calls, and will incubate low-hanging fruit by beginning to draft specifications and build community interested in Recommendation-track work, perhaps including the Chainpoint specification, which was discussed in detail at the workshop.

We are also considering a second blockchain workshop, possibly on the US West Coast, where we will work on more technical aspects and specifications that can contribute to W3C standardization, with particular focus on client-side technologies.

By the end of the year, we hope to have laid the groundwork for possible candidates for formal standardization.

We encourage interested people and organizations to join the W3C Blockchain Community Group to keep informed about future developments. We are expanding the scope of that group to include coordination for our various activities around blockchains, including links to related specific-topic community groups, such as the new Blockchain Digital Assets CG.

This post was co-written by Marta Piekarska and Doug Schepers.

by Doug Schepers at August 26, 2016 10:39 AM

August 25, 2016

ishida >> blog

Right-to-left scripts

These are just some notes for future reference. The following scripts in Unicode 9.0 are normally written from right to left.

Scripts containing characters with the property ARABIC RIGHT-to-LEFT have an asterisk. The remaining scripts have characters with the property RIGHT:

In modern use

Arabic *
Syriac *
Thaana *

Limited modern use

Mende Kikakui (small numbers)
Old Hungarian
Samaritan (religious)


Imperial Aramaic
Old South Arabian
Old North Arabian
Old Turkic
Pahlavi, (Inscriptional)
Parthian, (Inscriptional)

by r12a at August 25, 2016 07:32 PM

August 17, 2016

Reinventing Fire

Topic of Cancer

I’m now officially a cancer survivor! Achievement unlocked!

A couple weeks ago, on July 27th, during a routine colonoscopy, they found a mass in my ascending colon which turned out to have some cancer cells.

I immediately went to UNC Hospital, a world-class local teaching hospital, and they did a CT scan on me. There are no signs that the cancer has spread. I was asymptomatic, so they caught it very early. The only reason I did the colonoscopy is that there’s a history of colon cancer in my family.

Yesterday, I had surgery to remove my ascending colon (an operation they call a “right colectomy”). They used a robot (named da Vinci!) operated by their chief GI oncology surgeon, and made 5 small incisions: 4 on the left side of my belly to cut out that part of the right colon; and a slightly larger one below my belly to remove the tissue (ruining my bikini line).

Everything went fine (I made sure in advance that this was a good robot and not a killer robot that might pull a gun on me), and I’m recovering well. I walked three times today so far, and even drank some clear liquids. I’ll probably be back on my feet and at home sometime this weekend. Visitors are welcome!

There are very few long-term negative effects from this surgery, if any.

They still don’t know for certain what stage the cancer was at, or if it’s spread to my lymph nodes; they’ll be doing a biopsy on my removed colon and lymph nodes to determine if I have to do chemotherapy. As of right now, they are optimistic that it has not spread, and even if it has, the chemo for this kind of cancer is typically pretty mild. If it hasn’t spread (or “metastasized”), then I’m already cured by having the tumor removed. In either case, I’m going to recover quickly.

My Dad had colon cancer, and came through fine. My eldest sister also had colon cancer over a decade ago, and it had even metastasized, and her chemo went fine… and cancer treatments have greatly improved in the past few years.

So, nobody should worry. I didn’t mention it widely, because I didn’t want to cause needless grief to anyone until after the operation was done. Cancer is such a scary word, and I don’t think this is going to be as serious as it might otherwise sound.

I’ll be seeing a geneticist in the coming weeks to determine exactly what signature of cancer I have, so I know what I’m dealing with. And I want to give more information to my family, because this runs in our genes, and if I’d gotten a colonoscopy a few years ago, they could have removed the polyp in the early stages and I’d have never developed cancer. (And because I’m otherwise healthy, I probably wouldn’t have gotten the colonoscopy if I hadn’t had insurance, which I probably wouldn’t have had if Obamacare didn’t mandate it. Thanks, Obama!)

Yay, science!

Future Plans

So, the cliché here is for me to say that this has opened my eyes to the ephemerality and immediacy of life, and that I’m planning to make major decisions in my life that prioritize what I truly value, based on my experience with cancer.

But the fact is, I’ve already been doing that recently, and while the cancer underscores this, I’ve already been making big plans for the future. I’ll post soon about some exciting new projects I’m trying to get underway, things that are far outside my comfort zone for which I’ll need to transform myself (you know, in a not-cancerous sort of way). I’ve already reduced my hours at W3C to 50%, and I’m looking at changing my role and remaining time there; I love the mission of W3C, which I see as a valuable kind of public service, so no matter what, I’ll probably stay involved there in some capacity for the foreseeable future. But I feel myself pulled toward building software and social systems, not just specifications. Stay tuned for more soon!

I’m optimistic and excited, not just about leaving behind this roadbump of cancer, but of new possibilities and new missions to change the world for the better in my own small ways.


Today (Friday, 26 August), I got the results of my biopsy from my oncologist, and I’m pleased to announce that I have no more colon cancer! The results were that the cancer was “well-differentiated, no activity in lymph nodes”, meaning that there was no metastasis, and I’m cured. This whole “adventure” emerged, played out, and concluded in just a month: I heard there was a tumor, was diagnosed with cancer, consulted an oncologist, had surgery, recovered, and got my cancer-free results all in 30 days. It felt much longer!

by Shepazu at August 17, 2016 08:39 PM