W3C

DOI/DONA vs. the Internet

I’ve recently been looking at some of the DO (Digital Object) literature, and DOI/DONA, and I’m still not clear what it amounts to nowadays. I’ve discussed it a bit internally at W3C and wanted to share my views here as well.

If it wasn’t for the IoT and ITU context, I’d say it’ll have a negligeable effect on the existing Internet architecture using URI/http/DNS/IP. But we do have this context of IoT chaos in terms of standardization (the main reason why we’re working on the Web of Things, or WoT, as a more abstract layer), so I think it’s worth discussing a bit more in our community.

Anyway, this is part of an old story.

I remember one particular aspect that was discussed on some of our W3C lists in 2003 when DOI tried to get a URI scheme (i.e. doi:), after having rejected the idea of a URN namespace (e.g. urn:doi:) or a non-IETF-tree scheme (like org:doi:) as too second-class citizen flavored.

For the novices, DOI is just another persistent identifier catalog and syntax, using a “global/local” grammar (e.g. 10.101/something), which should ideally be presented as urn:doi:10.101/something to fit with our architecture (this apparently works in some tools, even though urn:doi is not a registered URN namespace) or even as doi:10.101/something, using a new URI scheme this time (which is also used sometimes in interfaces or papers, even though it’s not registered with IANA either, and doesn’t resolve as such).

In the end, today, all DOIs use the DNS and the http URL scheme like in http://dx.doi.org/10.101/something to enter their resolution space (which is main reason why they didn’t get their doi: scheme in the first place, since they have no independent resolution “running code” of their own for the schema itself, they just use http with DNS and URL querying).

But the point is, once the initial http://dx.doi.org proxying is done, they (DOI/DONA) provide a full resolution system independent of DNS for their own internal PI syntax, with a commercially based hierarchical registration system comparable, from a distance, to ICANN/IANA/DNS, with registries, fees, registrants, etc. I haven’t looked at their pricing, persistency policies, etc. This is a service that the scholarly community seems to appreciate a lot, e.g. to be able to dereference an ISBN number into some resources about it (e.g. the book itself, a summary, some metadata, a link to a bookstore, etc.). What’s their competition in this space ? Local universities/librarian portals with URL querying using ISBN ? purl, ark maybe ? But without a central root (Global Handle Registry) like what DONA is providing.

It’s also still unclear to me how their hierarchical name space is organized, e.g. by countries, by industries, both, flat ? using what semantics ? whatever it is, better or worse than ICANN’s gTLD, ccTLD and subdomain policies, one paper was saying that e.g. for China/Russia/Iran, one main advantage over ICANN is that it’s not run by a California non-for-profit but by a Swiss non-for-profit – which BTW has ITU singled out in its Statutes as a partner of choice. There goes down the drain our usual “multistakeholder matters/geography doesn’t” argument..

An interesting question is why all these countries, somehow active in ICANN/IETF  are at the same time trying to fragment the root ? I read somewhere that the same person who is/was overseeing the ITU/DOI work is also on the ICANN GAC for instance, so there is communication. Another question worth asking is why is there a recent RFC asking all IETF RFCs to also provide DOIs ? As if an ietf.org URL was not stable enough.

The ITU move to endorse DOI is clearly political, and there is no denying that having the top I* headquartered in the states and having Trump as a potential head of the same states is worrysome for lots of folks on the globe.

BTW, there seems to be also a metadata stack of some sort being used/integrated in the DOI system, called indecs, which I haven’t looked at, and some questions related to URN syntax not being able to support their needs for structured identifiers.

So far, I haven’t seen any deployment of a custom protocol of their own (like http or ftp) to justify doi: as a first class URI citizen, e.g. something that browsers would implement more readily. It looks like the Handle part of their system defines that, or used to (there was an hdl: URI scheme available at some point). Same thing for the deployment of a “bind” of their own that OSs would have to implement as well, to connect user agents to their DOI servers directly, without going through DNS. For now, DOI looks like an alternate root that doesn’t use the open DNS software infrastructure (but have to use a URL to access their resolution space).

Once they do that – deploy software that connect directly to their main resolver/handle, and I think they will if they reach enough critical mass, what’s behind their doi: syntax doesn’t use DNS or IP, it’s just a private identifier binding space run by the DONA/DOI organization/servers and their registries, with a promise of uniqueness and persistency (maybe I’m just used to W3C and ICANN and we’re as opaque for newbies, but I can’t say it’s very transparent to me in terms of who is controlling what, what ontologies are used, etc., but then again, they don’t really sell cashable global names like coconuts.net, but dull series of unique ID/numbers, which look more like IP numbers that domain names from the outset, except that they are supposed to be assigned to the same resources forever).

Of course, today, most (all?) doi: identifiers not only use URLs to resolve their ID (funny enough, at some point, back a year or so, doi.org resolution was down because DOI had forgotten to renew its .org DNS registration), but they also return URLs as their main typed values in  DOI records, since that’s the most easily resolvable ID today on the Internet.

Overall, I’m not too worried, since being on the same root as the rest of the planet brings more economic advantages that anything else nowadays, but let’s not forget the IoT context, with most of the “Things” already in control of governments (i.e. your fridge, your electrical plug or bulb, your car, are already all subjects to gov conformance) so an easy target for gov to impose a particular network interface vs. the Internet stack.