This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7682 - link tag: rel: associate pages about the same organization across many sites
Summary: link tag: rel: associate pages about the same organization across many sites
Status: VERIFIED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P3 enhancement
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: NE, TrackerIssue
Depends on:
Blocks:
 
Reported: 2009-09-21 01:17 UTC by Nick Levinson
Modified: 2010-10-04 14:57 UTC (History)
6 users (show)

See Also:


Attachments

Description Nick Levinson 2009-09-21 01:17:43 UTC
Different websites may have pages about the same organization. Several organizations (businesses, government agencies, institutions, ad hoc criminal conspiracies, etc.) may have the same name and all may be written about on multiple sites.

A link element naming the organization and providing data that is standardized could help search engines organize their listings to reduce accidental intermixing. It wouldn't be perfect; e.g., an organization may have listed its name differently in different places; a website owner may erroneously enter the wrong data; nationality may vary with a citizenship change; or its functional headquarters and its legal headquarters may be far apart. But, in general, listings with this element could be more successfully separated.

Writing and parsing the link element would be a bit more complex than with other link elements, but I think this is manageable and the method I propose has been applied elsewhere.

I propose that the rel value be "canonical-organization" and that its title attribute be reserved for a special meaning and syntax. The title attribute's syntax would be in the form of title="name: XYZ Greasy Spoon, Inc.; headquarters: South Beach, Staten Island, New York, NY, US; ident-scheme: ; ident: ;".

Each subattribute (e.g., "name") would be optional.

For the subattribute headquarters, if a subvalue is supplied, a nation would be required. The nation would be represented by a standard code.

For the subattribute ident-scheme, a list of schema could be developed later, perhaps each to be prefixed by a code for the scheme's nation and a hyphen. Schemes could include privately-owned but widely available databases of moderately-well-known organizations. Subvalues for ident-scheme and ident must not be entered until a list of schema and the style of ident values for a scheme is centralized and then the scheme must be in that list and ident's subvalue must conform to the specified style.

If only whitespace or a null is between the colon and the semicolon, that is equivalent to the subattribute not appearing.

A final semicolon before the closing quote mark is optional and may be imputed.

More subattributes might be added in the future, so page authors must not invent new ones in the meantime.

No subvalue could contain a colon or a seimcolon. If a one is needed or wanted, a character entity must represent the colon or the semicolon.

Nations would be identified by standard two-letter codes. For nations that no longer exist and do not have two-letter codes, e.g., Roman Empire and Van Lang, longer codes must be used, since about 200 2-letter codes are already in use and only 676 exist, and longer codes would prevent future conflict or exhaustion. A list of deceased nations and their longer codes would have to be established, possibly based on a standard gazetteer.

Multiple link elements with this rel value would be permitted, and UAs should apply all of them. That permits multiple names (e.g., corporate and d/b/a), ident-schemes, and idents to identify the one organization more certainly.

No rev value would be meaningful.

See also Bug 7681, on "canonical-human".

Thank you.

-- 
Nick
Comment 1 Jeremy Keith 2009-09-21 09:17:00 UTC
Please propose rel values on the wiki:

http://wiki.whatwg.org/wiki/RelExtensions

Please note:

"For the "Status" section to be changed to "Accepted", the proposed keyword must either have been through the Microformats process, and been approved by the Microformats community; or must be defined by a W3C specification in the Candidate Recommendation or Recommendation state. If it fails to go through this process, it is "Rejected"."
Comment 2 Nick Levinson 2009-09-22 01:54:11 UTC
The RelExtensions page prefers that the description be brief and a link for more details be supplied and HTML5 spec proposals are solicited through this bug reporting system, thus this enhancement request appears to be the appropriate medium. Omitting most of the details just for the sake of brevity would have led to an opinion that the link couldn't handle what's being proposed, so I had to explain an approach that could solve likely problems.

Thanks.

-- 
Nick
Comment 3 Jeremy Keith 2009-09-22 09:19:38 UTC
Nick, the process for adding a rel value is far from brief:

"For the "Status" section to be changed to "Accepted", the proposed keyword must either have been through the Microformats process, and been approved by the Microformats community; or must be defined by a W3C specification in the Candidate Recommendation or Recommendation state. If it fails to go through this process, it is "Rejected"."

I strongly suggest going down the route of having a proposed value approved by the microformats community (if it doesn't already exist in a W3C spec). If you don't, the status can never be updated to "accepted", no matter how many "bugs" you file here.
Comment 4 Nick Levinson 2009-09-22 16:37:52 UTC
I already posted to RelExtensions before your comments, and did with a brief description and a link.

The description (not the process) should be brief, and RelExtensions gives a choice of two approval methods, so I chose one, filing a grand total of 2 bugs/enhancements for the 2 values, one for each.

I'm looking again at microformats.org and will follow up appropriately.

Thanks.

-- 
Nick
Comment 5 Nick Levinson 2009-09-26 21:22:05 UTC
hCard and FOAF are too limiting, as discussed in Bug 7681 Comment 5, and don't do what canonical-organization would. For example, FOAF and hCard each lack 3 of the 4 fields I proposed for the link element rel.

Arguably, effort can go into amending hCard and/or FOAF, but that seems more complicated, partly because there appears to be a commitment to maintaining a nearly 1:1 relationship with vCard, which self-describes as stable, while FOAF is oriented to online communities, requiring overcoming that to generalize its use. Amending the HTML5 provision with equal effect is easier in simply adding a new rel value and adding to where Web designers are likelier to see it (HTML5).

One difficulty with my proposal is that an organizational headquarters is often inconsistently identified between the functionally dominant one (e.g., where the CEO sits) and the legal one (e.g., according to incorporation law), but that's probably solvable with headquarters-main and headquarters-legal.

Thank you.

-- 
Nick
Comment 6 Ian 'Hixie' Hickson 2009-09-29 07:40:20 UTC
Before we add this to the spec, we need implementation experience, a more formal specification, research on its usefulness, etc:

http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F
Comment 7 Maciej Stachowiak 2010-03-14 14:51:31 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
  http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.
Comment 8 Nick Levinson 2010-03-28 20:33:36 UTC
I'm not sure what needs to be made more formal for a spec. The need to find content on one organization across many websites when many organizations may have the same name is well known, and, when a canonical page is not agreed upon or doesn't exist, is resolvable by each website offering extensive matchable biographical data so a search engine can compare and match to distinguish different organizations with the same name so websites about the same organization can be found together. I've had difficulty getting UA makers to respond to feature requests before HTML5 support. We need HTML5 support for some of them to prioritize the feature. Thus, I'm requesting escalation.

Suggested title: link tag: rel: associate pages about the same organization across many sites for searches without a canonical page and despite a confusingly indistinct name

Suggested text:

Different websites may have pages about the same organization. Several organizations (businesses, government agencies, institutions, ad hoc criminal conspiracies, etc.) may have the same name and all may be written about on multiple sites.

A link element naming the organization and providing data that is standardized could help search engines organize their listings to reduce accidental intermixing. It wouldn't be perfect; e.g., an organization may have listed its name differently in different places; a website owner may erroneously enter the wrong data; nationality may vary with a citizenship change; or its functional headquarters and its legal headquarters may be far apart. But, in general, listings with this element could be more successfully separated.

Writing and parsing the link element would be a bit more complex than with other link elements, but I think this is manageable and the method I propose has been applied elsewhere.

I propose that the rel value be "canonical-organization" and that its title attribute be reserved for a special meaning and syntax. The title attribute's syntax would be in the form of title="name: XYZ Greasy Spoon, Inc.; headquarters: South Beach, Staten Island, New York, NY, US; ident-scheme: ; ident: ;".

Each subattribute (e.g., "name") would be optional.

For the subattribute headquarters, if a subvalue is supplied, a nation would be required. The nation would be represented by a standard code. One difficulty is that an organizational headquarters is often inconsistently identified between the functionally dominant one (e.g., where the CEO sits) and the legal one (e.g., according to incorporation law), but that's probably solvable with headquarters-main and headquarters-legal.

For the subattribute ident-scheme, a list of schema could be developed later, perhaps each to be prefixed by a code for the scheme's nation and a hyphen. Schemes could include privately-owned but widely available databases of moderately-well-known organizations. Subvalues for ident-scheme and ident must not be entered until a list of schema and the style of ident values for a scheme is centralized and then the scheme must be in that list and ident's subvalue must conform to the specified style.

If only whitespace or a null is between the colon and the semicolon, that is equivalent to the subattribute not appearing.

A final semicolon before the closing quote mark is optional and may be imputed.

More subattributes might be added in the future, so page authors must not invent new ones in the meantime.

No subvalue could contain a colon or a semicolon. If a one is needed or wanted, a character entity must represent the colon or the semicolon.

Nations would be identified by standard two-letter codes. For nations that no longer exist and do not have two-letter codes, e.g., Roman Empire and Van Lang, longer codes must be used, since about 200 2-letter codes are already in use and only 676 exist, and longer codes would prevent future conflict or exhaustion. A list of deceased nations and their longer codes would have to be established, possibly based on a standard gazetteer.

Multiple link elements with this rel value would be permitted, and UAs should apply all of them. That permits multiple names (e.g., corporate and d/b/a), ident-schemes, and idents to identify the one organization more certainly.

No rev value would be meaningful.

hCard and FOAF are too limiting, as discussed in Bug 7681 Comment 5, and don't do what canonical-organization would. For example, FOAF and hCard each lack 3 of the 4 fields I proposed for the link element rel. Arguably, effort can go into amending hCard and/or FOAF, but that seems more complicated, partly because there appears to be a commitment to maintaining a nearly 1:1 relationship between hCard and vCard, which self-describes as stable, while FOAF is oriented to online communities, requiring overcoming that to generalize its use. Amending the HTML5 provision with equal effect is easier in simply adding a new rel value and adding to where Web designers are likelier to see it (HTML5).

Closely related is the corresponding value for humans, in Bug 7681.
Comment 9 Maciej Stachowiak 2010-03-28 21:51:18 UTC
As discussed in other bugs, moving this back to VERIFIED since it should not be both reopened and escalated.
Comment 10 Nick Levinson 2010-04-11 20:49:57 UTC
Thanks for setting this back to Verified; option 2 is correct.

FOAF is proposed in bug 7681. Possible problems remaining with FOAF are when it will be usable in HTML and whether its vocabulary will be extended enough and by agreement so that search engines can rely on it. Wikipedia is not extensive enough as a central organizational names repository for less-than-notable organizations that might be useful for local blogs and the like, e.g., institutions with common easily-confusable names, and we need a system whereby search engines can recognize a single organization without a home page and that does not require trust by one author of another.
Comment 11 Maciej Stachowiak 2010-05-12 03:41:48 UTC
http://www.w3.org/html/wg/tracker/issues/114