[Bug 7681] New: link tag: rel: associate pages about the same person across many sites

http://www.w3.org/Bugs/Public/show_bug.cgi?id=7681

           Summary: link tag: rel: associate pages about the same person
                    across many sites
           Product: HTML WG
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: HTML5 spec proposals
        AssignedTo: dave.null@w3.org
        ReportedBy: Nick_Levinson@yahoo.com
         QAContact: public-html-bugzilla@w3.org
                CC: ian@hixie.ch, mike@w3.org, public-html@w3.org


Different websites may have pages about the same person. Several people may
have the same name and all may be written about on multiple sites. Search
engines have difficulty associating pages that are about the same person
without erroneously intermixing other people with the same name, especially
when none of the people are extraordinarily famous and popular in searches
(when they are, search engines may have algorithms for more sophisticated
associational analysis).

Libraries solve this for authors by distinguishing among them with birth years
and death years. Other biographical sources offer vague dates for when someone
flourishide nationalities or birth places.

A link element naming the person and providing data that is standardized could
help search engines organize their listings to reduce accidental intermixing.
It wouldn't be perfect; e.g., a person may have reported multiple ages from
which different birth years are calculated; a website owner may erroneously
enter the wrong data; nationality may vary with a citizenship change; or
historians may disagree. But, in general, listings with this element could be
more successfully separated.

Writing and parsing the link element would be a bit more complex than with
other link elements, but I think this is manageable and the method I propose
has been applied elsewhere.

I propose that the rel value be "canonical-human" and that its title attribute
be reserved for a special meaning and syntax. The title attribute's syntax
would be in the form of title="name: Asashi T. Fung; born: 1723; died: 1799;
flourished: 1740s-1750s; nationality: FR; birthplace: Honolulu, Hawaii, US;
ident-scheme: ; ident: ;".

Each subattribute (e.g., "name") would be optional. For example, "flourished"
would likely be used only when birth and death years are unknown.

For the subattribute birthplace, if a subvalue is supplied, a nation would be
required. The nation of the birthplace would be represented by one of the same
codes used for nationality.

For the subattribute ident-scheme, a list of schema could be developed later,
perhaps each to be prefixed by a code for the scheme's nation and a hyphen.
Schemes could include privately-owned but widely available databases of
moderately-well-known people. Subvalues for ident-scheme and ident must not be
entered until a list of schema and the style of ident values for a scheme is
centralized and then the scheme must be in that list and ident's subvalue must
conform to the specified style.

If only whitespace or a null is between the colon and the semicolon, that is
equivalent to the subattribute not appearing.

A final semicolon before the closing quote mark is optional and may be imputed.

More subattributes might be added in the future, so page authors must not
invent new ones in the meantime.

No subvalue (e.g., "1723") could contain a colon or a seimcolon. If a one is
needed or wanted, a character entity must represent the colon or the semicolon.

The nationality and the birthplace would include a nation using standard
two-letter codes. For nations that no longer exist and do not have two-letter
codes, e.g., Roman Empire and Van Lang, longer codes must be used, since about
200 2-letter codes are already in use and only 676 exist, and longer codes
would prevent future conflict or exhaustion. A list of deceased nations and
their longer codes would have to be established, possibly based on a standard
gazetteer.

The rel value of "canonical-human" avoids the legal meaning of _person_ in the
U.S., and probably in other nations that rely on U.K. common law traditions,
where it includes corporations and other legally-recognized entities. A value
of "canonical-individual" may be too confusing if misunderstood as being about,
say, pages and not people at all.

No rev value would be meaningful.

Multiple link elements with this rel value would be permitted, and UAs should
apply all of them. That permits multiple names (e.g., spellings),
ident-schemes, and idents to identify the person more certainly.

A separate enhancement request for "canonical-organization" will likely be
posted shortly.

Thank you.

-- 
Nick


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Monday, 21 September 2009 00:52:12 UTC