7681 – link tag: rel: associate pages about the same person across many sites

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7681 - link tag: rel: associate pages about the same person across many sites

Summary: link tag: rel: associate pages about the same person across many sites

Status:	VERIFIED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	pre-LC1 HTML Microdata (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	NE, TrackerIssue

Depends on:
Blocks:

Reported:	2009-09-21 00:52 UTC by Nick Levinson
Modified:	2010-10-05 13:03 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Nick Levinson 2009-09-21 00:52:03 UTC

Different websites may have pages about the same person. Several people may have the same name and all may be written about on multiple sites. Search engines have difficulty associating pages that are about the same person without erroneously intermixing other people with the same name, especially when none of the people are extraordinarily famous and popular in searches (when they are, search engines may have algorithms for more sophisticated associational analysis).

Libraries solve this for authors by distinguishing among them with birth years and death years. Other biographical sources offer vague dates for when someone flourishide nationalities or birth places.

A link element naming the person and providing data that is standardized could help search engines organize their listings to reduce accidental intermixing. It wouldn't be perfect; e.g., a person may have reported multiple ages from which different birth years are calculated; a website owner may erroneously enter the wrong data; nationality may vary with a citizenship change; or historians may disagree. But, in general, listings with this element could be more successfully separated.

Writing and parsing the link element would be a bit more complex than with other link elements, but I think this is manageable and the method I propose has been applied elsewhere.

I propose that the rel value be "canonical-human" and that its title attribute be reserved for a special meaning and syntax. The title attribute's syntax would be in the form of title="name: Asashi T. Fung; born: 1723; died: 1799; flourished: 1740s-1750s; nationality: FR; birthplace: Honolulu, Hawaii, US; ident-scheme: ; ident: ;".

Each subattribute (e.g., "name") would be optional. For example, "flourished" would likely be used only when birth and death years are unknown.

For the subattribute birthplace, if a subvalue is supplied, a nation would be required. The nation of the birthplace would be represented by one of the same codes used for nationality.

For the subattribute ident-scheme, a list of schema could be developed later, perhaps each to be prefixed by a code for the scheme's nation and a hyphen. Schemes could include privately-owned but widely available databases of moderately-well-known people. Subvalues for ident-scheme and ident must not be entered until a list of schema and the style of ident values for a scheme is centralized and then the scheme must be in that list and ident's subvalue must conform to the specified style.

If only whitespace or a null is between the colon and the semicolon, that is equivalent to the subattribute not appearing.

A final semicolon before the closing quote mark is optional and may be imputed.

More subattributes might be added in the future, so page authors must not invent new ones in the meantime.

No subvalue (e.g., "1723") could contain a colon or a seimcolon. If a one is needed or wanted, a character entity must represent the colon or the semicolon.

The nationality and the birthplace would include a nation using standard two-letter codes. For nations that no longer exist and do not have two-letter codes, e.g., Roman Empire and Van Lang, longer codes must be used, since about 200 2-letter codes are already in use and only 676 exist, and longer codes would prevent future conflict or exhaustion. A list of deceased nations and their longer codes would have to be established, possibly based on a standard gazetteer.

The rel value of "canonical-human" avoids the legal meaning of _person_ in the U.S., and probably in other nations that rely on U.K. common law traditions, where it includes corporations and other legally-recognized entities. A value of "canonical-individual" may be too confusing if misunderstood as being about, say, pages and not people at all.

No rev value would be meaningful.

Multiple link elements with this rel value would be permitted, and UAs should apply all of them. That permits multiple names (e.g., spellings), ident-schemes, and idents to identify the person more certainly.

A separate enhancement request for "canonical-organization" will likely be posted shortly.

Thank you.

--
Nick

Comment 1 Jeremy Keith 2009-09-21 09:12:52 UTC

This use case is already covered by rel="me", listed on the rel extensions wiki:

http://wiki.whatwg.org/wiki/RelExtensions

Comment 2 Nick Levinson 2009-09-22 01:39:37 UTC

No, it doesn't. The "me" points to a canonical page, not a person. And which page is canonical for Attila the Hun might not be subject to general agreement.

This proposal is for describing the human. Multiple pages across many websites that have the same link element, because they contain the same personally identifying information, can be associated as representing the same person. That helps search engines.

Thanks.

-- 
Nick

Comment 3 Jeremy Keith 2009-09-22 09:16:27 UTC

Nick, the use case you are describing ("Multiple pages across many websites that have the same link element, because they contain the same personally identifying information, can be associated as representing the same person. That helps search engines.") is already implemented by Google using a combination of rel="me" and hCard and/or FOAF. http://code.google.com/apis/socialgraph/docs/otherme.html

In any case, please add rel value proposals to the wiki page rather than here in the bug tracker.

Comment 4 Nick Levinson 2009-09-22 16:28:18 UTC

Using "me" for other people sounds not very semantic, but if it works (with the other systems added) that'll be fine. Anyway, I'm not entirely into semantic coding when it makes usability worse, often the case with long-tail users.

I'm looking at Google's page.

Thanks.

-- 
Nick

Comment 5 Nick Levinson 2009-09-26 21:13:16 UTC

None of those are as useful for this use case. Each lacks most of the needed properties or their ready equivalents and each requires learning yet another technology when HTML5 already offers one we know.

The Google Social Graph API is limited to URLs and name (http://code.google.com/apis/socialgraph/docs/attributes.html) (and I'm unclear how you use the API for HTML markup). The person in whom you're interested has to have URLs you consider authoritative. They may not exist. To use it to describe specific data about a person other than URLs requires believing any URLs you cite are stable. You'd often have to limit the URLs to those you control. That makes otherme not very useful for many famous and semifamous people, including those in history. Many Web pages are about historical figures and many more are about modern people who are likely to be significant in history, like heads of state.

FOAF is for XML and therefore is compatible with XHTML, but is a bit more complicated to use with HTML, because some of its requirements don't apply to elsewhere in HTML. FOAF has many good features but, of 8 I proposed here, it lacks 6: death date, when flourished, nationality, birth place, and a way to refer to authoritative sources if they're not openly online (e.g., subscription databases and Who's Who books) (http://xmlns.com/foaf/spec/). In addition, despite having read probably dozens of books on Web matters (among hundreds on computers generally), I don't recall FOAF. It deserves publicity, but HTML already has that and already has a mechanism to do what I'm proposing, a mechanism described in books on the language.

hCard and the closely-related RDFa grammar Google supports are too limited, because they don't have enough fields available. Parsers are to ignore anything not understood. A proposal for a date-of-death field is pending for hCard, but not for other fields, and accepting the one proposal may require abandoning the nearly 1:1 relationship with the vCard RFC. Multiple birth dates are required when we know, say, a person was born October 16 but not whether that was in 1919 or 1918, often the case with entertainers, but hCard limits to a single date of birth or requires more vagueness than the known facts may justify. Birth and death dates may come from different calendars for people whose lives straddle a calendar change (one occurred about two and half centuries ago in the U.S.) and hCard doesn't accommodate those changes. While fn is flexible enough, n isn't for some name methods found internationally and n is impliable from fn, the inflexibility creating erroneous results not attributable to the content author. An ident-scheme in this thread's proposal might refer to a large collection of biographies that may be in book form or in an access-limited website and thus not have a URL or a full URL, and hCard doesn't offer compatible properties.

I did find one problem with my proposal. Where I wrote "Other biographical sources offer vague dates for when someone flourishide nationalities or birth places.", I probably meant something like "Other biographical sources offer vague dates for when someone flourish[ed and, to distinguish someone, commonly prov]ide nationalities or birth places." I'm proposing we provide linkages for that kind of data.

Let's say Prof. X writes about Attila the Hun. So does Prof. Y. The two professors don't trust each other, but they agree on their subject and when he flourished. They don't want to link to each other's pages because they don't want to trust their rivals' work or stability. At the same time, search engines' content analyses are more geared to popular writing. One way scholarly writing may differ is by using key terms less often per thousand words of total text, because it's presumed readers already know what they're reading about, and that lowers ranking, which may increase the spread between their papers, making finding same-subject people results harder for searchers. And requiring search engines to analyze free-form text like "he was brought by the stork as the most beautiful baby you ever saw on April 16, 1963" to extract an identifying birthdate is too much to ask of an algorithm, so any technology we use for this general purpose is likely to need hand-coding, making the page author's time a factor.

Solution: If both professors place rel canonical-human "Attila the Hun" and what happen to be the same dates for flourishing or birth and death in their pages, once per page head, search engines can recognize that Prof. X and Prof. Y are almost certainly talking about the same person. The certainty will go up when using standard biographical identifiers. This becomes even more important when the name in question is coincidentally shared by multiple people, say, a Panamanian judge and an Indian moviemaker, and searchers aren't sure which nationality or occupation makes their subject important. The searchers want the search engines to separate the results by subject person. And the rel being essentially a line or two saves authoring time.

This link rel would solve the problem.

Thank you.

--
Nick

Comment 6 Maciej Stachowiak 2009-09-26 21:51:18 UTC

Please consider making your proposal for a new link value on the Wiki instead of in the bug tracker.

Some questions to consider: 

- What would a <link rel="canonical-human"> link to? You explained the semantics of the "title" attribute but not the "href".

- Is it intended that this link relation should describe the page linked to, or the page containing the link? If the former, wouldn't it be better to give the page itself a way to say what person it is about? If the latter, it seems inappropriate to use <link> instead of <meta>.

- How can we be confident that people would want to use this complicated format, or want to use it in the first place

Comment 7 Nick Levinson 2009-09-27 19:51:17 UTC

It's already in the wiki and has been since the day this began, but descriptions there should be brief, preventing detailing, and the wiki offers two routes to acceptance, this being one, where it's filed as a spec proposal/enhancement.

> - What would . . . [it] link to? You explained the semantics of the "title" attribute but not the "href".
I didn't add href because other rels and tags cover that. If adding href would be helpful and wouldn't create a problem with rev, I don't mind adding it, but I don't think it's needed.

Search engines would associate identical link elements from across the Internet and present the results together. If a dozen websites link about "Chris The Great" with the same birth/death dates, a search engine would assume they're probably about the same person, while placing some other person with the same name but different biographical data in a separate list of results. Results for "Chris The Great" could then say "Six people have that name", present one or two results for each, and then present more results matching the result you clicked.

> - . . . wouldn't it be better to give the page [linked to] . . . a way to say what person it is about?
That option exists now. But linking to a page presumes the author trusts the page they didn't author, and that trust shouldn't be required.

> - . . . If ["describ[ing] . . . the page containing the link"] . . ., it seems inappropriate to use <link> instead of <meta>.
A meta tag's grammar wouldn't be any simpler, meta seems less semantic, and no present meta tag offers the set of fields.

A link tag in a head doesn't present a link to a visitor, but already (with exceptions) is for search engines and other services to use to associate pages (e.g., pages about friends) in results. The link tag educates the search engine for more intelligent grouping. It is with that sense that this <link> was proposed.

> - How can we be confident that people would want to use this complicated format, or want to use it in the first place[?]
The only ones who'd need to use it would be page authors and search engines. Visitors would not use it, but would benefit from its use by others. Growth in use could be gradual. The benefit begins as soon as a search engine sees two websites with the same link tag.

It would be optional, and all the fields within it are optional. Some other technologies, such a FOAF, are more complicated, even if revised to support what this proposal would support. This one is shorter and easier for busy page authors to apply.

Suppose you write fifty pages about Houdini Z. who flourished in the 1320s. You could write <link rel="canonical-human" title="name: Houdini Z.; flourished: 1320s" /> into your template and copy the whole template fifty times, and add your content. Or you could paste it into fifty heads; either way, you would not have to search for a string within the body in order to put the tag next to a relevant string. (You might need the tag on only one page, not all of them; that would be up to the search engine in parsing importance. I'd recommend having it on all, such as with a one-job template.) Suppose, while you write your fifty pages of content, Mr. Putin, as a direct descendant of Houdini Z., writes one page about the same Houdini Z. and inserts <link rel="canonical-human" title="name: Houdini Z; flourished: 1320s; ident-scheme: Lesser Magicians; ident: 322" />. A search engine would guess that you and Mr. Putin are writing about the same Houdini Z. And suppose Madonna writes about Houdini Z., a California music producer who hit it big in the 1990s, and writes <link rel="canonical-human" title="name: Houdini Z.; flourished: 1990s" />. A search engine would conclude that that's a different Houdini and present it separately.

Search engines have tried to group results in people searches, but haven't had much success in writing algorithms to do it. A9 with Amazon's backing tried and virtually gave up the service altogether. Results that try to group people systemically tend to be awful. Ranking that puts famous people first means if you want a less-well-known person you probably have to dig way down, and along the way read every snippet just in case. Grouping would eliminate most of that drudge with little inconvenience to searchers who want the most famous person. That kind of grouping is already in use for some single-site search engines (e.g., separating sales lit from downloads and repairs in a results list), but that can work because a single site can impose per-page codes to support grouping.

That benefit would motivate some page authors and search engines to employ the feature once it's standardized.

Thanks.

-- 
Nick

Comment 8 Ian 'Hixie' Hickson 2009-09-29 07:37:32 UTC

Before we add this to the spec, we need implementation experience, a more formal specification, research on its usefulness, etc:

http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_a_specification.3F

We don't just add keywords to the spec arbitrarily.

Comment 9 Maciej Stachowiak 2010-03-14 14:51:31 UTC

This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
  http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.

Comment 10 Nick Levinson 2010-03-28 20:15:26 UTC

I'm not sure what needs to be made more formal for a spec. The need to find content on one person across many websites when many people may have the same name is well known, and, when a canonical page is not agreed upon or doesn't exist, is resolvable by each website offering extensive matchable biographical data so a search engine can compare and match to distinguish different people with the same name so websites about the same person can be found together. I've had difficulty getting UA makers to respond to feature requests before HTML5 support. We need HTML5 support for some of them to prioritize the feature. Thus, I'm requesting escalation.

Suggested title: link tag: rel: associate pages about the same person across many sites for searches without a canonical page and despite a confusingly indistinct name

Suggested text:

Different websites may have pages about the same person. Several people may have the same name and all may be written about on multiple sites. Search engines have difficulty associating pages that are about the same person without erroneously intermixing other people with the same name, especially when none of the people are extraordinarily famous and popular in searches (when they are, search engines may have algorithms for more sophisticated associational analysis).

Libraries solve this for authors by distinguishing among them with birth years and death years. Other biographical sources offer vague dates for when someone flourished and, to distinguish someone, commonly provide nationalities or birth places.

Without a standard method, one search engine, A9, tried grouping people in results and, in my observation, failed abysmally. They no longer offer the service. This proposal would provide page authors with a tool that search engines could read for much better grouping of results.

A link element naming the person and providing, in the element, data that is standardized could help search engines organize their listings to reduce accidental intermixing. It wouldn't be perfect; e.g., a person may have reported multiple ages from which different birth years are calculated; a website owner may erroneously enter the wrong data; nationality may vary with a citizenship change; or historians may disagree. But, in general, listings with this element could be more successfully separated.

Writing and parsing the link element would be a bit more complex than with other link elements, but I think this is manageable and the method I propose has been applied elsewhere.

I propose that the rel value be "canonical-human" and that its title attribute be reserved for a special meaning and syntax. The title attribute's syntax would be in the form of title="name: Asashi T. Fung; born: 1723; died: 1799; flourished: 1740s-1750s; nationality: FR; birthplace: Honolulu, Hawaii, US; ident-scheme: ; ident: ;". No href attribute is needed.

Each subattribute (e.g., "name") would be optional. For example, "flourished" would likely be used only when birth and death years are unknown.

For the subattribute birthplace, if a subvalue is supplied, a nation would be required. The nation of the birthplace would be represented by one of the same codes used for nationality.

For the subattribute ident-scheme, a list of schema could be developed later, perhaps each to be prefixed by a code for the scheme's nation and a hyphen. Schemes could include privately-owned but widely available databases of moderately-well-known people. Subvalues for ident-scheme and ident must not be entered until a list of schema and the style of ident values for a scheme is centralized and then the scheme must be in that list and ident's subvalue must conform to the specified style.

If only whitespace or a null is between the colon and the semicolon, that is equivalent to the subattribute not appearing.

A final semicolon before the closing quote mark is optional and may be imputed.

More subattributes might be added in the future, so page authors must not invent new ones in the meantime.

No subvalue (e.g., "1723") could contain a colon or a seimcolon. If one is needed or wanted, a character entity must represent the colon or the semicolon.

The nationality and the birthplace would include a nation using standard two-letter codes. For nations that no longer exist and do not have two-letter codes, e.g., Roman Empire and Van Lang, longer codes must be used, since about	200 2-letter codes are already in use and only 676 exist, and longer codes would prevent future conflict or exhaustion. A list of deceased nations and their longer codes would have to be established, possibly based on a standard gazetteer.

The rel value of "canonical-human" avoids the legal meaning of _person_ in the U.S., and probably in other nations that rely on U.K. common law traditions, where it includes corporations and other legally-recognized entities. A value of "canonical-individual" may be too confusing if misunderstood as being about, say, pages and not people at all.

This is already in the RelExtensions wiki, albeit without details.

No rev value would be meaningful.

Multiple link elements with this rel value would be permitted, and UAs should apply all of them. That permits multiple names (e.g., spellings), ident-schemes, and idents to identify the person more certainly.

Several other technologies fall short for the purpose:

--- Rel="me" is not adequate. The "me" points to a canonical page, not a person. And which page is canonical for Attila the Hun might not be subject to general agreement.

--- The Google Social Graph API is limited to URLs and name (http://code.google.com/apis/socialgraph/docs/attributes.html) (and I'm unclear how you use the API for HTML markup). The person in whom you're interested has to have URLs you consider authoritative. They may not exist. To use it to describe specific data about a person other than URLs requires believing any URLs you cite are stable. You'd often have to limit the URLs to those you control. That makes otherme not very useful for many famous and semifamous people, including those in history. Many Web pages are about historical figures and many more are about modern people who are likely to be significant in history, like heads of state.

--- FOAF is for XML and therefore is compatible with XHTML, but is a bit more complicated to use with HTML, because some of its requirements don't apply to elsewhere in HTML. FOAF has many good features but, of 8 I proposed here, it lacks 6: death date, when flourished, nationality, birth place, and a way to refer to authoritative sources if they're not openly online (e.g., subscription databases and Who's Who books) (http://xmlns.com/foaf/spec/). In addition, despite having read probably dozens of books on Web matters (among hundreds on computers generally), I didn't recall FOAF. It deserves publicity, but HTML already has that and already has a mechanism to do what I'm proposing, a mechanism described in books on the language.

--- hCard and the closely-related RDFa grammar Google supports are too limited, because they don't have enough fields available. Parsers are to ignore anything not understood. A proposal for a date-of-death field is pending for hCard, but not for other fields, and accepting the one proposal may require abandoning the nearly 1:1 relationship with the vCard RFC. Multiple birth dates are required when we know, say, a person was born October 16 but not whether that was in 1919 or 1918, often the case with entertainers and older women, but hCard limits to a single date of birth or requires more vagueness than the known facts may justify. Birth and death dates may come from different calendars for people whose lives straddle a calendar change (one occurred about two and half centuries ago in the U.S.) and hCard doesn't accommodate those changes. While fn is flexible enough, n isn't for some name methods found internationally and n is impliable from fn, the inflexibility creating erroneous results not attributable to the content author. An ident-scheme might refer to a large collection of biographies that may be in book form or in an access-limited website and thus not have a URL or a full URL, and hCard doesn't offer compatible properties.

Example:

--- Let's say Prof. X writes about Attila the Hun. So does Prof. Y. The two professors don't trust each other, but they agree on their subject and when he flourished. They don't want to link to each other's pages because they don't want to trust their rivals' work or stability. At the same time, search engines' content analyses are more geared to popular writing. One way scholarly writing may differ is by using key terms less often per thousand words of total text, because it's presumed readers already know what they're reading about, and that lowers ranking, which may increase the spread between their papers, making finding same-subject people results harder for searchers. And requiring search engines to analyze free-form text like "he was brought by the stork as the most beautiful baby you ever saw on April 16, 1963" to extract an identifying birthdate is too much to ask of an algorithm, so any technology we use for this general purpose is likely to need hand-coding, making the page author's time a factor.

 --- Solution: If both professors place rel canonical-human "Attila the Hun" and what happen to be the same dates for flourishing or birth and death in their pages, once per page head, search engines can recognize that Prof. X and Prof. Y are almost certainly talking about the same person. The certainty will go up when using standard biographical identifiers. This becomes even more important when the name in question is coincidentally shared by multiple people, say, a Panamanian judge and an Indian moviemaker, and searchers aren't sure which nationality or occupation makes their subject important. The searchers want the search engines to separate the results by subject person. And the rel being essentially a line or two saves authoring time.

This proposal is for describing the human. Multiple pages across many websites that have the same link element, because they contain the same personally identifying information, can be associated as representing the same person. That helps search engines.

An enhancement request for "canonical-organization" is separate.

Comment 11 Maciej Stachowiak 2010-03-28 20:22:11 UTC

(In reply to comment #10)
> I'm not sure what needs to be made more formal for a spec. The need to find
> content on one person across many websites when many people may have the same
> name is well known, and, when a canonical page is not agreed upon or doesn't
> exist, is resolvable by each website offering extensive matchable biographical
> data so a search engine can compare and match to distinguish different people
> with the same name so websites about the same person can be found together.
> I've had difficulty getting UA makers to respond to feature requests before
> HTML5 support. We need HTML5 support for some of them to prioritize the
> feature. Thus, I'm requesting escalation.

It's incorrect process to both reopen the bug *and* request escalation. Please
pick one of the following:

1) Reopen bug for fresh consideration by the editor - you will get a full
Editor's Response with rationale and a spec diff link if any spec changes are
made.

2) Escalate to tracker for consideration by the full Working Group - a Change
Proposal will be required.

In case of (1), the TrackerRequest keyword should be removed for now (you will
still be entitled to request escalation once the editor replies again).

In case of (2), the bug should be moved back to VERIFIED - it will remain there
and will not be closed pending a Working Group Decision.

If you do not pick one of these in a couple of days, I will assume option 2.

Comment 12 Maciej Stachowiak 2010-03-28 21:52:04 UTC

Per discussion in other bugs, moving back to VERIFIED.

Comment 13 Toby Inkster 2010-03-29 18:33:22 UTC

(In reply to comment #10)
> --- Rel="me" is not adequate. The "me" points to a canonical page, not a
> person. And which page is canonical for Attila the Hun might not be subject to
> general agreement.

rel=me should be adequate. It is not defined to point to a "canonical page" just to another page about the same person. If your page on Attila the Hun pointed to Wikipedia's page on him using rel=me, and my page did the same, and Wikipedia cited Joe Bloggs' page on Attila using rel=me, then a crawler could easily determine ttha all the pages in question dealt with the same person.

> --- FOAF is for XML and therefore is compatible with XHTML, but is a bit more
> complicated to use with HTML, because some of its requirements don't apply to
> elsewhere in HTML. FOAF has many good features but, of 8 I proposed here, it
> lacks 6: death date, when flourished, nationality, birth place, and a way to
> refer to authoritative sources if they're not openly online (e.g., subscription
> databases and Who's Who books) (http://xmlns.com/foaf/spec/). In addition,
> despite having read probably dozens of books on Web matters (among hundreds on
> computers generally), I didn't recall FOAF. It deserves publicity, but HTML
> already has that and already has a mechanism to do what I'm proposing, a
> mechanism described in books on the language.

If you believe FOAF to be an XML-based standard, then you are mistaken: it is RDF based. RDF is an abstract data model that can be serialised in a variety of ways: XML is one such way of course, but it's also possible to use JSON or indeed HTML - the HTML Working Group is working on HTML+RDFa as a method of embedding RDFa in HTML. So it's certainly possible to embed a FOAF description of a person into a webpage.

As far as it lacking the properties you propose, FOAF is occasionally revised, so you could bring them up on the FOAF mailing list. The editors of the FOAF spec are generally quite open to adding new features if they're shown to be useful. And FOAF being RDF-based is very extensible - any properties or classes that FOAF lacks, you can define yourself. For example, you might defined a "historical person vocabulary" and use it in HTML+RDFa like this:

  <div xmlns:foaf="http://xmlns.com/foaf/0.1/"
       xmlns:hp="http://example.com/historical-people#"
       typeof="foaf:Person">
       <h1 property="foaf:name">Asashi T. Fung</h1>
       Born: <span property="foaf:birthday">1723</span>,
       died: <span property="hp:died">1799</span>,
       flourished: <span property="hp:floreat">1740s-1750s</span>.
       Nationality: <span rel="hp:nationality" resource="http://dbpedia.org/resource/France">FR</span>,
       birthplace: <span rel="hp:birthplace" resource="http://dbpedia.org/resource/Honolulu">Honolulu, Hawaii, US</span>.
       <br rev="foaf:primaryTopic" resource="">
  </div>

Comment 14 Nick Levinson 2010-04-11 20:30:19 UTC

Thanks, Maciej; option 2 is what I intended.

Closely related: bug 7682, which is about organizations but otherwise similar.

FOAF 0.91 was for use with XML but 0.97 appears at a glance to go more easily with HTML. When will an HTML FOAF be available, do you think?

FOAF's vocabulary has to be agreed on or search engines would have difficulty matching different vocabularies meant by different authors for similar purposes. If the FOAF drafters are willing, then it's a good idea.

Wikipedia biographies are limited to notable people, a problem for specialists researching more obscure personages, e.g., geneaology sites, topical histories, and blogs reporting local goings-on where several blogs compete and cover the same people. So a more universal system is needed. Not requiring a central repository of people is what would allow a search engine to match pages about the same person even without a central reference.

If FOAF will support that in a more-or-less timely fashion, with an agreed long-enough vocabulary and HTML5 compatibility, so no one else's site need be trusted about a subject person, then that may be a good solution.

Comment 15 Maciej Stachowiak 2010-04-11 21:07:35 UTC

(In reply to comment #14)
> Thanks, Maciej; option 2 is what I intended.
> 
> Closely related: bug 7682, which is about organizations but otherwise similar.
> 
> FOAF 0.91 was for use with XML but 0.97 appears at a glance to go more easily
> with HTML. When will an HTML FOAF be available, do you think?
> 

I can't speak for the FOAF authors, but I believe FOAF could be embedded in HTML using HTML+RDFa. Or perhaps someone will try to define a Microdata serialization of FOAF.

Comment 16 Toby Inkster 2010-04-11 21:40:32 UTC

(In reply to comment #14)
> FOAF 0.91 was for use with XML but 0.97 appears at a glance to go more easily
> with HTML. When will an HTML FOAF be available, do you think?

As I indicated above, no version of FOAF is XML-based - all versions are RDF-based. RDF is an abstract data model which can be represented in XML, JSON and various other formats and can be embedded in HTML. A W3C Recommendation for embedding RDF (including FOAF) in XHTML already exists today, and is in the process of being "backported" to HTML by the HTML Working Group already. It's called RDFa <http://www.w3.org/TR/rdfa-syntax/>.

So FOAF already works in HTML. And it's already being used - http://sw-app.org/mic.xhtml, http://www.ivan-herman.net/foaf.html, http://tobyinkster.co.uk/, etc.

Comment 17 Maciej Stachowiak 2010-05-12 03:42:02 UTC

http://www.w3.org/html/wg/tracker/issues/115