This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6609 - negative keywords-not meta tags
Summary: negative keywords-not meta tags
Status: VERIFIED NEEDSINFO
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P2 enhancement
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords: TrackerIssue
Depends on:
Blocks:
 
Reported: 2009-02-22 08:13 UTC by Nick Levinson
Modified: 2010-10-04 14:28 UTC (History)
4 users (show)

See Also:


Attachments

Description Nick Levinson 2009-02-22 08:13:51 UTC
Rumor to the contrary notwithstanding, keyword meta elements do work, albeit within limits. I did a test and also found confirmatory recent discussion online about major search engines.

Insofar as they work, what's needed is a way to clarify relevance to one theme by distinguishing it from another. Negative keywords would thus be helpful. For example, a page about "virus" could be about computer viri or biological viri but usually won't be about both. While major search engines may be intelligent enough to distinguish in that well-known case, new subjects may not be well known to search engine managers, and thus an author may prefer to control how their theme is understood from the date of going live. A negative keyword could quickly clarify the theme of the page.

Using body text may not be adequate. Consider a doctor writing a carefully exhaustive article about aspirin's less-well-known uses and thus without discussing headaches, since almost everyone already knows about that use. Being careful, the doctor writes in the introduction that "the article will not discuss headaches." Someone does a search for "aspirin NOT headache". They should get that paper but they do not. A negative metatag may aid a search engine in understanding the doctor's thematic intention and thus in supplying what a searcher is seeking. Search engine designers would have to do some careful work to handle the aspirin case as intended but they could do that far more easily if we page authors have an HTML facility that would give search engines something to work with.

Keyword metatags long ago lost favor after their widespread abuse. However, they are used by search engines; and I don't see how negative keywords are any more susceptible to abuse than positive ones. Further, a page author could use either positive or negative keywords without having to offer both so there'd be no unwanted increase in the designer's workload. Optimizers could use essentially the same tools to generate either kind of keyword. The only risk, I think, is putting a word in both, but I think that would only be an author's error, so each search engine could prepare for that eventuality any way they see fit and editing software and validators could choose to alert an author to the apparent conflict without requiring an author to change an element. Thus, if a page author uses the same word in both but with differing case because one represents a common product and the other a brand name the page author would take the risk of being misunderstood by a search engine while a search engine might observe the case distinction and consider how to handle it. The page author could also use longer phrases either positively or negatively and thus ease distinguishing themes.

Because of the relevance of Boolean NOT searches and for relative brevity and to avoid an abbreviation that may not be familiar to speakers of other languages, I propose calling it "keywords-not". I'm shortly proposing it in the Wiki at http://wiki.whatwg.org/wiki/MetaExtensions. The synonyms I'll list there do not relate to legacy content, of which I know none, but are what people would likely think of. I'm preparing to include keywords-not in a website I'm designing, but I don't know when the site will go live. My method will probably be to use a separate meta tag following the metatag for keywords used positively, since they can't be combined into one element, but I see no reason to require any position other than that both go into the head, as one tag already must. E.g.,

<head>
. . . . .
<meta name="keywords" content="aspirin,heart,blood" />
<meta name="keywords-not" content="headache" />
. . . . .
</head>
<body>
<h1>Aspirin Except For Headaches</h1>
<p>. . . .</p>
</body>

This responds to <http://www.w3.org/TR/html5/single-page/>, Working Draft, 12 February 2009. For Bugzilla, I selected all OSes; I develop on Win95a and 98SE and Linux and want pages to work on whatever users use.

Thank you.

-- 
Nick
Comment 1 Nick Levinson 2009-02-25 04:24:20 UTC
A couple more aspects:

Antonyms are usually a waste of time in this area, so the keywords-not attribute need not be invoked just to provide an antonymy. Rather, this is for cases where the same word serves very different meanings, such as _virus_, including opposite meanings by the same word, such as _sanction_. Thus, writing keywords-not would be infrequent, although the sheer scale of the Web and of HTML usage means the attribute would be still used enough to warrant recognition in a standard and adaptation by search engines.

Search engines give more weight to thematic words written directly into page content. However, some thematic words may be difficult for authors to work into text without going to some length to explain important complications, and that might make the whole page too cumbersome, losing readers. If the main text is to be short, leaving those secondary keywords out may be smarter writing of content. This is often true when stating principles, which may be more easily understood if stated in just a few words, leaving redundant particulars out. But searchers may still use various common particulars to find this principle via search engines. To support search, the keywords that represent the particulars and are not in the visible text should be put into meta tags. Some would go into meta elements with the keywords attribute. But, for some of them, keywords-not may be the more relevant attribute. And that would keep the positive keywords metatag from getting enormously long.

-- 
Nick
Comment 2 Ian 'Hixie' Hickson 2009-06-28 10:21:41 UTC
This is an interesting idea. Are there any search engine vendors who have tried implementing this experimentally? That would be the first step towards adding this to the specification.

http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_the_spec.3F
Comment 3 Maciej Stachowiak 2010-03-14 13:16:57 UTC
This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.
Comment 4 Nick Levinson 2010-03-28 19:19:36 UTC
I've had difficulty getting UA makers to respond to feature requests before HTML5 support. We need HTML5 support for some of them to prioritize the feature. Thus, I'm requesting escalation.

Suggested title: negative keywords-not meta tags

Suggested text:

Rumor to the contrary notwithstanding, keyword meta elements do work, albeit within limits. I did a test and also found confirmatory recent discussion online about major search engines.

Insofar as they work, what's needed is a way to clarify relevance to one theme by distinguishing it from another. Negative keywords would thus be helpful. For example, a page about "virus" could be about computer viri or biological viri but usually won't be about both. While major search engines may be intelligent enough to distinguish in that well-known case, new subjects may not be well known to search engine managers, and thus an author may prefer to control how their theme is understood from the date of going live. A negative keyword could quickly clarify the theme of the page.

Using body text may not be adequate. Consider a doctor writing a carefully exhaustive article about aspirin's less-well-known uses and thus without discussing headaches, since almost everyone already knows about that use. Being careful, the doctor writes in the introduction that "the article will not discuss headaches." Someone does a search for "aspirin NOT headache". They should get that paper but they do not. A negative metatag may aid a search engine in understanding the doctor's thematic intention and thus in supplying what a searcher is seeking. Search engine designers would have to do some careful work to handle the aspirin case as intended but they could do that far more easily if we page authors have an HTML facility that would give search engines something to work with.

Antonyms are usually a waste of time in this area, so the keywords-not attribute need not be invoked just to provide an antonymy. Rather, this is for cases where the same word serves very different meanings, such as _virus_, including opposite meanings by the same word, such as _sanction_. Thus, writing keywords-not would be infrequent, although the sheer scale of the Web and of HTML usage means the attribute would be still used enough to warrant recognition in a standard and adaptation by search engines.

Search engines give more weight to thematic words written directly into page content. However, some thematic words may be difficult for authors to work into text without going to some length to explain important complications, and that might make the whole page too cumbersome, losing readers. If the main text is to be short, leaving those secondary keywords out may be smarter writing of content. This is often true when stating principles, which may be more easily understood if stated in just a few words, leaving redundant particulars out. But searchers may still use various common particulars to find this principle via search engines. To support search, the keywords that represent the particulars and are not in the visible text should be put into meta tags. Some would go into meta elements with the keywords attribute. But, for some of them, keywords-not may be the more relevant attribute. And that would keep the positive keywords metatag from getting enormously long.

Keyword metatags long ago lost favor after their widespread abuse. However, they are used by search engines; and I don't see how negative keywords are any more susceptible to abuse than positive ones. Further, a page author could use either positive or negative keywords without having to offer both so there'd be no unwanted increase in the designer's workload. Optimizers could use essentially the same tools to generate either kind of keyword. The only risk, I think, is putting a word in both, but I think that would only be an author's error, so each search engine could prepare for that eventuality any way they see fit and editing software and validators could choose to alert an author to the apparent conflict without requiring an author to change an element. Thus, if a page author uses the same word in both but with differing case because one represents a common product and the other a brand name the page author would take the risk of being misunderstood by a search engine while a search engine might observe the case distinction and consider how to handle it. The page author could also use longer phrases either positively or negatively and thus ease distinguishing themes.

Because of the relevance of Boolean NOT searches and for relative brevity and to avoid an abbreviation that may not be familiar to speakers of other languages, I propose calling it "keywords-not". I'm preparing to include keywords-not in a website I'm designing, but I don't know when the site will go live. My method will probably be to use a separate meta tag following the metatag for keywords used positively, since they can't be combined into one element, but I see no reason to require any position other than that both go into the head, as one tag already must. E.g.,

<head>
. . . . .
<meta name="keywords" content="aspirin,heart,blood" />
<meta name="keywords-not" content="headache" />
. . . . .
</head>
<body>
<h1>Aspirin Except For Headaches</h1>
<p>. . . .</p>
</body>
Comment 5 Maciej Stachowiak 2010-03-28 19:47:15 UTC
(In reply to comment #4)
> I've had difficulty getting UA makers to respond to feature requests before
> HTML5 support. We need HTML5 support for some of them to prioritize the
> feature. Thus, I'm requesting escalation.

It's incorrect process to both reopen the bug *and* request escalation. Please pick one of the following:

1) Reopen bug for fresh consideration by the editor - you will get a full Editor's Response with rationale and a spec diff link if any spec changes are made.

2) Escalate to tracker for consideration by the full Working Group - a Change Proposal will be required.

In case of (1), the TrackerRequest keyword should be removed for now (you will still be entitled to request escalation once the editor replies again).

In case of (2), the bug should be moved back to VERIFIED - it will remain there and will not be closed pending a Working Group Decision.

If you do not pick one of these in a couple of days, I will assume option 2.
Comment 6 Nick Levinson 2010-03-28 21:30:13 UTC
Option 2 is what I intended. I don't see a Verified option on my login, so I can't make the change. Thank you for offering.
Comment 7 Maciej Stachowiak 2010-03-28 21:40:51 UTC
Moving back to VERIFIED state
Comment 8 Nick Levinson 2010-04-11 17:26:05 UTC
Asking search engine firms for improvements doesn't get answers, in my experience. Waiting for them to design for a feature that isn't remotely close to being already widespread and isn't in any standard is a chicken-and-egg problem. HTML5 or the WHATWG MetaExpensions wiki should finalize keywords-not, since its inclusion adds no burden for any page author who doesn't add the element and adding it is simple.

Thank you.
Comment 9 Maciej Stachowiak 2010-05-12 03:41:07 UTC
http://www.w3.org/html/wg/tracker/issues/112