6610 – add a preventable forced-fragment method

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6610 - add a preventable forced-fragment method

Summary: add a preventable forced-fragment method

Status:	VERIFIED NEEDSINFO

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	TrackerIssue

Depends on:
Blocks:

Reported:	2009-02-22 08:34 UTC by Nick Levinson
Modified:	2010-10-04 14:48 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description Nick Levinson 2009-02-22 08:34:05 UTC

Could you please add an ability for a user to go to a fragment even though the fragment has not been identified by the website owner? That way, I, as an ordinary user, could tell someone about an interesting item that's down a long page or buried in text, by sending a link that would take the user directly to the destination I intend.

Versions of this idea were discussed at http://esw.w3.org/topic/HTML/DocFragPointer and as bug 5744, closed as a wontfix largely on a question of lack of evidence for need based on lack of evidence that workarounds are in common use or that browser makers' customer research reveals such a need.

The need to identify a locus within a document is more likely in academic papers and in Chinese sites, where longer pages are the norm. Scholarly citation guides sometimes point out the problem of citing to pages in WWW documents given inconsistencies in printout paginations. PDFs are a partial solution since even documents without page numbering benefit from the PDF reader's own paginations. This is also relevant to forum threads that run at length; some fora support linking to a single post but not all do. Writing in longhand how to go to a fragment without an author's support is feasible but the longhand is offputting; people would rather have a link to the exact destination and even a long URL is more convenient than a narrative description, because it doesn't have to be remembered after one has left the citing document, especially among people who are used to a link being all they need to find what they want.

I suggest a double hash as the URL forced-fragment identifier to be followed by a string from phrasing content or visible text and, even if no one has adopted or defined the double hash as a URL element, I recommend that HTML 5 recognize the double hash for the purpose in a draft. If the concept appears acceptable, that may be enough to justify asking IETF or another appropriate body concerned with URLs not to contradictorily define the double hash but to reserve it for this purpose. I don't think there's any conflict extant.

Thus, I might send a URL such as <http://www.example.com/directory/page.html##Further%20Reflection>. If "Further Reflection" appears in one place on the page as it would be interpreted by a standards-compliant browser, using that URL would take the visitor there, possibly saving some scrolling or any need to use a Find function. If "Further Reflection" appears twice on the page, the URL could take the visitor to the first occurrence. If "Further Reflection" doesn't appear on the page, the URL would take the visitor to the top of the page, as if the URL was <http://www.example.com/directory/page.html>. If "Further Reflection" appears only in an alt text for an image, an alt text would be treated as a destination after those that would ordinarily appear in a browser, i.e., a visitor would be taken to the image only if "Further Reflection" doesn't appear anywhere else but in alt texts, thus accommodating page designs in which illustrations appear above or before section headlines and body texts.

Two or three problems could often arise. One is a user's difficulty in forming the linking URL especially with percent-encoding, but that could be left to third-party tools, but that requires someone to develop a third-party tool. In time, a browser maker or extension maker could add a menubar or contextual menu command by which a selection in the window would be turned into a link with a forced fragment unless forbidden by the page author, and that integration into the browser would facilitate automatically checking source code for anything needed to make the string successfully match later, such as finding an invisible character and stripping markup.

A related problem is failing to correctly quote a string for the URL's forced-fragment string, such as if a page uses a nonbreaking space the quoter doesn't notice, but a comparable problem is in quoting from PDFs that were made by scanning text as images or that have odd spacing and word-breaking for page layout purposes, but, as with PDFs, selecting and highlighting text will solve the problem often enough.

The third problem is quoting a string that actually appears as artwork, as when a page author wanted platform-independent font choice, but in that case alt text could fulfill the user's need, so a forced-fragment URL would be feasible even there.

User agent designers should be advised that the browser should automatically scroll so that the top of the fragment should be at the top of the browser's drawing area. That would be contrary to the behavior of many browsers, which present a found string at the bottom of the drawing area, a behavior which may be retained for the Find function if the browser designer so wishes, but shouldn't be applied to fragment-finding.

As a complement, the ability to block going to an unapproved fragment should be provided, so website owners who, for example, want to ensure that every user sees the top of a page will be shown it in their browser. In that case, the string beginning with ## (or other forced-fragment identifier) would be ignored, while being preserved for possible use for history or as a referer if a user clicks on a link on the destination page. To that end, if the general idea of forcing fragment identification is accepted, I would propose as an optional preventer for site owners a meta element using a new keyword, e.g., <meta name="allow-forcing-fragments" content="false" /> (the only other value being "true", trivial since the tag could be eliminated for true). I can add this to the appropriate Wiki for meta tag proposals.

This responds to <http://www.w3.org/TR/html5/single-page/>, Working Draft, 12 February 2009. For Bugzilla, I selected all OSes; I develop on Win95a and 98SE and Linux and want pages to work on whatever users use.

Thank you.

--
Nick

Comment 1 Ian 'Hixie' Hickson 2009-02-22 09:05:49 UTC

Does XPointer address your needs?

http://www.w3.org/TR/xptr-framework/
http://www.w3.org/TR/xptr-xpointer/

Comment 2 Nick Levinson 2009-02-24 04:31:32 UTC

On first impression only, it looks to me like XPointer technology will work, if it's true a user can use XPointer to identify fragments that an author did not distinguish, but apparently XPointer is only for XML-related media types, which probably excludes most Web content. With HTML's prevalence, document authors generally strive for compatibility only with HTML and implicitly XHTML and may not take the time to divide a document into many fragments and identify them. Thus, in practice, it seems XPointer will usually be irrelevant. HTML can provide a rudimentary facility so that when an author does not signify fragments a reader can do so, even on old unfragmented documents, even without XML compatibility, and then the reader can tell a prospective reader how to get to the good part without any cooperation from the author.

We often don't notice the problem because of the American practice of having many short pages rather than a few long pages and because popular fast-pacing means many people give a page only a minute or two to reveal the desired information before the seeker goes elsewhere. Someone who leaves usually does not thereafter ask someone else where that missing information was. They'll just say the info wasn't there. So they don't find the information and the person who told them it's there is probably judged to be less reliable, because they promised information that evidently wasn't there.

An HTML facility needn't be as well-controlled as XPointer's. But there should be something.

Thanks, including for the links.

--
Nick

Comment 3 Erik Wilde 2009-02-28 02:26:57 UTC

http://dret.typepad.com/dretblog/2008/05/xhtml-fragment.html has some discussion about the general question of fragment identifiers for HTML5. it had been discussed on the HTML5 mailing list and was shot down because it is a classical "chicken and egg" problem, and therefore it is hard to justify to do this other than by saying "this will improve the web as a hypermedia and information system, is not extremely hard to do, can be done with perfect backwards compatibility, and therefore we do it."

my personal recollection of the discussion is that the whole argument of "improving HTML as a language for hypermedia and information representation" is not in the focus of the HTML5 effort. it is mostly about improving HTML as a language for building interfaces. personally, i think this is unfortunate and misses some long hanging fruit, but the feedback on the mailing list was not too great, and apparently most people don't really care that much about this particular problem.

more specifically, the "##" approach looks problematic to me. this would require an update of the generic URI syntax, a fairly central document on the web. the approach i have proposed would be to improve fragment identifications in HTML5, and these could be pointers to elements, or could have "search semantics", deciding on the possible fragment identification mechanisms would be the second step after deciding that HTML5 should do something in that area.

Comment 4 Nick Levinson 2009-02-28 04:44:19 UTC

What if Google's and Yahoo's search results routinely included forced-fragment identifiers in most of their results? Their results often show phrasing that surrounds the string I wanted. But when I click on a result, I often have to use my browser's Find function to find what the search engine already found. If Yahoo or Google generated forced-fragment identifiers with their URLs, I'd save a step, and Google or Yahoo would be more relevant right away. An HTML technology that doesn't depend on content or page authors to cooperate would allow Yahoo and Google to generate forced-fragment identifiers for most of their results at little additional cost.

Some pages have many links embedded in the content (ignoring navbars, headers, etc.). Some have few. It's a lot of work to add lots of tags. Some high-quality content has few or no tags and will probably always have few or no tags. If we insist that only authors can link from their content, HTML 4/5 suffices, but we who read would like to be heard about someone else's work. Depending on elements works for people who understand invisible markup, which is not most people, or would require a browser or extension that would interpret a user's intention by translating into element identification (such as <h4>), and that seems complicated and very prone to the chicken-and-egg problem. Forced-fragment identifiers, I think, are easier to implement.

In that context, the chicken-and-egg problem is not with page or content authors. It's with a browser or extension designer, of whom only one is needed, which makes it easier, and the standard being in HTML5 would be a reason for someone to do that, solving the problem. Google or Yahoo wanting the facility would speed embracing the concept across the industry.

I understand hesitating to change any well-established standard format, including that for URLs. But I'm not sure that it's conceptually different than changing a computer language so that comments formerly never parsed by a compiler or interpreter are sometimes so parsed once a comment can possibly include an executable script, even though the script-announcing string could possibly have been used in a non-scripting comment earlier when it was legal. (Apple used to reserve some commands for future use. Woe befell the programmer who preempted one early. Perhaps that concept should be more widely considered in standards-setting, as it sometimes is now.) If a double-hash might be confusing, perhaps some other string could be proposed, although I think problems will be least if that string were to begin with a hash.

Avoiding proprietary ownership of a functionality means avoiding a solution that only one browser would understand, as that would force both a visitor and a prospective visitor to have the same browser, and that would be practical mainly if IE were that browser, unlikely if MS doesn't think there's a market large enough to justify the financial liability inherent in introducing any new feature. It would be better for Firefox or Opera to offer the URL-making feature, if then IE or almost any browser could apply it to any canvas, and then IE and others would be likelier to add URL-making.

Only two extensions are needed. One extension would make URLs by adding a forced-fragment identifier to the URL already in the address bar, deleting any existing non-forced fragment identifier. A menu command would be a good vehicle to that end. If text was preselected on the canvas, good; if not, selecting the command could bring up an alert requesting selection first, with the side effect of keeping the command almost never dimmed, thus in front of people's eyes. If the preselected text was very long, only its beginning would be needed, just enough of the beginning to ensure uniqueness on the page. The user could then edit the URL in the address bar to add more of the selection in anticipation of possible page edits, if desired.

The other extension would recognize and extract from the URL in the address bar the forced-fragment identifier, apply the browser's native search function to either the canvas or the source code to find the fragment, and scroll the page to put the found string at the top of the canvas. This seems to combine abilities already in most major browsers, so relatively little new programming would be required.

Dynamic or other page changes would only be a minor problem. If a string can't be found, so be it; the user would get the top of the page. A browser maker who wants to offer an additional bell or whistle could announce via an icon that the fragment could not be found, in addition to giving the user the top of the page, but that announcement would be optional, and the HTML standard wouldn't have to specify it.

I don't think the two purposes (improving interfaces and adding hypermedia abilities) are in contradiction and I don't think this proposal would bloat HTML5, since it would add only a couple of lines to the standard.

I lack the time to really pursue this idea. I have to leave it to others to decide if it's important enough to reopen this and to evaluate the browser extension mentioned by Jeff on your blog page. Meanwhile, your blog entry inspired the search-engine idea. Thanks.

--
Nick

Comment 5 Maciej Stachowiak 2010-03-14 13:16:58 UTC

This bug predates the HTML Working Group Decision Policy.

If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.

Comment 6 Nick Levinson 2010-03-28 19:39:46 UTC

This would help users find more content and find it faster, even when a page author has not created a fragment. I'm requesting escalation.

This bug is already assigned, so I don't know if its status is correct..

Suggested title: add a preventable forced-fragment method

Suggested text:

A URL with a special fragment identifier should take a user to the fragment another user intends even if the page author didn't intend it.

Thus, when a search engine shows me a snippet, I should be able to go directly to that snippet, a problem with some long documents. Using the find function on a page isn't always feasible.

A page author should be able to disable the action of all such URLs by ignoring the special fragment identifier and taking the user to the top of the page.

I'm not sure if additional syntax for URLs needs to be defined or if the standard "#" fragment identifier can serve by having HTML5 recognize it even if the page author did not explicitly define an anchor for the fragment.

Comment 7 Maciej Stachowiak 2010-03-28 19:48:46 UTC

(In reply to comment #6)
> This would help users find more content and find it faster, even when a page
> author has not created a fragment. I'm requesting escalation.
> 

It's incorrect process to both reopen the bug *and* request escalation. Please pick one of the following:

1) Reopen bug for fresh consideration by the editor - you will get a full Editor's Response with rationale and a spec diff link if any spec changes are made.

2) Escalate to tracker for consideration by the full Working Group - a Change Proposal will be required.

In case of (1), the TrackerRequest keyword should be removed for now (you will still be entitled to request escalation once the editor replies again).

In case of (2), the bug should be moved back to VERIFIED - it will remain there and will not be closed pending a Working Group Decision.

If you do not pick one of these in a couple of days, I will assume option 2.

Comment 8 Maciej Stachowiak 2010-03-28 21:52:42 UTC

Per discussion in other bug, moving back to NEEDSINFO.

Comment 9 Maciej Stachowiak 2010-04-06 01:10:54 UTC

Is this meant to be in "HTML5: The Markup Language" rather than "HTML5 spec bugs"?

Comment 10 Nick Levinson 2010-04-11 17:41:57 UTC

Thanks, Maciej, for option 2; that's what I intended. The menu option for Spec Proposals seems more appropriate; I originally set the component as just "HTML5: The Markup Language" because it didn't say anything about having an editor other than Hixie, so I thought it was just a fancy name for HTML5 (I don't remember what the choices were back then), and didn't rethink it when it was later renamed with editor Michael(tm) Smith's name (I guess it's a planned manual); so I'm changing the component now. Thanks for that, too.

Comment 11 Maciej Stachowiak 2010-05-12 03:41:25 UTC

http://www.w3.org/html/wg/tracker/issues/113