Hypertext Links in HTML needs "offline" support

As offline web activity becomes more popular, I believe that it will
become necessary for web site authors to be able to craft their content
so that it is downloaded in an optimal manner. As the technical arm for
a number of content creation-oriented web sites that are quite
interested in offline subscribers (e.g. www.tvguide.com,
www.foxnews.com) I believe that that there is room for HTML tagging to
improve the offline web experience.

Current offline browsers (e.g. Freeloader, Netscape's Netcaster) take a
naive approach, in that they simply traverse all available links, to a
user-defined depth, caching all retrieved documents.  The result is that
inappropriate content is cached, and that links can unpredictably (to
the user) trigger attempts to connect to a LAN/dial that isn't
available.

As an alternative, I believe that there needs to be an attribute of an
anchor that can be used by browsers to optimize the content that they
cache locally.

The robots.txt file mechanism isn't sufficient, in that there are
clearly different sets of content that are appropriate for robots to
scan than for humans to browse offline.  In addition, there may be
several levels of content that are potentially downloaded, while the
robot.txt mechanism is a simple include/exclude mechanism.

A meta-tag in HTML documents could help, but only for HTML documents. 
It's important that users (and thus browsers) can intelligently handle
the offline use of rich media, not just HTML.

I would propose that we add a "download" attribute to the <a> tag, which
would provide guidance to browsers that download web sites in the
background.  In addition, there would be a "download.txt" file, in the
same place as the robots.txt file, that would define what the levels of
the attribute mean.  This would allow browsers to let users select a
site to subscribe to, then select the level of content to download.

For example, imagine a news site that has these levels of offline
operation:

www.foo.com/download.txt:

* This file defines levels of downloading of content
1: current headlines
2: full text of major stories
3: multimedia edition

Aside from the above, a value of zero (0) would mean that the content
should never be downloaded for offline use.  This would apply for files
that trigger inherently online activities, such as RealAudio streaming
audio sessions.

Links would be like this:

<a href=foo.moov download=3>a newsreel clip</a>
<a href=body1.html download=2>more information</a>
<a href=audio.ra download=0>streaming audio</a>

Links without any download attribute would be followed using whatever
scheme the offline browsers current use.

If there are current discussions on this topic, please direct me to
them. I haven't found those discussions, and in talking with the major
vendors of this technology they don't seem to be aware of any, but I
have no desire to re-invent the wheel.

As an aside, it would also be useful if there were a similar pre-fetch
attribute, that would tell browsers that if there's available bandwidth
to pre-fetch the destination of the link.

Received on Wednesday, 14 May 1997 13:24:23 UTC