Style Guide for online hypertext

This guide is designed to help you create a WWW hypertext database that effectively communicates your knowledge to the reader. It has been prepared in the light of comments by readers, and many demands by providers of online documentation. Some of the points made may be influenced by personal preference, and some may be common sense, but a collection of points has been demanded, and so here it is.

The guide is designed to be read sequentially, but feel free to depart from this. The sections are as follows:

Introduction
Etiquette for server administrators
Cool URIs don't change (1998)
Overall structure of a work
Within each document
Test your document
Ramblings: Related but more random thoughts. Under construction.
Background reading
Reader comments

The above lists all the parts of this guide except for individual reader comments.

To print this document

A single long page with all of them excluding reader comments is available for printing (but has dysfunctional links and is not in correct html).

This document is open to comment!

Suggestions are strongly invited, if you think of anything mail it to timbl@w3.org, mentioning the Style Guide for Online Hypertext or its URL. I'm also interested in the URLs of other style guides, corporate house style guides, or your favorite book on style (hypertext or otherwise).

Introduction

You are going to write (or generate ) some online hypertext. Because hypertext is potentially unconstrained you are a little daunted. Do not be. You can write a document as simply as you like. In many ways, the simpler the better.

You will be writing a number of separate files. These files will be linked to each other, and to external documents, to make your final work.

You may think of your work as a "document", and if it were on paper, then you would call it that. In the online case though, we tend to refer to each individual file as a document. A document may correspond, in the book analogy, to a section or a subsection, or even a footnote. In this guide, we'll refer to the whole collection as a work.

The document is the unit by which information is picked up. At any one time, a document is completely loaded into the reader's computer. It is also normally the amount you edit at any one time, though with a good editor you will probably have a number of documents open at a time.

This guide has a bit on etiquette for each server , which mainly applies to the server administrator: the rest applies to anyone putting information onto any server. The section on structure discusses how you organize your material into documents. Another section discusses how to organize your material within a document .

Web Etiquette

There are a few conventions which will make for a more usable, less confusing, web. As a server administrator, or webmaster as they are known (the term having been coined on this page, below) you should make sure this applies to your data. This Guide gives more ideas for all information providers. See especially:

Your server administrator needs these things set up once per server:

A welcome page for outsiders

You don't have to have any particular structure to the data you publish: you can let it evolve as you think best. However, it is neat to have a document on each host which others can use to get a quick idea (with pointers) of what information is available there. You should put a "pass" line into your daemon rule file to map the document name "/" onto such a document. As well as a summary of what is available at your host, pointers to related hosts are a good idea.

Welcome home?

The welcome page for a server is often now called a "home" page because it is a good choice for a client to use as a home (default) page. The term "home" page means the default place to start your browser. Don't be confused by this, though. There are two separate concepts.

The welcome page will be welcoming those new to your server who want an overview of what it contains. It will serve a similar purpose to your home page, but it differs in the audience it addresses. Often, it only confuses things to have to, so people within the organization use the welcome page as their home. This at least ensures that they are aware of the public view of the organization. I don't do this myself, as I have many personal things on my home page, which I don't want on the organization's welcome page nor my own "welcome" page, my Bio. A welcome page may have explanations about what your server is all about which would be a waste of space on a home page for your local users. So you may want to make a separate home page for local users.

An alias for your server

If you have a serious server then it may last longer than the machine on which it runs. Ask your internet domain name manager to make an alias for it so that you can refer to it, instead of as "mysun12.dom.edu" as "www.dom.edu". This will mean that when you change machines, you move the alias, and people's links to your data will still work.

In the future [3/94] clients come out of the box configured to look for a local "www" machine, to use its welcome page as "home" if no other default is specified. This means that anyone starting such a client within your domain will get a relevant place to start.

An alias for yourself

You should make a mail alias "webmaster" on the server machine so that people who have problems with your server can mail you about it easily. This is similar to the "postmaster" alias for people who have mail problems with your machine.

Delegating control

The server administrator (the one with the root password) in principle has the power to turn the thing on or off, and control what happens. However, it is wise to have clearly delegated responsibility for separate areas of documentation. Maybe the server administrator has no responsibility at all for the actual content of the data, in which case he or she should just keep the machine running properly.

House style

The web has spread from the grass roots, without a central authority, and this has worked very well. This has been due in part to the creativity of information providers, and the freedom they have to express their information as directly and vividly as they can. Readers appreciate the variety this gives. However, in a large web they also enjoy a certain consistency.

If you are a person responsible for managing the information provided by your organization, you have to balance the advantages of a "house style" with the advantages of giving each group or author free rein. If you end up with decisions in this area, it is as well to write them down (not to mention put them on the web).

Cool URIs don't change

What makes a cool URI?
A cool URI is one which does not change.
What sorts of URI change?
URIs don't change: people change them.

There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice.

In theory, the domain name space owner ownes the domain name space and therefore all URIs in it. Except insolvency, nothing prevents the domain name owner from keeping the name. And in theory the URI space under your domain name is totally under your control, so you can make it as stable as you like. Pretty much the only good reason for a document to disappear from the Web is that the company which owned the domain name went out of business or can no longer afford to keep the server running. They why are there so many dangling links in the world? Part of it is just lack of forethought. Here are some reasons you hear out there:

We just reorganized out website to make it better.

Do you really fel that the old URIs cannot be kept running? If so, yu chose them very badly. Think of your new ones so that you will be able to keep then running after the next redesign.

We have so much m,aterial that we can't keep track of what is out of date and what is confidential and what is valid and so we thought we'd better just tutn the whole lot off.

That I can sympathise with - the W3C went though a period like that, when we had to carefully sift archival material for confidentiality before making the archives public. The solution is forthought - make sure you capture with every document - its acceptable distribution, its creation date and ideally its expiry date. Keep this metadata.

Well, we found we had to move the files...

This is one of the lamest excuses. A lot of people don't know that servers such as Apache give you a lot of control over a flexible relationship betwene the URI of an object and where a file which represenst it actually is in a file system. Think of the URI space as an abtsract space, perfectly organized. Then, make a mapping onto whatever reality you actually use to. Then, tell your server. You can eaven write bits of your server.

John doesn't maintain that file any more, Jane does.

Whatever was that URI doing with John's name in it? It was in his directory? I see.

We used to use a cgi script for this and now we use a cgi script.

There is a crzy notion that pages produced by scripts have to be located in a "cgibin" or "cgi" area. This is exposing the mecahnism of how you run your server. You change the meachanism (even keeping the content the same ) and whoops - all your URIs change.

For example, take the The National Science foundation:

NSF Online Documents
http://www.nsf.gov/cgi-bin/pubsys/browser/odbrowse.pl

the main page for starting to look for documents, is clearly not going to be something to trust to being there in a few years. "cgi-bin" and "oldbrowser" and ".pl" all point to bits of how-we-do-it-now. By contrsast, if you use the page to find a document, you get first an equally bad

Report of Working Group on Cryptology and Coding Theory
http://www.nsf.gov/cgi-bin/getpub?nsf9814

for the document's index page, but the html document itself by contrast is very much better:

http://www.nsf.gov/pubs/1998/nsf9814/nsf9814.htm

Looking at this one, the "pubs/1998" header is going to give any future archive service a good clue that thenold 1998 document classification scheme is in progress. Though in 2098 the document numbers might look different, I can imagine this URI still be ing valid, and the NSF or whatever carries on the archive not being at all embarassed about it.

I didn't think URLs have to be persistent - that was URNs.

This is the probably one of the worst spin-offs of the URN discussions. Some seem to think that because there is research about namespaces which will be less persistent that they can be as lax about dangling links as they like as URNs will fix all that. If you are one of these folks, then allow me to disillusion you.

Most URN schemes I have seen look something like an authority ID followed by either a date and a string you choses, or just a string you chose. This looks very like an HTTP URI. In other words, if you think your organization will be capable of craeting URNs which will last, then prove it by doing it now and using them for your HTTP URIs. There is nothing about HTTP which makes your links URI. It is your organization. Make yourself a databse of document URN to current file, and let the web server use that to actually retrieve files.

If you have got to this point, then unless you have the time and money and contacts to get some software design done, then you might claim the next excuse:

We would like to but we just don't have the right tools.

Now here is one I can sympathise with. I agree entirely. What you need to do is to have the web server look up a persistent URI in an instant and return the file, whereveer you current crazy filesystem has it stored away at the moment. You'de like to be able to store the URI in the file as a check, and constantly keep the database in tune with actuality. You'd like to store the relationships between different versions and translations of the same document, and you'd like to keep an independent record of the checksum to provide a guard against file corruption by accidental error. And web servers just don't come out of the box with these features. When you want to create a new document, your editor asks you for a URI instead of telling you.

You need to be able to change things like ownership, access, archive level security level, and so on of a documeny in the URI space without changing the URI.

Too bad. But we'll get there. At W3C we are playing with "Jigedit" functionality (jigsaw server with editing) which does track versions, and we are experimenting with document creation scripts. If you make tools, servers and clients, take note!

This is an outstanding reason, which applies for example to many W3C pages including this one: so do what I say, not what I do.

Why should I care?

When you change a URI on your server, you can never completely tell who will have links to the old URI. They might have made links from regular web pages. They might have bookmarked your page. They might have scrawled the URI in the margin of a latter to a friend.

When someone follows a link and it breaks, they generally lose confidence in the owner of the server. They also are frustrated - emotionally and practically from accomplishing their goal.

Enough people complain all the time about dangling links that I hope the damage is obvious. I hope it also obvious that the reputation damage is to the maintainer of the server whose document vanished.

So what should I do? Designing URIs

It the the duty of a Webmaster to allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years. This needs thought, and organization, and commitment.

URIs change when there is some information in them which changes. It is critical how you design them. (What, design a URI? I have to design URIs? Yes, you have to think about it.). Designing mostly means leaving information out.

The creation date of the document - the date the URI is issued - is one thing which will not change. It is very useful for separating requests which use a new system from those which use an old system. That is one thing which us good to start a URI with. If a document is in any way dated, even though it will be if interest for generations, then the date is a good starter.

The only exception is a page which is deliberately a "latest" page for for example the whole organization or a large part of it.

http://www.pathfinder.com/money/moneydaily/latest/

is the latest "Money daily" column in "Money" magazine. The main reason for not need ing the date in this URI noone is likely to want to link to that for link which will outlast the magazine - if you want to link to the content, you would link to it where it appears seperately im the archives as

http://www.pathfinder.com/money/moneydaily/1998/981212.moneyonline.html

(Looks good. Assumes that "money" will mean the same thing thoughout the life of pathfinder.com. There is a duplication of "98" and an ".html" you don't need but otherwise this looks a strong URI).

What to leave out

Everything! After the creation date, putting any information in the name is asking for trouble one way or another.

Authors name - authorship can change with new versions. People quit organizations and hand things on.
Subject. This is tricky. It always looks good at the time. You classify your documents according to a breakdown of the work you are doing. That breakdown will change. Names for areas will change. At W3C we wanted to change "MarkUp" to "Markup" and then to "HTML" toreflect the actual content of the section. Also, beware that this is flat name space. In 100 years are you sure you won't want to reuse anything? We wanted to reuse "History" and "Stylesheets" for example in our short life.
Status - directorties like "old" and "draft" and so on, not to mention "latest" and "cool" appear all over file systems. Documents change status - or there would be no point in prodcuing drafts. The latest version of a document needs a persitent identifier whatever its status is. Keep the status out of the name.
Access. At W3C we divide the site into "Team access", "Member acess" and "Public access". Sounds good, but of course documents start off as team ideas, are discussed with members, and then go public. A shame indeed if everytime some document is opened to wider discussion all the old links to it fail! We are switching to a simple date code now.
File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.
Software meachsnisms. Look for "cgi", , "exec" and other give-away "look what software we are using" bits in URIs. Anyone want to commit to using perl cgi scripts all their lives? Nope? Cut out the .pl. Read the server manual on how to do it.
Disk name - gimme a break! But I've seen it.

So a better example from our site is simply

http://www.w3.org/1998/12/01/chairs

a report of the minutes of a meesting of W3C chairpeople.

Dont't forget the domain name.

Remember that this applies not only to the "path" part of a URI but to the server name. If you have seperate servers for some if your stuff, remember that that division will be impossible to change without destroying many many links. Some classic "look what software we are using today" domain names are "cgi.pathfinder.com", "secure", "lists.w3.org". They are made to make administartion of the servers easier. Whether it represents divisions in your company, or document status, or access level, or security level, be very very careful before using more than one domain name for more than one type of document. remember that you can hide many web servers inside one apparent web server using redirection and proxying.

Oh, and do think about your domain name. If your name ain't soap, will you want to be referred to as "soap.com" even when you have switched your product line to something else. (With apologies to whoever owns soap.com at the moment).

Conclusion

Keeping URIs so that they will still be around in a 2, 20 or 200 years is clearly not as simple as it sounds. However, all over the Web, webmasters are making decisions which will make it really difficult for themselves in the future. Often, this is becasue thy are using tools whose task is seen as to present the best site in the moment, and noone ahs eveluated what will happen to the links when things change. The message here is, however, that many many things can change and your URIs can and should stay the same. They only can if you think about how you design them.

Structure

If you have in mind a body of information to put across to your reader, you probably have a mental organization for it. Normally this is a sort of hierarchical tree, like the chapters of a book if you were to write a book.

Keep this structure. It helps readers to have a tree structure as a basis for the book: it gives them a feeling of knowing where they are. You can also use this structure for organizing your files in directories.

You should also bear in mind:

The reader's structure

Remember always the audience for whom you are writing. If they are novices in the subject, it will normally help if you are firm about the structure of your work, so that they can learn the structure of the knowledge itself. For example, if you feel that the subject falls into three distinct areas, then that is an important thing to teach.

If, however, your readers will already have some knowledge in the subject, then they will already have formed their own structure for it. In this case they will consciously or subconsciously know where they expect to find things. If your structure is different from theirs, enforcing it too strongly will confuse them and put them off.

You may in this case have to resist a strong tendency to put across your own structure strongly and to the detriment of all others. There are two solutions.

If you have a single well-defined audience in mind, who will share a similar world view, then try to write exactly for that world view rather than yours.

If you are simultaneously writing for more than one group, then you must provide for both.

When you make a reference, qualify it with a clue to allow some people to skip it. For example, "If you really want to know how it works inside, see the Internals guide", or "A step-by-step introduction is in the tutorial".

Provide links for both reader's views. Your work will be more connected than a simple tree, but with proper qualification, no one should get lost.

Provide two separate tree "roots". For example, you can write a step-by-step tutorial and a functionally direct reference tree for the same data. Both will at the lowest level have the same data, but while the first will deal with the simple things first, the second may be functionally grouped. This is just like having several indexes to a book. The tutorial might also include information which the reference work does not.

Overlapping Trees

Here is an example of a work (describing some programming functions, say) with two separate structures:

			Tutorial			Reference
			   |				    |
		  Let's do it together		       -----------------
		from simple to difficult	      |			|
			    |			by Functional      Alphabetical
			    |			    group	     by name
		  Task oriented examples	      |			|
			    |			       -----------------
			    |				    |
		  Examples of use of		   Syntax definition for
		  specific functions   <-------->    specific functions

The novice user starts at the top left, and works his way down. Where he needs specific details, he will get down to the examples and from them a link to the underlying definitive descriptions of each. As far as he is concerned, he is reading a tree-structured work. In fact, he is reading the same information as the expert who, coming in to check on one particular function, then looks up an example of its use.

How big to make each document

The most important point here is that a document should put across a well-defined concept. It is not generally worth splitting one idea arbitrarily into two bits in order to make the bits smaller. Nor is it a good idea to put together ideas which are really separate just to make a bigger document.

A document can be as small as a footnote .

There are two upper limits on a document's size. One is that long documents will take longer to transfer , and so a reader will not be able to simply jump to it and back as fast as he or she can think. This depends a lot on the link speed of course.

The other limit is the difficulty for a reader to scroll through large documents. Readers with character based terminals don't generally read more than a few screens. They often only absorb what is on the first screen, as if that is not interesting they won't be bothered to scroll down. Readers are also put off by being left at the top of a large document.

Readers with graphic interfaces generally scroll through long documents with a scroll bar. When the scroll bar is moved a small amount, the document should move a sufficiently small amount so that some of the original window-full is still left in the window. This allows the reader to scan the document. If the document is any bigger, then it is basically unreadable, in that any movement of the scroll bar will loses the place and leaves the reader disoriented.

Advantages with longer documents are that it is easier for readers with scroll bars to read through in an uninterrupted flow, if that is how the document is written.

Also, one doesn't have to go to the trouble of making (or generating) so many links and keeping them up to date if things are altered. If making the links is a problem, just settle for one link to a contents page. Some browsers have "next" and "previous" buttons to allow a document to be browsed serially according to a list.

(In fact, one can normally scroll up and down explicitly page by page, but this is gives the same feeling as the terminal interface.)

A rough guide, then, for the size of a document is:

For online help, menus giving access to other things: small enough to fit on 24 lines. Check this by using a terminal browser.
For textual documents, of the order of half a letter-sized (A4) page to 5 pages.

Refer or copy?

When you are setting up an information system which refer to information which is available elsewhere, be very careful before taking a copy.

Here are some reasons for leaving it where it is:

When it is updated, you will either have to have a way of finding out, and make a fresh copy, or you will end up with an out of date copy.
If you feel that your copy will be easier to access, remember that this is relative. You will have readers from other places who may find the original is closer. If the original has a serious access problem, you could find another server (maybe offer your own) as the definitive storage point.

Here are reasons for copying it:

If the information is transient, like a news article, you will have to take a copy.
You may want to refer to a particular version of something which will will be changed and not saved in an archive.

You should be very wary before referring to your own private collections of the following, of which plenty of established collections exist:

Internet FAQ *Frequently Asked Questions) lists
RFCs (Request For Comments -- Internet standards etc)
Information by subject
The "best" URLs on the web. Make a personalized list of pointers to things of specific interest. It will be more valuable!

Within each document

This section of the style guide deals with the layout of text within a "document", the unit of retrieval of information on the web.

To be completed.

You should try to:

Sign It!

An important aspect of information which helps keep it up to date is that one can trace its author. Doing this with hypertext is easy -- all you have to do is put a link to a page about the author (or simply to the author's phone book entry).

Make a page for yourself with your mail address and phone number. At the bottom of files for which you are responsible, put a small note -- say just your initials -- and link it to that page. The address style (typically right justified) is useful for this.

Your author page is also a convenient place to put and disclaimers, copyright notices, etc which law or convention require. It saves cluttering up the messages themselves with a long signature.

(If you are using the WorldWideWeb.app hypertext editor, then you can put this link from your default blank page so that it turns up on the bottom of each new document automatically)

The status of your information

Some information is definitive, some is hastily put together and incomplete. Both are useful to readers, so do not be shy to put information up which is incomplete or out of date -- it may be the best there is. However, do remember to state what the status is. When was it last updated? Is it complete? What is its scope? For a phone book for example, what set of people are in it?

Not every document needs a status declaration, if there is something in the overview page of the work which covers it.

You can of course also give a feel for the status of the text by its language ... bad spelling, missing capitals, and relaxed grammar all indicate informal notes. Careful use of verbs such as "shall" and "should", and the introduction of Long Capitalized Noun Phrases (LCNPs) will give at least the impression of an ISO standard. ;-)

Date it

In some cases it can be useful to put creation dates and last modified dates on your work. (Note that this is the sort of thing which one could make a server do automatically with a little programming).

Figure out whether putting one might later save the reader from following out of date information.

Linking to context

A major difference between writing part of a serial text, and an online document, is that your readers may have jumped in from anywhere. Even though you have only made links to it from one place, any other person may want to refer to that particular point, and will so make a link to that particular part of your work from their own. So you can't rely on your reader having followed your path through your work.

Of course if you are writing a tutorial, it will be important to keep the flow from one document to the next in the order you intended for its primary audience. You may not wish to cater specially for those who jump in out of the blue, but it is wise to leave them with enough clues so as not to be hopelessly lost. Some ways of doing this are:

Watch that your text and vocabulary stands by itself. Starting a document with "The next thing we we consider is..." or "The only solution to this problem is..." will certainly confuse.
Sometimes the opening words refer to the context, and can be linked to background information. For example, in the WWW project documentation, the first occurrence of the acronym WWW is often linked back to the central project document.
The navigation hints at the top or bottom of the document can give explicit pointers. Examples are at the bottom of this document.

It can also be useful to imagine as you are writing that you yourself may wish to reuse the document. some day.

Navigational Icons

Icons make great navigational hints. It is very effective to have the same consistent icon throughout the work, always (except on the "top" page) linked back to the top page. This kills two birds with one stone: it gives consistency to the work, so readers know when they are in it and when they are outside it, and it also gives them a quick way of getting back to the top of it.

You can do the same thing with sections, so that at the top (or bottom) of each page you might have a small string of icons, the first to go back to the top of the work, the second to go back to the chapter, the third to go back to the section within the chapter, for example.

[This style guide was for a long time empty of icons because I was editing it with the old hypertext editor which doesn't handle images. I may fix that with time -tbl]

TITLE

The title of a document is specified by the TITLE element. The TITLE element should occur in the HEAD of the document.

There may only be one title in any document. It should identify the content of the document in a fairly wide context.

The title is not part of the text of the document, but is a property of the whole document. It may not contain anchors, paragraph marks, or highlighting. The title may be used to identify the node in a history list, to label the window displaying the node, etc. It is not normally displayed in the text of a document itself. Contrast titles with headings . The title should ideally be less than 64 characters in length. That is, many applications will display document titles in window titles, menus, etc where there is only limited room. Whilst there is no limit on the length of a title (as it may be automatically generated from other data), information providers are warned that it may be truncated if long.

Examples of use

Appropriate titles might be

		<TITLE>Rivest and Neuman. 1989(b)</TITLE>

		<TITLE>A Recipe for Maple Syrup Flap-Jack</TITLE>

		<TITLE>Introduction -- AFS user's Guide</TITLE>

Examples of inappropriate titles are those which are only meaningful within context,

		<TITLE>Introduction</TITLE>

or too long,

	<TITLE>Remarks on the Quantum-Gravity effects of "Bean
	Pole" diversification in Mononucleosis patients in Developing
	Countries under Economic Conditions Prevalent during
	the Second half of the Twentieth Century, and Related Papers:
	a Summary</TITLE>

Device Independence

The hypertext you write is stored in HTML language, which does not contain information about the fonts and paragraph shapes and spacing which should be used for displaying the document.

This gives great advantages in that your document will be rendered successfully on whatever platform it is viewed, including a plain text terminal.

You should be aware that different clients do use different spacing and fonts. You should be careful to use the structuring elements such as headers and lists in the way in which they were intended. If you don't like the rendering on your particular client, don't try to fix it by using inappropriate elements, or trying for example to force extra spacing with empty elements. This may well end up being interpreted differently by other clients and looking very strange. You can in many cases configure the client displays each element.

For example:

Always use heading levels in order, with one heading level 1 at the top of the document, and if necessary several level 2 headings, and then if necessary several level 3 headings under each level 2 heading. If you don't like the way heading level 2 is formatted, fix it on your client, don't just skip to heading level 3.
Don't put extra spaces or blank lines into your text to pad it out, except in pre-formatted (PRE) sections.
Don't refer in your text to facets of particular browsers. Asking someone to "click here" won't make sense without a mouse, just as asking someone to "select a link by number" will betray the fact that you were using the line mode browser. Just leave a link. The instructions get boring as the user will normally know how to select a link.

A choice of senses

It is not even unwise to assume that your readers will be using a screen-based browser at all. The visually impaired, or those at work or driving, may be browsing the web using their ears rather than their eyes. The "click here" makes even less sense for them.

A few obvious things to do are

Use ALT tags on images whose meaning is needed for the document;
Don't use ALT tags on purely decorative stuff;
Use tags for the purposes for which they were intended, so that future browsers can do even cleverer things with them;

I can't give a complete summary of the do's and don't for making a web page "accessible".

Printable hypertext

In an ideal world, paper might not be necessary. In a next to ideal world, one would have enough time to write a hypertext version of a document and also to write a completely separate paper version. However, the real world, you will probably want to generate any printed documents and online documents from the same file.

Suppose the HTML files will be the master, and you will generate the printable from this, by making one long document, and possibly printing it via translation into TeX, or some word processor format, for example. You might not initially, but you might want to one day.

Try to avoid references in the text to online aspects. "See the section on device independence " is better than "For more on device independence, click here .". In fact we are talking about a form of device independence .

Unfortunately, the recommended practices of signing each document and giving navigational links tend to mess up the printable copy, though one can of course develop ways of stripping them out if they follow a common format.

For example, the most common comment about this document was that it is difficult to print. I therefore made a single page version of the whole thing with a few scripts, and put a pointer to it from the cover page. But then people still ask, not having read the cover page. (The scripts were just bits of "sed", which I am not supporting. I have put rules in at the top and bottom of each page and the scripts use these to chop off bits which are not needed in the printed copy.)

Make your (hyper)text readable

This is just a little rant about two style issues in hypertext that I'm seeing more of and don't like much.

The first is the _here_ syndrome, e.g.:

	Information about Blah Blah Blah is available by clicking _here_.

where the word _here_ is the link. This style is really awkward; when you click on 'here', you have to look around to make sure it is the *right* here. Let me urge you, when you construct your HTML page, to make sure that the thing-you-click is actually some kind of title for what it is when you click there. E.g. say

	Information about _Blah Blah Blah_ is now available.

And use:

	Information on _how to do searches_ is available.

instead of

	For information on how to do searches, choose _this link_

.Not quite as bad, but still awkward is where someone will use a topic word as a link, but it still talks about the links:

   	Here are links to a _CREDITS_ page and _technical details_ ...

Instead, try to write something like

   	Many thanks go to _various people_ for their contributions.

_Technical details_ of this system are available now.I.e., make your HTML page such that you can read it even if you don't follow any links.

Avoid talking about mechanics

Announcements of internet services have typically been followed by pages for information about how to use FTP, mail servers, etc, to get at the information. The WWW is designed to make all this unnecessary.

The temptation is to strip out these instructions and leave a link like:

	There is now WWW accessto our large FTP archive which was
	previously only available by FTP, NFS and mail.  This
	collection includes much public domain software and text
	whose copyright has expired.

The web is read by people who don't need or, often, want to know about FTP and NFS - or even WWW! So the following is better:

	Our archive includes much public domain software and text
	whose copyright has expired.

Keeping on the subject of discourse rather than the mechanisms and protocols keeps the text shorter, which means people are more likely to read it.

Even when you are working within the web metaphor, use links, don't talk about them. For example

	You can read more about this in the tutorial which is
	linked to the home page

obviously would be be better as

		The tutorial has more about this.

Another common one is

	The tutorial contains sections on mowing, sharpening the mower,
	and buying a mower.

Give the reader a break, and let him or her jump straight there!

	The tutorial contains sections on mowing, sharpening the mower,
	and buying a mower.

Test your document

In a way your hypertext is like a book, which you should have proofread. In a way, it is like a program which you should have tested. At least get someone from the target group for which you wrote the document to read it and give you some feedback. Other ideas are:

Read the document using several different client programs, to ensure that you have formatted it in a device independent way.
Monitor the readership of your document. You can do this by analysing the server log files. You may find that some parts are not being read, perhaps because people are looking in the wrong place for them. You may see that people often follow a path and backtrack. If you can guess what they were looking for, you can make the clues around the link more helpful. (Remember to keep log information confidential until you have removed user information from it.)
Make it clear whether your will accept criticism or suggestions from your readers, and how they should send it.
Ask people to solve problems using the document, and report on their success. If they fail, find out what they were looking for, whether it was in the document at all,

How much testing?

Testing takes time. The decision of how much testing you do is based on the quality of the document you wish to provide. You are balancing your reader's time and effort against yours. If your document is "selling" an idea, or if you are selling the document or providing a service, you will want to make it as easy as possible for the reader. If many people will read your work, a little of your time will save a lot of theirs.

If however you are documenting some obscure part of a system in which no one other than yourself is likely to be interested, or if you feel that your readers are lucky to have anything available at all, there is no point wasting time testing it. In the event of someone needing the information, they might have to go to some extra trouble to follow several links to find what they want, and then to understand what you have written. This may be the most efficient way of working. I emphasize this because there is very much information which is for a fleeting moment in people's minds, or is hastily scribbled down on some file, and which may be important to posterity. It is better for this information to be available even in unpolished form than for it to be hidden out of embarrassment for its form. Before electronic technology, the effort of publishing was such that this information was never seen, and it was a waste, and and considered an insult to one's readers, to publish something which was not of high quality. Nowadays, there is "publishing" at all levels, and both high quality and hasty documents have their value. It is important, though, to make it clear what the quality of a document is when making a reference to it, to avoid disappointment.

Monitoring the server log files will tell you which documents are really being read. You can use your time most efficiently to improve the quality of those. Of course, analysing the server log files also takes time!

Testing your HTML

If you are using hypertext editing software, then your files should always contain valid HTML. Currently though, many people are editing HTML files as plain text files and having to get the markup right themselves. If you are in this category, then it is well worth running the HTML you write through an HTML checker. (There are some pointers to these from the W3C HTML overview.) It is also a good idea when you use a new HTML generation tool to test its output once. There are pointers to clients (some with HTML editing capability)and pointers to more lists in the client list.) (Part of the Style

Acceptable Content

Other sections of the style guide have dealt with the layout and structure of text. Here I digress to broaden the notion of style into a consideration of what is acceptable in the actual content of information put on the web.

The web is intended to be a mapping of the knowledge of society in general, and not constrained into a particular format or level. One can therefore expect to find anything on it, from scribbled note to encyclopaedia, and styles and manners will vary. However, for information which is to be generally accessible there are some questions of acceptability which apply simply because the material has been published.

There is a tendency on the Internet to regard news and email as very informal media, in which tolerance is expected. However, you never know who may link to or be led to your public web document. In some countries there may be legal requirements as well as informal ethical codes. I would not advocate any global censorship in this regard, but that does not mean you should not think about which aspects below are relevant to the document you are writing.

Due Credit

The visibility of someone's work depends, on the web more than anywhere, on where it is referenced. If an academic paper purports to be a description of some state of affairs and in fact does not mention related work which maybe of interest, the academic code requires that you refer to it.

Commercial servers selling products are not in practice bound by such a code: you don't find hypertext links to competitor's products. Therefore, make it clear whether your list of services is intended to be fair, or is commercially biased. Both forms are appreciated by the public, but an advertisement masquerading as something else is not.

Acceptable content

Pornography is just the most often discussed form of content which is generally disapproved of and illegal. There are others: libel, material infringing copyright or other intellectual property right, and material inciting to criminal activity are also things which you would be wise to avoid. Bear in mind in this context

your reader
whoever pays for your time equipment and connectivity
your government

and try not to upset any of them.

Where you feel that something may offend your reader, you can to a certain extent protect both yourself and them by making an access path which goes through a warning page, and never yourself distribute URIs for anything behind that page without a similar warning. This is not, of course, foolproof.

The PICS initiative of the W3C consortium is aimed at allowing you (or anyone else) to rate your pages as to their acceptability. The idea is that parents and schools can then use rating systems of their choice to select suitable content for their children. This technology is expected to be available commercially some time in 1996.

Acceptable language

Unacceptable language is the simplest form of unacceptable content. Standards tend to vary from one country to another: in the US they are high. Here is a non-exhaustive non-definitive checklist of a few sorts of language to avoid.

Sexist: Any language which makes reference to the male or female of the human species when either could in fact apply. References to users for example as "he" or "she" can be equally offending.
"Adult": Sexual innuendo or explicit vocabulary obviously is a no-no.
Racist: Politically correct language for referring to each ethnic group seems to vary from month to month. because there are such strong feelings it is wise to be careful.
Religiously assumptive: Whilst it it is reasonable to discuss religion, inadvertently assuming that your reader is of a particular religion or not can be offending.

( back to ... , On to ... )

Thanks to my brother Michael for descibing "how and why" trees he uses in his team-building training.

Why "why?"

Why is it important to be able to ask why?

Because without scientific curiosity we would not have scientific discovery.
Because democracy depends on an informed public (because, in principle, the public are needed to make decisions directly or indirectly);
Because an unquestioning individual can be hookwinked and manipulated;
Because our society is constructed of a set of assumptions built on each other and on shifting conditions, and only by constant inspection can this framework of assumptions and beliefs be maintained in a sufficiently consistent state for us to stay out of serious trouble.

(If this were a good piece of hypertext, it would have well researched links to webs on science, society, politics, ethics and philosophy -- but isn't and it doesn't. Suggestions for related material welcome.)

Background reading

Some other documents which may be of relevance, if you are reading the Style Guide for Online Hypertext :

All about the various HTML Specifications
A Beginner's Guide to writing HTML
World-Wide Web server software - a list of pointers
Web Etiquette -- for Server Administrators (now part of this guide)
The Web Designer, N Laviolette. Many references
The Sevloid guides to Web Design and to HTML John Cooke, 1997 (

Personal views:

Thoughts on style from Jorn Barger.

This list is far from complete. There are many books now (1994) about how to buld a web site and write HTML. If they have a common failing it is that they assume a particular browser. Pick one which inspires you, as they are generally full of enthusiasm and good ideas. Browse the web, get your own ideas, and mail me mentioning the guide. (Part of the Style Guide for Online Hypertext; Back to testing, on to ...)

Disclaimer and Copyright

MIT / LCS 545 Technology Square, Cambridge MA 02139, USA

This information is provided in good faith but no warranty can be made for its accuracy. Opinions expressed are entirely those of myself and/or my colleagues and cannot be taken to represent views past present or future of our employers.

Feel free to quote, but reproduction of this material in any form of storage, paper, etc is forbidden without the express written permission of the author. Intellectual property rights in this material may be be held by the author, CERN and/or MIT. All rights are reserved.

If you notice something incorrect or have any comment which you don't think is a FAQ, feel free to mail me. If I don't get around to answering, please forgive me -- I try to answer everything!

Tim Berners-Lee