Reform of SGML

It is true that SGML was designed from the standpoint of markup, that is, annotations on text as to how it should be formatted, rather than as a language. Here is my $.02 worth .. I don't for a moment imagine that ISO would really clean it up at this stage. (March 93). We consider an incremental cleaning up of justa few points of SGML syntax.

Clean up those brackets

The problems of interpretation of the space betwen two tags would be removed if one had a delimiter (say a semicolon) which meant "end of tag, begin new tag". For some reason an empty piece of text is used for this in SGML! This is like using a null string, or often a newline string, as a statement separator.

Suppose one could then write

			<TAG1 ATTR ATTR2 ;
 			TAG2 ATTRSF SDF SDF>
 
instead of
			<TAG1 ATTR ATTR2>
			<TAG2 ATTRSF SDF SDF>

Try this with your average DTD and see how clean it looks! The result looks like (what it should be) a computing language with text as parameters.

Free format

Suppose white space be allowed between the < and the first character. This is unthinkable to the markup-minded person who wants a < by itself to be an error but it looks SO much nicer to a language-minded person:
		<SECTION LEVEL=2>
		<STITLE ID=ABC>What Next?
		</STITLE>
		<IDX>
		   <FIG X=7 y=67  CAP="The solution">
		   Hello
		   </FRE>
		</IDX>

would come out like
		SECTION LEVEL=2;
		    STITLE ID=ABC  >What Next?<
		 /STITLE;
		 IDX;
		   FIG X=7 y=67  CAP="The solution"
		   >Hello<
		   /FIG;
		/IDX;

It makes so much more sense to quote the text instead of the markup when there is much more markup than text. This way it can look like language with embedded text or text with embedded markup depending on which is predominant.

Unifying the quoting

Now, the astute would realize that the double quotes in the attribute value CAP="The solution" are playing basically the same role as the angle brackets which are left around text, and would suggest that they are made equivalent.
		SECTION LEVEL=2;
		    STITLE ID=ABC  >What Next?<
		 /STITLE;
		 IDX;
		   FIG X=7 y=67  CAP=>The solution<
		   >Hello<
		   /FIG;
		/IDX;

Now we have only one form of quoting and we can easily distinguish between markup and text because one is inside and the other outside the quotes.

Mark the structure

An independent point is a fundamental bug in the language design which makes it impossible to tell which elements are empty without the DTD. In other words, the structure is not apparent from the syntax. For a "structured markup language", that's pretty bad.

If you run Dynatext, for example, all you have to do is tell it which elements are empty and it can do a good job without any DTD. It should really be possible to see the structure at a low level. So I would suggest some kind of opening symbol which was mandatory on all element opening tags. maybe a trailing / for symetry with the leading / of the closing tag. For example:

		SECTION/ LEVEL=2;
		    STITLE/ ID=ABC  >What Next?<
		    /STITLE;
		   IDX/;
		      FIG/ X=7 y=67  CAP=>The solution<
		         >Hello<
		      /FIG;
		/  IDX;

Now I can parse that and see that I am missing a section end.

Of course real language people might want use a different concrete syntax:

		{ section(level=2)
		    { stitle(id=abc)
		    	"What's Next?"
		    } stitle
		    { idx
		        { fig (x=7, y=67, cap="The solution")
			    "Hello"
		        } fig
	   	     } idx
	
but we wouldn't like SGML not to look like SGML, would we? :-)
	

Tim BL