# From: ianf@random.se (Ian Feldman, Keepers of The Setext Flame[tm])
# Newsgroups: alt.hypertext (complete, original headers at end of file)
# Date: Fri, 23 Apr 93 07:53:50 +0200
# Message-ID: <a7fd5104@random.se>
# X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_01.etx
# Reply-To: setext-list-request@random.se
# Organization: random design -- "Opinions, cheaply"
# Lines: 349
# Summary: setext is to plaintext as RTF is to RTFM
# Subject: Re: Looking for Electronic Publshing formats... [long]


SGML vs setext

by Ian Feldman

Having fathered and mothered setext, the structure-enhanced text markup method designed for use primarily by smaller periodic online publications, I feel compelled to clarify certain miscon- ceptions in regard to in this forum expressed doubts as to its usability as an electronic hyper?text interchange format. Please observe the ambiguity of the subject of this debate: the original query was about "electronic formats for printed materials" for deployment in a multi-format browser using "Amiga's system of DataTypes to provide content-independent methods of viewing data" (both quotes author verbatim).

In time the discussion has come to be centered around SGML's alleged superiority, inevitability and, to a lesser extent, of the setext being or not being a viable solution for online-distributed matter. Having read just the basic introductory document about it, meant to provide the public at large with an easily-palatable foundation, Eliot Kimber of IBM has declared_ it to be "a very primitive, obviously easy to implement and interchange."

Admittedly, limited it may be, but 'primitive'? Anything judged through the prism of the SGML will by definition appear primitive (although the setext ALSO readable) to the naked eye. In contrast, SGML et al judged through the bias of human-readable- text/ ASCII will appear unduly complex and mostly inaccessible to anyone having but the lowest common denominator hardware/ software at their disposal (80% of all users? 90%?) Sure, everybody should have a Corvette... er, a SparcStation I mean, but as long as not everybody does we might just as well judge the setext on its own merits.

Eliot Kimber has many interesting things to say about the SGML, data-notations limits and markup methods in general, any of which I couldn't agree more fully with. However, he also seems oblivious to the loopsided logic present in this his advocated solution (here taken out of context but not misrepresentative of the whole):

simply add a layer between the source and the presentation
system that translates the SGML source into setext dynamically:

SGML Source --> SGML2SETEXT --> setext --> setext viewer

It strikes me as no little ironic that in order to view enhanced plaintext (i.e. the setext) in a basic-structured manner, say an outline of the submitted text, one would have to first encode it with SGML, then pipe it through a filter with a DTD acronym thrown in for a good measure. I'd have thought that, if setext is deemed adequate for some particular job, then surely it wouldn't have to be arrived at via the SGML-encoding route. In fact, and if I may contribute something of a truly-heretic nature, I'd have thought that the opposite would be an altogether more-agreeable solution:

    plaintext --> setext --> setext2SGML --> SGML viewer

Obviously, Kimber has all the resources at his beck and call and expects that others will have them too. We may all yearn to become 1Mbit/sec-access high-flyers of the Internet, but in the meantime many of us have to make do with but Have-A-Mac and never enough funding to equip it with enough RAM to satisfy our needs.

setext in multimedia

The originator of this debate, Greg R Block further had this_ to say:

: For the moment, setext appears to me to be the most practical 
: (universal, useable, general consumption) standard around for
: textual documents.

But ONLY for textual documents, and that is where part of the
problem lies.  SGML's advantage is that it can structure things in
definite ways, and embed things that are not necessarily text. 


Let me respectfully suggest that anyone claiming that setext's use at best starts and ends with ASCII text had obviously not done their homework. Those of you familiar with the NewsGrazer newsreader on the NeXT may recall that the data format there employed is that of uuencoded richtext article prepended by ASCII version of the text of same. This enables it to propagate normally along the net, display as richtext on other NeXTs and the relevant, top portion of it, in plain elsewhere. Had the ASCII portions of it been setextized it'd allow it to provide an additional, more universally parseable, dimension of structure. So potentially setext is as valid an encapsulation method for distributed-multimedial use as may be the RTF, SGML and the others. But unlike the others the text content of it will ALWAYS remain readable to the unaided eye while still offering limited --but hardly "small"-- amounts of extractable structure.

Nor has potential for use of setext in hypertext been overlooked. While arguably providing only one dedicated tag for linking of (text) elements, the concept that it follows resembles closely the format employed by WorldWideWeb's email server (unbeknownst to one another, the WWW team and I have arrived at similar solutions of verbose anchors in text referenced by expanded URLs or comments at end of documents). In this fashion even when viewed in unenhanced state the "administrivial" linkage data need not encroach upon the content of the document itself.

Philippe-Andre Prindeville adds_ this:

SGML allows one to put wrappers on data-types that SGML itself
isn't capable of parsing.  This shows a reasonable amount of
forethought (wish certain "commercial" standards had half a mind
to do so).  We obviously can't foresee all possible media types. 
But we can plan for their advent. 

Ditto for the setext... no limits on encapsulated data types. Anything that can be encoded in transportable manner may be appended last after the human-readable portion of a document and, optionally, made into by-default-in-setext-viewers suppressed matter (in three different ways). Yet, although a dedicated browser is always a preferable solution, setexts do not automatically require one in order to be viewable. This in marked [sic!] contrast to the in the SGML FAQ 0.0 expressed statement:

# <A>99% of the fun with SGML can be had only with a parser, 
# so you do need one.         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


A thought or two

If past experiences are anything to go by, the biggest obstacle to wider acceptance of the setext seems to be a common inability to think in terms of other document models than those indented for paper printout. Surprizingly many people, even among the hypertext.rules community, seem unawares that they really are subconsciously thinking of text ending up on paper, rather than (and despite any usual claims to the contrary of) that of all-electronic delivery and "consumption." That and a second, equally-common misconception, that anything that's understandable also must be primitive, thus defacto unusable for the higher task at hand, whatever the latter may be.

As an extra service for the diagram-und-table individuals among yourselves here is an off-the-cuff attempt to summarize some of setext's attributes in relation to those of SGML and RTF. Not an expert on either one of these I ask for your forgiveness should I happen to have misrepresented something.

Easy-O-Meter[tm]

                          RTF  ___________ SGML  __________ setext
 basic document  flat file with   an entity made    any text file
          model  embedded typo-   up of definable   interspaced by 
                 graphic /tags    logical elements  subheads (also 
                 and no sense of  denoted by rigid  other unobtrusive
                 syntaxt (format  syntax and un-    optional elements
                 proprietary)     ambiguous <tags>  may be employed)
 --------------  ---------------  ----------------  -----------------
    generalized               no               YES                yes
        markup?
 --------------  ---------------  ----------------  -----------------
        primary  richtext         machine-assisted  bringing order to
      objective  interchange      large-scale       amorphous online-
                 format           text processing   distributed data
 --------------  ---------------  ----------------  -----------------
   papercopy-as              YES               yes                 NO
      ultimate-                                and
     -objective                                noo
 document model
 --------------  ---------------  ----------------  -----------------
       smallest  a character      a character       a word
       emphasis  (multistyled)    (multistyled)     (single style)
    granularity
 --------------  ---------------  ----------------  -----------------
   type of tags  /descriptor      <start> <\end>     this and that
     # employed  ?                unlimited #       2 + 11 optional
 --------------  ---------------  ----------------  -----------------
 #typographical  a finite set     unlimited set     3 typographical
 tags employed?                                     1 hypertextual
 --------------  ---------------  ----------------  -----------------
   tag overhead  +25%?            +30%?             +9% (verified)
 --------------  ---------------  ----------------  -----------------
 parser/browser              yes               YES                 no
       required
 --------------  ---------------  ----------------  -----------------
        encoder               no               YES                 no
       required  but recommended                    but would be nice
 --------------  ---------------  ----------------  -----------------
   availability  many commercial  a few commercial  a free browser
       of tools  readers          full-scale        for the Macintosh
                 a few authoring  implementations   several end-user
                 implementations  1 known Windows   implementations
                 a few freeware   free browser + 1  PC/ unix parser
                 resources        free source code  engine undergoing
                                  parser/ browser   tests
 --------------  ---------------  ----------------  -----------------
      installed  predominantly    professional/     50,000-100,000
           base  word processors  large, always     weekly readers
                 (under Windows)  requiring         predominantly Mac
 93-04-23        MS Word native   dedicated tools   growing fast
 ==============  ===============  ================  =================


Wrapping it up in more ways than one

As an afterthought: it may come as a surprize to everyone that the SGML <FAQ version="0.0" date="1991-12-15">, penned by Erik Naggum comes up in the Easy View browser for the Mac with certain of its elements emphasized as underlined richtext (version 2.3.1 of the EV, as yet being debugged, do not ask for a copy, please). Why is it so, you may wonder, has Erik been forced to employ some ``bastard'' format because SGML wouldn't do? No, of course not. Erik, at the time of writing it definitely oblivious of the very existence of setext, has simply seen the need to add visible emphasis to a FAQ intended for wide distribution, in a fashion that's commonly used on the net.

The setext neither has ambition nor makes any claims to be a "revolutionary" markup method -- whenever it was possible I had formalized the best of the current online usage and called it setext typotags this-and-that. Thus this SGML FAQ has defacto been enhanced in its plaintext state with no extra explicit encoding overhead. Now and then I also see on the net examples of what I'd call spontaneous-setexts, texts subdivided with valid setext subheads and title elements by their makers with no apparent knowledge of it whatsoever. If neither of this provides a strong argument for usability of the method as such then I don't know what else might do.

Yes,

this posting is a setext (the word stands both for the method and a single structure-enhanced text). Had you been reading it in a dedicated mail shell or newsreader_ you could have been presented with something akin to:

 (306) "Re: Looking for Electronic Publshing formats..." (Ian Feldman, Keep...
 -----------------------------------------------------------------------------
 SGML vs setext <0>
    setext in multimedia <1>
    A thought or two <2>
    Easy-O-Meter[tm] <3>
    Wrapping it up in more ways than one <4>
    Yes, <5>

and then been able to access its parts in non-linear fashion. If nothing else then at least the setext has a capacity to provide unambiguous yet unobtrusive anchors within texts that are supposed to be universally accessible everywhere. WWW, WAIS and Gopher people please take note.

There are other markup formats and many may well be "better" for their respective applications but generally speaking there are no other that can make the following claim: there is MORE to me than meets the eye.

Ian "Xanadude in waiting" Feldman <ianf@random.se>
       XU/Server[tm] not responding -- still trying

 $$

# original headers, suppressed on account of appearing AFTER a twodot-tt
# Path: random.se!ianf
# From: ianf@random.se (Ian Feldman, Keepers of The Setext Flame[tm])
# Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text,comp.text.sgml,comp.sys.amiga.multimedia
# Date: Fri, 23 Apr 93 07:53:50 +0200
# Message-ID: <a7fd5104@random.se>
# References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com> <19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr>
# X-References: <1qff1hINNf5u@uwm.edu> <1993Apr16.011307.20939@gallant.apple.com> 
#   <19930416.063132.922@almaden.ibm.com> <1993Apr16.175131.28736@gallant.apple.com> 
#   <1qn506INN27o@uwm.edu> <1qn588INN27o@uwm.edu> <raj.735006847@cambridge>
# X-More-References: <4939@ulysse.enst.fr> <4942@ulysse.enst.fr> <19930419.113449.182@almaden.ibm.com> 
#   <1993Apr19.203208.2751@ornl.gov> <1993Apr20.004712.4298@gallant.apple.com> 
#   <1993Apr20.005046.4406@gallant.apple.com>
# X-Even-More-References: <19930420.063124.67@almaden.ibm.com> 
#   <2AWMs*7c1@dynam.adsp.sub.org> <1r2n6p$m1n@nigel.msen.com> 
# Followup-To: alt.hypertext,comp.multimedia,comp.text,comp.text.sgml
# X-Note: ---------------------------------------------------------------
# X-Also: First Mac browser for setext, the structure-enhanced ASCII text
# X-This: format in sumex-aim.stanford.edu:/info-mac/app/easy-view-22.hqx
# X-Note: ---------------------------------------------------------------
# X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml.etx
# Reply-To: setext-list-request@random.se
# Content-Type: setext/plain; charset=ascii_827
# Organization: random design -- "Opinions, cheaply"
# Lines: 306
# Summary: setext is to plaintext as RTF is to RTFM
# Subject: Re: Looking for Electronic Publshing formats... [long]