Date: Thu, 4 Jun 92 00:59:21 +0200 From: jfg@dxcern.cern.ch (Jean Francois Groff) Sender: jfg@dxcern.cern.ch To: barker@www1.cern.ch Subject: forwarded message from Tim Berners-Lee ------- Start of forwarded message ------- Received: by dxmint.cern.ch (dxcern) (5.57/3.14) id AA27986; Wed, 3 Jun 92 16:56:29 +0200 Received: by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA08770; Wed, 3 Jun 92 16:55:12 MET DST Message-Id: <9206031455.AA08770@ nxoc01.cern.ch > From: timbl@nxoc01.cern.ch (Tim Berners-Lee) To: connolly@pixel.convex.com Cc: timbl@nxoc01.cern.ch, wei@xcf.berkeley.edu, www-bug@nxoc01.cern.ch Subject: Re: still no DTD, huh? Date: Wed, 3 Jun 92 16:55:12 MET DST Dan, taking your points in order before they pop off the screen. I agree, attribute values ought to be quoted unless they contain only sgml-nice characters. The www browers accept quotes or non-quoted values. It is a bug in the NeXT editor that it exploits this feature. B When we fix the NeXT editor then we will put the quotes in. All other p browsers use the SGML.c parser in the W3 dist which accept quotes. Yes, NEXTID will have to go. NEXTID will be anattibute of the documenmt. We proposed sorry propose 3 dcotypes, HTDOC, HTERR and HTFWD to be described in the DTD. These will be such that any extra tags they define, and structure, will be safeley ignored by old parsers. 3. Minimisation. This is copied from the BOOKMAKER style stuff. Basically, we use <P> as a paragraph separater rather than a paragraph begin or end. It can be regarded as a minimized paragraph element though. Its just that we actually parse it as an empty elemnt with no end tag. That's still valid SGML and you could write it in the DTD that way. <LI> always has an opener and never a closer. The same applies to <DD> and <DT>. Note that we have though made sure that the browser will ignore closers to these, so we could edfine teh DTD with them in and optional. 4. YEs, sections appeal to me too. Especially when making big HTML files out of lots of little ones. The effect of <SECTION> .. </SECTION> would be to demote all headings by one inside the section. I would be inclined then to have simpky a <HEADING> tag which would be equivalent to H0 and map onto H1 within a section, or Hn within n sections. The SGML parser can't generate this stuff, but the editors could derive it from the style information. We would have to introduce <SECTION> early on to get a transistion period. Then in HTML3 we would declare H2 etc obsolete. Pei Wei is maybe working on a DTD too and Carl Barker at CERN is defininbg new features of HTML needed by new features in the protocol (things like <BODY NOTATION=postscript> and suchlike). Some of htis is defined in a few "technical notes" linked to a listof technical notes linked to the W3 project page, if you want to see and comment. (Carl: you could take this message in text form and link it in too) Tim ________ Dan's message: >From connolly@pixel.convex.com Wed Jun 3 04:23:34 1992 Return-Path: <connolly@pixel.convex.com> Received: from dxmint.cern.ch by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA05562; Wed, 3 Jun 92 04:23:28 MET DST Received: by dxmint.cern.ch (dxcern) (5.57/3.14) id AA27281; Wed, 3 Jun 92 04:21:34 +0200 Received: from pixel.convex.com by convex.convex.com (5.64/1.35) id AA25114; Tue, 2 Jun 92 21:21:17 -0500 Received: from localhost by pixel.convex.com (5.64/1.28) id AA23193; Tue, 2 Jun 92 21:21:15 -0500 Message-Id: <9206030221.AA23193@pixel.convex.com> To: timbl@nxoc01.cern.ch Subject: still no DTD, huh? Date: Tue, 02 Jun 92 21:21:14 CDT From: Dan Connolly <connolly@pixel.convex.com> Status: R by the way... replying to an address you sent me doesn't work... - ------- Forwarded Message ----- Transcript of session follows ----- >>> RCPT To:<timbl@dxmint.cern.ch> <<< 550 <timbl@dxmint.cern.ch>... Addressee unknown 550 timbl@dxmint.cern.ch... User unknown ----- Unsent message follows ----- Date: Tue, 26 May 92 17:06:43 +0200 From: connolly (Dan Connolly) Message-Id: <9205261506.AA25934@connie.de.convex.com> To: timbl@dxmint.cern.ch Subject: still no DTD, huh? Cc: connolly@convex.com I just browsed the web, hoping to find a DTD for HTML. No such luck. One nifty part of the Chameleon project is an X windows grammar editor for developing context free grammars. It's a little clunky, but in addition to outputting editable Chameleon grammar files, it can write YACC specifications or !SGML DTD's! Finally! a simple DTD editor! Unfortunately, it doesn't support attributes, and I don't think the DTD's it creates have minimization, but it could certainly save a lot of time in creating a DTD! I'll see if I can prototype something when I get back. More later. Dan - ------- End of Forwarded Message Well, I've been attempting to prototype something with Devegram, the Integrated Chameleon Architecture's (ICA's) grammar editor. I messed around a while and had it write out an SGML DTD to play with. Unfortunately Devegram doesn't support many features of an SGML DTD which would be most convenient to describe HTML. So I've abandoned Devegram in favor of a text editor. But it did help with the initial prototype. Now for the REAL problems: HTML in its present form is very difficult to describe in SGML. I'm not experienced enough to say for sure, but I think it's impossible. The problems are mostly small and lexical in nature, but I'd say it's VERY important to make these changes NOW in order to be able to use SGML processing engines in WWW clients in the future. An SGML document consists of 3 parts: the declaration, the prologue, and the instance. The declaration lays the groundwork -- defines the encoding and interpretation of the character set(s), sets processing limits and bounds, and other lexical stuff. Applications generally use the default SGML declaration given in the standard. Each SGML parser has a declaration that declares its feature list and limits. If HTML cannot be described with the default SGML declaration, this will severely limit the usable parsers. (one exception is the NAMELEN limit: many parsers have a value higher than 8) The prologue (sometimes called the DTD, though there may be more than one DOCTYPE in the prologue) gives the structure of the document -- the basic grammar and entities and such. This varies from one application to another, but generally one SGML declaration and prologue is used throughout an application. For example, CALS specifies an SGML declaration and some DTD's. The AAP also has a DTD. The third part is the document instance. This is the part that varies from one document to another within an application domain. I'm trying to use the default SGML declaration and design a DTD such that all HTML files are instances of that DTD. - --- 1--- The first problem I've come accross is that HTML attribute values are not quoted. That is: <A NAME=2 HREF=http://crnvmc.cern.ch./WHO> yields sgmls: SGML error at ../../../WWW/WWW/LineMode/Defaults/default.html, line 8 at ":": Incorrect character in markup; markup terminated I don't know what the exact syntax of an SGML attribute is, but it's not the same as HTML's "everything up to the next space or >" syntax. - --- 2 --- Next, all attributes have names. So I can't figure out a way to parse <NEXTID 10> I could do <NEXTID n=10> - --- 3 --- The biggest problem is the somewhat random use of minimization. I can't seem to make SGML sense of it. More later. I don't have as much time as I thought to explain this. - --- 4 --- I'd also like to be able to add a little more structure than just a "big list of tags and text" to the documents like this: <HTML> <TITLE>foo</TITLE> <SECTION> <H1> header </H1> paragraph associated with above header <SUBSECTION> <H2> header </H2> stuff under H2 </SUBSECTION> </SECTION> </HTML> I can _almost_ get the SGML parser to infer the <SECTION> and </SECTION> tags, but not quite. More later. Dan ------- End of forwarded message -------