W3C | HTML | writings on web architecture
Let's suppose we're creating a new, highly generalized data format from scratch (this is not entirely hypothetical).
We're after a language in the sense of SGML, lisp, or C more than an RPC-style presentation format, because it's for the purpose of machine-assisted human communication, and readability/writability by humans is too valuable to give up. It's essential for debugging and development purposes, since the applications tend to lead the tools, but there's also a bootstrapping and deployment advantage to data formats the people can understand by inspection; and my intuition says it's valuable for archival purposes.
See also: compound document architectures
Here are the good ideas I'd try to incorporate:
from lout 3.08 expert doc A @I symbol symbol. @Index Symbol is a name, like {@Code "@TeX"}, which stands for something other than itself. The initial @Code "@" is not compulsory, but it does make the name stand out clearly. A @I definition of a symbol declares a name to be a symbol, and says what the symbol stands for. The @I body of a definition body.of @Index { Body of a definition } is the part following the name, between the braces. To @I invoke invocation @Index { Invocation of a symbol } a symbol is to make use of it.
I think there's an irresolvable tension between "mostly text, with special characters for markup" languages like flatfiles and SGML vs. "mostly notation, with embedded strings" languages like s-expressions, C, SOIF, etc. SOIF's use of BLOBS is a nifty trick. In discussion with the CURL folks, TimBL suggests they can be combined (an earlier note about SGML along those lines).
Hmmm... an s-expression syntax with URLs as atoms sure would be nice. I wonder what happens if you take the common lisp reader and make the set of symbol-constituent characters the same as the set of URL characters. What happens to ()'s? Do you have to use <> or {} in stead? Could the set of URL characters
Another list of good ideas comes to mind if we're not just talking about read-only, sequential access:
{} -- empty object 1.0 -- token (number) "{}" -- string
Lout | Postscript | Trestle | Common Lisp | C |
---|---|---|---|---|
macro | backquote | #define macro | ||
definition | bind, def | macro | defmacro | |
letter | symbol contstituent | |||
symbol (identifier or delimiter) | symbol |
The PICS and XML groups are doing just this. The IETF URN WG, the WEBDAV group, and the DSIG manifest group are all potential consumers.
Note also, there is a move afoot to settle the syntax of URLs once and for all.