samples directory (action 20220215-01)

In fulfillment of the action I took today, I have created an
ixml/samples directory (the action said ixml/grammars but I don't think
anyone cared much about the name, and 'samples' seemed better when
thought about it. Franklin Delano Roosevelt agreed when I consulted
him.)

I have seeded the directory with a README.md file, a grammar for the
IETF Augmented BNF notation and a grammar for ISBN-13 book numbers.
(Strictly speaking, the language defined by ISBNs is regular and could
be recognized by a regular expression, but writing the regular
expression to require the correct check digit is a bit of a challenge.
For that matter, I found writing an ixml grammar to require the correct
check digit also more challenging than I had expected.  If time allows,
I expect to add ISBN-10 and ISSN to the grammar as well.)

At the moment, neither grammar has been tested.

I think it would be useful to have ixml grammars that work for as many
well known notations or syntaxes as we can manage.  Programming
languages, Relax NG compact syntax, XPath, some flavor or other of SQL,
CSS, CSS Selectors, URIs, all come to mind.

What do people think about

  - Mail headers (RFC 822 and successors)
  
  - IETF dates, ISO dates, ...

  - The lexical spaces of the built-in datatypes of XSD

  - XPath and XSD regular expressions; other regex notations

  - XSLT match patterns (as distinct from XPath in general)

  - The subset of XPath which XSD processors are required to support for
    uniqueness constraints and assertions

  - REx grammar notation

  - Turtle, N3, other notations used in Semantic Web work

  - A grammar that can read XML and produce a kind of rudimentary
    representation of the XML (not, as things currently stand, the XML
    an XML parser would produce)

  - A rational form of CSV (if such a thing exists)

  - Some flavor of Markdown (there are so many to choose from) or one of
    its competitors.  Given our use of Github, perhaps Github-flavored
    Markdown would be helpful.

Are those worth trying to find grammars for and/or create ixml grammars
for?

Unfortunately, all of those seem rather computer-oriented; I am having
trouble thinking of things for which there is something like an
authoritative syntax that are not computer-oriented.  The best I have
managed so far are:

  - The syntax(es) used for formal logic by various theorem provers (no
    two seem to use the same syntax, so there is a wide range of choice)

  - The syntax used in Principia Mathematica (if someone can figure out
    how to express the rules for dots in a context-free grammar)

  - If anyone understands how legal citations are structured in the
    U.S. or in any other jurisdiction, that would be interesting,
    especially if accompanied by an explanation for those of us who
    don't.

  - Are there grammar rules for things like chemical formulas?  For
    standard names of molecules?

  - The notation used for describing syntax trees in the Susanna corpus
    of English.  (If there are other documented syntax notations, I'd be
    happy to work on them.  My recollection is that the Hamburg
    Dependency Corpus has a non-XML notation.  And my recollection is
    that the syntax notations in the Susanna corpus don't always parse
    according to the rules given in the documentation.)

The more examples we can think of, the better.  I think people should
get double points for suggestions in non-computer domains, and double
points again if there is something like an authoritative definition of
the language in question.  

Michael

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Wednesday, 16 February 2022 05:01:38 UTC