FormalLanguageVsProse

From W3C Wiki

When defining a language (such as XHTML), some requirements are defined in prose (in the case of W3C, in English) while others are defined using a formal specification. Very often, too, there is an overlap between the 2 approaches ; that's especially the case in XML Languages where a DTD or an XML Schema can be used for 2 purposes : to define the syntactic requirements set by the language and/or to allow validating documents conforming to the said language.

Issues may arise when this overlap leads to contradictions between the prose and the formal language ; such contradictions includes :

  1. the English prose says more than the formal language does. (Sometimes this is unavoidable -- sometimes a constraint simply cannot be stated in a particular formal language.)
  2. the formal language says more (ie, puts more constraints) than the English prose does ;
  3. the formal language says something different from the English prose

Case 1 may lead to a situation where a document is said to be valid with regard to the formal language, while it is not conformant to the specification as described in the text, and vice versa for case 2.

Case 3 is obviously the worse, where it may be impossible to say when a document does conform or not to a specification.

See also a thread on www-qa on this very topic.

RFC 3930 has a section on meaning in human vs computer oriented language that has some relevance for this topic.

Guidelines for the use of formal languages in IETF specifications

Testing Formal Languages specifications

One of the usual characteristics of a formal language is that it is processable by a computer : this allows to imagine several ways of testing a specification formal language :

  • when there is an already wide set of known conformant (or known non-conformant) instances of the language, set up a test suite to check whether a validator based on the formal language identify the documents as valid or not
  • when the formal specification is encoded in different languages (e.g. XML DTD vs XML Schema), set up a test suite that generate random documents based on the said formal specifications, and see whether they cross-validate
  • use the examples given in the specification as test cases to use in the validation test suite; this was done e.g. for VoiceXML 2.1 using an XSLT

Generation of documents based on XML Schema: see Convert Schemas to Documents, Generating for XML-based Web Component Interactions Using Mutation Analysis.

Yacker parses BNF and creates parsers in PERL (and in Python?). It is (potentially) used with the SPARQL grammar.

Generation of instances of a formal language: Polygen, open-source tool

Other formal languages include:

QA qa.png