This document is also available in these non-normative formats: XML.
When designing computer systems, one is often faced with a choice between using a more or less powerful language for publishing information, for expressing constraints, or for solving some problem. This finding explains the Principle of Least Power, which suggests choosing the least powerful language suitable for a given purpose.
This document is an editors' copy that has no official standing.
Additional TAG findings, both accepted and in draft state, may also be available. The TAG may incorporate this and other findings into future versions of the [AWWW]. Please send comments on this finding to the publicly archived TAG mailing list firstname.lastname@example.org (archive).
The World Wide Web is unique in its ability to promote information reuse on a global scale. Information published on the Web can be flexibly combined with other information, read by a broad range of software tools, and browsed by human users of the Web. For such reuse to succeed, the broadest possible range of tools must be capable of understanding the data on the Web, and the relationships among that data. Thus, when publishing information or programs on the Web, the choice of language is important.
Expressing constraints, relationships and processing
instructions in less powerful languages increases
the flexibility with which information can be reused.
The reason for this is that less powerful the language, the
more you can do with the data stored in that language.
If you capture information in a simple declarative form,
anyone can write a program to analyze it in many ways.
Conversely, the Turing-complete languages are
shown by computer science to be equivalent in their ability
to compute any result of which a computer is capable, and are
in that sense the most powerful class of languages for computers.
The tradeoff for such power is that you typically cannot
determine what a program in a Turing-complete language will do
without actually running it.
Indeed, you often cannot tell in advance whether such a
program will even reach the point of producing useful output.
Of course, you can easily tell what a simple program
print "2+2" will do,
but given an arbitrary program you'd likely have to run it,
and possibly for a very long time.
Thus, there is a tradeoff in choosing between languages that
can solve a broad range of problems and
languages in which programs and data are easily analyzed.
Less powerful languages are usually easier to secure. A bug-free regular expression processor, for example, is by definition free of many security exposures that are inherent in the more general runtime one might use for a language like C++. Because programs in simpler languages are easier to analyze, it's also easier to identify the security problems that they do have.
In aiming for simplicity, one must of course go far enough but not too far. The language you choose must be powerful enough to successfully solve your problem, and indeed, complexity and lack of clarity can easily result from clumsy efforts to patch around use of a language that is too limited. Overall, though, the Web benefits when less powerful languages can be successfully applied.
Many Web technologies are designed to exploit the Principle of Least Power. HTML for example, is intentionally designed not to be a full programming language so that many different things can be done with an HTML document: software can present the document in various styles, extract tables of contents, index it, and so on. Similarly, CSS is a declarative styling language that is easily analyzed. The Semantic Web is an attempt, largely, to map large quantities of existing data onto a common language so that the data can be analyzed in ways never dreamed of by its creators. If, for example, some weather data is published as a Web resource using RDF, a user can retrieve it as a table, perhaps average it, plot it, or deduce things from it in combination with other information. At the other end of the scale is the weather information conveyed by an ingeniously written Java applet. While the applet might provide a very cool user interface or other sophisticated features, the results of the program will be not usually be predictable in advance. A search engine finding the resource will have no idea of what the weather data is or even, in the absence of other information, that it is a weather-related resource The only way to find out what a Java applet means is generally to set it running, and see what it does. Thus, HTML, CSS and the Semantic Web are examples of Web technologies designed with "least power" in mind. Web resources that use these technologies are more likely to be reused in flexible ways than those expressed in more powerful languages.
A different sort of scalability can be found when comparing Turing-complete languages. Although all have equivalent expressive power, functional languages such as Haskell and XSLT facilitate the creation of programs that may be easier to analyze than their imperative equivalents. Particularly when such languages are further subset to eliminate complex features (to eliminate recursion, perhaps, or to focus on template forms in XSLT), the resulting variants may be quite powerful yet easy to analyze.
The following should be considered before we publish:
The items in this section have not yet been addressed:
Should we provide lots of biblio references for programming languages, etc.? Pros: it's helpful. Cons: makes the text choppier to read, and most of the languages we reference are well known. For the moment, references are provided only for a few that may be obscure, such as JSON.
Remove this "To Do" <div> prior to formal publication.
Following is from Jon Hanna note to email@example.com, 20 Dec 2005:
The following "to do" items have been addressed:
Is it a principle? Roy suggests "no" (this was later determined to be a misunderstanding; Roy agrees it's a principle.. Is the title OK? (Agreed, it's a principle and the title is OK.)
Tim BL to review. (Done for first draft)
Elliotte Rusty Harold. says he's not sure whether SQL is Turing-complete. Jon Hanna answers that some dialects are, but the ANSI version isn't (or at one point wasn't). (Changed to indicate that some are, some aren't.)
Added reference to functional languages and Haskell OK? (Email feedback says "yes, mention." Clarified that functional languages are positioned as easier to analyze, not less desirable.)
New 3rd paragraph on "reuse on the Web" OK? (Tim: yes. Now moved to be 1st para)
Existence and wording of GPN OK? (yes)
Separation of and minor rewording of section 2 OK? (Tim had everything in one section.) (yes)
Decide editor list. [12/06/2005]: Dan suggests it's more accurate to include Noah as an editor, at least as long as the draft has text that Tim hasn't seen or approved. Noah has been added for now; we can revert to Tim-only if most of the final texts proves to be his. (Agreed: Tim and Noah)
Use RFC2119 uppercase terms? If not, remove from intro and bibref. I'm leaning against [NRM] (done - no reference to RFC 2119)
Tim's original claims the Java applet "can't be analyzed at all". I've kept that for now, but it seems a bit strong.[NRM] (Tim agrees. Fixed.)
Current text notes that PLP is enunciated in a document dated 1998. The actual label on the document is " Date: 1998, last change: $Date: 2006/01/23 13:55:50 $ " and Roy suggests that PLP in particular was added in a 2002 update. Need to get the date straight. (reference to date removed)
Editorial: "the attraction of being an open-ended hook into which anything can be placed" Mixed metaphor? Can you place something into a hook? (paragraph removed)
Following are from Tim Berners-Lee note to firstname.lastname@example.org, 20 Dec 2005:
"I'd like to change the wording of the bit about RDF to not talk about RDF in the HTML file but compare and HTML file with an RDF file. The business of mixing them is a distraction." (Example reworded to avoid suggestion that the weather RDF is in an HTML page.)
"As regards SQL and truing completeness, I had assumed that it wasn't but then I seem to remember being told it was." (See above. Text now makes clear that some versions are and some aren't.)
Following is from Harry Halpin note to email@example.com, 20 Dec 2005:
"There are good examples of non-Turing complete languages that make sense to your ordinary programmer on the street, such as regular expressions." (Regular expressions added to list at front, and used in security example.)
Following is from Bill de hOra note to firstname.lastname@example.org, 22 Dec 2005:
"The spectacle of initially and deliberately weak languages that have had to have extra expressive power bolted on is so very common, and flies in the face of this advice, that I wonder if this principle is applicable." (New paragraph added discussing downsides of going too far.)
Following is from Mark Baker note to email@example.com, 22 Dec 2005:
Following are from TAG Teleconference of 20 Dec 2005: (I haven't duplicated reference to some that were covered above):
"TBL: suggests adding discussion of OWL and Rule Interchange Format and their limited expressive power, with more capable supersets available. TBL: also, the relationships between subset and superset languages can be particularly clear in the realm of logic."(New section added on scaleable language families, using OWL as principle example. RIF not mentioned so far. We could add it.)
"DanC, you wanted to say I lean toward discussing HTML separately from the SemWeb, if only for story-telling/history reasons. and to suggest elaborating on the word "turing." (HTML now not discussed in RDF example. Several sentences added in section 1 introducing Turing completeness and implications thereof.)
19-Dec-2005 [NRM]: initial version
19-Dec-2005 [NRM]: corrected a few typos, set right date in this change log.
23-Jan-2006 [NRM]: Significant revision to account for comments made at the (Dec. 20 2005 TAG Teleconference ) and email received in Dec. and Jan. (Note: changed from discussing "procedural" to "imperative" languages.)