On the Design of Notations

Steven Pemberton, CWI, Amsterdam and W3C
Chair, W3C HTML and Forms Working Groups

ME

ME

= Maine

Apparent Rule

Looking at GA= Georgia, and FL= Florida, it appears that there is no real rule.

What I could work out is:

Codes

NE: Nevada or Nebraska?

Codes

NE: Nevada or Nebraska?

It's Nebraska, but NB would have been a better choice

Codes

NE: Nevada or Nebraska?

It's Nebraska, but NB would have been a better choice

MI: Mississippi, Missouri, Michigan, or Minnisota?

Codes

NE: Nevada or Nebraska?

It's Nebraska, but NB would have been a better choice

MI: Mississippi, Missouri, Michigan, or Minnisota?

It's Michigan, but MG would have been a better choice

Codes

NE: Nevada or Nebraska?

It's Nebraska, but NB would have been a better choice

MI: Mississippi, Missouri, Michigan, or Minnisota?

It's Michigan, but MG would have been a better choice

MS: Mississippi, Missouri, or Minnisota?

Codes

NE: Nevada or Nebraska?

It's Nebraska, but NB would have been a better choice

MI: Mississippi, Missouri, Michigan, or Minnisota?

It's Michigan, but MG would have been a better choice

MS: Mississippi, Missouri, or Minnisota?

It's Mississippi, but MP would have been a better choice.

Active and Passive

But solving these problems with reading 2-letter codes would still not solve the problem of writing them.

Winter school was open in December

Water is warm

Doing it better

I couldn't believe it wasn't possible to do the 2-letter codes better.

So I wrote a program.

The best rule I came up with:

The point

My point here is that the 2-letter codes were introduced because of automation.

But that is no excuse.

1984: The introduction of the Macintosh

So this year is the 20th anniversary of the Mac. "The computer for the rest of us".

Well, the rest of them actually, because the type of person who turns up at an XML conference isn't likely to be your average computer user.

The average computer user

So let me tell you about people.

People have different psychological makeups.

One theory, of Jung, uses 4 variables:

Bruce "Tog" Tognazzini best described it in "Tog on Interface" in the section "The Goldilocks Theories"

...Tog...

People are strange...

The fact is, people are strange, yes even you

Writing text

People are strange...

The fact is, people are strange, yes even you

Writing text

People are strange...

Playing chess: command interface, mouse interface, 'direct manipulation' interface with real pieces

People are strange...

Playing chess: command interface, mouse interface, 'direct manipulation' interface with real pieces

Example

Hold your hand up

Count the number of triangles on the next screen

Drop your hand when you have counted

A lot of different shapes randomly placed

Now do the same

but count the red shapes

A lot of shapes, randomly coloured, randomly placed

People are strange

Now let me tell you something about computers

To demonstrate Moore's Law

Take a piece of paper, and divide it in two:

Representing the power of a modern computer with an area of the screen

Comparing this year's machine with one of 18 months ago

Comparing this years machine, with ones from 18 months and 3 years ago

Comparing the power of machines over the years, doubling each 18 months

This demonstrates that your current computer is more powerful than all other computers you have had put together

So how are we using all that extra power?

Badly...

Why aren't we using it to make people's (our!) lives better?

Programming

For instance programming.

In the 70's programmers were free. Nowadays, hardware is free.

According to the DoD, 90% of the cost of software is debugging.

According to Fred Brookes, Mythical Man Month, the number of bugs increases quadratically according to code size: L1.5.

I.e a program that is 10 times longer is 31 times harder to write.

Programming languages

Programming languages seem to always be designed for the machine. That may have been a wise decision in the 60's.

In the early 80's a group of us sat down to design a programming language from the programmer's perspective.

Interesting team: Dick Grune (CVS), Guido van Rossum (Python), Lambert Meertens

Photo

Photo includes author and Guido van ROssum

How we did it

Imagine, hypothetically, that programmers are humans...

All evidence to the contrary:

Also pretend, just for a moment, that their chief method of communicating with a computer was with programming languages.

What should you do?

What should you do?

Treat it like a user interface design problem:

ABC

Order of magnitude easier to use: a program that would take you a week took you an afternoon.

Only 5 datatypes

Mathematicians and cryptographers loved it.

Trick is: supply high-level primitives

Moved on

Three stylesheet generated views of the same document, a clockGuido van Rossum: Python

Dick Grune: CVS

Lambert Meertens: Got another order of magnitude reduction

Others of us: Views, extensible markup language, structured vector graphics, stylesheets, a DOM, etc. (Ran on Atari ST)

Which brings us to the web

(and if we're honest, why we are here today)

Why was HTML successful?

  1. It fulfilled a need
  2. ...
  3. It was easy to use

So, let's go back to the early days

1991: 10 MHz, 1Mb, 20Mb

a computer mag

An HTML Document

<title>My HTML File</title>
<h1>My first Webpage</h1>
Welcome to my page!

Now let's fast forward 10 years:

2001: 1 GHz, 256Mb, 20Gb

a computer mag

Designing XML

When XML was designed...

Group: we don't need to consider authoring ease

This is OK. But beware: role model!

An HTML Document

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"
      xml:lang="en" lang="en">
<head>
  <title>Virtual Library</title>
</head>
<body>
  <p>Moved to
     <a href="http://example.org/">example.org</a>.
  </p>
</body>
</html>

The world gets to work

Groups: we don't need to consider programming ease
Fast forward again

2002

a computer mag

2003

a computer mag

2004

a computer mag

2005

a computer mag

2006

a computer mag

2007

a computer mag

2008

a computer mag

2009

a computer mag

2010: 100GHz, 64Gb, 20Tb

a computer mag

An HTML Document

<?xml version="1.1" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="xhtml2.css"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 2.0//EN" "xhtml2.dtd"[
<!ATTLIST html
xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation CDATA #IMPLIED >
<!-- long list of redundant ATTLIST declarations for ID ... -->
]>
<html xmlns="http://www.w3.org/2002/06/xhtml2" xml:lang="en"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:xfm="http://www.w3.org/2002/01/xforms"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ev="http://www.w3.org/2002/xml-events" xmlns:mumble bla bla bla ginger
xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2 xhtml2.xsd ...">
<head>
<title>My web page</title>
</head>
<body>
...
</body>
</html>

(Several things have been left out here for briefness)

Why not leave it to authoring tools?

  1. Not everyone uses 'em.
  2. It's a band-aid design approach: one bad design, a thousand fixes
  3. Anyway, have you ever looked at the quality of the markup produced by most user agents?

Some concluding remarks

"If there's an X in it, it must be good"

What's so hard about parsing?

Would you rather write

if a < 0
then a= -a

or

<if><rel op="less than"><var name="a"/><constant value="0"/></rel>
<then><set><var name="a"><expr><neg><var name="a"/></neg></expr>
        </set></then>
</if>

Use the notation to match the circumstances.

Parsing is quite easy

It would be fairly easy to add a generalised part to the XML pipeline that parsed unmarked-up text, and produced XML as a parse tree: it's just a different sort of transform.

We could have our cake and eat it!

Conclusion

Computers are getting more powerful

People aren't.