Web Architecture: Information Representation

old source material: formalism draft

Abstract

The World Wide Web is the universe of network-accessible information, an embodyment of human knowledge. In this report on the software architecture of the web, we discuss formally (using Larch) the data models and structures used in the web, and their semantics; that is, their relationship to human knowledge.

@@From octets to characters to markup languages and message formats for email and other transports.

Keywords: Larch, Character Set, Hypermedia, MIME, Internet Media Types, SGML, ILU, Relational Data Model, GIF, Image Formats

@@terminology needs refinement: data/information, information representation, knowledge, knowledge representation, semantics, language, meta-information.

Information Primitives

@@Bits, bytes, characters, numbers, symbols, and Strings.

octet: trait

includes
  intfield(Octet, 0..255)
  sequence(OctetSeq, Octet)

@@discuss i18n. Use examples from "character set considered harmful"

character: trait

includes octet
 sequence(CharSeq, Char)

introduces
  encodes: Encoding, CharSeq, OctetSeq -> Bool
  "encodes(e,cs,os) ::=
  	In the encoding e, the sequence of
  		characters cs is represented by
  		the sequence of octets os"
  
  code: CodedCharSet, Char, Int
  "code(ccs, c, p) ::=
  	In the coded character set ccs,
  	the character at position p is c."

Simple Structures

lists, sequences, strings

Relations, Tables, and Databases

relational-table: trait

includes
  finiteSet(Rows, Row)
  list(Row, Value)
  set(Type, Value)
  list(Schema, ColDesc)
  set(Query, Row)

introduces:
  tuple: Table[Schema, Rows]
  tuple: ColDesc[name: Identifier, t: Type]

  empty: Schema -> Table

  check: Schema, Row -> Bool
  insert: Table, Row -> Table
  project: Table, FieldList -> Table
  select: Table, Query -> Table
  join: Table, Identifier, Table, Identifier -> Table

asserts:
  empty(s).schema = s
  empty(s).rows = {}

  check([], []) = True
  check(s, r) = r[0] \in s[0].type /\ check(rest(s), rest(r))

  check(t.schema, r) -> insert(t,r).schema = t.schema
  check(t.schema, r) -> insert(t,r).rows = insert(r, t.rows)

  select(t,q).schema = t.schema
  r \in select(t,q).rows = r \in t.rows /\ r \in q

  project(t,[]).schema = []
  @@ subseq of schema w/matching names

  r \in project(t, []) = (r = [])
  subseq(s, t.schema, map) -> (
    r \in project(t,s).rows = rr \in t.rows /\ subseq(r, rr, map) )

  join(t1, t2, n1, n2).schema = @@append(t1.schema, t2.schema)
  r \in join(t1, t2).rows = head(r, length(t1.schema)) \in t1.rows /\ tail(r, length(t2.schema)) \in t2.rows

relational-sort: trait

includes:
  relational-table

introduces:
  ordered: Schema, Order, Row, Row -> Bool

relational-query: trait


includes:
 relational-table,
 lambda(Schema for Formals, Row for Actuals, Bool for Value)
 expression

introduces:
  where: Schema, Expr -> Query

asserts:
  r \in where(s, e) = apply(lambda(s, e), r)


Regular Expressions and Finite State Machines

@@protocol state machines, language idioms

finite-state-machine: trait


introduces:
  transition: FSM, Symbol, State -> State
  accept: FSM, State: Bool

regular-expression: trait


introduces
  empty -> RE
  union: RE, RE -> RE (* RE / RE *)
  concatenate: RE, RE -> RE (* RE , RE *)
  kleene-star: RE -> RE (* RE* *)

asserts:
	Assoc(concatenate)
	AssocComm(union)
  ForAll a, b, c: RE
	a/a = a
	a** = a*
	a,a* = a*
	a*,a = a*
	a / (b,c) = (a/b), (a/c)

re-util: trait

includes:
  regular-expression

introduces
  optional: RE -> RE (* RE? *)
  one-or-more: RE -> RE (* RE+ *)
asserts:
	a? = a/empty
	a+ = a,a*

Languages and Grammars

grammar: trait

includes
  FiniteSet(Alphabet, Symbol)
  List(String, Symbol)
  Set(Language, String)
  FiniteSet(Productions, Production)
  Relation(Derives, String)

introduces
  tuple: Grammar[a: Alphabet, s: Symbol,
		p: Productions]
  tuple: Production[lhs,rhs]
  derives: Grammar -> Derives
  L: Grammar -> Language

asserts
  forall w: String, g: Grammar
    w \in L(g) = [g.s] <derives(g)*> w

Automated Parsing


scanner: trait

includes character
  sequence(Construct, Token)
  finite(State)

introduces:
  Token = [Symbol, OctetSeq]
  recognize: Scanner, State, OctetSeq,
  			Construct, State, OctetSeq -> Bool
  "recognize(s, si, osi, c, so, oso) ::=
    Scanner s, in state si, when presented
    with OctetSeq osi, recognizes Construct c,
    enters state so, and leaves oso remaining."

Message and Document Formats

@@more traits: internet media type, ILU message (pickle)

BodyPart: trait


introduces
  fields: BodyPart -> Integer
  fieldName: BodyPart, Integer -> Identifier
  fieldValue: BodyPart, Integer -> Identifier
  body: BodyPart -> OctetSeq

SGML: trait

includes:
  character
  list(Entity, Character)
  list(Names, Name)

introduces:
  document-character-set: Document -> CodedCharacterSet

  document-element: Document -> Element

  name: ElementType -> Name
  name: Element -> Name

  attributes: ElementType -> Names
  attributes: Element -> Names
  attrval : Element, Name -> String

  conforming: Document -> Bool

Turing Machines and Programs

read, @@cite "Ultimate..." papers

lambda: trait

includes
  list(Formals)
  list(Actuals)

introduces:
  lambda: Formals, Expr -> Closure
  %@@enclosing bindigs?
  apply: Closure, Actuals -> Value
  %@@machine state change

expression: trait

introduces:
  where: Expr -> Query

  id: Identifier -> Expr
  const: Value -> Expr
  equal: Expr, Expr -> Expr
  __ /\ __: Expr, Expr -> Expr
  __ \/ __: Expr, Expr -> Expr
  eval: Env, Expr -> Value
asserts forall v:Value
  eval(const(v)) = v

Objects, Operations, Attributes, and Types

Logic: Argument, Deduction, and Proof

Hypertext: web of nodes and links

(graph of vertices and edges)

semantics of n1 -l-> n2

is predicate: P(n1, n2)

object types

ontology

frames

rules

Logic: trait


includes
  Language(Formula)
  Set(Formulas, Formula)
  Relation(Rules, Formulas, Formula)
  Tree(Argument, Formula, Arc)

introduces
  axiom: Logic, Formula -> Bool
  theorem: Logic, Formula -> Bool
  proof: Logic, Argument, Formula -> Bool
  rules: Logic -> Rules

asserts
  forall l:Logic, f,n,a: Formula,
		hyp: Formulas
  theorems(hyp) /\ hyp \langle rules(l) \rangle f ->
		theorem(f)
  root(a, f)
  (forall n:Formula
		leaf(a, n) -> axiom(n)) /\
  (forall hyp: Formulas, c: Formula
	children(a, c, hyp) -> <hyp l f>))
		 -> proof(l, a, f)

  proof(l, a, f) -> theorem(l, f)		
implies
  axiom(f) -> theorem(f)

HTML forms

HTTP methods/URI/fields/content

CORBA/HTTP forms isomorphism

web/news/mail: postweb

layout (style sheets): dsssl flow objects, areas

"user experience": history, redirects, authoring idioms

MIME message

PEP: offer, instance

PICS?

character sets, I18N

Internet media type

grammars: HTML, LINCKS (bento?) (sgml-lex)

URI, resource, digital artifact, HTTP caching

transactions, check-in, check-out

libwww: M3 Rd/Wr for streams

change libwww parsers to be ILU message formats?

(ILU transports nest, but do message formats?)

FMS