Reference processing model
  - 
    Logically, characters are UCS characters
    
      - 
	For HTML, UCS is declared as the SGML Document Character Set
      
- 
	For XML, the grammar is based on characters (not bytes): "A character is
	an atomic unit of text as specified by ISO/IEC 10646"
      
- 
	For CSS, essentially the same: "A CSS style sheet is a sequence of characters
	from the UCS..."
    
 
- 
    On-the-wire encoding can be anything compatible with UCS (i.e. any encoding
    of a subset of UCS)
  
- 
    Identify encoding, perform transcoding on input, then deal only with Unicode