Re: LC315 serialization escaping

Hi Jacek.

* Jacek Kopecky <jacek.kopecky@deri.org> [2005-09-15 18:42+0200]
> Further, header name (we should soon have a proposal to add an attribute
> specifying that) can consist only of printable ASCII characters except
> for colon, i.e. 33 to 57 and 59 to 126. We should similarly restrict the
> value of the attribute, except here we can safely create XML Schema type
> to catch this restriction.

Trying to implement LC315, I am wondering where you find this info.

RFC2616 says:

       message-header = field-name ":" [ field-value ]
       field-name     = token

       token          = 1*<any CHAR except CTLs or separators>
       CHAR           = <any US-ASCII character (octets 0 - 127)>
       separators     = "(" | ")" | "<" | ">" | "@"
                      | "," | ";" | ":" | "\" | <">
                      | "/" | "[" | "]" | "?" | "="
                      | "{" | "}" | SP | HT
       SP             = <US-ASCII SP, space (32)>
       HT             = <US-ASCII HT, horizontal-tab (9)>
       <">            = <US-ASCII double-quote mark (34)>

Which gave me the following candidates (original list from
Wikipedia[1]):

  Binary 	        Dec 	Hex 	Glyph
  0010 0001 	33 	21 	!
  0010 0011 	35 	23 	#
  0010 0100 	36 	24 	$
  0010 0101 	37 	25 	%
  0010 0110 	38 	26 	&
  0010 0111 	39 	27 	'
  0010 1010 	42 	2A 	*
  0010 1011 	43 	2B 	+
  0010 1101 	45 	2D 	-
  0010 1110 	46 	2E 	.
  0011 0000 	48 	30 	0
  0011 0001 	49 	31 	1
  0011 0010 	50 	32 	2
  0011 0011 	51 	33 	3
  0011 0100 	52 	34 	4
  0011 0101 	53 	35 	5
  0011 0110 	54 	36 	6
  0011 0111 	55 	37 	7
  0011 1000 	56 	38 	8
  0011 1001 	57 	39 	9
  0100 0001 	65 	41 	A
  0100 0010 	66 	42 	B
  0100 0011 	67 	43 	C
  0100 0100 	68 	44 	D
  0100 0101 	69 	45 	E
  0100 0110 	70 	46 	F
  0100 0111 	71 	47 	G
  0100 1000 	72 	48 	H
  0100 1001 	73 	49 	I
  0100 1010 	74 	4A 	J
  0100 1011 	75 	4B 	K
  0100 1100 	76 	4C 	L
  0100 1101 	77 	4D 	M
  0100 1110 	78 	4E 	N
  0100 1111 	79 	4F 	O
  0101 0000 	80 	50 	P
  0101 0001 	81 	51 	Q
  0101 0010 	82 	52 	R
  0101 0011 	83 	53 	S
  0101 0100 	84 	54 	T
  0101 0101 	85 	55 	U
  0101 0110 	86 	56 	V
  0101 0111 	87 	57 	W
  0101 1000 	88 	58 	X
  0101 1001 	89 	59 	Y
  0101 1010 	90 	5A 	Z
  0101 1110 	94 	5E 	^
  0101 1111 	95 	5F 	_
  0110 0000 	96 	60 	`
  0110 0001 	97 	61 	a
  0110 0010 	98 	62 	b
  0110 0011 	99 	63 	c
  0110 0100 	100 	64 	d
  0110 0101 	101 	65 	e
  0110 0110 	102 	66 	f
  0110 0111 	103 	67 	g
  0110 1000 	104 	68 	h
  0110 1001 	105 	69 	i
  0110 1010 	106 	6A 	j
  0110 1011 	107 	6B 	k
  0110 1100 	108 	6C 	l
  0110 1101 	109 	6D 	m
  0110 1110 	110 	6E 	n
  0110 1111 	111 	6F 	o
  0111 0000 	112 	70 	p
  0111 0001 	113 	71 	q
  0111 0010 	114 	72 	r
  0111 0011 	115 	73 	s
  0111 0100 	116 	74 	t
  0111 0101 	117 	75 	u
  0111 0110 	118 	76 	v
  0111 0111 	119 	77 	w
  0111 1000 	120 	78 	x
  0111 1001 	121 	79 	y
  0111 1010 	122 	7A 	z
  0111 1100 	124 	7C 	|
  0111 1110 	126 	7E 	~

Which can be summarized as:

  33 + 35-39 + 42-43 + 45-46 + 48-57 + 65-90 + 94-122 + 124 + 126

which, if I didn't get this wrong, the following schema pattern:

  "[!#-'*+\-.0-9A-Z^-z|~]+"

Do you agree?

Cheers,

Hugo

  1. http://en.wikipedia.org/wiki/Ascii
-- 
Hugo Haas - W3C
mailto:hugo@w3.org - http://www.w3.org/People/Hugo/

Received on Thursday, 27 October 2005 13:46:32 UTC