xml2asc/asc2xml

A rather futile pair of programs, but they were easy to write (180 lines of C total, including comments), and it shows an issue with XML...

These two simple programs can be used to transcode any file from ASCII to UTF-8, or vice versa. Round-trip conversion is assured by the use of &#-escapes ("numerical character references") in the ASCII output. (Also a great way to create UTF-8 files if you have no UTF-8 editor...)

The program doesn't understand XML, but it leaves all ASCII characters alone, since they may have special roles in XML. I.e., when the program finds an ASCII character, it will be copied verbatim, and not written as &#nnn;. Escaped ASCII characters (&#nnn;, where nnn <= 127) are left escaped for the same reason.

Actually, this is not enough. In XML, there are contexts in which non-ASCII characters cannot be written as &#-escapes. In particular, names of elements, attributes and entities cannot be encoded with &#-escapes.

For example, this XML file cannot be transcoded, because of the "é" that occurs in an entity name:

<!DOCTYPE xml SYSTEM "my.dtd">
<xml>&één;</xml>

Synopsis

Neither program accepts arguments. Just call them as:

xml2asc <file1 >file2

or

asc2xml <file2 >file1

The former reads a UTF-8 file and outputs it as ASCII, using &#-escapes for all characters that cannot be encoded in ASCII directly (Unicode codes >127).

The latter reads an ASCII file and writes it as UTF-8, expanding all &#-escapes for characters >127.

Neither program does any error checking. If there are syntax errors in the &#-escapes or if file1 is not a proper UTF-8 file, results are undefined.

Compilation

Download the source, call it "xmlrecode.c" and compile it, then link the result to both xml2asc and asc2xml. (There is a single binary, what it does depends on the name with which it is invoked.)

There is also a Makefile, which consists of just these three lines:

all: xml2asc asc2xml
asc2xml: xmlrecode; ln $< $@
xml2asc: xmlrecode; ln $< $@

Bert Bos
Last modified: Mon Jan 19 19:33:37 MET

Copyright  ©  1997 W3C (MIT, INRIA, Keio ), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.