This example shows how the programs xmlpipe, unix2coll and coll2unix work.
Assume a Unix-style database of the following form:
tcpmux 1/tcp echo 7/tcp echo 7/udp discard 9/tcp sink null discard 9/udp sink null systat 11/tcp users daytime 13/tcp daytime 13/udp netstat 15/tcp chargen 19/tcp ttytst source chargen 19/udp ttytst source ftp-data 20/tcp ftp 21/tcp telnet 23/tcp smtp 25/tcp mail time 37/tcp timserver time 37/udp timserver name 42/udp nameserver whois 43/tcp nicname # usually to sri-nic domain 53/udp domain 53/tcp hostnames 101/tcp hostname # usually to sri-nic sunrpc 111/udp rpcbind sunrpc 111/tcp rpcbind
Every line represents one record and there are tabs between the columns. Many Unix files are of this structure, or are very close to it, so that a simple command can convert them to this form. (The above, e.g., was created from the /etc/services file, by removing comment lines and replacing multiple tabs by single tabs.)
There are three columns in this database (if we forget about the comment column), called service-name, port/protocol and aliases. The command line that converts it into an XML-style Web Collection is as follows:
cat file | unix2coll -v FS="\t" -v PROFILE=services service-name \ port/protocol aliases
The -v FS="\t"
sets the field separator
to be a tab. The -v PROFILE=services
defines the profile
(which must syntactically look like a URL), and the other arguments
provide the names of the fields. The result looks like this:
<webcollection> <record profile="services"> <field name="service-name" value="tcpmux"/> <field name="port/protocol" value="1/tcp"/> </record> <record profile="services"> <field name="service-name" value="echo"/> <field name="port/protocol" value="7/tcp"/> </record> <record profile="services"> <field name="service-name" value="echo"/> <field name="port/protocol" value="7/udp"/> </record> <record profile="services"> <field name="service-name" value="discard"/> <field name="port/protocol" value="9/tcp"/> <field name="aliases" value="sink null"/> </record> <record profile="services"> <field name="service-name" value="discard"/> <field name="port/protocol" value="9/udp"/> <field name="aliases" value="sink null"/> </record> <record profile="services"> <field name="service-name" value="systat"/> <field name="port/protocol" value="11/tcp"/> <field name="aliases" value="users"/> </record> <record profile="services"> <field name="service-name" value="daytime"/> <field name="port/protocol" value="13/tcp"/> </record> <record profile="services"> <field name="service-name" value="daytime"/> <field name="port/protocol" value="13/udp"/> </record> <record profile="services"> <field name="service-name" value="netstat"/> <field name="port/protocol" value="15/tcp"/> </record> <record profile="services"> <field name="service-name" value="chargen"/> <field name="port/protocol" value="19/tcp"/> <field name="aliases" value="ttytst source"/> </record> <record profile="services"> <field name="service-name" value="chargen"/> <field name="port/protocol" value="19/udp"/> <field name="aliases" value="ttytst source"/> </record> <record profile="services"> <field name="service-name" value="ftp-data"/> <field name="port/protocol" value="20/tcp"/> </record> <record profile="services"> <field name="service-name" value="ftp"/> <field name="port/protocol" value="21/tcp"/> </record> <record profile="services"> <field name="service-name" value="telnet"/> <field name="port/protocol" value="23/tcp"/> </record> <record profile="services"> <field name="service-name" value="smtp"/> <field name="port/protocol" value="25/tcp"/> <field name="aliases" value="mail"/> </record> <record profile="services"> <field name="service-name" value="time"/> <field name="port/protocol" value="37/tcp"/> <field name="aliases" value="timserver"/> </record> <record profile="services"> <field name="service-name" value="time"/> <field name="port/protocol" value="37/udp"/> <field name="aliases" value="timserver"/> </record> <record profile="services"> <field name="service-name" value="name"/> <field name="port/protocol" value="42/udp"/> <field name="aliases" value="nameserver"/> </record> <record profile="services"> <field name="service-name" value="whois"/> <field name="port/protocol" value="43/tcp"/> <field name="aliases" value="nicname"/> </record> <record profile="services"> <field name="service-name" value="domain"/> <field name="port/protocol" value="53/udp"/> </record> <record profile="services"> <field name="service-name" value="domain"/> <field name="port/protocol" value="53/tcp"/> </record> <record profile="services"> <field name="service-name" value="hostnames"/> <field name="port/protocol" value="101/tcp"/> <field name="aliases" value="hostname"/> </record> <record profile="services"> <field name="service-name" value="sunrpc"/> <field name="port/protocol" value="111/udp"/> <field name="aliases" value="rpcbind"/> </record> <record profile="services"> <field name="service-name" value="sunrpc"/> <field name="port/protocol" value="111/tcp"/> <field name="aliases" value="rpcbind"/> </record> </webcollection>
To convert the Web Collection to a Unix-style database, we need two steps. The first is a Java program called `xmlpipe' that parses the XML and outputs the structure of the file in a manner that is very suitable for AWK-like scripts. For people that know (n)sgmls: the output is similar. The second step is another little AWK-program that selects the right records and fields and outputs them in a tabular format.
The command line looks like this:
java xmlpipe collfile | coll2unix services service-name port/protocol aliases
`collfile' is the Web Collections file shown above. The first argument to `coll2unix' is the profile that we are looking for, since a Web Collection may contain records with different profiles. The program will skip all records with a different profile.
The other arguments are the field names we are looking for. The corresponding fields from the selected records are output in the indicated order, with tabs in between the fields. The result is not surprising:
tcpmux 1/tcp echo 7/tcp echo 7/udp discard 9/tcp sink null discard 9/udp sink null systat 11/tcp users daytime 13/tcp daytime 13/udp netstat 15/tcp chargen 19/tcp ttytst source chargen 19/udp ttytst source ftp-data 20/tcp ftp 21/tcp telnet 23/tcp smtp 25/tcp mail time 37/tcp timserver time 37/udp timserver name 42/udp nameserver whois 43/tcp nicname domain 53/udp domain 53/tcp hostnames 101/tcp hostname sunrpc 111/udp rpcbind sunrpc 111/tcp rpcbind
The result is the same as the file we started with, except for the comments, which we didn't put in the Web Collection.
Of course, we could have added the comments as well, but that is left as an excercise for the reader...