W3C

File and operating system steps for XProc

W3C Working Group Note 2 August 2013

This Version:
http://www.w3.org/TR/2013/NOTE-xproc-fileos-20130802/
Latest Version:
http://www.w3.org/TR/xproc-fileos/
Editor:
Norman Walsh, MarkLogic Corporation

This document is also available in these non-normative formats: XML


Abstract

This note describes a set of new XProc steps designed to provide access to filesystem data and operating system information.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is an editor's draft of this document as a Working Group Note. This document is a product of the XML Processing Model Working Group as part of the W3C XML Activity. The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/2003/03/Translations/byTechnology?technology=xproc-template.

This Note defines some additional optional steps for use in XProc pipelines. The XML Processing Model Working Group expects that these new steps will be widely implemented and used.

Please report errors in this document to the public mailing list public-xml-processing-model-comments@w3.org (public archives are available).

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

Introduction
Terminology
File system steps
3.1 pf:copy
3.2 pf:delete
3.3 pf:head
3.4 pf:info
3.5 pf:mkdir
3.6 pf:move
3.7 pf:tail
3.8 pf:tempfile
3.9 pf:touch
Operating system steps
4.1 po:info
4.2 po:cwd
4.3 po:env

Appendices

1 Introduction

The [XProc: An XML Pipeline Language] specification treats the interface between the pipeline and its external environment as an implementation-defined boundary. This is entirely reasonable for a language that may be implemented on a wide variety of platforms and in a wide variety of systems. By electing to omit steps that would provide access to files or operating system details, it does not predjudice one platform over another.

However, in practice, many pipelines run on systems with quite common and widely interoperable environments. These environments support the notion of files and directories, of a current working directory, of search paths of various sorts delimited in platform-specific ways, etc.

This note defines a set of steps that provide access to those underlying abstractions. These steps can be implemented by a processor wishing to provide interoperable access to the filesystem or the operating system environment.

This specification uses the terms “file” and “directory” in a non-technical sense. They apply equally to files and directories common on modern operating systems and to esoteric artifacts that happen to have reasonably similar semantics in some embedded environment.

Which, if any, of these steps an implementation chooses to support is naturally implementation-defined.

2 Terminology

In this note the words must, must not, should, should not, may and recommended are to be interpreted as described in [RFC 2119].

3 File system steps

The filesystem steps are in the http://www.w3.org/ns/xproc-step/filesystem namespace identified in this specification with the pf: prefix. These are optional steps not in the XProc namespace, therefore declarations for them must be provided in the pipeline where they are used.

Conceptually, these steps operate on files and directories. In the common case, files and directories will be identified with the file: URI scheme. For example,

<pf:delete href="file:///project/build/output.tmp"/>

However, it is implementation-defined if these steps can operate on other schemes as well.

If an error occurs in one of the filesystem steps, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.

Some common errors are summarized here.

It is an error (err:XF0001) if the step attempts to read from a file that does not exist.

It is an error (err:XF0002) if the step attempts to write to a file or directory that is not writable.

3.1 pf:copy

Copies a file.

<p:declare-step type="pf:copy">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="target" required="true"/>                     <!-- anyURI -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The pf:copy copies the file named in href to the new name specified in target. If the target is a directory, the step attempts to copy the file into that directory, preserving its base name.

If the copy is successful, the step returns a c:result element containing the absolute URI of the target.

[Question: should target directories be created?]

3.2 pf:delete

Deletes a file.

<p:declare-step type="pf:delete">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="recursive" select="'false'"/>                 <!-- boolean -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The pf:delete step attempts to delete the file or directory named in href.

If the file or directory is successfully deleted, the step returns a c:result element containing the absolute URI of the deleted file.

If href specifies a directory, it can only be deleted if the recursive option is true or if the directory is empty.

It is an error (err:XF0003) if the href option specifies a directory, the directory is not empty, and the recursive option has the value false.

3.3 pf:head

Returns the first few lines of text file.

<p:declare-step type="pf:head">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="count" required="true"/>                      <!-- int -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

Returns the first count lines of the file named in href. If count is negative, the step returns all except those first lines.

The step returns a c:result element containing one c:line for each line. Lines are identified as described in XML, 2.11 End-of-Line Handling.

3.4 pf:info

Returns information about a file or directory.

<p:declare-step type="pf:info">
     <p:output port="result" sequence="true"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The info step returns information about the file or directory named in href.

The step returns a c:directory for directories, a c:file for ordinary files, or a c:other for other kinds of filesystem objects. Implementations may also return more specific types, for example c:device, so anything other than c:directory or c:file must be interpreted as “other”. If the document doesn't exist, an empty sequence is returned.

The document element of the result, if there is one, will have the following attributes:

Attribute Type Description
readable xs:boolean true” if the object is readable.
writable xs:boolean true” if the object file is writable.
hidden xs:boolean true” if the object is hidden.
last-modified xs:dateTime The last modification time of the object expressed in UTC.
size xs:integer The size of the object in bytes.

If the value of a particular attribute is unknown or inapplicable for the particular kind of object, or in the case of boolean attributes, if it's false, then the attribute is not present. Additional implementation-defined attributes may be present, but they must be in a namespace.

If the href option specified is not a file: URI, then the result is implementation-defined.

3.5 pf:mkdir

Creates a directory.

<p:declare-step type="pf:mkdir">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The pf:mkdir step creates a directory with the name in href. If the name includes more than one directory component, all of the intermediate components are created. The path separator is implementation-defined.

The step returns a c:result element containing the absolute URI of the directory created.

3.6 pf:move

Moves (renames) a file or directory.

<p:declare-step type="pf:move">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="target" required="true"/>                     <!-- boolean -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The pxf:move step attempts to move (rename) the file specified in the href option to the new name specified in the target option.

If the target is a directory, the step attempts to move the file into that directory, preserving its base name.

If the move is successful, the step returns a c:result element containing the absolute URI of the new name of the file. The original file is effectively removed.

If the href option specifies a directory, device, or other special kind of object, the results are implementation-defined.

3.7 pf:tail

Returns the last few lines of a text file.

<p:declare-step type="pf:tail">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="count" required="true"/>                      <!-- int -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

Returns the last count lines of the file named in href. If count is negative, the step returns all except those last lines.

The step returns a c:result element containing one c:line for each line. Lines are identified as described in XML, 2.11 End-of-Line Handling.

3.8 pf:tempfile

Creates a temporary file.

<p:declare-step type="pf:tempfile">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="prefix"/>                                     <!-- string -->
     <p:option name="suffix"/>                                     <!-- string -->
     <p:option name="delete-on-exit"/>                             <!-- boolean -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The pxf:tempfile step creates a temporary file. The temporary file is guaranteed not to already exist when pxf:tempfile is called.

The file is created in the directory specified by the href option. If prefix is specified, the file's name will begin with that prefix. If suffix is specified, the file's name will end with that suffix.

The step returns a c:result element containing the absolute URI of the temporary file.

If the delete-on-exit option is true, then the temporary file will automatically be deleted when the processor terminates.

3.9 pf:touch

Update the modification time of a file.

<p:declare-step type="pf:touch">
     <p:output port="result"/>
     <p:option name="href" required="true"/>                       <!-- anyURI -->
     <p:option name="timestamp"/>                                  <!-- xs:dateTime -->
     <p:option name="fail-on-error" select="'true'"/>              <!-- boolean -->
</p:declare-step>

The pxf:touch step “touches” the file named in href. The file will be created if it does not exist.

If timestamp is specified, the modification time of the file will be updated to the specified time. If unspecified, the current date and time will be used.

The step returns a c:result element containing the absolute URI of the touched file.

4 Operating system steps

The filesystem steps are in the http://www.w3.org/ns/xproc-step/os namespace identified in this specification with the po: prefix. These are optional steps not in the XProc namespace, therefore declarations for them must be provided in the pipeline where they are used.

4.1 po:info

Returns information about the operating system.

<p:declare-step type="po:info">
     <p:output port="result"/>
</p:declare-step>

The pos:info step returns information about the operating system on which the processor is running. It returns a c:result element with attributes describing properties of the system. It should include the following properties:

file-separator

The file separator; usually “/” on Unix, “\” on Windows.

path-separator

The path separator; usually “:” on Unix, “;” on Windows.

os-architecture

The operating system architecture, for example “i386”.

os-name

The name of the operating system, for example “Mac OS X”.

os-version

The version of the operating system, for example “10.5.6”.

cwd

The current working directory.

user-name

The login name of the effective user, for example “ndw”.

user-home

The home diretory of the effective user, for example “/home/ndw”.

The exact set of properties returned is implementation-defined.

4.2 po:cwd

Returns the current working directory of the processor.

<p:declare-step type="po:cwd">
     <p:output port="result" sequence="true"/>
</p:declare-step>

The pos:cwd step returns a single c:result containing the current working directory. On systems which have no concept of a working directory, this step returns the empty sequence.

(This step is exactly duplicates the cwd attribute on the c:result from pos:info; it's just for convenience.)

4.3 po:env

Returns information about the environment

<p:declare-step type="po:env">
     <p:output port="result"/>
</p:declare-step>

The pos:env step returns information about the operating system environment. It returns a c:result containing zero or more c:env elements. Each c:env has name and value attributes containing the name and value of an environment variable.

On systems which nave no concept of an environment, this step returns an empty c:result.

A References

A.1 Normative References

[RFC 2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. Network Working Group, IETF, Mar 1997.

[XProc: An XML Pipeline Language] XML: An XML Pipeline Language. Norman Walsh, Alex Milowski, and Henry S. Thompson, editors. W3C Recommedation 11 May 2010.

B List of Error Codes

B.1 Step Errors

The following errors are explicitly called out in this note.

Errors
err:XF0001

It is an error if the step attempts to read from a file that does not exist.

See: File system steps

err:XF0002

It is an error if the step attempts to write to a file or directory that is not writable.

See: File system steps

err:XF0003

It is an error if the href option specifies a directory, the directory is not empty, and the recursive option has the value false.

See: pf:delete

Other errors may also arise, see [XProc: An XML Pipeline Language] for a complete discussion of error codes.