Widgets 1.0: Packaging and Configuration

W3C Working Draft 14 April 2008

This version:: http://www.w3.org/TR/2008/WD-widgets-20080414/
Latest version:: http://www.w3.org/TR/widgets/
Previous version:: http://www.w3.org/TR/2007/WD-widgets-20071013/; http://www.w3.org/TR/2006/WD-widgets-20061109/
Latest Editor's draft:: http://dev.w3.org/2006/waf/widgets/
Version history:: Twitter messages (non-editorial changes only): http://twitter.com/widgetspecs (RSS)
Editor:: Marcos Caceres, Invited Expert

Abstract

This document defines a Zip-based packaging format and an XML-based configuration document format for widgets. The configuration document is a simple XML-based language that authors can use to record metadata and configuration parameters about a widget. The packaging format is a container for files required by a widget.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is produced by the Web Application Formats WG, part of the Rich Web Clients Activity in the W3C Interaction Domain. It is expected that this document will progress along the W3C's Recommendation track. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is the W3C First Public Working Draft of thePlease send comments to public-appformats@w3.org, the W3C's public email list for issues related to Web Application Formats. Archives of the list are available. A detailed list of changes from the previous version is also available from the W3C's CVS server.

Implementers should be aware that this document is not stable. Implementers who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this document before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

User agents that wish to extend this specification in any way are encouraged to discuss their extensions on a public forum, such as public-appformats so their extensions can be considered for standardization.

1. Introduction
2. Conformance
3. Widget User Agents
- 3.1 Zip support
4. Zip Archive
- 4.1 Dealing with invalid Zip archives
- 4.2 Extracting the File Data from a File Entry
5. Widget Resource
- 5.1 MIME Type
- 5.2 File Extension
6. Configuration Document
7. Steps for Processing a Widget Resource
8. Displaying Icons
Appendix
- Embedding a Widget Resource into an HTML Document
- RelaxNG Schema of the Configuration Document
Acknowledgements
References

1. Introduction

Widgets are a class of client-side web application for displaying and updating local or remote data, packaged in a way to allow a single download and installation on a client machine or device. Widgets typically run as stand alone applications outside of a web browser, but it is possible to embed them into web pages. Examples range from simple clocks, stock tickers, news casters, games and weather forecasters, to complex applications that pull data from multiple sources to be "mashed-up" and presented to a user in some interesting and useful way (see [Widgets-Landscape] for more information).

For widgets, this specification defines:

a [ZIP]-based format used to package the files that constitute a widget, as well as how those packages are to be processed.
An XML-based vocabulary to create a configuration document (used to configure a widget at runtime), as well as the rules for how to parse a configuration document and defaults when a configuration document is unavailable.
Rules that allow a widget user agent to locate and launch the start file of a widget resource.
An auto-discovery mechanism to allow HTML user agents to "discover" a widget resource from within a HTML document.
An internationalization model, which automatically selects the appropriate content to display based on the end-user's locale.

Please note that this specification is part of a family of documents that together work to define widgets as a whole. The [Widgets-APIs] specification defines APIs to store preferences and capture events for widgets. The [Widgets-Digsig] defines a means for widgets to be digitally signed using a custom profile of the XML-Signature Syntax and Processing Specification. The [Widgets-Security] specification defines a security model to reduce privacy risks and potential damage to an end-users computer or mobile device. The [Widgets-Updates] specification defines a version control model that allows widgets to be kept up-to-date over HTTP.

1.2 Design Goals

The design goals and requirements for this specification are addressed in the Widgets 1.0 Requirements [Widgets-Reqs] document.

1.3 Definitions

The following definitions are used globally throughout this specification. Please note that other definitions are given throughout this document and defined where they are used.

In this specification, a widget is understood to be an interactive single purpose application for displaying and/or updating local data or data on the Web, packaged in a way to allow a single download and installation on a user's machine, mobile phone, or mobile Internet device [Widgets-Landscape].

The space characters, for the purposes of this specification, are [Unicode] code points:

U+0020 SPACE,
U+0009 CHARACTER TABULATION (tab),
U+000A LINE FEED (LF),
U+000B LINE TABULATION,
U+000C FORM FEED (FF),
U+000D CARRIAGE RETURN (CR).

The control characters, for the purpose of this specification, are characters in the range U+0000 NUL to U+001F INFORMATION SEPARATOR 1, and character U+007F DELETE.

A file entry is the amalgamation of data held by a local file header, file data, and (optional) data descriptor [ZIP] for each physical file stored in a zip archive.

A widget resource is digitally signed if it contains a file entry that conforms to the [Widgets-Digsig] specification.

2. Conformance

As well as sections marked as non-normative, all diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may and optional in this specification are to be interpreted as described in [RFC2119].

This specification describes the conformance criteria for user agents (relevant to implementers) and documents (relevant to authors and authoring tool implementers).

There are three classes of products that can claim conformance to this specification:

A widget user agent.
A widget resource.
A configuration document.

Products that generate widget resources and/or allow authoring of configuration documents must not claim conformance to this specification, though they can claim to produce widget resources or configuration documents that themselves are conforming to this specification. Authoring tools and markup generators must generate conforming configuration documents and/or widget resources.

Products that check the conformance of widget resources or configuration documents must not claim conformance to this specification, though they can claim to verify that a widget resource or a configuration document is conforming to the relevant section(s) of this specification.

A widget user agent must behave as described by this specification in order to claim conformance, even when faced with a non-conforming widget resource or a non-conforming configuration document.

A widget user agent is a user agent that attempts to support this specification, as well as the [Widgets-APIs], [Widgets-Security], [Widgets-Digsig], and [Widgets-Updates] specifications.

In addition to this specification, a widget user agent must implement:

The [Deflate] decompression method and be able to extract stored (decompressed) file entries from within a zip archive [ZIP].
[PNG], [GIF87], [JPEG] and [GIF89] as icon formats, and may support other formats (eg. jpg, bmp, svg).
[XML] and [XMLNS]
[DOM3CORE]

Note: In addition to supporting this specification, a widget user agent will typically support the following specifications.

HTTP 1.1
SVG
The [XMLHttpRequest] specification, or greater.
[HTML4] or greater
[ECMAScript] or greater
[CSS21] or greater

3.1 Zip support

A widget user agent is not required to implement or support any of the following aspects of the [ZIP] specification:

Any decompression algorithm, other than [Deflate].
File spanning or multiple volumes.
Zip64 extensions.
Any digital signatures method, other than method defined in [Widgets-Digsig].
Any decryption methods.
Any patented technology.

4. Zip Archive

A valid zip archive is a byte-stream that conforms to the production of a .Zip file as defined by the Zip File Format Specification [ZIP], with the exclusion or support for the features and conditions specified below.

Note: Please take the time to read relevant sections of the [ZIP] specification to become familiar with zip-specific terms and how data is structured inside a Zip file before reading this section. In particular, see the local file header, file data, data descriptor, CRC-32, compression method field, file name field, general purpose bit 11, and the Appendix D - Language Encoding (EFS).

To conform to this specification, a zip archive must contain one or more file entries and must not be an invalid zip archive.

See "Step 2 -Verify the zip archive and its file entries" for instructions on how to process a zip archive.

Allowed Compression Methods

The compression method is the compression algorithm or storage method used to store the file data of a file entry. The compression method that was used to encode the file data of a file entry is identified by the numeric value derived from the compression method field [ZIP].

The valid compression methods for a file entry are:

[Deflate]:: The value of the compression method field is 8.
Stored (no compression) [ZIP]:: The value of the compression method field is 0.

Author requirements: File data must be compressed with either [Deflate] or stored [ZIP]. Using any other compressions methods will result in an invalid zip archive.

Version Needed to Extract

The version needed to extract is a 2 byte sequence in the local file header of a file entry that indicate the minimum supported ZIP specification version needed to extract the file data [ZIP].

The value of the version needed to extract of a file data that is stored must be 1.0 (x100).

The value of the version needed to extract of a file data that is compressed using [Deflate] must be 2.0 (x200).

File and Folder Names

For the purpose of this specification, a zip relative path is the variable-length string derived from a file name field of a local file header from a file entry [ZIP].

Note: A zip relative path is said to be relative as it stores the string that represents file and folder names relative to where the zip archive was created on a file system (eg. images/bg.png), as opposed to storing an absolute path (eg. c:\images\bg.png). The value of a zip relative path will generally resemble the string value of a name of the file or folder(s) on the device on which the zip archive was created, but with the exception of the path delimiter being a U+002F SOLIDUS "/" character.

Author requirements: The zip relative path must be encoded as either [CP437] or [UTF-8]. Encoding the file name field using [UTF-8] is recommended. If the zip relative path is encoded using [UTF-8], then the general purpose bit 11 of the local file header must be set to 1, otherwise it must be set to 0.

Author requirements: It is recommended that authors keep their path lengths below 255 characters. Having excessively long path names (eg. over 120 characters) can also result in interoperability issues on some operating systems.

A valid zip relative path is one that matches the production of zip-rel-path in the following [ABNF]:

zip-rel-path   = ( *folder [ filename ] )
folder         = 1*243filename delimiter
delimiter      = U+002F
filename       = 1*255( *basename [file-extension] ) 
basename       = allowed-chars       
file-extension = "." 1*allowed-chars       
allowed-chars  = cp437 / utf8-range  
utf8-chars     = ascii-range / U+0080 and beyond     
cp437-chars    = ascii-range / x80-FF;       
ascii-chars    = ALPHA / DIGIT / SP / "$" / "%" / "'" / "-" / "_" / "@" 
                 / "~" / "`" / "!" / "(" / ")" / "^" / "#" / "&" / "+" 
                 / "," / "." / "=" / "[" / "]"

ALPHA, DIGIT, and SP are defined in [ABNF] (but essentially represent alphanumerical characters and the space (0x20) character).

Note: Authors should be aware that, at the time of writing, there are some interoperability issues with regards to using characters outside the ascii-chars for file an folder names in a Zip archive. If an author chooses to use cp437-chars or the UTF8-chars, they should thoroughly test their widgets on various platforms prior to distribution; otherwise it is recommended that authors restrict file and folder names to the ascii-chars.

Note for implementers: as this specification does not put a restriction on path length, implementers need to be prepared to deal with path lengths longer than 260 characters.

Reserved characters

The following reserved characters must not appear anywhere in a filename:

Reserved characters
Character	CP437 code points	Unicode code points
<	0x3C	U+003C LESS-THAN SIGN
>	0x3E	U+003E GREATER-THAN SIGN
:	0x3A	U+003A COLON
"	0x22	U+0022 QUOTATION MARK
\	0x5C	U+005C REVERSE SOLIDUS
\|	0x7C	U+007C VERTICAL LINE
?	0x3F	U+003F QUESTION MARK
*	0x2A	U+002A ASTERISK
;	0x3B	U+003B SEMICOLON
/	0x2F	U+002F SOLIDUS

Note to authors: It is recommend that authors avoid using the following words as either a folder or a basename in a zip relative path as they are reserved by some operating systems (case insensitive): CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9. For example, the following names are ok: "CON-tact.txt", "printer.lpt1", "DCOM1.pdf"; the following may not be: "com3.txt" "Lpt1", "CoM9.gif".

In addition, authors should also avoid having a "." U+002E FULL STOP as the last character of a file or folder name as some operating systems will remove the character when the file is extracted from the Zip archive onto the hard drive. Authors should also avoid having the space character (SP) at the start or end of a file name. Authors should also take caution when using the "+" U+002B PLUS SIGN, as it might cause issues on some operating systems.

4.1 Dealing with invalid Zip archives

An invalid zip archive is a zip archive that is either corrupt, meaning that a CRC-32 has failed for a file entry, or meets any of the following conditions:

Is split into multiple files or spans multiple volumes.
Uses ZIP64 extensions.
Is encrypted.
Is digitally signed using any digital signature method other than the digital signature method defined in [Widgets-Digsig].
Contains file entries compressed with any compression algorithm other than [Deflate] or stored (see the valid compression methods).
Contains any file entry whose file name field contains reserved characters.
Contain no file entries or only directories.
Meets any other condition defined in this specification that deems a zip archive to be treated as an invalid zip archive.

In the event that a widget user agent encounters an invalid zip archive during the steps for processing a widget resource, the widget user agent must abort any processing and should inform the end-user with an appropriate localized error dialog. The wording of appropriate localized error dialogs are left to the discretion of implementers, but their presence is nonetheless recommended.

Note: An example of an appropriate error message would be "Can't install this widget because it was created using an unsupported compression method."

4.2 Extracting the File Data from a File Entry

A widget user agent may decompress (or otherwise extract) the file data of file entry into its decompressed or unstored representation.

Note: a widget user agent does not need to extract all the file entries in a zip archive at the same time. It may choose to only extract specific file entries as they are needed for processing.

For the required file entry in the zip archive, if the value of file entry's compression method field is 8, use the result of applying the [Deflate] algorithm to the file data field. Otherwise, use the value of the file data field.

Note: as a security precaution, implementations are discouraged from extracting file entries from untrusted widgets directly onto the file system. Instead, implementations should consider a virtual file system or mapping to access file entries inside a zip archive.

5. Widget Resource

A widget resource is a byte-stream or file that is a valid zip archive.

A widget resource must contain a configuration document and may contain additional resources located either at the root of the archive or in sub-directories. In addition, a widget resource may also be digitally signed.

See "Step 1 - Acquire a Widget Resource Over HTTP or Local Storage" for instructions on how to process a widget resource.

5.1 MIME Type

Over the wire (eg. over HTTP), a widget resource must be labeled with an application/widget MIME type.

Note: Widget user agents can support other legacy/proprietary widget types, but they must remain conforming to this specification when dealing with widget resources.

The application/widget MIME type has not yet been registered with IANA.

5.2 File Extension

The file extension .wgt is required for a widget resource on systems where it is customary for file names to include an extension that symbolizes the MIME type.

The .wgt file extension in any case form is considered a valid file extension.

Author requirements: It is recommended that a widget resource is served over HTTP with the .wgt file extension in lower case form.

6. Configuration Document

A configuration document is an [XML] document that has a widget element at its root.

A config.xml file is the XML serialization of a configuration document.

A valid configuration document file name is config.xml, in any case form.

Author requirements: It is recommended that the config.xml file name be in lower case form.

The following is an example of a config.xml document:

<widget xmlns="http://www.w3.org/ns/widgets" 
        id="http://datadriven.com.au/exampleWidget" 
        version="2.0 Beta" 
        height="200" 
        width="200">
  <name>The example widget!</name>
  <description>
    A sample widget to demonstrate some of the possibilities.
  </description>
  <author url="http://foo-bar.example.org/"
    email="foo-bar@example.org">Foo Bar Corporation</author>  
  <icon src="icons/example.png" />  
  <content src="index.html"/> 
  <access network="true"/>  
  <license>
    Example license (based on MIT License)
    Copyright (c) 2008 The Foo Bar Corp.
    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
    THE SOFTWARE.
  </license> 
</widget>

Note: Widget user agents are encouraged to expose relevant information provided by configuration document to the user. Having "visual metadata" encourages authors to make full use of the configuration document format.

Author requirements: At a minimum, a configuration document must declare a widget element and a content element. The content element must have the src attribute.

For example, the smallest possible configuration document would be:

<widget xmlns="http://www.w3.org/ns/widgets">
  <content src="somefile.html"/>
</widget>

See "Step 7 - Process the configuration document" for instructions on how to process a configuration document.

6.1 Namespace

The configuration document namespace: is http://www.w3.org/ns/widgets [XMLNS].

Author requirements: Authors must assign the configuration document namespace to the widget element. If it's omitted, the widget will be treated as an invalid zip archive.

6.2 Extensions

Vendors wishing to extend the configuration document format with their own XML elements and attributes may do so by using a separate namespace [XMLNS]. This specification does not define a model for processing XML elements outside the configuration document namespace. For the sake of interoperability, extensions to the configuration document are not recommended.

Example of extending the configuration document format:

 <widgets xmlns="http://www.w3.org/ns/widgets"
   xmlns:ex="http://widgextension.org/">
    <icon src="icon_ss.png"  ex:role="screenshot"/>
    <icon src="icon_big.png" ex:role="big"/>
          <ex:datasource>{a:"b",c:"d"}</ex:datasource>
    <content src="widget.html"/>
</widgets>

6.3 Attribute Values and Types

A valid non-negative float is a string that consists of one or more characters in the range U+0030 (0) to U+0039 (9) followed by a single U+002E FULL STOP (".") character and one or more characters in the range U+0030 (0) to U+0039 (9). (eg. 1.0, 243.23, 23.006).

A valid non-negative integer is a string that consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9). (eg. 2, 323, 23214).

A valid version-tag is a string that matches the production for valid-version-tag in the following [ABNF]:

valid-version-tag  = version-identifier*('.'version-identifier)        
version-identifier = string

Example valid version tags:: Version 1.0 Beta, 1.0 RC1, 1.0-Build1580, Happydog 5.1.2100

Note: For the purpose of this specification, version tags have no significant semantics; they are just treated as arbitrary strings (eg. '1.0' is not greater than '2.0', but is simply different). This may change in future versions of this specification.

Boolean attribute

An attribute defined as taking boolean values is referred to as a boolean attribute. A valid boolean value is a string that case insensitively matches true or false. The default behavior, which is used when the attribute is omitted or has a value other than the two allowed values, is false. Rules for exactly how a boolean attribute are to be treated is given when the term is used.

MIME type attribute

An attribute defined as containing MIME types. A valid MIME type is one that matches the production for valid-MIME-type in the following ABNF:


valid-MIME-type = type "/" subtype *(";" parameter)

The type , subtype, and parameter tokens are defined in [RFC2045].

URI attribute

An attribute defined as containing a valid URI or a valid path. A valid URI is one that matches the URI token of [RFC3986] or the IRI token of [RFC3987]. A valid path is one that matches the path token of [RFC3986].

6.4 The `widget` Element

The widget serves as a container for the other elements; as such, it must be used.

Contexts in which this element must be used:: This is the the root element of a configuration document.
Expected children (in any order):: name: zero or one; description: zero or one; icon: zero or more; access: zero or one; author: zero or one; license: zero or one; content: one

Attributes

id: Optional. A valid URI that specifies a unique identifier specific to the widget.
version: Optional. A valid version-tag that specifies the version of the widget.
height: Optional. A valid non-negative integer greater than 0 that controls the initial height dimensions of the widget in CSS pixels [CSS21]. When the value is missing, the widget user agent will assume the value 300.
width: Optional. A valid non-negative integer greater than 0 that controls the initial width dimensions of the widget in CSS pixels [CSS21]. When the value is missing, the widget user agent will assume the value 150.

Note: The width and height attributes are only relevant to widgets with a visual output.

6.5 The `name` Element

The name element represent the human-readable name for a widget resource that can be used, for example, in application menus or in other contexts.

Contexts in which this element may be used:: In a widget element.
Content model:: Text.
Occurrences:: Zero or one.

6.6 The `description` Element

The description element represents a human-readable description of the widget.

Contexts in which this element may be used:: In a widget element.
Content model:: Text.
Occurrences:: Zero or one.

6.7 The `author` Element

An author element represents a person or an organization involved in the creation of the widget.

Contexts in which this element may be used:: In the widget element.
Content model:: Text.
Occurrences:: Zero or one.

Attributes

url: Optional. A URI attribute that represents a link that is associated with the author (eg. the homepage of the author).
email: Optional. A string that represents an email address associated with the author.

6.8 The `license` Element

The license element represents an end-user license agreement or a copyright statement.

Contexts in which this element may be used:: In the widget element.
Content model:: Text.
Occurrences:: Zero or one.

6.9 The `icon` Element

The icon element represents an icon for the widget. A widget user agent should expose an icon in a way that it is visible to the end user.

Contexts in which this element may be used:: In the widget element.
Content model:: Empty.
Occurrences:: Zero or more.

Attributes

src: Required. A valid path that points to an image inside the widget resource.

6.10 The `access` Element

The access element indicates, through its attributes, what kind of security permissions afforded to the widget by the widget user agent. When the access element is absent, a widget user agent must deny access to networked resources and to plugins.

Contexts in which this element may be used:: In the widget element.
Content model:: Empty.
Occurrences:: Zero or one.

Attributes

network: Optional. A boolean attribute that indicates that the widget might need to access resources over HTTP.
plugins (AT RISK - attribute will be removed unless compelling use cases are found.): Optional. A boolean attribute that indicates that the widget might need browser plugins (such as Flash or Java) to function.

Note: Widget user agents are not required to support, or otherwise implement, proprietary plugins.

Author requirements: If the access element is used, at least one of its attributes must be used.

6.11 The `content` Element

The content element is used by an author to declare which resource the widget user agent will use when it instantiates the widget.

Contexts in which this element must be used:: In the widget element.
Content model:: Empty.
Occurrences:: Zero or one.

Attributes

src: Required. A URI attribute that allows an author to point to resource via a valid path.
type: Optional. A MIME Type attribute that indicates the MIME type of the resource referenced by the src attribute. When the value is missing, the widget user agent will assume the value text/html.

7. Steps for Processing a Widget Resource

The steps for processing a widget resource involve 7 steps that a widget user agent must follow in order, responding accordingly if any of the steps result in an error. The procedures for what to do when an error is encountered are described in each section.

Note: the following steps and associated parsing rules are written with more concern for clarity than over efficiency. As such, user agents may optimize any of the steps (or may perform them in a different order) and parsing rules, so long as the end result is indistinguishable from the result that would be obtained by the following the specification.

Step 1 - Acquire a Widget Resource Over HTTP or Local Storage

A widget user agent may acquire a widget resource over HTTP or from local storage (eg. from the user's hard drive).

If attempting to acquire a widget resource over HTTP, a widget user agent must only attempt to process resources whose Content-Type is application/widget (regardless of the file extension).

Unless supporting a legacy or proprietary MIME type, all other Content-type values are in error and the widget user agent must treat the resource as an invalid zip archive.

When acquiring a widget resource from local storage, a widget user agent must process a resource regardless of the file extension (even no file extension).

The widget user agent must verify the zip archive and its file entries.

In this example, the Content-Type is in error, so the widget engine would not attempt to process the following widget:

Request

GET /foo.wgt HTTP/1.1
Host: www.example.com
Accept: application/widget,*/*

Response

HTTP/1.1 200 OK
Date: Tue, 04 Sep 2007 00:00:38 GMT
Last-Modified: Mon, 03 Sep 2007 06:47:19 GMT
Content-Length: 1337
Content-Type: application/x-gadget

Step 2 - Verify the zip archive and its file entries

To verify that a zip archive conforms to this specification (and hence can be treated as a widget resource), a widget user agent must perform the following checks on a zip archive.

If any of these checks are true (indicating that a condition is met), then the zip archive is an invalid zip archive:

Is split into multiple files or spans multiple volumes [ZIP].
Is encrypted, denoted by the presence of archive decryption header and an archive extra data record [ZIP].
Is digitally signed using any of the digital signature methods defined in [ZIP].
Contains zero file entries [ZIP].

Next, for each file entry in the zip archive, the widget user agent must perform the the following checks. If any of these checks return true (indicating that a condition is met), then the zip archive is an invalid zip archive. For each file entry in the zip archive, check the following data from the local file header:

The value the CRC-32 field (defined in [Zip]) fails a CRC-32 check.
The version needed to extract is greater than 20 (meaning the archive is using a feature unsupported by this specification, such as Zip64).
The value of compression method field is not one of the valid compression methods (0 or 8).
The file name field is an empty string.
The file name field contains reserved characters, or control characters.
The file name field is a sequence exclusively composed of (one or more) space characters.
The file name field is not a valid zip relative path.
The value of the extended language encoding extra field (defined in [ZIP]) is not null or is not an empty string or does not exactly match "UTF8".

Only once the above checks have been completed, a widget user agent must attempt to locate the digital signature.

Step 3 - Locate the Digital Signature

The steps to locate the digital signature are as follows:

Let signature be null.
For each file entry in the zip-archive:
1. if the of the filename field of the current file entry case-insensitively matches 'signature.xml' then:
  1. Let signature be the result of extracting the file data from a file entry for this file entry.
  2. Terminate this algorithm and go to "Step 4 - Process the Digital Signature".
If signature is null (meaning that no signature was found), the go to "Step 6 - Locate the configuration document".

Step 4 - Process the Digital Signature

With the signature derived from Step 4, apply the Procedure for Verifying a Digital Signature in the [Widgets-Digsig] specification.

Step 5 - Locate the Configuration Document

The config.xml file must be located at the root folder of a widget resource.

A widget user agent must not process any config.xml inside sub-folders of the root directory.

Internationalization model will be defined here. Still under development.

Step 6 - Process the configuration document

The config.xml file must be processed as described in the following sub-sections. The purpose of processing the configuration document is to establish the configuration defaults (which are used during instantiation (step 7) and at runtime [Widgets-security]).

The main algorithm for processing a configuration document is given by the rules for processing a configuration document. However, that algorithm makes use of three additional processing rules:

Rules for Parsing a Non-negative Integer
Rules for Removing Whitespace
Rules for identifying the content type of an image

Configuration Defaults

Widget user agents must assume the following defaults prior to attempting applying the rules for processing a configuration document.

widget id: null
widget version: null
widget name: null
widget width: 150
widget height: 300
allow plugins: False
allow network: False
author name: null
author email: null
author url: null
Start file: null
content-type: text/html
widget Locale: The system locale as an RFC3066 language code (eg. en-us)
icons: Empty list (Should we allow "default.png", like Dashboard)?

Rules for Processing a Configuration Document

The rules for processing a configuration document are as given in the following algorithm.

In this section, the term in error is typically used to mean that an element or attribute in a configuration document is not conforming according to the rules of this specification. Rules for exactly how the an element or attribute are to be treated when it is in error is always given when the term is used. Typically the specification will say that the erroneous DOM nodes must be ignored, meaning the widget user agent will act as if erroneous nodes were absent.

A correct resource is a resource that has been verified as residing inside a widget resource.

A supported MIME Type is one that a widget user agent is able to process or otherwise render. For icons, a widget user agent must apply rules for identifying the content type of an image to acquire its MIME type and verify that it is supported.

The term text node refers to any Text node, including CDATASection nodes (any Node with node type 3 or 4) as defined in [DOM3Core].

Let doc be the result of loading config.xml as a [DOM3Core] Document using a [XMLNS]-aware parser. If the document is not well-formed [XML], then the widget resource is an invalid zip archive. (Although a document must be well-formed, it does not necessarily need to be valid [XML]).
Let root element be the documentElement of doc.
If the root element is not a widget element, then treat this widget resource as an invalid zip archive.
Otherwise,

if it is a widget element:

If this element is not in the configuration document namespace, then treat this widget resource as an invalid zip archive.

If this element has no child nodes, then treat this resource as an invalid zip archive.

Let content-nodes be the content elements in the configuration document namespace contained by this element.

If content-nodes contains no items, then treat this resource as an invalid zip archive.

If the first element in the content-nodes does not have a src attribute, then treat this resource as an invalid zip archive.

For the widget element, if the height attribute is used, apply the rules for parsing a non-negative integer to its value. If the value is not in error and greater than 0, let widget height be the value of the height attribute.

If the width attribute is used, apply the rules for parsing a non-negative integer to its value. If the value is not in error and greater than 0, let widget width be the value of the width attribute.

If the id attribute is used, and it is a valid uri, then let widget id be the value of the id attribute.

If the version attribute is used, and it is a valid version-tag, then let widget id be the value of the id attribute.
For each child in the child nodes of the root element, starting with the first to the last, if it is:

A name element

If this is the first name element encountered, then let widget name be the result of applying the rules for getting text content to this element.

A description element

If this is not the first description element, then the element is in error and must be ignored.

If this is the first description element encountered, then let widget description be the result of applying the rules for getting text content to this element.

A author element

If this is not the first author element, then the element is in error and must be ignored.

If this is the first author element encountered, then let widget author be the result of applying the rules for getting text content to this element.

If the url attribute is used, and it is a valid uri, then let author url be the value of the url attribute.

If the email attribute is used, then let author email be the value of the email attribute.

A license element

If this is not the first license element, then the element is in error and must be ignored.

If this is the first license element used, then let widget license be the result of applying the rules for getting text content to this element.

An icon element

If it has a src attribute that is a valid path that points to a correct resource, then add the value of the src attribute to the list of icons. If the src attribute is missing, then the element is in error and must be ignored.

If the resource identified by the src attribute is missing, or corrupt, or in an unsupported MIME type, then this element is in error and must be ignored.

A content element

If this is not the first content element, then the element is in error and must be ignored.

If this is the first content element and it has a src attribute that is a valid path that points to a correct resource, then let the start file be the value of the src attribute. If the src attribute is missing, or in error, then a widget user agent must treat the widget resource as an invalid zip archive.

If the type attribute was used, and is a valid MIME type that is supported, then let the content-type be the value of the type attribute. If the MIME type is invalid, or unsupported by the widget user agent, then a widget user agent must treat the widget resource as an invalid zip archive.

An access element

If this is not the first access element, then the element is in error and must be ignored.

If network attribute was used, and it is a valid boolean value, then let allow network be the value of the network attribute.

(AT RISK OF BEING REMOVED) If plugins attribute was used, and it is a valid boolean value, then let allow plugins be the value of the plugins attribute.

A Comment node, or a ProcessingInstruction node, or a Text node that only contains space characters, or a CDATASection node that only contains space characters, or anything else

This is in error and must be ignored.

Rules for Getting Text Content

The rules for getting text content are as given in the following algorithm. The algorithm always returns a string, which may be empty.

Let input be the element to be processed.
Let result be an empty string.
Ignoring any Text nodes that only contain space characters, or CDATASection nodes that only contain space characters, let result be the concatenation the input's text nodes, in document order, as a string:
1. If the node is an element that contains any Text nodes, recursively apply the rules for getting text content to that element and concatenate the returned value to result.
While position doesn't point past the end of input and the character at position is not one of U+0009 CHARACTER TABULATION (tab), U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), or U+000D CARRIAGE RETURN (CR), append character to the end of result and let position become the character in input.
Return result.

Generally speaking, the processing model described in this section involves walking through an element's children and concatenating all the Text and CDATASection nodes into a single string. This is functionally equivalent to invoking the getTextContent() DOM3 Java interface on an element (part of the Node interface), but with the any CR and LF characters removed. Subsequent instances of a correct element will be ignored.

For example, the author and blink elements would be ignored, but their Text nodes would be extracted. The resulting widget name would be "The Awesome Super Dude Widget" (the second name element is ignored):

<widget xmlns="http://www.w3.org/ns/widgets">
   <name>
     The <blink>Awesome</blink> 
     <author email="dude@example.com">Super <blink>Dude</blink></author>
     Widget</name>
   <name>I will be ignored</name>
   <content src="main.html"/>
</widget>

Rules for Parsing a Non-negative Integer

The rules for parsing a non-negative integer are as given in the following algorithm. When invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm will either return zero, a positive integer, or an error. Leading spaces are ignored. Trailing spaces and any trailing garbage characters are ignored.

Let input be the string being parsed.
Let result have the value 0.
Let input be the result of applying the rules for removing whitespace on input.
If the length of input is 0, return an error.
Let position be a pointer into input, initially pointing at the start of the string.
Let nextchar be the character in input at position.
If the nextchar is not one of U+0030 (0) .. U+0039 (9), then return an error.
If the nextchar is one of U+0030 (0) .. U+0039 (9):
1. Multiply result by ten.
2. Add the value of the nextchar to result.
3. increment position.
4. If position is not past the end of input, go to step 6.
Return result

Rules for Removing Whitespace

The rules for removing whitespace are as given in the following algorithm. As with the previous algorithms, when this one is invoked, the steps must be followed in the order given, aborting at the first step that returns a value. This algorithm returns a string with all space characters removed.

Let input be the string to be parsed.
Let result be the empty string.
While position doesn't point past the end of input and the character at position is not one of the space characters, append character to the end of result and let position become the character in input.
Return result.

Rules for Identifying the Content Type of an Image

The rules for identifying the content type of an image are given by the following algorithm.

If the first bytes of the decompressed file entry's data field matches one of the byte sequences in the bytes in hexadecimal column of the following table, then the MIME type of the resource is the type given in the corresponding cell in the second column on the same row:

Image MIME Type Signatures
Bytes in Hexadecimal	MIME type	Comment	Defined by
47 49 46 38 37 61	image/gif	The string "`GIF87a`", a GIF signature.	[GIF87]
47 49 46 38 39 61	image/gif	The string "`GIF89a`", a GIF signature.	[GIF89]
89 50 4E 47 0D 0A 1A 0A	image/png	The PNG signature.	[PNG]
FF D8 FF	image/jpeg	A JPEG SOI marker followed by the first byte of another marker.	[JPEG]

Widget user agents must ignore any rows for file types that they do not support.

Step 7 - Instantiating the Start File

To be written... this should be deferred to the security spec. I think we need a "core spec" that brings everything together.

8. Displaying Icons

To be written... will define a way to deal with icons of different sizes.

Appendix

Embedding a Widget Resource into an HTML Document

This section is informative.

This section only applies to HTML user agents [HTML4] [XHTML].

Auto-discovery enables a user agent to identify and install a widget resource that is associated with an HTML page. When a page points to a widget resource, user agents should expose the presence of the widget resource to the end-user and allow the end-user to install the widget.

The link type "widget" indicates that a link of this type references a document that is a widget resource. In HTML, it may be specified for the a, area and link elements to create a hyperlink.

For example:

<a rel="widget" href="http://widgets.example.org/exampleWidget">The Example Widget</a>

RelaxNG Schema of the Configuration Document

This section is informative.

The following RelaxNG schema is a representation of the elements and attributes of the configuration document. Products that check the conformance of widget resources or configuration documents can use this schema to validate configurations documents.

# Widgets 1.0 (Working Draft) RELAX NG schema

default namespace = "http://www.w3.org/ns/widgets"
namespace xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

xmllang.att = attribute xml:lang { xsd:language }

start = widget

widget = element widget {
  xmllang.att?,
  attribute id { text }?,
  attribute version { text }?,
  attribute height { xsd:string { pattern="[1-9]\d*" } }?,
  attribute width { xsd:string { pattern="[1-9]\d*" } }?,
  ( name? & 
    description? &  
    icon* & 
    access? &       
    author? &       
    license? &      
    content )
}

name = element name {
  xmllang.att?,
  text
}

description = element description {
  xmllang.att?,
  text
}

icon = element icon {
  attribute src { xsd:anyURI },
  empty
}

access = element access {
  ( attribute network { "true" | "false" } |
    attribute plugins { "true" | "false" } |
    ( attribute network { "true" | "false" },
      attribute plugins { "true" | "false" } ) ),
  empty
}

author = element author {
  xmllang.att?,
  attribute url { xsd:anyURI }?,
  attribute email { xsd:string { pattern=".*@.*" } }?,
  text
}

license = element license {
  xmllang.att?,
  text
}

content = element content {
  attribute src { xsd:anyURI },
  attribute type { text }?,
  empty
}

Acknowledgements

Special thanks go to Arve Bersvendsen, Anne van Kesteren and Charles McCathieNevile who helped edit the initial version of this specification. Special thanks also to David Håsäther for creating and maintaining the RelaxNG Schema for the configuration document.

Parts of this document reproduce text and behavior from the HTML5 specification and from the XBL 2.0 specification (as permitted by both specifications by their copyright).

The editor would also like to thank the following people for their contributions to this specification:

Arthur Barstow
Brian Wilson
Bjoern Hoehrmann
Cameron McCormack
David Pollington
Dean Jackson
Gautam Chandna
Geir Pedersen
Gorm Haug Eriksen
Guido Grassel
Ian Hickson
Jon Ferraiolo
Jouni Hakala
Lachlan Hunt
Marc Silbey
Olli Immonen
Tex Texin
Thomas Roessler

References

This section will be completed as the document matures.

[CP437]: cp437_DOSLatinUS to Unicode table. ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT
[Deflate]: DEFLATE Compressed Data Format Specification version 1.3, P. Deutsch, The Internet Society, May 1996.
[DOM3Core]: Document Object Model (DOM) Level 3 Core Specification, A. Le Hors, P. Le Hégaret, L. Wood, G. Nicol, J. Robie, M. Champion, S. Byrne, editors. World Wide Web Consortium, April 2004.
[ABNF]: Augmented BNF for Syntax Specifications:ABNF. D. Crocker. The Internet Society, October 2005.
[CSS21]: Cascading Style Sheets, level 2 revision 1; CSS 2.1 Specification. Bert Bos, Ian Hickson, Tantek Ãƒâ€¡elik, HÃƒÂ¥kon Wium Lie. W3C, April 2006.
[RFC2045]: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, RFC 2045, N. Freed, N. Borenstein, November 1996.
Available at: http://www.ietf.org/rfc/rfc2045.txt
[RFC2119]: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997.
[RFC2822]: Internet Message Format. P. Resnick, IETF, April 2001.
[RFC3986]: Uniform Resource Identifier (URI): Generic Syntax
[RFC3987]: Internationalized Resource Identifiers (IRIs) . M. Duerst, M. Suignard. IETF, January 2005.
[UTF-8]: RFC 2279. UTF-8, a transformation format of ISO 10646. F. Yergeau. January 1998. http://www.ietf.org/rfc/rfc2279.txt
[Widgets-APIs]: Widgets 1.0: API's and Events. A. Bersvendsen and M. Caceres, Eds. TBP.
[Widgets-Digsig]: Widgets 1.0: Digital Signature. M. Caceres, Ed. W3C, TBP.
[Widgets-Reqs]: Widgets 1.0 Requirements, M. Caceres, Ed. W3C, July 2007.
[Widgets-Security]: .... coming soon.
[Widgets-Updates]: ... coming soon.
[Widgets-Landscape]: ....
[XML]: Extensible Markup Language (XML) 1.0 (Third Edition). T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau. W3C, February 2004.
[XMLDsig]: XML-Signature Syntax and Processing. Mark Bartel, John Boyer, Barb Fox, Brian LaMacchia and Ed Simon, authors. Donald Eastlake, Joseph Reagle, and David Solo, editors. W3C, February 2002.
[XMLNS]: Namespaces in XML (Second Edition), T. Bray, D. Hollander, A. Layman, R. Tobin. W3C, August 2006. The latest version of the Namespaces in XML specification is available at http://www.w3.org/TR/REC-xml-names/
[ZIP]: .ZIP File Format Specification. PKWare Inc.
[PNG]: Portable Network Graphics (PNG) Specification (Second Edition), David Duce, ed., 10 November 2003. Available at http://www.w3.org/TR/PNG/.
[GIF87]: Graphics Interchange Format. CompuServe Incorporated, June 15, 1987. http://www.w3.org/Graphics/GIF/spec-gif87.txt
[JPEG]: ISO/IEC 10918. Information Technology Digital Compression And Coding Of Continuous-tone Still Images International Organization for Standardization (ISO).
[GIF89]: Graphics Interchange Format (sm). CompuServe Incorporated, 1990. Available at http://www.martinreddy.net/gfx/2d/GIF89a.txt
[HTML4]: HTML 4.01 Specification, D. Raggett, A. Le Hors, I. Jacobs, 24 December 1999. Latest version available at: http://www.w3.org/TR/html401
[XHTML]: "XHTML 1.0: The Extensible HyperText Markup Language", W3C Recommendation, S. Pemberton et al., 26 January 2000.
Available at: http://www.w3.org/TR/2000/REC-xhtml1-20000126
[ECMAScript]: ECMAScript Language Specification, Third Edition. ECMA, December 1999.
[Unicode]: The Unicode Consortium. The Unicode Standard, Version 4.0. Reading, Mass.: Addison-Wesley, 2003, as updated from time to time by the publication of new versions. (See http://www.unicode.org/unicode/standard/versions for the latest version and additional information on versions of the standard and of the Unicode Character Database).
[XMLHttpRequest]

Widgets 1.0: Packaging and Configuration

W3C Working Draft 14 April 2008

Abstract

Status of this Document

Table of Contents

1. Introduction

1.1 The Widget Family of Specifications

1.2 Design Goals

1.3 Definitions

2. Conformance

3. Widget User Agents

3.1 Zip support

4. Zip Archive

Allowed Compression Methods

Version Needed to Extract

File and Folder Names

Reserved characters

4.1 Dealing with invalid Zip archives

4.2 Extracting the File Data from a File Entry

5. Widget Resource

5.1 MIME Type

5.2 File Extension

6. Configuration Document

6.1 Namespace

6.2 Extensions

6.3 Attribute Values and Types

6.4 The widget Element

Attributes

6.5 The name Element

6.6 The description Element

6.7 The author Element

Attributes

6.8 The license Element

6.9 The icon Element

Attributes

6.10 The access Element

Attributes

6.11 The content Element

Attributes

7. Steps for Processing a Widget Resource

Step 1 - Acquire a Widget Resource Over HTTP or Local Storage

Step 2 - Verify the zip archive and its file entries

Step 3 - Locate the Digital Signature

Step 4 - Process the Digital Signature

Step 5 - Locate the Configuration Document

Step 6 - Process the configuration document

Configuration Defaults

Rules for Processing a Configuration Document

Rules for Getting Text Content

Rules for Parsing a Non-negative Integer

Rules for Removing Whitespace

Rules for Identifying the Content Type of an Image

Step 7 - Instantiating the Start File

8. Displaying Icons

Appendix

Embedding a Widget Resource into an HTML Document

RelaxNG Schema of the Configuration Document

Acknowledgements

References

6.4 The `widget` Element

6.5 The `name` Element

6.6 The `description` Element

6.7 The `author` Element

6.8 The `license` Element

6.9 The `icon` Element

6.10 The `access` Element

6.11 The `content` Element