W3C: WD-form-filling-960416

Client Side Automated Form Entry

W3C Working Draft WD-form-filling-960416

This version:
http://www.w3.org/pub/WWW/TR/WD-form-filling-960416.html
Latest version:
http://www.w3.org/pub/WWW/TR/WD-form-filling.html
Author:
Phillip M. Hallam-Baker <hallam@w3.org>

Status of this document

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at: http://www.w3.org/pub/WWW/TR

Note: since working drafts are subject to frequent change, you are advised to reference the above URL, rather than the URLs for working drafts themselves.

Abstract

The same information is frequently requested on multiple forms. A mechanism is proposed which avoids the need for repetative re-entry of the same data. Unlike previous proposals this mechanism does not depend on the compilation of a dictionary of common terms. Each content provider is able to define private dictionaries where desirable. The scheme remains tractable since the interests of content providers will encourage use of established schemes whenever possible.

Introduction.

HTML forms have proved a highly usefull extension to HTML. Unfortunately it is frequently necessary to re-enter the same information into many different forms, in particular personal identifiaction information such as name anc contact data.

While it is not possible to compile a list of all the information which might possibly be duplicated it is possible to identify the fields in a form which are likely to be entered in another.

Subscription at Content Provider.

Many sites require or encourage users to go through some type of subscription proceedure. In many cases these forms ask the user for the same set of data which the user must retype each time. Providing the user with a rapid and painless means of entering this data reduces the cost of subscription and consequently increases the likelyhood that a user will subscribe.

The type of data requested for subscriptions is generally related to identity and communication. In addition internet services are frequently interested in the nature of the connectivity their users have.

Similar requiremente are the need to leave a "business card", a delivery or billing address at a site.

Workflow Application

In an Intranet a worker may be required to fill in multiple forms with company specific fields such as a company specific worker identification number. A company may also assign identifiers for use by people outside their organisation such as a customer identification number, task tracking number or such.

An important special case of the need to fill out the same data in multiple forms is submission of federal and state tax returns.

A convention for labeling of form fields would also be very usefull for interfacing automata to HTML forms designed for human use.

URIs vs Central Registration

Proposals for standardized HTML field names have been suggested since the first implementations of Web based forms. A recurring problem with such proposals was the difficulty of specifying a set of standardized field names which contained all the fields which might be commonly used.

Construction of a universal lexicon is a complex and difficult task. It is essential to distribute the task in a manner which permits individuals to add to the registry individually. In the Web the general approach has been to use the IANA DNS database as the sole name registry since registring with DNS is effectively a requirement for use of the Web.

The approach taken in this proposal is to provide a mechanism which is capable of supporting an arbitrary number of labeling schemes. In practice however it is highly likely that users will wish to reuse existing schemes wherever possible since that will maximize their effectiveness.

Mechanism.

The mechanism consists of two parts, extensions to HTML to permit a collection of related default values to be declared and an interface to some form of persistent storage. Since the data entered is likely to be of a personal nature privacy issues must be addressed.

HTML Extensions.

The binding attribute is used to identify a set of bindings of form fields to client specific values. A binding is used to brovide a default value if the extract entity is specified in an input, select or textarea tag.

<PRE>
<FORM BINDING="http://www.w3.org/bindings/generic">
Full Name: <INPUT extract name="full-name">
Email:     <INPUT extract name="email">
Phone:     <INPUT extract name="telephone-work">
<INPUT name="">
</FORM>
</PRE>

The binding attribute may be specifed for individual fields. This permits a form to incorporate information from multiple bindings.

<PRE>
<FORM BINDING="http://www.w3.org/bindings/generic">
Full Name: <INPUT extract name="full-name">
Email:     <INPUT extract name="email">
Connection speed:
<INPUT TYPE="radio" extract NAME="connection-speed" 
	BINDING="http://www.w3.org/bindings/connection" VALUE="14.4"> 14.4 Kbps or lower
<INPUT TYPE="radio" NAME="connection-speed" VALUE="28.8"> 28.8 Kbps
<INPUT TYPE="radio" NAME="connection-speed" VALUE="256"> 256 Kbps
<INPUT TYPE="radio" NAME="connection-speed" VALUE="t1"> T1 Line
<INPUT TYPE="radio" NAME="connection-speed" VALUE="t3"> T3 Line
<INPUT TYPE="radio" NAME="connection-speed" VALUE="unkown"> Don't know
Service type:
<INPUT TYPE="radio" NAME="service-type" 
	BINDING="http://www.w3.org/bindings/connection" VALUE="isp"> ISP

</FORM>
</PRE>

Note that in this cast the extract flag and binding attribute need only be defined for one of the the "radio button" alternatives.

Automatic entry of forms data must not be performed for hidden fields, read only fields, password fields or any other field that the user does not normally use for data entry. This means that the DTD should not permit use of the extract attribute with the type values hidden or password.

ISSUE: Should there be a means of flagging fields to individual require user confirmation for each entry? E.g. we might want a parcel tracking number to default to the last tracking number used but ask the user if it really is the same parcel.

ISSUE: Should there be a way to add data into the database except via a fill in form entry? This would permit a vendor to provide the tracking number to be used when making enquiries with the delivery service. The extract attribute might be generalized to a store=extract,insert form. This would also make it possible for a form to specify that an entry should not be saved.

Database Storage.

Some form of persistent storage is required to store the values entered by the user. In the simplest implementation this storage need only be indexed on the binding URI. In more sophisticated implementations designed to simplify Intranet workflow

ISSUE: Should advice be given on the nature of the database API to encourage providing information likely to be needed in future? The URI of the source document is likely to be the minimum required by an agent. Providing all the data in the relevant tag fields is likely to be very usefull.

Privacy Concerns.

Automated filling in of forms raises a number of privacy concerns. In general automated form entry should be a convenience to the user only. It should not permit a server to obtain information without the explicit permission of the user.

Summary of privacy recomendations.

Common Binding Sets.

This note defines two binding sets for common use. The first is a generic binding intended to cover most subscription type applications. The second provides internet specific information.

Generic Binding Set.

It is convenient to pre-define a collection of fields for generic identification purposes. This set will be assigned a URI within the w3.org space.

The ommission of US social security numbers from this list is intentional. Social security numbers are issued by the US government for their own use and unauthorised usage is strongly discouraged. Accordingly it is inappropriate for such an index to be included in a generic list. For similar reasons there is no default password field.

Identity information.
full-name
Full name.
first-name
First name.
middle-name
Middle name
middle-initial
Midle initials.
last-name
Last name
country-residence
Country of residence.
nationality
nationality
date-of-birth
Date of birth.
username
Username (i.e. nickname).
Contact Information
email
Email address.
home-page
Home page.
telephone-work
Work phone number.
telephone-home
Home phone number.
telephone-mobile
Mobile phone number.
telephone-fax
Fax number.
Postal Information
address
The full postal address.
address-country
The postal country. Note that this may be different from country of residence, especially for military personal posted overseas.
address-shire
Most countries are divided into counties. The US is divided into both states and counties with states being the more relevant postal division . In order to avoid confusion the english term shire is used. The shires were ruled by independent kings and princes between the Roman and Norman occupations.
addrsss-town
Postal town or city.
address-street
Postal address street and number information.
address-code
Postal code number.
Postal Information (billing)
billing-address
Billing version of address
billing-address-country
Billing version of address-country
billing-address-shire
Billing version of address-shire
billing-addrsss-town
Billing version of address-town
billing-address-street
Billing version of address-street
billing-address-code
Billing version of address-code
Organisational Information.
organization
Organizational affiliation.
position
Organisational position.

Internet Connection Binding.

Enquires concerning a subscribers internet connection is the second most frequently recurring collection of fields.

connection-speed 14.4 | 28.8 | 256 | t1 | t3 | unknown
Speed of user's connection.
weekly-usage
Number of hours the user uses the internet per week.
net-years-usage
Number of years the user has been using the internet
web-years-usage
Number of years the user has been using the Web

Further Work.

The present proposal covers data entry into forms. It would be convenient to be able to encode data in an HTML document in a form which allowed it to be accessed in machine readable form. For example people might start encoding their business card information into their homepage so that it can be captured an authomatically entered into a database by an intelligent browser.

The binding tag provides a generic method of binding assertions to a HTML form. There is no reason why those assertions should be limited to default values. Since the binding URI could be resolvable this could provide a variety of information concerning the form entry process.

In the simplest scheme the binding might resolve to a collection of type and consistency constraints on the form data. Use of such constraints would provide functionality similar to that of JavaScript [Netscape] but with the advantage of permitting constraint specification to be separated from the document itself.

In a more sophisticated scheme the binding might be defined through a knowledge exchange protocol. A simple knowledge exchange scheme might be used to provide context sensitive help to the user amongst other uses.

More sophisticate knowledge exchange protocols might automate the collection and organisation of material relevant to a form. This might be used to simplify tasks such as filling of a tax return, information being automatically gathered whenever a tax relevant transaction occurred during the year, the data would then be collated according to instructions provided by the binding URI. Essentialy the binding URI provides a linkage between HTML and an intelligent agent without requiring the capability of parsing unrestricted text.

Acknowledgments

Thanks due to Dan Connoly, John Mallory, Rohit Khare. [Expand].

References

[Hallam96]
Phillip M. Hallam-Baker Extended Log File Format