W3C

Shop Log File Format

W3C NOTE 15 November 2000


This version:
http://www.w3.org/TR/2000/NOTE-shoplogfileformat-20001115
Latest version
http://www.w3.org/TR/shoplogfileformat
Authors:
Erik Moeser, Exody E-Business Intelligence GmbH <Moeser@exody.net>


Abstract

Exody E-Business Intelligence GmbH has created a new log file format which logs all visitor and customer movements and revenue-related activities in online shops. Analysis of this data reveals the shop's strengths and weaknesses. The shop log file is written by the shop, not by the web server.

The log file format is essentially compatible to W3C Extended Log File Format as supported by commonly used web servers such as Microsoft Internet Information Server (IIS), Version 4 and later.


Status of this Document

This document is a submission to the World Wide Web Consortium from Exody and Intershop (see Submission Request, W3C Staff Comment). For a full list of all acknowledged Submissions, please see Acknowledged Submissions to W3C.

This document is a NOTE made available by W3C for discussion only. This indicates no endorsement of its content, nor that W3C has had any editorial control in its preparation, nor that W3C has, is, or will be allocating any resources to the issues addressed by the NOTE.

A list of current W3C technical documents can be found at the Technical Reports page.

Table of Contents




Special features of the format

  • This format can be analyzed by all conventional log file analysis tools by virtue of being compatible to W3C specification.
  • This format easily integrates with standard shop systems or shop applications (e.g. those based on Java, Allaire ColdFusion, Microsoft Active Server Pages (ASP)).
  • This format is already supported by leading shop System vendors such as AEON, Hybris, Internolix, Intershop, Openshop and SAP.




Specification



This format is an extension of W3C Working Draft WD-logfile-960323.



Header



The first entry in the log file is the header.


#Software: Name and version of server (e.g. Some-Shop 2.0)
#Version: 1.0 (Version of specification, not of the shop system!)
#Date: yyyy-mm-dd hh:mm:ss (Date and time of log file creation)
#Fields: date time c-ip cs-customer-id cs-method cs-uri-stem cs-uri-query cs(user-agent) cs(referer)


Please refer to WD-logfile-960323 (http://www.w3.org/TR/WD-logfile.html) for the exact specification of the header. The fields listed in this example present the minimal requirements to be included in the implementation of the log file. Further fields can be added, see WD?logfile?960323 for examples.
The field cs-customer-id is the only non-standard field added by this specification. All the other shop specific information is encoded in the standard fields cs-method, cs-uri-stem and cs-uri-query.



Body

Following the header, each "access" is recorded in a line by itself using the fields specified in the header above. Fields are separated by spaces. The following table presents an overview of the individual fields. The field "cs?customer-id" is newly introduced for this specification. The fields "cs-method", "cs-uri-stem" and "cs-uri-query" are standard fields, as known from web server log files, with extended meanings, allowing the log file to be analyzed with conventional analysis tools, while enabling evaluations specific to shop operations.

Field Name Syntax Content Standard Field
Date yyyy-mm-dd date of access yes
Time hh:mm:ss time of access yes
c-ip 123.45.67.89 client IP address yes
cs (User-Agent)   agent yes
cs (Referer)   referer yes
cs-customer-id   customer number in shop system
or character "-"
no
cs-method GET, SEARCH, PROD,
ADDBI or ORDER
see below extended
cs-uri-stem /page name [or] /event name see below extended
cs-uri-query see below see below extended


 


About the field cs-method



The field cs-method is set to
  • "GET" to denote events corresponding to pages which are not listed in the following
  • "SEARCH" when a visitor does an on-site search, e.g. for a certain product
  • "PROD" to signify a product description view
  • "ADDBI" when a visitor adds a product to a shopping basket/cart, e.g. by clicking on "add product to basket"
  • "ORDER" when a visitor finalized the order of the contents of the shopping basket




About the field cs-uri-query



The field cs-uri-query consists of several sub-parameters which are divided by the delimiter "&". Which sub-parameters occur depend on the field cs-method. Depending on the value of cs-method, the field cs-uri-query has the following format:
  • GET: a minus "-"
  • SEARCH: the search query of the on-site search (e.g. "Douglas%20Adams")
  • PROD: Product_ID&Name&Category
  • ADDBI: Product_ID&Name&Category
  • ORDER: TotalPrice&Payment&Shipping
  • and for each position:
        &Product_ID&Name&Category&Units&NetUnitPrice&NetPurchasePrice


Descriptions of the sub-parameters:

Sub-Parameter Description Example
Product_ID ID of a given product. 10800-1
Name Name of a given product. French Language Courses
Category Shop category of a given product. If shop categories are organized in a tree structure, the category can be recorded like a directory path. If the shop does not contain any categories, the parameter is set to the minus character "-" /Electronics/Music
TotalPrice Total price of a given order, including all applicable discounts. Discounts may apply to the order as a whole, but must then not be divided among products. The total price does NOT include fees for shipping and handling and VAT. 839.48
Payment Payment method used for a given order. Mastercard
Shipping Shipping method used for a given order. UPS
Units Number of units ordered of a given product. 3
NetUnitPrice Price per unit for which the merchant is selling a given product to a customer. The NetUnitPrice does NOT include VAT or discounts that apply to the order as a whole. Any discounts given for a bulk order will be calculated for and applied to the NetUnitPrice. 29.95
NetPurchasePrice Price per unit for which the merchant has purchased a given product. The NetPurchasePrice does not include VAT. If a NetPurchasePrice is not available, the log file records zero "0". 19.95


 

Notes
  • All prices are net.
  • Currency need not be fixed, but must remain the same throughout.
  • There is no currency symbol recorded.
  • Prices are recorded with up to four decimals, e.g. 123.45 or 5.6789.


Encoding of parameters and subparameters



WD-logfile-960323 recommends separating fields by white space, and enclosing text fields (which can contain white spaces) in double quotes ("). However, separating fields by a space and URL-encoding text fields (replacing a space with a plus) seems to be more common, although not mentioned in the WD. For example Microsoft Internet Information Server (which is one of the most used web servers today, according to the netcraft survey) uses this encoding. Therefore we recommend the following encoding:
  • Separate fields by a space
  • Do not enclose text fields in quotes
  • To ensure correct recognition of fields, text fields should be encoded with at least the following subset of the URL encoding:
Charakter Code
[Space] +
+ %2B
% %25
  • To ensure correct recognition of subparameters of the field cs-uri-query, text subparameters of the field cs-uri-query should be encoded with at least the following subset of the URL encoding:
Charakter Code
[Space] +
+ %2B
& %26
% %25



For example, the character string "Abc & 100%" would thus be encoded "Abc+%26+100%25"




Example Log File According to the Specified Format



This example illustrates the specified format in a hypothetical sample shop. Note: Due to the page margins, the present document contains line wraps not found in log files. The log file will only record a carriage return to indicate a new access; each line thus corresponds to one access.

#Software: XYZshop 4.1



#Version: 1.0

#Date: 1999-09-07 12:53:20

#Fields: date time c-ip cs-customer-id cs-method cs-uri-stem cs-uri-query cs(User-Agent) cs(Referer)

1999-09-07 13:41:41 192.168.1.179 1001 GET /Catalog - - -

1999-09-07 13:42:02 192.168.1.179 1001 ADDBI /Basket 10806&Supersonic+6000+Stereo+System&/Electronics/Music - -

1999-09-07 13:42:09 192.168.1.179 1001 GET /List - - -

1999-09-07 13:42:12 192.168.1.179 1001 ADDBI /Basket 10338&HealthRider+Home+Pro+-+Chrome&/Sports/Equipment - -

1999-09-07 13:42:46 192.168.1.179 1001 PROD /Product 10328&Aiwa+AM/FM+Stereo&/Electronics/Music - -

1999-09-07 13:43:16 192.168.1.179 1001 GET /ViewBookmarks - - -

1999-09-07 13:43:37 192.168.1.179 1001 PROD /Product 10800-1&French+Language+Courses&/Language+Courses - -

1999-09-07 13:43:40 192.168.1.179 1001 ADDBI /Basket 10800-1&French+Language+Courses&/Language+Courses - -

1999-09-07 13:43:58 192.168.1.179 1001 GET /SelectPaymentMethod - - -

1999-09-07 13:44:04 192.168.1.179 1001 GET /MemberOrderInformation - - -

1999-09-07 13:44:06 192.168.1.179 1001 GET /Password - - -

1999-09-07 13:44:11 192.168.1.179 1001 ORDER/ OrderConfirmation 893.48&5&UPS+-+US+2nd+Day+Air&10806&Supersonic+6000+Stereo+System&/ Electronics/ Music&1&150.0000&70.0000&10338&HealthRider+Home+Pro+-+Chrome&/ Sports/ Equipment&1&543.4800&434.7800&10800-1&French+Language+Courses&/ Language+Courses&1&200.0000&125.0000 - -




Log File Rotation Recommended



In addition to the specified log file format above, we recommend that you implement log file rotation. It allows an administrator to determine a time period after which the shop will start to write a new log file.All known web servers apply log file rotation to facilitate handling and storage of log files.

Specifically, we recommend a choice among daily, weekly and monthly log file rotation. We also recommend that shop log files are named to include the date in the file name, such as


shopYYYYMMDD.log


for example shop19990822.log, shop20000113.log, etc.

Please note that each new log file must begin with the header specified above.





Contact Information



Exody E-Business Intelligence GmbH

Erik Moeser

Mergenthalerallee 55-59
65760 Eschborn
Germany

Mail: moeser@exody.net



Valid HTML 4.0! Erik Moeser
Last modified: Date: 2000/10/26 13:51:07