Unicorn Code Review 2006-09-28

Unicorn Code Review - 28 Sep 2006

A walkthrough of the Unicorn Project Java source code was conducted on the 28th of September, 2006. The collective notes from this walkthrough are given here, as an introduction to the code organization. They may become obsolete: check the Unicorn Project Documentation for an up-to-date documentation index.

Source Location

The Unicorn Sources are located on W3C's public source repository. The Repository can be used to retrieve, or browse the code.

Input Classes

org.w3c.unicorn.input classes describe all the forms of (user) inputs to unicorn. URIInputModule, DirectInputModule, UploadInputModule are the handlers of their respective input method.

One particularity is Upload. Upload is implemented by two classes, FileItem (direct implementation) and FakeUpload (re-created from bits). The former is the implementation of the normal "upload" method, while the second is used to act as a proxy, creating an "upload" request from scratch. This can be used for an observer which can handle an upload method, but cannot handle the direct input method.

Why only a fake Upload class? Because no need for a fake URI class, as a URI is just a string. Same for direct input, which is just given as a string parameter; whereas for upload, there is more to be recreated from the original query

Contract Classes

The Contract classes at org.w3c.unicorn.contract are defining the communication between the framework and the observers.

The contracts are written in WADL, although we used a slightly simplified version. Unicorn parses the observer's WADL files via XPath to build a java object describing the capabilities of the observer, the parameters it takes, etc.

The XPath code could be replace by WSDL if we want to do "proper" Web Services. If we do, note that WADLUnmarshallerXPath.java may have been a wrong naming, but renaming should be easy.

The second part of the contract is a RDF file, with localization information about the observer - names etc, plus the names and descriptions for the UI parameters.

The RDF parsing class uses JENA to parse the RDF, then processes xpath requests (interesting because they can be reused).

Parsing of the WADL and RDF file result in an Observer object.

Tasks

The Tasks classes org.w3.unicorn.tasklist handle what unicorn shows to the end user. Tasks are useful groupings of observers. They help simplify the UI. Tasks also define a few default values to interface with the observers.

Implementation of the Tasks are very similar to the Contract code. The Tasks are defined by two files: an RDF description of the tasks + an XML task list.

First JAXB is used to build a task object from the XML task file (using an XML schema to know the grammar of the XML task file), then RDF file is parsed for descriptions/names/localization.

There was some work done to only use one file, merge the XML task and RDF files. This was not entirely done, for the sake of simplicity. Two files easy to edit are better than one incomprehensible file.

The RDF parsing code is similar to the one used for the contract RDF file.

Unmarshalling the Tasks

The TaskListUnmarshallerJAXB class is here to finish what JAXB starts. THe objects provided by JAXB only have a tree for which tasks know the names of each observer. The TaskListUnmarshallerJAXB code takes care of expanding this, and link actually to the Observer's object.

<Jean-Gui> org/w3c/unicorn/tests/ExpandTest.java has a test (and simplified version) of the graph reading algorithm, with depth unlimited, but avoids infinite loops. This code is a little complex because it only replace textual values taken by JAXB with actual objects. It gets a little hairy when walking through parameter values, for instance.

the parameters package

the package parameters creates all the UI widgets from the tasklists. It links parameters of a task to parameters of observers. It also performs verifications on the types (e.g strings, or a default value). For example, for the output, we say that there is an output parameter, which by default it is e.g ucn. but for some observers we give a specific value (e.g xml)

The XML parser's dump

org/w3c/unicorn/generated/ is where all the files generated by JAXB get written

The Framework Object

Contract + Task are grouped in a Framework object. The Framework object is statically created, only one instance, at framework initialization. Thus avoiding concurrent access.

This means that Any change to the tasks or contracts require a reload of the servlet to be actually propagated to the system.

Sending Requests to Observers

org.w3.unicorn.request has all the objects to make the request to the observers. When the servlet receives a request, it maps all the parameters and creates an object UnicornCall which will then create the requests.

The requests are flexible, to allow different HTTP methods (e.g the direct request is an HTTP GET in CSS validator, but an HTTP POST in the markup validator. Possibly, we could also add a URIRequest class for HTTP POST (even if it would make little sense).

Note that Request is a class and not an interface because of many common things across all requests. Therefore, we use it as a factory.

RequestList - is a way for us to store all the requests with all common informations for all the requests. We requests by priority - which makes it easy to call them ordered by priority high to low. Then the request are handled by batch, all the high priority ones first, then the mid ones if the high ones succeeded, then the low ones.

Lower priorities requests are not handled if a higher priority request failed.

Template-generated files: index and output

org.w3c.unicorn.index is the index generator. What it does is:

picks templates files in resources/templates
use the tasks list already parsed
and generates the index files with the template engine (velocity

The indexes are generated once when the server is started, so if the tasks are modified, the server needs to be restarted.and the indexes are regenerated automatically. One alternative idea would be to monitor task lists and contracts and reload them automatically, regenerates the indexes, but that's not necessary for now.

The velocity template engine is really easy to use: you just give it the the templates, and all the objects you want to feed it, and velocity just mixes the soup.

Velocity is not xml or html specific, it can generate about anything. It can handle variables, some conditional loops, and a bit of logic. The only difficult thing is to know where to get the information, so it takes a minimum of knowledge of the code and of the classes, to know how to get the variables you need.

In the case of the index generation, it takes some knowledge of the task list object.

For localized documents, what needs to be done is to get the proper variable, with the locale, from within the template. Here is a sample using a localized variable:

<option value="$task.getID()">$task.getLongName("en")</option>

If the value does not exist for the language requested, the template manager will use the language by default, ehich at the moment is hard coded in org/w3c/unicorn/util/LocalizedString.java. We could make it so that this is handled through the configuration file.

JG notes that localization is one of the areas that was not thoroughly tested

we only have the Accept-Language passed, and the default language

Output generation

org.w3c.unicorn.output is the code generating the Unicorn output from the observation responses it has received from the observers.

Output is managed by formatters, producing HTML, XML or plain text. In a way similar to the index generation, the output is created with templates and the Velocity engine, based on the results of the requests: A formatter takes a list of all the responses of all observers, and passes that through the templates.

The modules are used to define each type of output. At the moment this is limited to XHTMLoutput, but we could think of other formats: plain text, mail, EARL, etc.

The choice of output module is passed to unicorn as a parameter (output=...)

Note that there is not yet anything in the index generator to add the UI widgets for selection of output modules. This could be added by passing the list of possible modules to the index generator

Bits of testing code

org.w3c.unicorn.tests holds small tests of code. One exception is that the Servlet code in that directory is actually the servlet code being used as the Unicorn servlet. Historically, what started as a bit of test coded ended up working just fine, and was ... we may want to move it somewhere else, eventually.

Other test code bits include commandline, which was a first try for the framework, probably still compiles but does not work any more

Logging

Unicorn has a big system of log, configured in log4j.xml. We use log4j.xml to define the levels of logs, and where the logged information must be routed:

Logging is done by level:trace, info, warn (problems but automatically fixed), error (blocking problem)
logging is also done by code package
and a full log file for quick lookup of issues.

<param name="File"
    value="${catalina.home}/webapps/unicorn/WEB-INF/logs/level/error.log"
    />

		 .
		 |-- all.log
		 |-- level
		 | |-- debug.log
		 | |-- error.log
		 | |-- info.log
		 | |-- trace.log
		 | `-- warning.log
		 `-- package
		   |-- contract.log
		   |-- index.log
		   |-- input.log
		   |-- output.log
		   |-- request.log
		   |-- servlet.log
		   |-- tasklist.log
		   `-- util.log

log4j is adapted to a tomcat installation, it would have to be modified for another installation

Other files in the code structure

note also the tomcat policy file for management of resources, writing permissions etc