Unicorn Implementation Questions
Who will parse the document? (Solved)
There are several possibilities:
-
UniCORN
parses the document and dispatches fragments to each interested module.
This means that the framework must be able to parse any document and can be quite hard to maintain, knowing that most (all?) validators and checkers already include a parser. It would probably also mean to adapt observers so that they can handle these fragments.
On the other hand the framework has a total control on what will be sent to each observer and knows this is coherent data (since he parsed it).
-
UniCORN
determines in selected observers which one is the most suited for parsing the
document, gives it to this module, with instructions of fragments to keep,
which processes the document and gives the results to other modules (or to the framework...
see below).
This mode would imply a huge amount of communications (to transfer the fragments), a lot of modifications on the observers side (for them to know how to handle these fragments). Finally, UniCORN should be able to determine the "best" parser, so it would be necessary to add a "parsing priority" in the contract, for each document type an observer can handle.
The model would have "What is this" and "Who should handle this". Who should handle something is a subset of "Who can handle this". The sets may not be equivalent. As a example, in the following XSLT here the content "is" XHTML and XSLT at the same time, but should only be processed by the XSLT checker. Another example is link checking, while sometimes you might want to check for conformance and broken links at the same time, e.g. when the W3C Webmaster checks a document against pubrules, you typically just want to check for conformance.
-
UniCORN
gives the document to each selected module and lets them parse it.
This provokes superfluous parsing. Another problem is that we may get
inconsistencies as we do several times the same job with different tools.
For example, we can see that the CSS validator
and the Markup validator don't parse
(X)HTML in the same way, leading to different results in some cases (e.g.:
the Markup validator allows attributes written
without spaces between them, whereas
the CSS validator does not).
If we validated such a special document for both XHTML and CSS, we would get something like:
The main advantage is its easiness to implement, since existing validators and checkers already parse whole documents. Another good point is that it may avoid heavy communications between observers and UniCORN.This page is valid XHTML!
CSS couldn't be checked since the XHTML is not valid. - other solutions ?
Personnally (Jean-Guilhem), I think the best solution would be the third one, because it avoids huge modifications on observers, and resulting bugs. I don't think neither that having multiple parses would lead to many inconstencies (existing validators are nearly perfect... ahem :-)), and in the case it might happen, a simple message could explain that the validators/checkers have some issues...
Actually we have chosen to use the third solution.
How observers should interact with each others and with UniCORN? (Solved)
In the previous question, we have seen that if an observer parses the document and emit fragment to other observers, there are two possibilities:
- The "main" observer sends the fragment to the framework which will have to dispatch them to the correct observers
- The "main" observer communicates directly with the other ones (this implies that the "main" observer knows where to send the information).
This problem exists only if we choose the second solution in the previous question, but a problem still remains: how do observers and the framework communicate together? We think that the solution would be to use SOAP messages or EARL documents (and maybe only GET requests in the framework→observer direction if we choose the third answer in the previous question).
Actually the observer didn't communicate between us so this question doesn't make sense.
Which implementation for the framework? (Solved)
Since validators/checkers are already developped using various technologies, we should be quite technology-free for UniCORN, we only have to use a standard way of communication between the different entities (as explained before).
We think we will develop it in Java because it's quite easy to use it, and we already know it (contrary to PERL). We still don't know what technology to use (servlets, JSP, ...) and have to get documented on that.
Identify document type
Because the framework enables to validate/check many type of document as XHTML document, CSS stylesheet, XML document and other, we need a solution to identify a document. In most case, we can rely of the Content-Type provided by other servers or MIME-Type for file uploads.
The main problem is what to do in case of direct input?
- Add a "mime-type" parameter in the framework's call (mandatory if the method is direct input), and drop-down list in the "Direct input" interface.
- Let the framework make some parsing to identify his type. That could be quite hard if the user enters a CSS for example (hardly nothing identifies it)
Actually we use a drop-down list in the "Direct input" interface.
Do we use the word "validation" in a good way in these documents ?
We hope we are not mistakening when we use this word. It's possible that sometimes we are mixing validation and check.
Observers output
Observers' results can be written in SOAP or
EARL. The question is which of them would be
the best to communicate with the framework? SOAP
output already exists for some validators, but it is not normalized
(CSS validator has one,
Markup validator has a different one).
So, will it be possible to have a generic handling
of these SOAP
messages if they are all different? Yves seems to say yes, but we don't really understand
how because how will the framework know the way to handle results? For example, if an
observer decides to write it's messages in a <msg> tag, and another one
in <error>, the framework won't probably know how to manage them. So,
we think that having a "universal" WSDL would be a lot easier, but...
When Yves returns to work we will talk about that problem with him.
File upload (Solved)
How can we handle file upload? The framework might read the document and send its sontent
to observers using the text parameter (this parameter exists in the
CSS validator and
Markup validator) with the good MIME-Type.
Another way is maybe to upload the file to each observer.
We use a specific class
(ClientHttpRequest
write by Vlad Patryshev) to upload file received from client to any appropriate observer.
Use of WSDL/WADL
Yves told us that it would be great to use WSDL or WADL to describe the contract. But we are not sure it will enough to describe everything, because the contract is not only used to know how to call an observer. It contains also information on the possible values of these parameters, a description of the UI, ... We will discuss this problem with Yves as soon as he is back from holidays.
Parameters URI, DirectInput and FileUpload
This three parameter are specific cause they define the input method. So we need a solution to handle observer will not allow all this input method.
An approach can be : Describe in the RDF of observer if it can handle each parameter. In the client interface enable only the input method allowed by the selected task. Show, for task who need it, information about input method not handle.
Select one observer between two or more.
In some case it will be usefull to select only the best observer to handle a document and avoid other observer even if they also can handle the mime type of the document.
Actually the UniCORN framework checks each observer and call those who can handle the mime type of the document.
