Tim Berners-Lee

Date: Januray 6, 1997

Status: personal view. Editing status: Italic text is rough. Requires complete edit and possibly massaging, but content is basically there.

Web Architecture: More random notes

More notes not yet sorted

These are bits which have not been put into a design issues note but could <<<<<<< More.html be foddder in general for one

WS-Sociology

Much of Web Services, above the SOAP/WSDL layer, has been largely driven by large vendors with the model that existing code is given a specification which is first published as a proprietary specification, and then taken though the standards process. Charactersitcs of this environment for standardization are that the submitters expect a large amount of control, tolerate few changes to the specifications during the process, and are loth to commit to royalty-free open standards when the possibility exists of using intellectual property to protect the field which they hope to dominate.

Much of this work is being done under the OASIS process, which is more suited to this style of development.

======= be foddder in general for one

Workflow

The roots of workflow automation applications as expressed in Web application orchestration can be found in rules engine applications and the static, step-by-step, rules-based automation of production and manufacturing processes. This kind of workflow is now heading towards supporting people-based workflows as well.

-- Business processe and workflow in the W4b services world. Margie Virdell (virdell@us.ibm.com)
source

>>>>>>> 1.5

Math about fractal networks

see Fractal Web

[Amaya math play space! --- ignore this

Using a model where the conductance $c_{i j}$ between two nodes is the reciprocal of the "distance" $d_{i j}$ , the total conductance

$C_{i} = \sum_{j = 1}^{N} (c_{ij})$

you would hope would be finite (possibly distributed but relate to the group size or power, or just limite dto a valence of a node). If it is to be evenly distributed over order of magnitudes (fractally distributed). Then the density ρ of nodes at a given distance has to be inversely proportional to the distance

ρ $(r)$ = $\frac{α}{r}$

A model for the conductance (1/distance) is tricky. The 6 gerres uses the shortest path - which does not lend itself to treatment very easily. Anoethr approach would be to take total conductance C between two nodes as the electrical conductance in the net: paarllel conductances add, series resistances add, but calculating the conducntance between a and b for an arbitrary net involves solving simultaneous equations. Suppose current S flows between nodes a and b, and the individual currents in the links i and j are $s_{i j}$ then the total current flowing into nodes must be zero except for a and b:

${_{} Σ s_{i j}}_{j = 1}^{N}$ = $S (δ_{i a} {- δ}_{i b})$

and the current flowing across a link is voltage between the nodes multiplies by the individual link conductance (i ≠j)

$s_{i j} = (v_{i} - v_{j}) a_{i j}$

where we know that

Principles behind HTML

The design of HTML is influenced by traditions and experience from both the networking and the documentation communities. The networking and computing communities have understood for a long time the importance of isolating pieces of information which vary independently. The documentation and human interface communities have recognized that it is one important isolation to make is between "form" and "content". By "form" they mean the way information is presented on a screen, printed page, or for that matter in a multimedia presentation. By "content" they mean the actual text which contains the meaning. The reasons are manifold: There is a general principle that the way that information is presented is separate from the semantics of that information. In the publishing industry there is often different authorship on the style of information and on its textual content. This also applies to internal company documents, for example whereas one person may write a memorandum, it is another person who decides how that memorandum is distributed or put on the page according to the company stationary and conventions. The reasons for this are that it is a different person who changes the style to the person who changes the content; the authorship is different, the rate of change is different, and the skills needed to change it is different. When documentation is available online, there are further reasons for doing this. One is that if the semantics of the information are available independently of the design, this allows processing which is essential to the effective indexing of information. Another is that it allows the presentation to be changed especially for those who are disabled and cannot effectively make use of the original intended presentation. On the WWW information is stored, transferred, but most importantly, referred to in unit knowledge resource, or document. The separation of style and text, color and form and content implies therefore that it should be possible for style and text to be stored at different WWW resources with different URIs.

January 6, 1997 - 7:00 p.m.

HTTP Semantics

As has been mentioned above, the semantics of GET in HTTP are fundamentally special because the results of a GET, the state which is distributed is a slowly changing state and may therefore be cached and the GET operation can be optimized in special ways. There are, in fact, other operations in many systems whose results are reproducible but GET because it's as widely used, in fact, typically as cached much more frequently to greater effect that probably other operations could be. If GET is a special operation, then PUT, which is in some ways the contrary, the reverse of GET is also special although caching does not apply to it the special status is that once a document has been PUT the expectation is that a following GET given the same URL will produce the identical result. Now, this situation is of course complicated in that the concepts of generic URIs which allow the GET on the URI to resolve to one of many different forms of GET depending on the circumstances. One simple rule is to allow PUT operations only using URIs which refer to completely specified versions of an object. In fact, that is a fairly good rule which perhaps should be substantiated and made explicit in the specifications.

Axiom of PUT

PUT may only be to URI which is completely specific, i.e., when d-referenced will always produce the same stream of bits.

The semantics of POST were originally defined to allow new content to be developed. In order to understand this original intent, one needs to realize that the original assumption was that URIs would be dynamically allocated by the server rather than chosen for nemonic value by the client. Therefore, when a new document was generated a relevant piece of information to give to the server was a link from an existing document but one could expect the server to reply with the URL which had been granted for the object. Also envisaged was that the client should specify archive criteria and access control for the newly created object and the server would use these to determine how and where it should internally store the object and therefore a suitable URL to give the object. In practice, although URLs are often related to authorship, control, ownership and access, typically because most servers are based on simple unix-like file systems, the policy is first determined for given directories and then the user or the creator of the document chooses the URL of the new document to reflect the desired status of the document. In the original definition of POST the semantics were there that a new piece of information was being submitted that the HTTP headers provided information about the new object and in particular that because by virtue of being posted to a particular URI, the document was created and accessible by the new link from the document to which it was posted. The semantics of a document to which other documents can be posted are similar to those of a mailbox. The posting is like annotation or like sending a mail to a person. Hence, the term "??????". With the current motis operendi in which users typically allocate URLs for new documents, the use of POST does not allow them to do this, however, PUT had only been specified for use with the URIs of existing documents. Clearly in a world in which URIs are allocated by the server, the POST which creates a new URI must be used before PUT can be used. Now an ambiguity exists between a PUT which is designed to create a new document and a PUT which is designed to override an old document. This is a common difference in semantics between opening a file for creation and opening a file to be overridden or for that matter for being appended to, which exists in most file systems. Tackling it by specification of PUT is no great problem. [It just hasn't been done.]

The use of forms which could lead the user into making a POST to a given URI has extended the use of POST and this has not been reflected in any documentation of the semantics of POST. Still, the analogy with a mail-box holds, as the mail-box can be seen like many electronic mail mail-boxes as that to which things to be done, things to be processed are submitted. Indeed, the analogy with the creation of a new document with a link can be imagined as well that when a form is posted to a document that even though the document may be invisible to the user that there is a conceptual list of posted documents, each of which has a link from the resource to which it was posted. However, as it is not visible, this analogy is wearing thin and is probably not shared by the majority of the development community.

Well-defined Interfaces

This question as to the significance of a POST to a given URL raises the whole issue of well-defined interfaces on the World Wide Web. There are HTTP headers which allow a client to know which additional HTTP methods are supported by a given resource. However, this information is only sufficient when each method has a well-defined semantics in the specification. In the situation in which POST is used as a generic method for submitting any request, there is a need for information as to exactly what operations are available. In other words, what interface is exported the current situation for human initiated remote operations on the World Wide Web is that they are done with HTML forms and therefore the HTML form carries with it both the syntax and semantics of the operation. In other words, the human being reading the form understands or should understand the significance of pressing the "submit" button and the form itself contains information as to the type of the various parameters which may be submitted.

However, there is no way of querying a URL to know which forms it received information from. Imagine, for example, that seats on an airplane could be found by browsing through a plan or picture of the airplane on the World Wide Web. Imagine that having found a seat, the user wanted to perform operations on it. Typically, links or buttons or forms on the bottom of the picture itself give an indication of the operations available. A possible link type indicating relationship between an object and a form which could be filled in to apply to that object might be interesting. In any event, the conclusion must be that the original significance of POST as having the effect of an annotation or addition of new data to the Web is not the current one.

One could either make ways of exposing the definition of interface exported using HTTP PUT, or one could require that the message posted is self-describing. That is, there is enough information within the message to allow the object to determine what operation is requested and what the significance of the parameter is. Currently, if one attempts a POST to a URL using a form which has been incorrectly written or written by somebody else with insufficient knowledge of the processing engine behind the URL then there is no clean way as you would expect with a remote object oriented system for the system to check and indicate that an incorrect interface is being used. It is simply up to the CGI script or whatever is used to implement the processing engine to be smart about checking its parameters. The definition of interface of course contains two parts; the type of parameters and the semantics of the operations. Remote procedure call systems typically define the syntax, that is to say the data types which should be submitted and the names of procedures, and typically they leave the documentation and the procedure and parameter names to define for a human being the semantics. In the future both aspects of this should be addressed for Web operations.

Remote operations of changes to state

Currently, the use of the Web seems to be divided into two types. There is the distribution of state, there is the browsable Web of intellect, hypertext documents, and there is the set of forms which allow all operations to happen within some black box-some machine which performs some undefined and unverifiable function. However, there are changes afoot which will bring remote operations and the state of the Web together, in that there will be operations other than PUT which explicitly change the state of the Web and are therefore verifiable by later recourse to the state of the Web.

For example, editing functions which create converted copies of documents, functions which manipulate version trees and configuration management systems, and functions which request servers to duplicate a document under another URL, or insert metadata to bring about a document or annotations on a document, can all be defined in terms of the state of the Web before and after. This situation is interesting in that it allows proofs of correctness it allows systems to be tested, and although I admit that all the benefits is difficult for me to list at the moment, it is clear that such a situation in which there is state, and changes are evident and well defined, seems to have very desirable properties.

Certainly from the point of view of human interface, graphical user interfaces have the great advantage that the user seems to be manipulating state and can at any point see the results of his or her actions, as he or she for example drags folders from one place to another or changes a document in a WYSIWYG editor. Similarly, electronic commerce and global transactions would benefit from the verifiability, for example, of a mutually held list of invoices between two companies being visible to both sides and verifiable at any point, an improvement on a situation in which an invoice is sent as a message without well-defined semantics as to where it will be put, what will be done with it, or when if ever it will be paid. It is clear (is it?) that not all operations should be represented in this way, as changes of state. However, it also seems clear that a very large number of important operations can be. Therefore, the definition of some operations which effect the state of the Web and the implementation for those functions where it is appropriate, their deployment and actual use for those functions would be an important step.

Just to show the different ways of thinking about an operation, suppose I order a book from the bookshop. I can regard the operation for ordering as the sending of a message. The semantics are only understood by the book shop. Alternatively I can regard it as a change to the number of books which I have on order with the company, which is part of a list of books which I call in principle "order from the company". If I change that number, I can go back and verify the number later. The list can be a private or public verifyable statement. I can sign it, and anyone with access to it can prove things about the relationship I have with the company by comparing it with the list of payments I have made to the company and the list of books they have delivered. This is a simplified example, but it shows how one can transform an operation expressed as a message into an operation expressed as a change of state.

[January 23, 1997]

An Example of Metadata: the Application of Style Sheets

The metadata architecture outlined above is in fact nothing short of a simple language. The language is perhaps complex in that the information in this language consists not of a single program but of the whole series of different desertions by different people; on the other hand it is simple in that the basic unit of an assertion is fundamentally very simple. Similarly, the specific case of the link is also very simple. One specific application for which links have been used is worth discussing here in that it points out the advantages and disadvantages of this architecture. Link type of stylesheet has been used to indicate that a particular document should be viewed using the style specified by a particular other resource, or stylesheet. As an assertion this is less rigid than something which can be declared to be true or false: it is, if you like, a statement of preference. The cascading stylesheets language [CSS] tackles this question by specifically describing algorithms for merging the preferences expressed by different parties: this is the "cascading" of stylesheets. Consider the case in which two parties have made two assertions: one asserts the document "D" should be read using a stylesheet "A" and the other asserts the document "D" should be read using stylesheet "B". But if we take these assertions as metadata the rules are that they stand by themselves. Either assertion may be made independently, and when the two assertions are both made then the result is more information which may or may not be consistent but consists simply of the application of both assertions. As was stated before, there is therefore no ordering when different assertions from different places are taken to account: collections of assertions are inherently unordered in the World Wide Web metadata architecture. CSS, by contrast, has an ordering to the stylesheets which it cascades: every stylesheet, when taken into consideration, overrides any previous stylesheets in places where they clash. Therefore, we cannot represent the cascading of more than one stylesheet simply by representing two different assertions-two different applications of stylesheets.

The question arose as to how, then, to incorporate multiple stylesheets into the metadata architecture. One possibility is not to use metadata architecture at all, and to just use CSS. That is, whenever multiple stylesheets need to be invoked, they must be invoked from a single CSS document which may invoke other stylesheets in turn through the "@import" facility. In this case, as the top level stylesheet and any imported stylesheets are expanded there is a clear sequence to the operations and the cascading operation is therefore well defined. However, we lose some of the power of the metadata architecture, we lose the ability of two different authors --two different authorities--to independently make assertions about how a document should be viewed. This could be very useful, for example in cases in which the website administrator make some assertions, the author of the document makes some assertions and perhaps even a third party may make an assertion that for example, an association for blind people may assert that a particular document is best read by blind people using a particular stylesheet even when they have no control over the original document. Under these circumstances it is impossible for all parties to get together and write one single document, a master CSS stylesheet, which defines the cascading. So we see here that the metadata architecture gives us one operation for combining assertions: it is the simple combination which is so intuitive that we all take it for granted whenever combining information. It is the AND function: when two statements are made one statement AND the other is true. Because AND is commutative, it doesn't matter in which order one takes statements. So, we have seen that implicit combination using AND gives us the power to combine information from multiple sources which are not aware of each other. We have also seen that there is a limitation to the power of this method of combining information as exemplified by the cascading of stylesheets. There are many practical solutions to this which are conceivable. One is to give Links weight so that when the assertions are combined even though they come from different sources, there is some idea of waiting which can be used for example to order the input to a cascading stylesheet engine, or in other cases to resolve conflict between apparently conflicting assertions. The idea of a weight on a link is quite interesting in the hypertext architecture in other ways, but would be a significant change from the simple @@@ to next tape??