The project addresses problems frequently faced by users who look for information on the World-Wide Web (WWW), and by information providers who wish to create and maintain a multilingual information offering on the web.
The problem faced by information seekers is dealing with the ever increasing amount of information that is available in different languages on the web. Existing search services are very efficient in locating documents, based on a keyword query. However, the prevalent string-matching search techniques are too coarse-grained because the query must be formulated with substrings of the actual document. Information in the same language can only be found if exactly the same words are used in the query and in the document, and documents in different languages can only be found by formulating different queries in different languages. Recall is poor because many relevant documents are not found. On the other hand, precision is also poor because there are often documents which are irrelevant to the query, but contain terms mentioned in the query. Therefore choosing among the set of automatically selected documents can be a very time-consuming task for the user, because she has to manually filter out those documents which are actually relevant.
The problem faced by information providers is that there are currently no adequate tools for creating and managing multilingual information offerings for the web. When versions of a web page exist in different languages, it is very tedious to provide the appropriate hyperlinks for each version, and to keep the versions consistent with each other. There are no adequate tools currently available for delivering documents to a user in her preferred language. Moreover, there are currently no technologies available for automatic hyper-linking of documents, which can help to create a denser and more systematic network of links than manual hyperlinking of documents.
An additional important drawback of existing web technology is the lack of information about the function of hyperlinks in a document. Different links can have different semantic relations to the document, i.e. they can point to different sorts of information, for example related documents, definitions, images, translated documents, external resources, etc. The meaning of a link is not transparent to the reader of a document. As a consequence, the user cannot specify what kind of links she is interested in, and the content provider must restrict the links to only a subset of the potentially interesting relations between documents in order to avoid overloading a document with links. The full potential of automatic intelligent hyperlinking, which can generate links for all sorts of different relationships within and between documents, cannot be exploited if only one kind of link is available; in particular, it is not possible to associate one anchor in a document with different links which stand in complementary relations to the source document.
The goal of the proposed project is the development of a leading-edge application that facilitates multilingual, selective access, navigation and browsing, and filtering of information in an efficient and purposeful way, and that supports the creation, indexing, linking and maintenance of multilingual documents by content providers.
The intended application can run entirely on the server of the content provider, so that the end user needs only a standard web browser such as Netscape or Mosaic.
The application will be realised as a group of interacting tools which improve access to information (search and navigation) in multilingual web document collections, and support the creation and maintenance of multilingual information offerings in the web by content providers. The set of tools will provide the following functionalities:
The envisaged new functionality concerning multilingual access and intelligent hyperlinking for the web is best achieved when the different tools are considered as interacting parts of an integrated application. On the one hand, the multilingual functionality allows a retrieval on a wider range of documents, while advanced concept-based search already filters the retrieved documents according to user goals. On the other hand, intelligent hyperlinking and interactive multilingual navigation tools provide effective support for the comfortable utilisation of web documents by a multilingual user community.