Profiles and Contexts: Definitions, Applications, and Problems There has been a number of research projects geared towards creating a contextualized information services; in essence, services which adapt the information to be presented to a profile of the user. As we have been building some experimental services, however, we have come to realize that the is not simply one of creating the most suitable filters, resolution, or matching engines. It is not simply a problem of creating an appropriate data model. Nor is it an issue of defining the pertinent profile of the user, or creating an appropriate matcing algorithm. It is all of these, and they are interdependent. First, I will define the problem space, as I perceive it. Then, I will discuss the various elements of the problem and the solution. I will then talk about the metaproblems surrounding them, and finally discuss what impact this will have for the current work. The problem at hand is creating a presentation that is optimized along a set of parameters, which include the adaptation of content and presentation. The level of granularity of this adaptation is very high (at the level of lexemes or their graphically semantic equivalents), essentially producing an individualized presentation of the content at hand (I will try to avoid the word "document" in this document, since a document is a fixed instance of a content object set with an associated fixed presentation, at least according to this authors definition). The high granularity of the adaptation assumes that the content is managed as a database, either by being marked up as one, by being managed in a DBMS, or through some equivalent means. The subject of the objects, as well as how their attributes are predicated, is immaterial to the current discussion. It can be a city map, a stock quotation ticker, a billing record, or any kind of information set we usually think about in terms of databases or document (since, marked up properly, a document becomes a database instance). The adaptation of an object set of high granularity (e.g. an XML file marked up at the lexeme level, or a database containing individual lexemes) takes place by matching a description of the current context of the user through a set of matching rules with the highly granular content object set. The processing rules can be expressed in a multitude of ways, from a DBMS internal language, to a formal rules language, to a programming language (high- or low level). Their function is to express the algorithm whereby the profile(s) is read and its (their) content(s) is used to process the object set into a presentation instance. There is currently no standardized way of expressing these algorithms, nor are the algorithms themselves standardized, and they are likely to be the subject of innovation - and hence, not suitable for standardization - for some time to come. The actual matching can take place through the DBMS internal systems, through a matching engine based on a formal language (such as Prolog), through a piece of specialized software, or through a variety of other means. How is immaterial to this discussion, but from a modeling standpoint, expressing the matching rules in a formal logic language has some advantages. Since this area is one of the research topics I want to point out for the future, I do not want to go deeper into this subject now, however. As can be concluded, the management of the data and its transformation are rather better understood than the final aspect, the profiles. The third component to this system is the profiles. It may seem like the only stakeholder in the system, and hence the only one requiring a profile is the user. However, even if this were true, there is no consensus what should go into a user profile. To completely describe a person would require an infinite data set, since in the absence of knowledge about what should go into the profile we are forced to put everything there - clearly, not very efficient. The fact is also that certain information is needed for certain adaptations but not for others - credit card information, for instance, for economic transactions but not for the adaptation of content to a location. As a matter of fact, I hold that we can distinguish three levels of profiles: Personal, situational, and contextual. This may be a spurious ontology, but it does create a layering of information that seems to correspond to real world layering, although this needs to be more researched. In this division, the personal information set contains such items that are unique to the user, but not to the service used. This contains such basic things as name and address, birth date, shoe size, credit card number, etc. In a totally context-dependent system, this layer could also contain information such as the culture and proficiency of the user; eyesight, preferred presentation mode, etc. Apart from the privacy aspects that would arise from this (more about which below), it would also become necessary to have some kind of preference expression mechanism (e.g. my preferred language is Swedish but I am fluent in English and can speak and read German). This could be inherent in the profiles, or a metaprofile could be used to express this using processing rules. Again, this is an interesting research topic. The situational profile contains such information which changes through the users actions, but independently of the interaction with information. This would imply items such as the location, the surrounding temperature (which in some locations can change quite drastically by moving just a few meters - in Singapore, for instance, the outdoor temperature is frequently 30 degrees centigrade, while the indoor temperature is 18 degrees). The situational profile also contains the device characteristics, since these are determined by the users actions, although not by his interaction with the information set. The easiest way to understand this is to consider the personalization the user can conduct using the "change profile" menu selection in his mobile phone, switching off ring tones and setting it in vibration mode. This could not take place without a user action. The same thing is true for the user opening the hatch of the r380 mobile organizer, turning it from a mobile phone to a PDA, for instance. In parenthesis, this points to a separate problem space surrounding the profile complex: The expression has to be globally applicable. This implies the use of data ranges and types which are applicable in any possible terrestrial location (it may be imprudent to exclude space applications, but the system may become too complex if this is done - and designed properly, it would handle the coordinate systems, temperature ranges, and barometric pressures - just to take a few examples - of Martian as well as terrestrial values and ranges). However, global applicability also implies taking into account the possible languages being used. There have been several groups, which have been attempting to do this, at least for specialized aspects, e.g. the Unicode consortium for writing systems, and the W3C I18N working group for markup. In the area of location, there are also several research questions. For instance, what does, in a users view, constitute a location in relation to certain information searches? And how does the conceptual location relate to the real location (when I am in an airplane over the pacific, for instance, I may conceptually be in the same location for twelve hours, even though my coordinates change rapidly). How to determine the coordinates of the user is fairly clear, and indeed standardized (in the LIF AI, for instance), but it is not clear how the user experiences the location(s) these coordinates expresses, and this may indeed be hugely variable. The situational parameters may also include other information, such as the current network latency, throughput, and transmission speed. In a constrained sense, the delivery context of a current request (or rather, session) would be equivalent to the situational parameters I just described. The contextual information, meanwhile, would be that which is generated (consciously or not) through the users interaction with a data set. The example that immediately comes to mind is the browsing history of a user in his web browser, but notably, the bookmark file also expresses such contextual information set. How this information is gathered and applied is another research question (as is which information is gathered and how it is applied to what by whom). There is a further constraint, in that the users privacy should not be violated. This is an entirely different complex of topics, and very complex in itself. There has been considerable research into it, as well as some efforts of standardization (most notably, the P3P work in the W3C), which has also been utilized in some efforts to handle profile management (e.g. the CC/PP work in the W3C). There is also another problem: That the user is not the only stakeholder. In the above discussion, I have not discussed the fact that there may be at least two other stakeholders in the system: The author of the information set and the service provider. In case several profiles exist which use similar element names, or even identical elements (the same classes) - something which will greatly facilitate the matching of profiles to determine the optimal presentation of an optimized information set-a way is needed to distinguish elements, and the same elements in different profiles. If a data management framework, such as XML, is used, these properties will come automatically. This is just one advantage of using such a framework; another is the automation of consistency management etc. that it brings (as a matter of fact, it is doubtful that there are any reasonable alternatives to XML, especially given that this is the format selected for MPEG-7). The author of an information set has a moral right to see it presented according to his preferences, according to the copyright laws according to which all information is applied - exempting any additional constraints that may be imposed by an agreement between the user and the copyright holder, implicit or not (for instance, the copyright holder may have made the material available for free under the implicit condition that the user watch the advertisements; or he may have made it available under the condition that the presentation can not be changed; or that it can only be viewed after payment). That there are no mechanisms to express some of these constraints is immaterial, and the fact that there are widespread perceptions of what the actual conditions are is totally beside point - although understanding these and how they have arisen is an interesting research topic for an anthropologist (for instance, why, after more than 200 years of copyright laws, do users not understand the concept of copyright? And more importantly, not authors?) The important point is that as author of a document (and, in some legislations but not others, a database) you have a right to determine how it is viewed. Expressing these constraints could conceivably be done using the same mechanisms as those which declare the users preferences for how a document should be viewed, and given suitable processing rules, an entirely new set of ways to express rights management could be constructed. The point here, however, is that the user is not the only stakeholder in the personalization and presentation of a dataset. The author is a second stakeholder due to his moral rights, although the management of these is frequently outsourced to the service provider. The service provider is the third generic stakeholder, but there may be several service providers in a system (for instance, content providers as well as network connectivity providers), and they will have different views of which information is required, and for what purpose. For instance, the service provider (especially in the case of network providers) seem to be very interested in receiving payment for their services (something the user may not be interested in, were he not contractually bound to provide it). The service provider also has a different set of requirements, for instance, service providers need to know which network cell the user is in, which is his home operator, etc. (This may be expressed using the same parameters as other information, for instance in the case of the location, which the user and the service provider both can have expressed as the cell ID, although the process of presenting that information is then completely different). From a privacy point of view, it is questionable if the service provider has any moral right to establish a profile of the user, however. Apart from a unique identifier and information needed for both parties to fulfill their contractual obligations, there is no need for the service provider to know anything about the user. One view, of course, is that the service provider is maintaining the profile in trust for the content provider, provided the service provider is providing the service as provided by the content provider. However, this gives rise to the important and interesting question who owns the profile information, whether provided by the user or generated, and whether the current situation on the web, where practice seems to be that the service provider owns all rights to any profile information existent in his system, will apply. Another question is how the trading of profiles - the coincidental use of which could be illegal in some jurisdictions - can be regulated, governed and controlled (this is, potentially, a much more important area for the application of digital rights management than the control of who listens to songs more than once). The 3GPP is in the process of creating a user profile for its environment, which should present an important insight into the service providers view of the information requirements for the 3G system. These will, of course, have to be analyzed to understand whether they apply to other (future, 4G) systems as well. I have only cursorily touched upon the temporal inconsistency of the data. The rate of change can be from several times per minute (the users position), to once or twice during the users lifetime (the name of an individual). It is actually possible to generalize a traditional web browser into the ideal system described above. First, it works on an object set which is marked up as a database instance (although the granularity, t o be sure, is minimal); and it uses a simple user profile (the browser settings) to arrive at a personalized presentation (the way information is shown on the screen). That hardly anybody ever bothers to change the default settings is immaterial to this discussion. Apart from the constraints mentioned above, we come to another set of constraints: the fact that considerable standardization has already been done in the area of some stakeholder profiles. If a contextual information system is to be built, it of course becomes important that these profiles are aligned with an architecture in which they can be matched with other profiles. This is a structural problem of the profiles; a semantic problem is the construction of their ontologies, i.e. which terms they use to describe the information in the profile, and how those terms are formally described. I have briefly touched upon some of the problems in creating contextual services. While large parts of the problem space have been solved, and even standardized, there are other aspects - partly arising from this - which need to be researched and solved. We have an interesting time ahead.