Profiles and Contexts: Definitions, Applications, and Problems

There has been a number of research projects geared towards creating a 
contextualized information services; in essence, services which adapt 
the information to be presented to a profile of the user. As we have 
been building some experimental services, however, we have come to 
realize that the is not simply one of creating the most suitable 
filters, resolution, or matching engines. It is not simply a problem of 
creating an appropriate data model. Nor is it an issue of defining the 
pertinent profile of the user, or creating an appropriate matcing 
algorithm. It is all of these, and they are interdependent.
First, I will define the problem space, as I perceive it. Then, I will 
discuss the various elements of the problem and the solution. I will 
then talk about the metaproblems surrounding them, and finally discuss 
what impact this will have for the current work.
The problem at hand is creating a presentation that is optimized along a 
set of parameters, which include the adaptation of content and 
presentation. The level of granularity of this adaptation is very high 
(at the level of lexemes or their graphically semantic equivalents), 
essentially producing an individualized presentation of the content at 
hand (I will try to avoid the word "document" in this document, since a 
document is a fixed instance of a content object set with an associated 
fixed presentation, at least according to this authors definition).
The high granularity of the adaptation assumes that the content is 
managed as a database, either by being marked up as one, by being 
managed in a DBMS, or through some equivalent means. The subject of the 
objects, as well as how their attributes are predicated, is immaterial 
to the current discussion. It can be a city map, a stock quotation 
ticker, a billing record, or any kind of information set we usually 
think about in terms of databases or document (since, marked up 
properly, a document becomes a database instance).
The adaptation of an object set of high granularity (e.g. an XML file 
marked up at the lexeme level, or a database containing individual 
lexemes) takes place by matching a description of the current context of 
the user through a set of matching rules with the highly granular 
content object set.
The processing rules can be expressed in a multitude of ways, from a 
DBMS internal language, to a formal rules language, to a programming 
language (high- or low level). Their function is to express the 
algorithm whereby the profile(s) is read and its (their) content(s) is 
used to process the object set into a presentation instance. There is 
currently no standardized way of expressing these algorithms, nor are 
the algorithms themselves standardized, and they are likely to be the 
subject of innovation - and hence, not suitable for standardization - 
for some time to come.
The actual matching can take place through the DBMS internal systems, 
through a matching engine based on a formal language (such as Prolog), 
through a piece of specialized software, or through a variety of other 
means. How is immaterial to this discussion, but from a modeling 
standpoint, expressing the matching rules in a formal logic language has 
some advantages. Since this area is one of the research topics I want to 
point out for the future, I do not want to go deeper into this subject 
now, however. As can be concluded, the management of the data and its 
transformation are rather better understood than the final aspect, the 
profiles.
The third component to this system is the profiles. It may seem like the 
only stakeholder in the system, and hence the only one requiring a 
profile is the user. However, even if this were true, there is no 
consensus what should go into a user profile. To completely describe a 
person would require an infinite data set, since in the absence of 
knowledge about what should go into the profile we are forced to put 
everything there - clearly, not very efficient. The fact is also that 
certain information is needed for certain adaptations but not for others 
- credit card information, for instance, for economic transactions but 
not for the adaptation of content to a location.
As a matter of fact, I hold that we can distinguish three levels of 
profiles: Personal, situational, and contextual. This may be a spurious 
ontology, but it does create a layering of information that seems to 
correspond to real world layering, although this needs to be more 
researched.
In this division, the personal information set contains such items that 
are unique to the user, but not to the service used. This contains such 
basic things as name and address, birth date, shoe size, credit card 
number, etc.
In a totally context-dependent system, this layer could also contain 
information such as the culture and proficiency of the user; eyesight, 
preferred presentation mode, etc. Apart from the privacy aspects that 
would arise from this (more about which below), it would also become 
necessary to have some kind of preference expression mechanism (e.g. my 
preferred language is Swedish but I am fluent in English and can speak 
and read German). This could be inherent in the profiles, or a 
metaprofile could be used to express this using processing rules. Again, 
this is an interesting research topic.
The situational profile contains such information which changes through 
the users actions, but independently of the interaction with 
information. This would imply items such as the location, the 
surrounding temperature (which in some locations can change quite 
drastically by moving just a few meters - in Singapore, for instance, 
the outdoor temperature is frequently 30 degrees centigrade, while the 
indoor temperature is 18 degrees).
The situational profile also contains the device characteristics, since 
these are determined by the users actions, although not by his 
interaction with the information set. The easiest way to understand this 
is to consider the personalization the user can conduct using the 
"change profile" menu selection in his mobile phone, switching off ring 
tones and setting it in vibration mode. This could not take place 
without a user action. The same thing is true for the user opening the 
hatch of the r380 mobile organizer, turning it from a mobile phone to a 
PDA, for instance.
In parenthesis, this points to a separate problem space surrounding the 
profile complex: The expression has to be globally applicable. This 
implies the use of data ranges and types which are applicable in any 
possible terrestrial location (it may be imprudent to exclude space 
applications, but the system may become too complex if this is done - 
and designed properly, it would handle the coordinate systems, 
temperature ranges, and barometric pressures - just to take a few 
examples - of Martian as well as terrestrial values and ranges). 
However, global applicability also implies taking into account the 
possible languages being used. There have been several groups, which 
have been attempting to do this, at least for specialized aspects, e.g. 
the Unicode consortium for writing systems, and the W3C I18N working 
group for markup.
In the area of location, there are also several research questions. For 
instance, what does, in a users view, constitute a location in relation 
to certain information searches? And how does the conceptual location 
relate to the real location (when I am in an airplane over the pacific, 
for instance, I may conceptually be in the same location for twelve 
hours, even though my coordinates change rapidly). How to determine the 
coordinates of the user is fairly clear, and indeed standardized (in the 
LIF AI, for instance), but it is not clear how the user experiences the 
location(s) these coordinates expresses, and this may indeed be hugely 
variable.
The situational parameters may also include other information, such as 
the current network latency, throughput, and transmission speed.
In a constrained sense, the delivery context of a current request (or 
rather, session) would be equivalent to the situational parameters I 
just described.
The contextual information, meanwhile, would be that which is generated 
(consciously or not) through the users interaction with a data set. The 
example that immediately comes to mind is the browsing history of a user 
in his web browser, but notably, the bookmark file also expresses such 
contextual information set. How this information is gathered and applied 
is another research question (as is which information is gathered and 
how it is applied to what by whom).
There is a further constraint, in that the users privacy should not be 
violated. This is an entirely different complex of topics, and very 
complex in itself. There has been considerable research into it, as well 
as some efforts of standardization (most notably, the P3P work in the 
W3C), which has also been utilized in some efforts to handle profile 
management (e.g. the CC/PP work in the W3C).
There is also another problem: That the user is not the only 
stakeholder. In the above discussion, I have not discussed the fact that 
there may be at least two other stakeholders in the system: The author 
of the information set and the service provider.
In case several profiles exist which use similar element names, or even 
identical elements (the same classes) - something which will greatly 
facilitate the matching of profiles to determine the optimal 
presentation of an optimized information set-a way is needed to 
distinguish elements, and the same elements in different profiles. If a 
data management framework, such as XML, is used, these properties will 
come automatically. This is just one advantage of using such a 
framework; another is the automation of consistency management etc. that 
it brings (as a matter of fact, it is doubtful that there are any 
reasonable alternatives to XML, especially given that this is the format 
selected for MPEG-7).
The author of an information set has a moral right to see it presented 
according to his preferences, according to the copyright laws according 
to which all information is applied - exempting any additional 
constraints that may be imposed by an agreement between the user and the 
copyright holder, implicit or not (for instance, the copyright holder 
may have made the material available for free under the implicit 
condition that the user watch the advertisements; or he may have made it 
available under the condition that the presentation can not be changed; 
or that it can only be viewed after payment). That there are no 
mechanisms to express some of these constraints is immaterial, and the 
fact that there are widespread perceptions of what the actual conditions 
are is totally beside point - although understanding these and how they 
have arisen is an interesting research topic for an anthropologist (for 
instance, why, after more than 200 years of copyright laws, do users not 
understand the concept of copyright? And more importantly, not authors?)
The important point is that as author of a document (and, in some 
legislations but not others, a database) you have a right to determine 
how it is viewed. Expressing these constraints could conceivably be done 
using the same mechanisms as those which declare the users preferences 
for how a document should be viewed, and given suitable processing 
rules, an entirely new set of ways to express rights management could be 
constructed.
The point here, however, is that the user is not the only stakeholder in 
the personalization and presentation of a dataset.  The author is a 
second stakeholder due to his moral rights, although the management of 
these is frequently outsourced to the service provider.
The service provider is the third generic stakeholder, but there may be 
several service providers in a system (for instance, content providers 
as well as network connectivity providers), and they will have different 
views of which information is required, and for what purpose. For 
instance, the service provider (especially in the case of network 
providers) seem to be very interested in receiving payment for their 
services (something the user may not be interested in, were he not 
contractually bound to provide it). The service provider also has a 
different set of requirements, for instance, service providers need to 
know which network cell the user is in, which is his home operator, etc. 
(This may be expressed using the same parameters as other information, 
for instance in the case of the location, which the user and the service 
provider both can have expressed as the cell ID, although the process of 
presenting that information is then completely different).
 From a privacy point of view, it is questionable if the service 
provider has any moral right to establish a profile of the user, 
however. Apart from a unique identifier and information needed for both 
parties to fulfill their contractual obligations, there is no need for 
the service provider to know anything about the user.
One view, of course, is that the service provider is maintaining the 
profile in trust for the content provider, provided the service provider 
is providing the service as provided by the content provider.  However, 
this gives rise to the important and interesting question who owns the 
profile information, whether provided by the user or generated, and 
whether the current situation on the web, where practice seems to be 
that the service provider owns all rights to any profile information 
existent in his system, will apply. Another question is how the trading 
of profiles - the coincidental use of which could be illegal in some 
jurisdictions - can be regulated, governed and controlled (this is, 
potentially, a much more important area for the application of digital 
rights management than the control of who listens to songs more than 
once).
The 3GPP is in the process of creating a user profile for its 
environment, which should present an important insight into the service 
providers view of the information requirements for the 3G system. These 
will, of course, have to be analyzed to understand whether they apply to 
other (future, 4G) systems as well.
I have only cursorily touched upon the temporal inconsistency of the 
data. The rate of change can be from several times per minute (the users 
position), to once or twice during the users lifetime (the name of an 
individual).
It is actually possible to generalize a traditional web browser into the 
ideal system described above. First, it works on an object set which is 
marked up as a database instance (although the granularity, t o be sure, 
is minimal); and it uses a simple user profile (the browser settings) to 
arrive at a personalized presentation (the way information is shown on 
the screen). That hardly anybody ever bothers to change the default 
settings is immaterial to this discussion.
Apart from the constraints mentioned above, we come to another set of 
constraints: the fact that considerable standardization has already been 
done in the area of some stakeholder profiles. If a contextual 
information system is to be built, it of course becomes important that 
these profiles are aligned with an architecture in which they can be 
matched with other profiles. This is a structural problem of the 
profiles; a semantic problem is the construction of their ontologies, 
i.e. which terms they use to describe the information in the profile, 
and how those terms are formally described.
I have briefly touched upon some of the problems in creating contextual 
services. While large parts of the problem space have been solved, and 
even standardized, there are other aspects - partly arising from this - 
which need to be researched and solved. We have an interesting time ahead.