SWAD-Europe Deliverable 12.1.1:
Semantic web applications - analysis and selection
- Project name:
- Semantic Web Advanced Development for Europe (SWAD-Europe)
- Project Number:
- IST-2001-34732
- Workpackage name:
- 12.1 Open Demonstrators
- Workpackage description:
- http://www.w3.org/2001/sw/Europe/plan/workpackages/live/esw-wp-12.1.html
- Deliverable title:
- Semantic web applications - analysis and selection
- URI:
-
http://www.w3.org/2001/sw/Europe/reports/chosen_demos_rationale_report/hp-applications-selection.html
- Authors:
- Dave Reynolds , Steve
Cayzer, Ian Dickinson, HP Laboratories, Bristol, UK
Paul Shabajee, Graduate School of Education and ILRT, Bristol,
UK
- Abstract:
- This report concerns the selection of two open demonstrator
applications designed to both illustrated the nature of the
semantic web and to explore issues involved in developing
substantial semantic web applications given the current state of
the art.
We start with a discussion of the nature of the semantic web and in
particular what the key aspects of it are that should be brought
out by the demonstrators. Then, after a brief summary of the roles
that the demonstrators play within the overall SWAD-E project, we
look at existing and proposed semantic web applications.
Finally we describe our two chosen demonstrators - semantic
blogging for bibliographies and semantic community portals.
- Status:
-
First release.
Comments on this document are welcome and should be sent to Dave Reynolds or to the public-esw@w3.org list. An
archive of this list is available at http://lists.w3.org/Archives/Public/public-esw/
Contents
1 Introduction
2 The nature of the semantic web
3 Role of applications within SWAD-E
4 Criteria for selection
5 An overview of the application
space
6 Demonstrator 1: semantic blogging and
bibliographies
7 Demonstrator 2: semantic community
portals
A References
B Application survey
C Blogging and semantic
blogging
D Changes
This report is part of SWAD-Europe Work package
12.1: Open demonstrators. This workpackage covers the selection
and development of two demonstration applications designed to both
illustrate the nature of the semantic web and to explore issues
involved in developing substantial semantic web applications.
The aim of this report is to select the two specific
demonstrators to be developed and provide the rationale behind that
choice. We have also tried to put this choice into context. In
particular, we offer a picture of what the key features of the
semantic web are that the demonstrators should illustrate, together
with a survey of many known or proposed semantic web
applications.
This report is not intended to specify the detailed
functionality or architecture of the chosen applications. Separate
deliverables are scheduled to cover these.
We start with a discussion of the nature of the semantic web and
in particular what the key aspects of it are that should be brought
out by the demonstrators. Then, after a brief summary of the roles
that the demonstrators play within the overall SWAD-E project, we
look at existing and proposed semantic web applications. We have
grouped the applications we are aware of into different categories
and in the body of the report we just offer an overview of these
categories - details on the specific applications surveyed is
included in Appendix B. We hope to also
make this survey available in RDF format.
Finally we describe our two chosen demonstrators and the reasons
for selecting them.
Overview
Much has been written about the nature of the semantic web [SEMWEB] [SCIAM] and at first
glance the notion is fairly straightforward. The existing world
wide web allows anyone to publish human readable web pages that can
be connected via hyperlinks. The combination of a common format for
marking up such web pages, common access protocols to allow client
applications (browsers) to access and view the data and universal
hyperlinking, has transformed the way we publish and access
information. A simple description of the semantic web is that it is
an attempt to do for machine processable data what the world wide
web did for human readable documents. Namely to transform
information processing by providing a common way that data can be
accessed, linked together and understood. To turn the web from a
large hyperlinked book into a large interlinked database.
There are several motivations for this.
First there is the view that simply making data available is an
end in and of itself and will lead to benefits and applications.
Just as the world wide web gave everyone access to information that
previously would have been locked away in local file systems or
local networks, the semantic web can unlock access to data
currently hidden away in databases, freeing that data to be
accessed by applications and tools across the globe. In particular
the ability to link data from different data sources together
allows us to explore data in new ways and discover new
relationships and correlations.
Secondly, there is the appeal of automated processing of this
information. As long as the data we share across the web is, as
now, primarily in natural language there are limits to the ways
that our software systems can add value to this information because
robust natural language processing is beyond the current state of
the art. Current text processing techniques are sufficient to allow
us to index and retrieve such documents but not to perform any
meaning processing on their contents. We can't, for example,
robustly extract the details (artists, locations, dates) from an
article on upcoming concerts to check them against our diaries. We
can't, other than by using fragile web scraping techniques,
aggregate price information and ratings from different product
sites to assist with purchases. Making such information available
in machine interpretable form means that we can build applications
which actively process this information - collect it, analyze it,
filter it, correlate it, link it and apply it to the task at
hand.
Thirdly, there is the sense in which the semantic web is very
much an extension to the current web. The common representation for
data allows us to attach semantic information, metadata, to the
human readable web - allowing people and machines to work in closer
cooperation. This enables applications such as semantic search -
search engines that "understand" the difference between computer
chips and potato chips, and can therefore present a user with
semantically relevant results rather than just syntactic
matches.
Technology layers
To achieve this vision the semantic web is built as a series of
layers. These are not layers in a strict software architecture
sense but levels of functionality. At the base level the semantic
web builds on the rest of the web infrastructure - HTTP transport,
URIs for naming and location, XML as a common syntactic format. On
top of this existing infrastructure the semantic web adds two key
pieces:
- A common representation for semi-structured data and metadata,
RDF [RDF].
- A common representation for ontologies that enable the terms
used by the data layer to be defined and related to each other [RDFS] [DAML] [OWL].
The roadmap for the semantic web also offers a vision of future
layers [SEMWEB
LAYERS] to support richer knowledge representation and
to provide a trust infrastructure so that the results inferred from
semantic web data can be traced back to the assumptions that lead
to them. However, our aim in these demonstrators is to illustrate
the semantic web as defined so far - other work packages will
explore these other layers, especially the important trust
issues.
Features
There are three core aspects to the semantic web which we feel
are critical to capture and illustrate in the open
demonstrations.
- Data representation
- The foundation of the semantic web is a common format, RDF, to
represent data. This format is designed to be suited to
representing semi-structured data and metadata. Data is broken down
into conjunctions of individual assertions in the form of
subject/predicate/object triples. Each of the components (other
than simple literals) is a web URI and thus has a defined place in
the global namespace. This allows many sorts of data (property
values of objects, relationships between objects, value
annotations) to be represented uniformly and allows data from
multiple locations to be combined without accidental clashing of
property names or structure mismatches.
- Semantics
- The aspiration of the semantic web is to be able to express
meaning. It is the second layer of the semantic web - the
schema and ontology layer - that begins to do this. It enables the
properties and types used in the data layer to be related to each
other. To say, for example, whether two terms are distinct, or
equivalent or whether one term is a subset of another. This
capability allows a data source to expose its conceptual model
explicitly in machine processable form thus allowing a software
agent accessing it to make decisions on how the data can be
processed and what the semantic relationship is between data from
different sources.
Ontologies do not provide an absolute way of conveying semantics.
They allow classes and properties be related to other known classes
and properties thus allowing the meaning of new terms to derived
from combinations of other known and "understood" terms. However,
the semantic web does not require some standardized global upper
ontology to function. Like the web, it remains decentralized so
that data sources are free to mix and match terms from different
ontologies - so long as two entities share a common ontology they
can communicate.
- Webness
- There is nothing new about either semi-structured data
representation or explicit representation of conceptual models
through ontologies. The critical innovation of the semantic web
does is put both of these concepts into a web framework. This is
manifested in deceptively simple ways such as the use of URIs to
provide a global namespace for both entities and concepts
(properties, types). However, the impact is substantial. An agent
accessing a data source now has a means for discovering the
ontology associated with that data source. Ontologies can be
developed in a decentralized way to suit particular needs, but the
terms defined in different ontologies can be related and combined
to enable transformation of data from one domain to another.
Each of these features is important but it is the combination of
all three that forms the fundamental nature of the semantic web and
our demonstrators should ideally illustrate that combination. This
is a critical point to emphasize - there is a difference between
applications which happen to use parts of the semantic web stack
and applications which serve to demonstrate the vision of the
semantic web itself. For example the Mozilla browser [Mozilla] uses RDF internally to represent the
structure of mail messages, web links and so forth. This is a great
use of RDF but it lacks the use of deeper semantics or the webness
to be an illustration of the full semantic web vision. Similarly,
there have many applications of ontology technology (and indeed
richer knowledge representations) over the years which do not
themselves illustrate their role in connecting data representations
across the web.
Before we look at example applications we should clarify what
the role of the application work is within the SWAD-E project. We
see two different classes of role - communication and
investigation.
Communication
A key role for the demonstrators is to illustrate the nature and
value of the semantic web, to enable both users and developers to
understand the potential benefits. It is primarily to meet this
requirement that we prefer that our applications attempt to
illustrate the semantic web concept as a whole and avoid
concentrating too strongly on, for example, just the ontology
aspects.
As well as communicating value and potential, the demonstrations
should also communicate practicality and feasibility. A core aim of
the SWAD-E project is to ensure enough of the tools and
understanding are in place to allow practical development of
serious semantic applications. The demonstrators should show that
existing tools and techniques are indeed sufficiently mature to
support such applications or give specific guidance on any current
limitations.
Investigation and analysis
The second role of the demonstrators is to test the current
capabilities and limitation of existing semantic web standards and
toolkits. By pushing the technical boundaries, the demonstrator
work should generate advice for current developers on practical
limitations, important feedback to toolkit developers on key
requirements for future tools and guidance for future standards
development.
The semantic web is an ambitious vision and fully realizing it
will require substantial research - not simply engineering
development. While the aim of the demonstrators is not to tackle
such research issues head on, they do have an important role in
probing the boundaries of these hard problems to determine which
issues can be worked around and which are the critical ones that
should be targeted by future needs-driven research investments.
In particular, in exploring potential applications we found that
the issues raised by the semantic web vision of many decentralized
data sources with separate and evolving ontologies seem
particularly important. Such issues include:
- encouraging ontology convergence rather than explosive
divergence, perhaps by facilitating ontology reuse
- efficient data source combination (query routing aspects of
distributed query, ontology transformations needed to query
divergent sources) over high latency networks
- coping with conflicts and inconsistencies in both the data and
the merged ontologies - the semantic web will include many
different and divergent conceptualizations of the world, it should
be possible to discover and reason with these disagreements
As well as investigating the technical issues of semantic web
applications the demonstrators should also give some insights into
the social and economic issues of uptake. The semantic web is, by
definition, a network effect technology [NETWORK EFFECT].
Its value depends on its deployment and vice versa. Such issues
include the question of how to seed community ontologies (too top
down and they are rigid and slow to develop, too bottom up and you
have too much divergence for the network effect to kick in) and how
to encourage adoption of common access and linking approaches in
the absence of processing standards.
There is clearly a conflict between these two requirements. To
meet the needs of communication, publicity and advice to current
developers then the demonstrators should be modest, low risk
affairs picked for their ease of comprehension. To begin to probe
the research boundaries and give useful feedback on the limitations
of current technologies and guidance for future investment the
demonstrators should be ambitious and deliberately touch on some of
the research issues noted above.
We address this conflict in two ways. Firstly, by choosing one
demonstrator which we believe to be in the low risk, easy uptake
category and one which is more risky in involving more serious
issues of multiple ontologies. Secondly, in both cases we chose a
broad area with a modest initial core so that each demonstrator
should be expandable in many directions to explore a variety of
research and engineering issues when appropriate.
There is also a risk that our emphasis on practical
illustrations of feasibility will lead to demonstrators that fall
short of the hyped expectations that are being generated about the
semantic web. We regard this as an entirely acceptable risk -
communication not evangelism is our aim.
Given this view of the key features of the semantic web and the
role of the SWAD-E open demonstrators we can then summarize our
selection criteria as follows:
- illustrate the overall semantic web vision, good mix of
semi-structured data, webness and deeper semantics;
- a good chance of impact and uptake across a large enough
community to become a live application and not simply an artificial
demonstration;
- the semantic web aspects of the demonstrators should be visible
and apparent to the end users and not hidden behind the
scenes;
- a balance, across the pair, of low risk, low cost of entry and
higher risk exploration of the technical boundaries;
- basic feasibility, a meaningful core application must be
constructable within the 9 man month bounds, this feasibility
includes aspects of availability of data as well as technical
feasibility and also suggests that the bulk of the work should be
on the semantic web core and not for example on user interface
issues;
- where the application requires significant community support
(in terms of user community or markup of data) we must have reason
to believe that the community can be motivated to collaborate on
the project;
- the application and domain area should match the interests and
expertise of the builders and of their parent organization.
To get a deeper understanding the the space of possible semantic
web applications we conducted an initial survey of applications
including current known or completed projects and prior suggestions
and proposals. We don't claim that this survey is comprehensive but
we were able to find enough examples (approximately 60) to give a
reasonable picture. Summary information on these applications
(brief descriptions, links, status) have been capture in RDF format
and a subset of this information (translated to HTML) is included
in Appendix B.
We then conducted an initial informal clustering exercise which
led us to identify some 11 categories of types of application. A
description and discussion of each category is included below. This
categorization is imperfect - the categories overlap somewhat,
applications appear in multiple categories, and some category pairs
could be merged without a great loss information. However, this
level of structure has been very useful to us. It gives enough
detail to provide a good overview of the different ways in which
the community believes the semantic web can be applied; while at
the same time it is rather more succinct than the raw application
data and helps one to see the wood for the trees.
In addition to classifying the applications into these type
categories we have also looked at other dimensions of
classification, in particular in terms of the domain to which the
applications are applied. This is valuable in understanding the
breadth of current semantic web explorations though is not a
primary criterion for us in choosing a demonstrator - we are fairly
agnostic about the information domain itself.
One interesting alternative classification dimension that could
be explored further in future analysis is that of information
lifecycle phase. Some of our application categories emphasize the
use of metadata for management of information in the creation and
storage phases of the lifecycle, others during the discovery and
selection phases, or others still the application and delivery
phases. Traditionally the metadata used in the early lifecycle
phases is hidden away in the internals of the data format or the
particular content management system used. The semantic web
approach to making such metadata externally visible might enable
this data to be reused in later phases of the lifecycle. This "end
to end" use of semantic metadata could be a powerful motivator for
the semantic web and further work in developing an information
lifecycle model that explores this aspect could well be
fruitful.
Semantic web application categories
- 1.
Data integration
- In this class of applications we use the semantic web as a way
of exporting data from multiple datasources to allow integration
and cross-source queries. Several categories of applications
involve some data integration but the essence of this class is that
the data itself is seen as having substantial value and simply
"freeing" the data and providing cross database query is a value in
its own right. Thus the user of the application may be simply and
explicitly issuing queries to the merged data sources or viewing
the information.
- Examples
-
- Discussion
- This class of applications primarily illustrates the common
data-format aspects of the semantic web. Clearly there is an
element of distribution but many near term practical applications
will be intranet scale and applied to carefully selected clusters
of databases - the network effect of webness is only partially
present. The depth of semantics and ontology support can be quite
significant here. The data sources will have typically been
designed with a specific narrow set of queries in mind and
integrating the different data schemas to support cross-source
query may require nontrivial concept translation.
- 2. Data-dependent agents
- This cluster of applications is one where some software entity
(which we shall, informally, call an agent) is providing a service
to a user that is only possible if a rich and inhomogeneous set of
data can be integrated. In terms of technical work and challenges
this is virtually identical to the data integration cluster but in
this case it is the operations carried out by the agent that
defines the end-user value - the data integration itself is but a
means to that end.
- Examples
-
- Discussion
- Whilst the semantic web issues here are very similar to those
in the pure data integration category, the data sources tend to be
less homogenous and more widespread so that this category has a
greater "webness" score. For example, the shopping assistant has to
not only aggregate price and offer information but also ratings and
evaluations from prior customers and has to translate between the
customer's specification of the desired article or service and the
different descriptions used in the data sources. This is certainly
an important class of applications for the semantic web - witness
its prominent role in [SCIAM]. It is also one of the easiest to
communicate - the shopping assistant scenario is probably one of
the most compelling we have looked at.
However, from the point of view of a demonstration of value it
suffers from a high cost (you have to build an effective software
agent as well as succeed in the data integration challenge) and
that the value delivered from the semantic web aspects of the work
is indirect; it enables "cool" stuff but is neither itself "cool"
nor visible.
In practical terms this category of application is likely to be
slow to take off in the semantic web. The value to the data
providers in making their data available is often not clear and the
agents cannot deliver their value until a sufficiently
comprehensive set of data sources is available. This can be
partially overcome by using screen scraping techniques to
artificially make data available, e.g. in the Isoco personal
financial aggregator [GetSee].
Once enough such applications are available and the economic model
becomes clearer then network effect should make this a high value
area for the semantic web in the long run.
- 3. Knowledge management
- Knowledge management is a well defined field of research and
technology [KnowledgeManagement]
which comprises several different classes of application - from
community formation, through collaboration support to enterprise
knowledge preservation. It would be thus be possible to subdivide
this category further into these component subfields.
The essence of the term knowledge as used in the knowledge
management field is applied information. Simply storing or
organizing information is not sufficient to turn it into knowledge,
knowledge in this context is taken to be the ability to harness
that information to solve actual problems. Unless people are able
to apply the information to the task at hand the knowledge is
useless.
The common factor in these examples is that the collective
knowledge of some community is expressed in some information form
such as a set of documents (case studies, past problem reports,
notes on a bulletin board) and semantic web techniques can be used
to classify and structure the document set to allow it to be
matched against a problem. Ontologies provide the key tool for this
classification.
- Examples
-
- Discussion
- This is an very important application area of substantial
commercial importance which is certainly the target of many
semantic web related projects. It primarily exploits the ontology
management aspects of the semantic web stack. The documents
themselves will often not be particularly structured and there may
be no machine processable information beyond the document
classifications. It is also an area that is typically applied to a
specific community, often within a single discipline and within a
single organization - and as such may neither benefit from nor
illustrate the web nature of the semantic web. However there will
be applications of this class which transcend that generalization
and so could illustrate data integration and the global "webness"
of the semantic web to an adequate degree.
- 4.
Semantic indexing and semantic portals
- The web is already replete with examples of document
integration, providing organized access to large collections of
information in the form of web links. These include topic-specific
portals, generic structured directories like Yahoo! or DMOZ and information retrieval
directories like Google. The
semantic web offers the possibility that such portal services could
be based on deeper categorizations of links exploiting rich
ontologies. In particular, the categorizations, topic tags and
other annotations associated with the indexed resources may be
drawn from many locations and communities and integrated by the
portal, rather than, as at present, being entirely synthesized by
the portal. Further, if the indexing is based on some deeper
underlying semantics, many different structured views could be
synthesized to map onto the same resources via the same set of
semantic tags. Curriculum Online [CurriculumOnline]
is an example of this where educational resources are tagged
according a 2,000 term topic ontology, which is then mapped onto
the curriculum structure. This allows the tagging to remain valid
despite changes in the curriculum.
One interesting variation on this theme is where the
semantic-driven lookup is applied in parallel with standard web
operations like searches (e.g.TAP)
or link following (e.g. context
aware links). Here there is the added challenge of mapping a
user's unstructured query onto a structured semantic space and
using that additional semantic information to both disambiguate and
enrich the user's query.
- Examples
-
- Discussion
- This group of semantic web applications is particularly strong
at showing the connections between the semantic web and the
human-oriented world wide web.
If we are dealing with a single portal with a single modest
underlying ontology then the research challenges are small and
there already several examples of such systems. However, despite
the connections to the existing web, this class of applications may
not fully illustrate the webness of the semantic web because often
the ontologies and the original data remain hidden behind the
scenes. There are exceptions to this where these structures are
deliberately exported and exposed (e.g. TAP) or where data and
ontologies from multiple sources or multiple communities needs to
be combined (e.g. community Arkive, distributed topic portals). The
latter cases, where data spanning multiple ontologies needs to be
combined, rapidly pushes such applications over into the higher
research content (and thus higher risk) zone.
- 5. Personal information
management
- This category is concerned with applying semantic web
techniques to help individual users manage their own information.
Several of these have a strong community aspects of sharing both
information and categorization schemes, but they are all
person-centric - they are managed by the user primarily for their
own benefit. In contrast the examples in knowledge management
category are typically managed by some specific organization such
as an employer on behalf of the organization's collective
good.
- Examples
-
- Discussion
- The applications in this category are primarily exploiting the
semi-structured data representation layer of the semantic web. By
translating the many different data objects an individual has to
manage (events, appointments, phone lists, mail lists, mail
headers, filing categories, action lists) to a common RDF format
then it is easier to build highly reusable and extensible tools and
to link data across multiple formats. As in the last category, an
issue with these applications from the point of view of our
demonstrator goals is that the transformed data may remain hidden
inside the application and the benefits of the approach may not be
visible to anyone other than application developers (see Mozilla
for example). These applications do involve some format translation
and hence some schema mapping element but are typically not
exploiting the semantics aspect of the semantic web in a
particularly deep way.
However, for applications where the sharing of information is a key
feature and where some rich categorization or ontological structure
is involved then this category can be an excellent source of
demonstrators. It has the strong attraction that such applications
can be immediately useful to a single individual or a small group
and can then grow in value as the network of users grows - there is
not the barrier of making substantial external data sets available
or artificially stimulating a substantial "ignition" community that
arises in some of the other categories.
- 6. Metadata for annotating and
enriching
- The foundation layer of the semantic web, RDF, was originally
designed as primarily a format for metadata. So many semantic web
applications are aimed at some aspects of metadata management that
we found it useful to distinguish between several subclasses of
metadata applications. In this category we see applications where
the metadata is intended to be directly visible to the end user; it
is there to annotate or enrich the data itself. Typically this is
used as a means for the viewer of a resource to also see the
comments, opinions and ratings from a larger community of
users.
- Examples
-
- Discussion
- This category is an excellent demonstration of the webness and
semi-structured data aspects of the semantic web. The common data
format allows many different annotations and metadata to be
attached to the same underlying object without barriers or
restrictions ("anyone can say anything about anything").
Those annotations can then be aggregated and organized for access,
opening up new channels of communication between users.
Typically the depth of the semantics is low. The annotations and
enrichment are often generated by humans for humans and are not
machine processable in any important way (except perhaps for
numerical rating schemes). One exception to this is where the
annotations include some classification of the annotated resources
- combining different classification schemes is a core semantic web
issue. However, such applications are typically in the overlap
between this category and the next one - metadata for discovery
& selection - and are discussed there.
- 7. Metadata for description,
discovery and selection
- In this category the metadata is primarily used to help a user
locate a resource, product or service to meet their needs. The
division between this and the metadata for enrichment
category is blurred but essentially the idea is that in this
category the metadata is primarily of value during some search task
and adds less value during any subsequent phases. This category is
also closely related to those of semantic indexing and
knowledge management.
- Examples
-
- Discussion
- It is possible to treat any of the text annotations created by
applications in the earlier - metadata for enrichment - category as
additional descriptive terms to search on. However, the essence of
this group of applications is that some more structured
representation of the descriptive properties is used -
classification into a defined taxonomy, property annotations using
a controlled vocabulary, numeric and symbolic descriptive
properties. This enables search tools to offer discovery,
comparison and selection functions which are semantic based and and
are thus more selective and less ambiguous than those based simply
on text retrieval techniques.
This is a core semantic web application area and nicely combines
the web-of-metadata features of the last category with clear
illustration of the value of the more explicit semantics.
Furthermore the user is directly accessing the metadata in
formulating searches and viewing results and so the semantic web
features are less hidden than in, say, the data-dependent
agents category.
The drawbacks to this category are firstly that it can be hard to
bootstrap - a sufficient fraction of the universe of resources
being searched need to be semantically annotated before the
discovery and selection tools become useful. Secondly, it is quite
well represented already by existing and current projects indeed
there is already an annotation application within the SWAD-E
workplan.
- 8. Metadata for media and
content
- An important use for metadata is as a tool for content or media
management. Here the value of the metadata is mostly seen during
the creation and archiving of the content - it many not be very
visible or valuable to the end users of that content.
- Examples
-
- Discussion
- Here the metadata is primarily used for management of the
annotated objects. It is similar to the personal information
management category in that it emphasizes the common metadata
format aspects but can lack rich semantics or webness - the
metadata helps the content producers and the archive maintainers
but is not visible to the end user as much as the earlier two
metadata categories.
This "RDF inside" category of applications is still extremely
important in the development of the semantic web - the use of RDF
for such technical and management metadata eases its use for richer
external metadata. However, it is not a direct illustration of the
whole semantic web vision.
- 9. Knowledge formation
- In application classes such as semantic indexing and
knowledge management we saw that resources were classified
and indexed as a means to an end such as improved search or better
problem solving. However, in the knowledge formation category this
classification and relationship knowledge is of primary value in
its own right. Those involved in these communities are consciously
creating new organizational structures which have value beyond the
resources themselves.
- Examples
-
- Discussion
- This is an intriguing category. It is putting the
representation of structured knowledge at the forefront and
highlights the semantic side of the semantic web and the
differences between that and the current web quite well. It is
appealing to think that by enabling communities to cross-link and
structure collective information this way, new insights will be
gained that would not have been apparent from simple text indexing.
Even without such speculative benefits this class of applications
is building the web-based semantic structures that other categories
of applications can then exploit - for example the TAP Knowledge
base is foundation for the TAP semantic search application.
A possible drawback to this category is those applications aimed
narrowly at knowledge formation rather than its application may
appear a little niche and may thus be less convincing concerning
the broader value of the semantic web. However, those broader
applications that fit in the overlap between this category and some
of the others are strong potential candidates for our
demonstrators.
- 10. Catalogue and Thesaurus
management
- Structured and controlled vocabularies of terms play an
important role in many applications ranging from digital libraries
(Thesauri) through to B2B market places (catalogues). The
management of these structures can be challenging, as the continued
evolution of world being described causes the creation of new terms
and the restructuring of existing branches. Further, the ontology
representation techniques used in the semantic web offer the
possibility of richer representation of such categorization schemes
than the simple hierarchical keyword trees often used. This
application category is focused on just this issue of management of
vocabularies as a distinct requirement from their creation (e.g.
knowledge formation) or application (e.g. semantic
indexing).
- Examples
-
- Discussion
- This is certainly an important and challenging application
area. There is significant commercial interest in tasks such as
product catalogue management and integration, and there are many
interesting research challenges in the semi-automated mapping of
such catalogues [Fensel2002].
As a demonstration of the overall semantic web vision, however,
such applications are are not ideal in that they focus almost
exclusively on the ontology representation issues to the exclusion
of the webness and common data representation aspects. This is also
an area already being explored with the SWAD-E program (work
package 8) so for us to build another demonstrator focusing
entirely on this issue seems unnecessary. However, this is such a
core problem that many of the potential demonstrators will have
some aspect of vocabulary management.
- 11.
Syndication
- In this final category lie applications where the semantic web
representations are used as a common format for broadcasting
metadata around some network of users. In this case we are not
necessarily indexing, classifying or annotating - merely
disseminating the information.
- Examples
-
- Discussion
- This is a challenging application area to analyze. On the one
hand the whole world of blogging [Appendix C] and
RSS is an immensely successful and interesting growth area for
the world wide web and is an excellent illustration of how a simple
common metadata format can support aggregation and filtering of
content streams in useful and effective ways. On the other hand,
for the bulk of current applications any centrally agreed
representation would have worked and indeed there is still violent
disagreement over whether the pure XML or the RDF based approaches
to RSS are most appropriate. It terms of developer evangelism the
use of RDF here has been of mixed success - the apparent additional
complexity of RDF has not yet been fully offset by real
exploitation of the additional power it brings.
Curiously some of the webness aspects of the semantic web do not
come across that clearly from this application. A single common
global schema is useful, but the power of the semantic web in
supporting the combination of different sorts of data is only
beginning to be explored.
Despite these reservations this area does offer an excellent
infrastructure and design approach for lightweight publishing and
dissemination of structured metadata. Building upon this but
extending it towards applications which involve richer and more
varied structured data - semantic blogging - is a prime
candidate for a demonstrator.
Given this discussion of semantic web application categories as
related to our selection criteria (outlined in Section 4) we can see that the most
relevant application categories are those of semantic indexing,
knowledge formation and to some extent personal information
management, knowledge management and syndication.
From this analysis we created a short list of 10
applications:
- bibliography workbench
- community arkive
- ideas workbench
- digital library metadata applied to DSpace
- Gene database ontologies
- virtual community support
- distributed product naming
- personal information and content management based on the
ePerson prototype
- semantic portals
- distributed topic portals
These were then filtered and combined to arrive at the final two
proposals.
Our first chosen demonstrator takes the semantic blogging ideas
touched on in the last category above and applies them to a
specific application domain: bibliography management.
The semantic blogging core of this demonstrator will develop a
generic framework that could be applied to many different tasks
where a user community is incrementally publishing structured and
semantically rich (categorized and cross-linked) information. It
could thus be extended to encompass other proposals on our short
list - such as the ideas workbench or the distributed topic
portals. This generality is a source of risk; unless a specific
domain is chosen there is not enough application feedback to enable
the team to focus on just core values and key technical
challenges.
The bibliography management domain has the attraction of being
very specific with much available data, both personal data and
network accessible resources such as [CiteSeer]. It helps to focus the semantic
blogging area down nicely. Whilst bibliography management is an
important task in the research community, it could be seen as a
niche application in the wider community. However, the same tools
and approaches will be as applicable to dissemination and
management of other content such as business documents or news
items. By starting with a specific, but widespread, task of
personal interest to the developers we aim to keep the work focused
and relevant. Generalizing the results to related areas will be
straightforward.
Semantic blogging
Web-logging, typically abbreviated to "blogging", is a very
successful paradigm for lightweight publishing which has grow
sharply in popularity over the last two years. The notion of
semantic blogging builds upon this success and clear network value
of blogging by adding additional semantic structure to items shared
over the blog channels. In this way we add significant value
allowing navigation and search along semantic rather than simply
chronological or serendipitous connections. We provide extra
background on the blogging phenomenon and its extension to semantic
blogging in Appendix C.
Blogging, as it stands, already offers many compelling values.
It provides a very low barrier to entry for personal web publishing
and yet these personal publications are automatically syndicated
and aggregated via centralized servers (e.g. blogger.com) allowing
a wide community to access the blogs. Blogs have a simple to
understand structure and yet links between blogs and items (so
called blog rolling) supports the decentralized construction
of a rich information network.
Semantic blogging exploits this same personal publishing,
syndication, aggregation and subscription model but applies it to
structured items with richer metadata data. The metadata would
include classification of the items into one or more topic
ontologies, semantic links between items ("supports", "refutes",
"extends" etc.) as well as less formal annotations and ratings.
There are several ways this more structured data could extend the
power of blogging:
- Discovery. At present is it not easy to discover either
a channel of interest (e.g. "I would like to find blog channels
about the semantic web") or a collection of specific items of
interest (e.g. "Are there any more blog entries describing this
application idea?").
- Cross-linking. Current blogs support a single link
between the channel record and the blogged item. By extending this
mechanism to support linking between items (using a property
hierarchy) we can create a network of topic interconnections that
supports more flexible navigation. These links can themselves form
part of the disseminated content - for example to represent the
structure or scholarly discourse (c.f. ClaiMaker/Scholonto).
- Flexible aggregation and selection. The current blog
subscription mechanisms are in some ways both too fine (being
bounded by the individual blogger's channel of posts) and too
coarse (e.g. I might like Ian's technology channel but am only
interested in the semantic web bits). The richer categorization and
structure of semantic blog channels would make it easier for users
to create virtual blog channels which aggregate across multiple
bloggers but select from that aggregate according to other criteria
such as topic (or community rating).
- Integration with other sources and applications. The
structured nature of semantic blog channels makes it possible to
develop automated blog robots that can process and enhance the
blogged items. For example, in the bibliography domain transducers
would enable import and export via existing bibliography schemas
like BibTex [BibTeX] and automatic linking to
large repositories such as CiteSeer.
Bibliography management
Management of citation databases is a recurring problem in
scientific and research domains. Many tools exist which support
good integration between a personal bibliography and word
processors, e.g. Endnote [EndNote] and
ProCite [ProCite]. However, the ability to
index and annotate the citations in these tools is often limited
with a lack of support for structured or controlled indexing
vocabularies.
A researcher's database of monographs they have read, together
with their annotations and categorization, is a valuable resource
not only to that researcher but potentially to others in the same
field. It may help others discover references they were not aware
of themselves and the commentary and evaluation associated with the
records can be an invaluable summary and guide. This collaborative
discovery and evaluation is currently limited due to the
inaccessibility of personal bibliographies and the weaknesses of
current bibliography standards when it comes to representing rich
community annotations.
The semantic web approach to representation of bibliography
entries, their annotations and their classifications could have
several benefits outlined below. This could be approached in
several ways - as a centralized citation repository, as a personal
information management tool or as a community sharing tool. It is
the latter option that interests us, both because of its intrinsic
value and because it more clearly indicates the semantic web values
than either of the other two schemes. Thus we propose to take the
semantic blogging approach sketched above and apply it initially to
the management and dissemination of citations and associated
commentary. We see several benefits in this approach:
- Community categorization and shared ontologies. By
exploiting the semantic web ontology layer we are able to not only
represent rich topic hierarchies for classifying citations, but
also to link and share these topic sets across communities. Thus
different communities can use distinct classification schemes and
yet the data can be shared across the same infrastructure and
potentially the relationships between terms in the different
schemes can be explicitly represented to allow cross-community
search.
- Inter-document relationships. Traditional bibliography
schemes represent just properties of citations but not
relationships between them. As well as the direct citation
relationship ("this paper cites this set of prior papers")
represented by the Science Citation Index [ISI]
and CiteSeer [CiteSeer] there is value in
representing the implicit links between the referenced works and
the semantics of those links such as "provides evidence for",
"refutes", "supports". Some groups have already started to explore
this area both in general and more recently in a semantic web
context (ClaiMaker/Scholonto).
- Annotation. By indexing bibliography entries by URI (or
indirectly by property patterns) we enable rich annotation of the
entries from different sources to be integrated. For example,
comments and ratings from a peer community can be created
decentralized and aggregated. Further more there origin or
provenance of such annotations can be explicitly captured and
traced.
- Subscription. By using the semantic blogging approach
with its inherent time-based channel structure we make it possible
to subscribe to categories such as "what's going on in field x".
Very often the personal reading patterns and comments of a group of
colleagues with similar interests is an excellent guide to the
important papers in a field and enables one to keep up to date with
background topics without being overwhelmed. Informal email
networks are useful for this, but they tend to be closed and hard
to discover. Centrally maintained topic portals are easier to find,
but tend to be out of date and may not offer the best tradeoff
between completeness and selectivity. The semantic blogging
infrastructure, with its ability to discover and selectively
aggregate feeds, offers an excellent alternative, enabling
subscribers to plug into the recommendations and highlights from
peer readers.
- Citation of components and other media. Finally the use
of web infrastructure to designate and refer to the content items
means that the citation infrastructure can be extended to include
references to other web addressable objects such as media objects
and to sub-components of objects (though use of XPATH [XPATH] and similar
mechanisms).
Clearly not all of these features will be achievable within the
bounds of this project. However, a functional and interesting core
demonstrator is manageable and future extension of this core to
deliver some of the other features listed above could be the
subject of further open source community development. Defining the
precise boundaries of the initial core demonstrator will be the
subject of the next package of work and will be reported as part of
deliverable 12.1.2 - requirements analysis.
Given the inherent extensibility of our first choice of
demonstrator it is tempting to make the second demonstrator a
variant on the same theme. We felt, however, that this would lack
balance and that it was better to choose a reasonably distinct
second application to explore a different cut at the semantic web
development and research issues.
For our second application we have chosen the broad area of
semantic community portals.
Again we need to select a specific application domain to ground
this application and turn it into a feasible demonstrator. There is
a difficulty in doing so for this application in that we really
need an external user community that is in a position to provide
requirements, feedback on early prototypes and most importantly the
metadata content itself. Our initial choice is to develop an
external community portal for a subset of the Arkive [ARKive] media
repository. However, at this stage it is not certain that the
appropriate community links can be put in place. We may need to
switch our focus to another similar application in a different
domain - such as a related environmental biology topic like birds
or 'mini-beasts' as studied in the UK National Curriculum for
schools, or a more generic repository such as the DSpace digital
library [DSpace]. Our
proposal is to explore the practical issues of establishing a
suitable set of community links in parallel with the development of
demonstrator 1 so that these issues will have been resolved before
the scheduled start of demonstrator 2.
The notion of semantic portals was introduced earlier in Section 5. The idea is that a
collection of resources is indexed using a rich domain ontology (as
opposed to, say, a flat keyword list). A portal provides search and
navigation of the underlying resources by exploiting the structure
of this domain ontology. There may be an indirect mapping between
the navigation view provided by the access portal and the domain
semantics - the portal may be reorganized to suit different user
needs while the domain indexes remain stable and reusable. This
indirection is exploited, for example, in the Curriculum Online
project [Curriculum Online] in which the a 2,000 term
ontology of education concepts is used in the annotation of
educational resources whereas the access portal navigates these
annotated resources according the current UK national curriculum
requirements. The mapping from user search or navigation terms to
the domain ontology may itself be an inferred step - as in the TAP
semantic search demonstrator where free text search terms are
matched to property and class labels in the domain ontology to
support semantic augmentation of a conventional keyword search.
We used the qualifier community in the description of
this demonstrator for several reasons. Firstly, we are particularly
concerned with applications where some external community is
cooperating to develop the semantic indexing - both developing the
ontology itself and the categorization of the resources. Secondly,
we are looking at applications where in fact several communities
with different interests in the same underlying resource set need
different but overlapping categorizations. This combination enables
us to emphasize the web connectedness of the ontologies and indexed
resources and gives us an opportunity to explore the ontology
development, reuse and mapping issues raised by the semantic
web.
Our preferred starting point for this application is the Arkive
media repository. This is a long term archive of rich multimedia
about worldwide endangered species and a large cross-section of
non-endangered UK species. Prior work [Shabajee2002] has indicated that many
different user groups have interest in structured access to such
data. These include:
- School teachers, support staff, pupils and third party resource
developers - who need to relate the content to the UK national
curriculum;
- Higher Education (HE) and Further Education (FE) tutors and
students - for example, to illustrate lectures with appropriate
multimedia content depicting relevant animal behaviours;
- Researchers - whose interests may be specific to only some
species or phyla and may span behavioural, habitat and
environmental issues; and
- "Hobby" groups - for example bird watchers who are both
interested in accessing such multimedia resources and who have much
to offer in terms of current information on sightings and
distribution.
This is a domain where there is a rich taxonomy of species
information (though some scholarly disagreements remain) but a lack
of agreed ontologies to cover other aspects such as behaviour or
habitat. Further, different user communities have different depths
of interest in particular areas. There are also many external
portals and repositories relating to biological concepts and
descriptions to which the Arkive repository could be usefully
cross-linked.
All of this is substantially beyond the scope of the Arkive
project itself. The proposal for this demonstrator is explore the
approach of creating an external index and annotation store,
through which a subset of the different user communities can create
classifications, cross-links and annotations which reference the
same underlying repository. This multi-community enrichment of a
shared repository is a common usage pattern, which appears in other
areas such as academic digital libraries [DSpace] or museum and heritage portals [Museum portals].
The challenge of this demonstrator is to balance the desire to
begin exploring some of the technical issues involved in the
cross-community categorization and ontology development, with the
limitations of project resources and timescales. One way to reduce
the project scope in order to improve feasibility would be to look
at only a subset of the repository by species (for example focus
just on birds or 'mini-beasts'), by media (for example, concentrate
on still photographs and side step the complex problems of
annotating time-based media), and by user community. It is the task
of the initial requirements study phase to pick the precise
focusing subset and to build the links with potential annotators,
user groups and external ontology and data sources to make this
project feasible.
As noted above, we plan to manage the risk associated with this
by (a) beginning this requirements study phase earlier than
scheduled and do some work in parallel with development of
demonstrator 1, and (b) by retaining the option to switch to an
alternative application domain of the same category should the
community arkive specification prove intractable.
- [SEMWEB]
- W3C Semantic Web activity
http://www.w3.org/2001/sw/
- [SCIAM]
-
The Semantic Web, Scientific American, May 2001, Tim
Berners-Lee, James Hendler and Ora Lassila
- [RDF]
- Resource
Description Framework (RDF) Model and Syntax
Specification, O. Lassies and R. Swick, Editors. World
Wide Web Consortium. 22 February 1999. This version is
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222. The latest version of RDF
M&S is available at
http://www.w3.org/TR/REC-rdf-syntax.
- [RDFS]
- RDF
Vocabulary Description Language 1.0: RDF Schema, D.
Brickley, E.V. Guha, Editors, World Wide Web Consortium W3C Working
Draft, work in progress, 19 March 2002. This version of the RDF
Primer is http://www.w3.org/TR/2002/WD-rdf-schema-20020430/. The latest version of the RDF
Primer is at http://www.w3.org/TR/rdf-schema/.
- [OWL]
- Web Ontology working group - http://www.w3.org/2001/SW/WebOnt/.
- [DAML]
- DAML + OIL ontology language - http://www.daml.org/
- [SEMWEB
LAYERS]
- Semantic web architecture roadmap
http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html
- [Mozilla]
- Mozilla web browser.
http://www.mozilla.org/
- [NETWORK
EFFECT]
- Definition of the term network effect.
http://www.marketingterms.com/dictionary/network_effect/
- [KnowledgeManagement]
- Knowledge Management Tutorial: An Editorial Overview,
Antony Satyadas, Umesh Harigopal, Nathalie Cassaigne, IEEE Trans
Systems, Man and Cybernetics - part C, 31, #4, November 2001.
- [Fensel2002]
- Semantic
web application areas. Dieter Fensel, Christoph Bussler,
Ying Ding, Vera Kartseva, Michel Klein, Maksym Korotkiy, Borys
Omelayenko, and Ronny Siebes. In Proceedings of the 7th
International Workshop on Applications of Natural Language to
Information Systems (NLDB 2002), Stockholm, Sweden, June~27-28,
2002.
- [CiteSeer]
- CiteSeer,
Scientific Literature Data Library.
- [Shabajee2002]
- Adding value to large multimedia collections through
annotation technologies and tools: Serving communities of
interest, Shabajee, P., Miller, L. and Dingley, A. 2002, In
Museums and the Web 2002: Selected Papers from an International
Conference (Eds, Bearman, D. and Trant, J.) Archives & Museums
Informatics, Boston, USA. p101-111. Available:
http://www.archimuse.com/mw2002/papers/shabajee/shabajee.html
- [BibTeX]
- LaTeX: A Document Preparation System by Leslie Lamport,
1986, Addison-Wesley.
BibTeXing (
btxdoc.tex), by Oren Patashnik, February 1988, (BibTeX
distribution).
- [EndNote]
- EndNote Bibliography software tool.
http://www.endnote.com/
- [ProCite]
- ProCide Bibliography software tool.
http://www.procite.com/
- [ISI]
- ISI Web of Knowledge
http://isi2.isiknowledge.com/
- [XPATH]
- XML Path Language Version 1.0
http://www.w3.org/TR/xpath
- [ARKive]
- The ARKive project.
http://www.arkive.org.uk/
- [DSpace]
- The MIT DSpace digital repository.
http://web.mit.edu/dspace/
- [Winer]
- The History of Weblogs, Dave Winer
http://newhome.weblogs.com/historyOfWeblogs
- [Blood]
- Weblogs: a history and perspective, Rebecca Blood
http://www.rebeccablood.net/essays/weblog_history.html
- [Radio]
- Radio Userland
http://radio.userland.com/
- [MoveableType]
- MoveableType
http://www.moveabletype.org/
- [DC]
- Dublin Core Metadata Initiative
http://dublincore.org/
- RDF Site Summary 1.0 Modules: Dublin Core
http://web.resource.org/rss/1.0/modules/dc/
- RDF Site Summary 1.0 Modules: Syndication http://web.resource.org/rss/1.0/modules/syndication/
-
The appendix contains a summary of an RDF database of notes and
example applications and suggestions which was developed during the
course of this work. The survey is in no way comprehensive - there
are many applications that we have missed or not had time to create
a record for. Even for those that have been captured our short
descriptions may well not do justice to the depth of research and
development activities involved. Think of these as an example flags
in a map of semantic web applications, not as in depth reviews.
The appendix content has been moved to a separate document (hp-applications-survey.html)
in order to keep the size of the primary document manageable.
C Blogging and semantic
blogging
In this appendix we provide more back ground on the application
of Semantic Web technology to enhance the lightweight publishing
paradigm known as "web-logging", typically abbreviated to
"blogging". To ground the discussion, we concentrate on the use of
semantically-enhanced blogging in the context of the development
and sharing of bibliographic data by academics and researchers. We
start with a review of the operation of blogging today, and try to
identify some of the reasons why it has become popular. Then we
identify some key enhancements that could improve standard blogging
though the application of semantic web techniques, particularly the
context of bibliography sharing.
Blogging today: a review
The roots of blogging go back to 1997 [Winer][Blood]. Its
popularity, however, has risen sharply over the past two years.
This can be mostly attributed to the emergence of better tools for
bloggers (for example [Radio], [Movabletype]), and the network
effect, in which the success of a community-based activity
causes more participants to join in, further enhancing that
success.
These success factors suggest the two key values that explain
blogging. Firstly, a key driver for many is to address the high
cost of maintaining up-to-date web sites. In the typical scenario
for large web sites today, one or more dedicated individuals must
be responsible for developing and maintaining the web site, and
must possess a wide range of specialist skills from web application
architecture and database, through to graphical design. Absent
these skills and dedicated resources, web sites quickly become
out-of-date, bug-ridden, or in thrall to "under construction"
clip-art. If individuals with interesting stories to tell could be
freed from having to worry about web design, and simply focus on
content, such issues would - to a greater or lesser extent -
resolve themselves. The first key value for blogging, then, is to
provide a very low effort publishing medium, in which the
individual author can provide content via a simple web form, and a
back-end application would generate a polished, indexed view of
each of the author's writings. The standard organisation of these
contributions is as a series of diary or journal entries, indexed
by calendar date and time. This approach has been very successful,
and today a very large number of people generate such blogged
journals, ranging in quality from professional-standard journalism
to highly personal, subjective reflections.
The creation of the blogging tools has been facilitated by some
key standards for the format of blog entries. Firstly, RSS
(variously "rich site summary", "really simple syndication", "RDF
site summary" or other variants) provides a basic set of structural
metaphors. RSS structures blogs as series of items, where
such series is termed a channel. An item, minimally, has a
title, link, and body. A channel has a title and an ordered
sequence of items. The link, if present, is taken to be a pointer
to another piece of content that this item is commenting upon. Thus
blog entries may be created that comment on an item of news
(referring, perhaps, to the news report on Yahoo!), or, commonly is
a comment on a blog entry by some other person. In particular, RSS
defines an XML format for summarizing a user's blog entries. A
second key standard is the blogger API, which is discussed
further below.
The second success factor in blogging is the sharing of and
reuse of streams of such lightweight publications. Since the RSS
file summarises the current set of blog entries, it is a simple
matter to examine it, detect which entries are new or changed, and
highlight them to the user. Monitoring the changes in an RSS XML
file in this way is termed subscribing to that user's blog.
An aggregator is a desktop tool that provides a user with a
view of the new and changed items in all of the RSS channels to
which he or she is subscribed. By subscribing to a set of channels
that closely matches their interest, the user gains a highly
selective flow of information most likely to be of both relevance
and interest. As such channels get connected between members of the
blogging community, a rich network of quality information flows is
created.
There remains the problem of how users discover blog channels
they wish to subscribe to. There are four main mechanisms:
- The blogging tool that the user employs notifies one or more
central directories of the changes to the RSS XML file. While in
principle each such directory could have its own means of
submission, the blogger API has become the de facto standard
that such directories adhere to. The directories publish various
indexes of the blogs they know about, including the most recently
updated (over a window of, say, the preceding 60 minutes), the most
active blogs, or the blogs most subscribed-to by other users. In
addition, most directories are searchable by keyword (from the blog
title or description).
- On the published HTML containing the blog, many bloggers will
list the influential or entertaining blogs to which they subscribe.
A common pattern when reading one blogger's writings is to explore
the writing of one or more of the blogs they subscribe to. The
phenomenon of listing such subscriptions (and hence helping to
propagate interesting writing) is termed blog-rolling.
- A blog entry frequently refers to an entry by another blogger.
By following the link, the reader is taken to the source blog, to
which they can then subscribe.
- Typically, individual bloggers will reference their blog from
their personal home pages, email signatures, etc.
Given the metadata contained in the blog's RSS file, and the
blog-roll subscriptions, a range of tools for exploring the
meta-network of connections between bloggers have been developed,
and continue to evolve.
Upgrading blogging to semantic blogging
In this section, we outline some of the key changes that must be
added to standard blogging in order to achieve our vision of
semantic blogging.
- Currently, items have a very limited structure (title, link and
body). Semantically meaningful items will have much more structure,
and will be governed by one or more ontologies to give meaning to
the structure. The RSS 1.0 specification provides for the use of
modules to extend the content of the item, though currently
this is limited to Dublin Core metadata [DC][RSS-DC] and syndication metadata [RSS-Syndication]. Fortunately, RSS 1.0 (but
not all RSS variants) are standard RDF, so in principle it should
be possible to add arbitrary additional structure to items.
- Current blogs use channels to identify sub-streams of
information from one contributor. For example, an author may have a
blog which contains categories of information on Java programming,
semantic web technology, RDF tools, etc. Each of these categories
is available as a separate RSS channel. However, there is no formal
semantic relationship between these channels, and the channel
structure is relatively static and coarse-grained. A more flexible
approach would be to label each item with one or more ontological
categories, and use these to define implicitly the channels
available from that source.
- At present, links between items are limited to the single
'link' field in the item. There is no particular semantics
associated with this link. For semantic blogging, there is likely
to be a rich and extensible set of links between items. For
example, in the bibliography domain, there will be such linkages as
'cites', 'isCitedBy', 'extends', 'replacesVersion', etc.
- The aggregator tool, or the HTML presentation, will need to be
extended to present these additional semantic capabilities. In
addition, the capture tool must allow the user to enter such
information as is available. Neither should, as far as possible,
sacrifice the lightweight simplicity that provides one of the key
values of the blogging approach.
- The HTML presentation of the semantic blog will also need to be
extended to metaphors other than the reverse-chronlogical order of
the current calendar-based presentations.
- Given the additional semantic information present in the
network, there are probably better ways of discovering other blogs,
or channels, or even individual items. For example, allowing
distributed queries based on semantic terms, or persistent filters
that notified the user of new matching occurrences, would allow
much more effective discovery.
- The network infrastructure, which for blogging today is centred
around the human reader, could be extended to allow web-services or
autonomous agents to contribute additional source data or semantic
mark-up. Examples include the automatic querying of CiteSeer for
details on a publication, or using a shared reference ontology to
translate between the various source formats used by different
reference formatting tools.
Issues
There are some hard research problems that will need to be
addressed to some degree during the project.
- To commonly refer to a semantic item, it must have a consistent
URI. This is not available for many publications. This naming
problem occurs in other guises in Semantic Web projects, and is not
unique to this project. It is also not the key research issue,
though it will require some investigation. It may be that limiting
the source data in the initial versions of the applications to
those publications that have a well-known URL on CiteSeer or Amazon
may be sufficient. Alternatively the naming by property
pattern approach (as used in TAP or as implemented in the
query-by-example system of ePerson) might be
appropriate.
- Users will want to develop their own ontologies to markup the
content of their publications. However, to have maximum value to
the community, such taxonomies must be widely shared and reused The
problem of merging and maintaining such distributed ontologies is
again not unique to this project. It will require some solution,
else ontological balkanization will result. It is less clear that
an effective work-around exists.
- 29-10-2002
- Revised initial draft to fix about 100 typos, tweak sections 6
and 7, and add appendix C.
- 5-11-2002
- Added Appendix B, generated from RDF-based application
survey.