Social Network Intelligence BenchMark

From W3C Wiki

Social Intelligence Benchmark (SIB) - Version 0.8

Peter Boncz, Minh-Duc Pham CWI

Orri Erling, Ivan Mikhailov, Yrjana Rankka OpenLink Software


Introduction

As RDF has recently become the most popular semantic web technology for modeling and integrating large collections of linked open data over the web, many RDF/SPARQL benchmarks have been proposed in order to evaluate the performance of RDF stores as well as their efficiency in processing SPARQL. However, current benchmarks are limited in representing the real RDF datasets and are mostly relational-like [Dua11], which hardly shows the advantages of RDF/SPARQL in comparing with relational models over real systems. Besides, even though an RDF dataset can be viewed as a large graph of RDF triples, to the best of our knowledge, there is no real graph benchmark available. Therefore, aiming at generating test areas where RDF/SPARQL can truly excel, and creating a benchmark for challenging query processing over real graphs, we proposed a novel RDF benchmark, namely the Social Intelligence BenchMark (SIB).

SIB takes the schema style from popular social networks such as FaceBook as the baseline for designing an RDF-friendly, scientific benchmark. Specifically, it simulates an RDF backend of a social network site, in which users and their interactions form an social graph of social activities such as writing posts, posting comments, creating/managing groups, etc.

In this benchmark, the dataset scales according to the number of users. The purely synthetic generated data is linked with the RDF datasets from DBpedia in order to exploit the vast amount of information in the DBpedia knowledge base.

The benchmark spec contains three query mixes: interactive, update, and analysis. These query mixes are expressed in terms of SPARQL 1.1 Working Draft, in order to use its advanced features, such as path expressions in RDF graphs.

Dataset Specification

The logical schema of this benchmark is similar to that of the BotNetBenchmark (BNBM), which can be considered the first version of SNIB, developed by Orri Erling, Ivan Mikhailov, and Yrjana Rankka ( Virtuoso team members at OpenLink Software). However, SIB contains richer information for each entity, and more features which are commonly used in current popular social networks (SNs). For simulating the relations in real SNs, the distribution of generated data on each relation conforms to the data distribution analyzed over real SNs. In addition, real association rules are included in order to convey the real data correlation into synthetic data.

User profile information is enriched by adding educational institution and company names. These names are retrieved from DBpedia and randomly generated for each user. Note that there may be a regional correlation between the location of a user and the locations of her institutions/companies. User profile also consists of the gender of the user, her birthday, her email address, and the source IP address when the user joined the social network. A user can also share her current status including "Single", "In a relationship", "Engaged", or "Married". For the last three statuses, there is another user account associated with them. A user specifies some of her hobbies or interests in her profile information that she likes. , e.g., "like music," "like Hello-Kitty." The selective interests may play an important roles in analyzing a user's characteristics. By now, the data generator only uses singers (e.g., Britney, Madonna,...) as the information for user's interests. Each singer is also associated with a real DBpedia entity (e.g., <http://dbpedia.org/resource/Britney>).

Users in the social graph are connected to each other by establishing user friendships. The initiator (i.e., a user) of a friendship may revoke friendship request with another user at any time. After receiving the friendship request, that user can deny or approve the friendship. In order to simulate a real situation, a user may re-approve the friendship even though she has denied it before. This often happens after a user checks and verifies the information of the initiator. When the friendship is approved, two users are connected by a symmetric pair of foaf:knows predicates.

A user uploads many photo albums, each of which contains many photos. For each photo, the user tags some people appearing in it. An uploaded photo also contains the EXIF information such as the location and time at which it was taken.

A user has a forum for creating her posts. The user becomes moderator and obviously subscriber of it. Users' friends reply the post by creating comments and show their interest in a post or a comment by clicking on the "Like" button (or by re-tweeting that post as in Twitter). Thus, a post or a comment has a list of users who "like" it.

A group is created by a user for people sharing similar interests or living in the same location. Each group also has many discussion/posts created by its members.

More and more people are accessing the internet and browsing their social networks by using their mobiles, i.e., smart phones (e.g., in one month about 250 million peoples browse Facebook from their mobiles). The branch names of their mobiles can be recorded/logged in the SN data. This information combined with other characteristics of each user on a SN then can be exploited in order to skew the users between mobile clients.

/* Social networks provide a number of applications for each user. Information about the applications that users are using is recorded for analyzing the activities of each user on SNs. (Not available in SIB) */

The users' IP addresses and their other logging information such as date&time are logged whenever they have any activity in the SNs such as writing posts, uploading photos. There should be a correlation between the IP domains and the geographical areas of a user. Here, the identification of the user agents for browsing a SN is also recorded, e.g., a user browses the SNs from an iPhone.


Data dictionaries

For generating tags and random texts for the contents of post/comments, the following dictionaries are used by the data generator.

  • Dictionary 1: Labels from DBpedia 3.6 categories, file "skos_categories_en.nt". These labels will be used for tags. (Note that the "categories_label_en.nt" file is not used since the terms in "skos_categories_en.nt" are more clearly categorized)
  • Dictionary 2: Random texts from DBpedia 3.6 extended abstracts, file "long_abstract_en.nt". Each comment or post is a random fragment from an abstract.
  • Dictionary 3: More than 70,000 personal names from DBpedia.
  • Dictionary 4: Location names from DBpedia links to Geonames, file geonames_links.nt.
  • Dictionary 5: 34,000 educational institutions from DBpedia ontology.
  • Dictionary 6: 148,000 organizations from DBpedia ontology.
  • Dictionary 7: All IP ranges for each country from http://www.ipdeny.com/ipblocks/
  • Dictionary 8: List of free email domain from http://www.zemskov.net/free-email-domains.html and best email domains from http://email.about.com
  • Dictionary 9: Popular web browsers such as Internet Explorer, Firefox, Chrome,... and their popularity information from w3c http://www.w3schools.com/browsers/browsers_stats.asp
  • Dictionary 10: List of stop words from http://www.ranks.nl/resources/stopwords.html


Following is the logical schema and RDF triples instances for social network benchmark.

Namespaces

@prefix       dc:  <http://purl.org/dc/elements/1.1/>  .
@prefix     sioc:  <http://rdfs.org/sioc/ns#>          .
@prefix    sioct:  <http://rdfs.org/sioc/types#>       .
@prefix     foaf:  <http://xmlns.com/foaf/0.1/>        .
@prefix  dcterms:  <http://purl.org/dc/terms/>         .
@prefix      dbp:  <http://dbpedia.org/ontology/>      .
@prefix      sib:  <http://www.ins.cwi.nl/sib/vocabulary/> .


User, user account

A user has first and last name randomly generated from dictionary 3, a location IRI, and an organization where she is working or studying. A user has an account for all social interactions. An account has forums for writing posts. A user specifies 0 - 10 things that she likes. (For user's interests, should we favor common interests such as Music, Film, Sport,...?)



<http://sut/person/0>       a                   foaf:Person                            ;
                            foaf:gender         "male";
                            foaf:birthday       "1983-03-23"^^xsd:date                  ;
                            foaf:mbox           "Narong@sibmail.com"                ;
                            foaf:firstName      "Narong" ;
                            foaf:lastName       "Supinyo" ;
                            foaf:based_near     dbp:Thailand , "Thailand" ;
                            dbpprop:latd        "15.059137592080845"^^xsd:double ;
                            dbpprop:longd       "101.05913759208084"^^xsd:double ;
                            foaf:organization   "Assumption College (Thailand)" ;
                            sib:class_year      "2008"^^xsd:date ;
                            sib:workAt          "Angel Airlines (Thailand)" ;
                            sib:workFrom        "2012"^^xsd:date ;
                            foaf:gender         "male" ;
                            foaf:birthday       "1988-10-14"^^xsd:date ;
                            sib:browser         "Firefox" ;
                            sioc:ip_address      "101.108.0.4" .
                        .  
<http://sut/user/fbg_liag>  a                   <http://sut/user>                    ,
                                                sioc:User                              ;
                            sioc:account_of     <http://sut/person/0>                  ;
                            sioc:moderator_of   <http://sut/forum/0>                 ,
                                                <http://sut/forum/1>                   ;
                            sioc:subscriber_of  <http://sut/forum/0>                 ,
                                                <http://sut/forum/1>                   ;
                            sib:status          "Married";
                            sib:in_relationship_with <http://sut/user/3>             ;
                            dc:created          "2000-01-01 00:00:00"^^xsd:dateTime      .
<http://sut/forum/0>        a                   sioc:Forum                             ;
                            dc:created          "2000-01-01 00:00:00"^^xsd:dateTime      .
<http://sut/forum/1>        a                   sioc:Forum                             ;
                            dc:created          "2000-01-01 00:00:00"^^xsd:dateTime      .
<http://sut/user/fbg_liag>  <http://sut/like>   """Britney Spears"""                             .
<http://sut/user/fbg_liag>  <http://sut/like>   """Madonna"""                        .


Groups / Events

A group is created by a user account who becomes the group moderator and its first group member. In this SIB version, each group contains by up to 100 group members having common interests or living in the same area. However, this could also be extended by considering people studying in the same institute or working in the same company, etc. The creator tags the group in order to indicate the interests of group members and the content of discussions in the group. In SIB, the tags of a group are generated from the group moderator's location information and his interests. A group membership is an auxiliary subject created for tracking the datetime when each user joins (or optionally leaves) the group. A group also has forums for group members to write posts.


<http://sut/group/except%20Sirenia%20have%205263213Group>  a                   <http://sut/group>                                         , 
                                                                               sioc:Usergroup                                               ;
                                                           sioc:name           """except Sirenia have 5263213Group"""                       ;
                                                           sioc:subscriber_of  <http://sut/forum/12664>                                   , 
                                                                               <http://sut/forum/12665>                                     ;
                                                           dc:created          "2004-11-26 09:43:13"^^xsd:dateTime                            .
<http://sut/user/fhfc_ldbi>                                sioc:creator_of      <http://sut/group/except%20Sirenia%20have%205263213Group>    ;
                                                           sioc:moderator_of   <http://sut/forum/12664>, <http://sut/forum/12665>           .
<http://sut/forum/12664>                                   a                   sioc:Forum                                                   ;
                                                           dc:created          "2004-11-26 09:43:13"^^xsd:dateTime                            .
<http://sut/forum/12665>                                   a                   sioc:Forum                                                   ;
                                                           dc:created          "2004-11-26 09:43:13"^^xsd:dateTime                            .
<http://sut/group/except%20Sirenia%20have%205263213Group>  <http://sut/tag>    """Bridges in Italy"""                                         .
...
<http://sut/group/except%20Sirenia%20have%205263213Group>  <http://sut/tag>    """Subdivisions of Dominica"""                                 .
<http://sut/event/Graduation_Party>                        a                   <http://sut/event> , sioc:Event                              ;
                                                           sioc:name           """Graduation Party"""                                       ;
                                                           sioc:subscriber_of  <http://sut/forum/12864>                                   , 
                                                                               <http://sut/forum/12865>                                     ;
                                                           dc:created          "2011-02-22 09:43:13"^^xsd:dateTime                            .
                                                           dcterms:date        "2011-03-20 00:00:00"^^xsd:dateTime                            .
<http://sut/user/fhfc_ldbi>                                sioc:member_of      <http://sut/event/Graduation_Party>                          ;
                                                           sioc:moderator_of   <http://sut/forum/12864>                                   ,
                                                                               <http://sut/forum/12865>                                       .
<http://sut/forum/12864>                                   a                   sioc:Forum                                                   ;
                                                           dc:created          "2011-02-22 09:43:13"^^xsd:dateTime                            .
<http://sut/forum/12865>                                   a                   sioc:Forum                                                   ;
                                                           dc:created          "2011-02-22 09:43:13"^^xsd:dateTime                            .
<http://sut/event/Graduation_Party>                        <http://sut/tag>    """Graduation"""                                               .


Group Memberships

Users enter and leave groups. For each membership, an auxiliary subject is created that tracks the dates the user has entered (and optionally left) the group.


<http://sut/group/the%20Jimmy%20Rogers%20Jr.2300987Group>  sioc:has_member                    <http://sut/user/ffca_ldde>                                  .
<http://sut/membership/1/1000000000>                       <http://sut/member-of-membership>  <http://sut/user/ffca_ldde>                                ;
                                                           <http://sut/group-of-membership>   <http://sut/group/the%20Jimmy%20Rogers%20Jr.2300987Group>  ;
                                                           <http://sut/added>                 "2000-01-13 20:04:21"^^xsd:dateTime                          .


Users handshaking

Users establish foaf:knows by a handshake (first A requests B, then B approves or rejects; A may revoke request any time after the request). An approved handshake makes a symmetric pair of foaf:knows predicates. The state of the handshake is described by properties of a special subject.

If A has requested B, then B will not request A.

Example of requested but not approved contact:

<http://sut/friendship/8>  <http://sut/memb>       <http://sut/user/fbg_liag>          , 
                                                  <http://sut/user/fedd_lhjd>           ;
                          <http://sut/initiator>  <http://sut/user/fbg_liag>            ;
                          <http://sut/requested>  "2000-04-07 05:12:00"^^xsd:dateTime   ;
                          <http://sut/declined>   "2001-02-06 10:17:43"^^xsd:dateTime   .


Example of requested and approved contact:

<http://sut/friendship/25>    <http://sut/memb>       <http://sut/user/fbg_liag>          , 
                                                     <http://sut/user/fbhh_ldji>           ;
                             <http://sut/initiator>  <http://sut/user/fbg_liag>            ;
                             <http://sut/requested>  "2001-02-19 09:24:23"^^xsd:dateTime   ;
                             <http://sut/approved>   "2004-10-10 04:48:33"^^xsd:dateTime     .
<http://sut/user/fbg_liag>   foaf:knows              <http://sut/user/fbhh_ldji>             . 
<http://sut/user/fbhh_ldji>  foaf:knows              <http://sut/user/fbg_liag>              .


Once created, membership or handshake subject continues to keep all its triples even if the membership or handshake is dropped by a user, but predicates sioc:has_member and foaf:knows get replaced by sioc:had_member and foaf:known.


Post

A post is created by a user in order to open a social discussion. It is contained in the forum where the user is the moderator. Any user can join the discussion by commenting on any post that he has write permission. In this benchmark, the write permission is applied to all the friends of the post's creator so that they can comment on any post of that user. Note that a user can also comment on a specific comment in the discussion. Discussions therefore have a tree-shape.

A post contains some hash tags which provide the main ideas of the post. It also contains a list of users who are interested in it (i.e., people who click on the like button).

The content of each post is generated by using the texts from dictionary 2. The tags for each content can be extracted by using text annotation tools such as Spotlight, or simply generated from the title of this content in dictionary 2.

Since a user can send the post by using a user agent, e.g., iPhone, and browse the social network by using different web browsers from a specific IP address, SIB includes the user agent, the browser, and the IP address information for each post.


<http://sut/thread/0/post/0>   a                  <http://sut/post>                                           , 
                                                  sioc:Post                                                     ;
                               dcterms:title      """... 3 to 83 chars ..."""                                   ;
                               dc:created         "2000-01-01 00:00:00"^^xsd:dateTime                           ;
                               sioc:content       """... 50 to 20000 chars depending on type of forum ..."""      ;
                               <http://sut/agent> "iPhone"                                                        ;
                               sioc:ip_address   "120.92.68.1"                                                   .
<http://sut/forum/2/thread/0>  sioc:container_of  <http://sut/thread/0/post/0>                                    .
<http://sut/user/ffca_ldde>    sioc:creator_of    <http://sut/thread/0/post/0>                                    .
<http://sut/thread/0/post/0>   <http://sut/tag>   """Lost BBC episodes"""                                         .
...
<http://sut/thread/0/post/0>   <http://sut/tag>   """Comics creator BLP pop"""                                    .
<http://sut/thread/0/post/0>   <http://sut/like>  <http://sut/user/ffca_liag>                                     .
...
<http://sut/thread/0/post/0>   <http://sut/like>  <http://sut/user/ffca_loeg>                                     .


Photos and photo Albums

A user shares her photos by uploading a photo album containing a number of photos. Each photo has the date time when it was taken, and the geographical information (i.e., latitude and longitude) where it was taken, the IP address where it was uploaded, and optionally the information of user agent used for uploading. The user can indicate those people appearing in her photo by tagging these names in the uploaded photo.

<http://sut/photoalbum/1234>          rdf:type            sioct:ImageGallery                  ;
                                      dcterms:title       """ ...3 to 83 characters... """    ;
                                      dc:created          "2011-03-03 00:00:00"^^xsd:dateTime   .
<http://sut/user/ffca_ldde>           soic:creator_of     <http://sut/photoalbum/1234>          .
<http://sut/photoalbum/1234/photo/0>  a                   <http://sut/photo>, sioc:Item       ;
                                      dbp:location        <http://sut/location/62452>         ;
                                      dbpprop:latd        ''52.0''^^xsd:double                ;
                                      dbpprop:longd       ''19.3''^^xsd:double"               ;
                                      dc:created          "2011-03-03 00:00:00"^^xsd:dateTime   .
<http://sut/photoalbum/1234>           sioc:container_of  <http://sut/photoalbum/1234/photo/0>  .
<http://sut/photoalbum/1234/photo/0>   <http://sut/agent> "iPhone".     
<http://sut/photoalbum/1234/photo/0>   <http://sut/tag>   <http://sut/user/ffca_ldde>           .
...
<http://sut/photoalbum/1234/photo/0>   <http://sut/tag>   <http://sut/user/ffca_liag>           .


Comments

A comment is created by a user for replying to a user's post or a particular comment. A user can write a comment on any post in any forum where he has write permission. In this benchmark, the write permission for commenting is applied to all friends of the post's creator so that they can comment on all of her posts. Similar to the post, the comment also contains the information of the user agent, the browser, and the IP address.

<http://sut/post/472278/cmt/1125658>   a                  <http://sut/comment>                 , 
                                                          sioc:Item                              ;
                                       dc:created         "2005-01-02 10:52:53"^^xsd:dateTime    ;
                                       sioc:reply_of      <http://sut/post/472278/cmt/1125657>
                                       sioc:content       """..."""                                .
<http://sut/thread/26273/post/472278>  sioc:container_of  <http://sut/post/472278/cmt/1125658>     .
<http://sut/user/fgfc_lehb>            sioc:creator_of    <http://sut/post/472278/cmt/1125658>     .
<http://sut/post/472278/cmt/1125623>   a                  <http://sut/comment> , sioc:Item       ;
                                       dc:created         "2010-01-02 10:52:53"^^xsd:dateTime    ;
                                       sioc:content       """..."""                                .
<http://sut/photoalbum/1234/photo/0>   sioc:container_of  <http://sut/post/472278/cmt/1125623>     .
<http://sut/user/fgfc_lahb>            sioc:creator_of    <http://sut/post/472278/cmt/1125658>     .
<http://sut/post/472278/cmt/1125658>   <http://sut/like>  <http://sut/user/ffca_liag>              .
...
<http://sut/post/472278/cmt/1125658>   <http://sut/like>  <http://sut/user/ffca_loeg>              .


Simulation

As mentioned above, this benchmark simulates the interaction as well as the data distribution in real SNs.


Social Graph Analysis

SNs can first be viewed in a simple way as a graph, namely a social graph, in which each node represents a user and the edge between two nodes represents the interaction between two corresponding users. For generating the social graph, this benchmark considers the following parameters: social degree (i.e., number of friends per user); clustering coefficient (the connectivity among the immediate neighbors/friends of a user); "assortativity" coefficient (probability that a node connects to a node with similar degree); and average path length (average of all-pairs-shortest-paths). As social networks are commonly small-world networks, in a social graph, lower social degree nodes have high clustering coefficient (i.e., high levels of local clustering at the edges of the graph) and the average path length between each pair of nodes is small. In case that each particular SN requires different set of degrees/coefficients, the data generator needs to accept a number of parameters in addition to the scale parameter (i.e., number of initiating users) in order to flexibly represent a specific real SN.

For the detail of our social graph generator, please have a look at our paper at TPCTC 2012 (See [Duc12] in Reference) .


User and Number of Posts/Photos

Users tend to write more posts if they have more friends [TWS]. Thus, the distribution of the number of posts per user is similar to the distribution of the number of friends per user. Additionally, users who have just joined the network may write more posts than those who have been joined for a long time. Thus, the number of posts/photos per user is a function combining of the number of friends and the number of days that she has joined. We consider that for those users who have been joined more than 30 days, the number of posts and only based on the number of their friends.


Posts/Photos and Tags/Comments

Results from [Wil09] show that most active users only receive photo comments from a small segment (< 15%) of their friends, while the majority of users receive comments from ~5% of their friends. 57% of users self-identify with the photo albums they upload by tagging one or more photos.

A friend is considered as involved in a post if he/she posts a comment in that post. The cumulative distribution between the number of wall posts and the number of friends involved is a power-law distribution with alpha value 0.36, a = 0, b = 40 (estimating by using EasyFit tool).

The cumulative distribution between the number of photo comments and the number of friends involved is a power-law distribution with alpha value 0.30, a = 0, b = 10 (estimating by using EasyFit tool).

We randomly generate 0 - 10 hash tags for each post, and also 0 - 10 tags to user's friends for each photo uploaded.


Number of Groups per users

(Currently, the real information for users' groups is not clear. From FB statistics, on average a user is connected to 80 community pages, groups, and events)


User activities

There are many actions that a user can take when she gets into a SN. These actions commonly include updating their profiles/status, checking friends' profiles and their updates, uploading/browsing photos, writing a post, browsing her groups/communities, or searching for necessary information. In a fraction of time, the probability that user perform a specific action is not the same. Analysis on SNs showed that users spend most of their time for their profiles, friends' updates, and photos.

Besides, it is the fact that all users are not equally active. This is different from BotNetBM's assumption. For example, users who has just joined (e.g., less than 30 days) are more active than ones who joined long time before.


Measurements

The metrics for the performance of RDF stores over SIB benchmark are:

  • Query per second
  • Query mix per hour
  • Total execution time

Note that, in a query mix, each query will have a particular number of execution times which reflects the popularities of using that query in a real SN. For example, Q9 in interactive query mix is much more popular than Q8 in this query mix, thus, it will be executed more times than Q8.


Interactive query mix


1. Find all users whose first names contain a particular string, e.g., "?ijk?" (regular expression text search)

prefix foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT ?name ?lastname ?institute  
WHERE {
        ?person foaf:organization ?institute.
        ?person foaf:firstName ?name.
        ?person foaf:lastName ?lastname.
        FILTER regex(?lastname, "ijk","i")
}

2. Return the names of people studied in the same school/organization at the same time (multiple patterns matching). These people are likely the classmates/colleagues of the user.


prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT DISTINCT ?name1 ?name2 ?institute1
WHERE {
        ?person1 foaf:lastName ?name1.
        ?person1 sib:class_year ?classyear1.
        ?person2 foaf:lastName ?name2.
        ?person2 sib:class_year ?classyear2.
        ?person1 foaf:organization ?institute1.
        ?person2 foaf:organization ?institute2.
        FILTER (xsd:date(?classyear1) = xsd:date(?classyear2)).
        FILTER (?institute1 = ?institute2)
} LIMIT 100


3. Find people studied from the same school that connect with you by a path of friend relationship (Use the "Property Path Expression" in SPARQL 1.1 with arbitrary length path)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>

SELECT DISTINCT ?user1 ?user2 ?institute1 ?dist

WHERE {
        {
                SELECT ?user1 ?user2
                WHERE{
                        ?user1 foaf:knows ?user2
                }
        }
        OPTION ( TRANSITIVE, t_distinct, t_in(?user1), t_out(?user2),
                t_min (0), t_step ('step_no') as ?dist ).

        ?person1 foaf:lastName ?name1.
        FILTER (?person1 = %person%).
        ?person1 foaf:organization ?institute1.
        ?user1 sioc:account_of ?person1.

        ?person2 foaf:lastName ?name2.
        ?person2 foaf:organization ?institute2.
        ?user2 sioc:account_of ?person2.
        FILTER (?institute1 = ?institute2)
}
ORDER BY ?dist

LIMIT 10


4. Find singers who won American Music Awards and are liked by one of your friends. (exploit DBpedia knowledge bases for this award and singers who won it)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>
prefix dbpprop: <http://dbpedia.org/property/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?singerName
WHERE {
        %user% foaf:knows ?friend.
        ?friend sib:like ?interest.
        ?show dbpprop:showName ?showname.
        ?show dbpprop:presenter ?singer.
        ?singer rdfs:label ?singerName.
        FILTER regex(?showname, "American Music Awards").
        FILTER (str(?singerName) = str(?interest))
}


5. Find all people living in a specific location, e.g., Amsterdam, that can be reached from a user by at most 3 steps friend relationship. (path expression of specific length)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>

SELECT DISTINCT ?user1 ?user2 ?dist

WHERE {
        {
                SELECT ?user1 ?user2
                WHERE{
                        ?user1 foaf:knows ?user2
                }
        }
        OPTION ( TRANSITIVE, t_distinct, t_in(?user1), t_out(?user2),
                t_min (1),t_max(3), t_step ('step_no') as ?dist ).

        ?person1 foaf:lastName ?name1.
        FILTER (?person1 = %person%).
        ?user1 sioc:account_of ?person1.

        ?person2 foaf:lastName ?name2.
        ?user2 sioc:account_of ?person2.
        ?person2 foaf:based_near %location%
}
ORDER BY ?dist

6. Show all the friends of yours who are living in Europe. This requires using the information from DBpedia, for example, Amsterdam is a city in Europe and London is a city in Europe. (regional correlation)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX yago: <http://dbpedia.org/class/yago/>

SELECT DISTINCT ?name2 ?eurocountryName  WHERE {
        %person% foaf:lastName ?name1.
        ?user1 sioc:account_of %person%.
        ?user1 foaf:knows ?user2.
        ?user2 sioc:account_of ?person2.
        ?person2 foaf:lastName ?name2.
        ?person2 foaf:based_near ?location2.
        ?eurocountry rdf:type yago:EuropeanCountries.
        ?eurocountry rdfs:label ?eurocountryName.
        FILTER regex(?location2,?eurocountryName)
} LIMIT 100

7. Find top-10 suggested friends for a user: Those people that are currently not your friend but are friends of many of your friends. (Get all friends of your friends, order them by the number of people in your friends list connecting to them)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>
prefix dbpprop: <http://dbpedia.org/property/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?friendOfFriend  count(*) as ?count
WHERE {
        ?user0 foaf:knows ?friend.
        ?friend foaf:knows ?friendOfFriend.
        FILTER (
                !bif:exists
                (
                        (
                        SELECT *
                        WHERE
                        {
                                ?user0 foaf:knows ?friendOfFriend
                        }
                        )
                )
        ).
        FILTER (?user0 != ?friendOfFriend).
        FILTER (?user0 = %user%).

}
GROUP BY ?friendOfFriend
ORDER BY DESC(?count)
LIMIT 10

8. Return all users that have not joined a specific group but more than 5 friends of theirs joined the group.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>
prefix dbpprop: <http://dbpedia.org/property/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sioc: <http://rdfs.org/sioc/ns#>

SELECT *
WHERE
{
        {
                SELECT ?user0 count(*) as ?count
                WHERE {
                        ?user0 foaf:knows ?friend.
                        ?group sioc:has_member ?friend.
                        FILTER (?group = %group%).
                        FILTER (
                                !bif:exists
                                (
                                        (
                                        SELECT *
                                        WHERE
                                        {
                                                ?group sioc:has_member ?user0.
                                        }
                                        )
                                )
                        ).

                }
                GROUP BY ?user0
        }
        FILTER (?count > 4)

}

9. Show 10 latest posts/tweets from your friends or the friends of them (Order by the posting time)


prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>

SELECT ?postcontent ?createDate
WHERE {
        {
                SELECT ?user1 ?friend
                WHERE {
                        { ?user1 foaf:knows ?friend }
                        UNION
                        { ?user1 foaf:knows ?user2.
                          ?user2 foaf:knows ?friend }
                }
        }

        ?user1 sioc:account_of <http://www.ins.cwi.nl/sib/person9299>.
        ?friend sioc:moderator_of ?forum.
        ?forum sioc:container_of ?post.
        ?post sioc:content ?postcontent.
        ?post dc:created ?createDate
}
ORDER BY DESC(?createDate)
LIMIT 10

10. Show active posts/tweets - the 10 latest commented posts/tweets from your friends. (Order by the timestamp of the last comments on the posts)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>

SELECT ?post ?commentcontent ?commentdate
WHERE {
        ?user1 sioc:account_of %person%.
        ?user1 foaf:knows ?friend.
        ?friend sioc:moderator_of ?forum.
        ?forum sioc:container_of ?post.
        ?post sioc:content ?postcontent.
        ?post sioc:container_of ?postcomment.
        ?postcomment sioc:content ?commentcontent.
        ?postcomment dc:created ?commentdate
}
ORDER BY DESC(?commentdate)
LIMIT 10

11. Return top-10 most interesting posts from your friends - First order by the number of "like" (or in Twitter, the number of "re-tweet" posts) on the posts from your friends, then order by the number of comments.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?post ?count count(?comment) as ?countcomment
WHERE
{
                {
                SELECT ?post count(?userLike) as ?count
                WHERE {
                        ?user1 sioc:account_of %person%.
                        ?user1 foaf:knows ?friend.
                        ?friend sioc:creator_of ?post.
                        ?post sioc:content ?postcontent.
                        ?post sib:like ?userLike.
                }

                GROUP BY ?post
                }

                ?post sioc:container_of ?comment
}
GROUP BY ?post ?count
ORDER BY DESC(?count) DESC(?countcomment)

LIMIT 10

12. Return all posts about an event, an (e.g., Unrest in Tunisia) in 10 recent days. Based on the hash tags if they are available. In case no tag appears in the post, check whether the content of the post contains the terms in the searching event. (free text search & tags checking)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?post ?content
WHERE
{
        ?post a sib:Post.
        ?post dc:created ?createdDate.
        FILTER (xsd:dateTime(?createdDate) > "2010-12-01T00:00:00Z"^^xsd:dateTime).
        {
                {
                        {
                        ?post sioc:content ?content.
                        ?post sib:tag ?tag.
                        FILTER (?tag = "Tunisia")
                        }
                        UNION
                        {

                        ?post sioc:content ?content.
                        FILTER (
                                bif:contains
                                (?content, "Tunisia")
                        )
                        }
                }
        }

}

13. Find people having same gender, in a range of age, living in the same areas who are not friends of a user, and order them by the number of shared interests with the user.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>
prefix sibu: <http://www.ins.cwi.nl/sib/user/>
prefix sioc: <http://rdfs.org/sioc/ns#>

SELECT ?user1  count(?interest1) as ?count
WHERE {
        ?user0 sioc:account_of ?person0.
        ?user1 sioc:account_of ?person1.

        ?user0 sib:like ?interest0.
        ?user1 sib:like ?interest1.
        FILTER(?interest0 = ?interest1).

        ?person0 foaf:based_near ?location0.
        ?person1 foaf:based_near ?location1.
        FILTER (?location0 = ?location1).

        ?person0 foaf:gender ?gender0.
        ?person1 foaf:gender ?gender1.
        FILTER (?gender0 = ?gender1).

        ?person1 foaf:birthday ?birthday.
        FILTER (xsd:date(?birthday) > "1975-01-01"^^xsd:date && xsd:date(?birthday) < "1985-01-01"^^xsd:date ).

        FILTER (
                !bif:exists
                (
                        (
                        SELECT *
                        WHERE
                        {
                                ?user0 foaf:knows ?user1
                        }
                        )
                )
        ).

        FILTER (?user0 != ?user1).
        FILTER (?user0 = <http://www.ins.cwi.nl/sib/user/u0>).
}
GROUP BY ?user1
ORDER BY DESC(?count)
LIMIT 10

14. Find number of current inactive user: all users activated more than 60 days and do not have any post during last 30 days.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT count(?user)
WHERE
{
        ?user dc:date ?createdDate.
        FILTER (xsd:dateTime(?createdDate) < "2010-09-01T00:00:00Z"^^xsd:dateTime).
        FILTER (
                !bif:exists
                (
                        (
                        SELECT *
                        WHERE
                        {
                                ?user sioc:creator_of ?post.
                                ?post dc:created ?postdate.
                                FILTER (xsd:dateTime(?postdate) > "2010-10-01T00:00:00Z"^^xsd:dateTime)
                        }
                        )
                )
        ).

}

15. Show all photos posted by my friends that I was tagged.

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT DISTINCT ?photo
WHERE {
        ?photo a sib:Photo.
        ?user1 sioc:account_of <http://www.ins.cwi.nl/sib/person0>.
        ?photo sib:tag ?user1.
        ?user1 foaf:knows ?user2.
        ?photoalbum sioc:container_of ?photo.
        ?user2 sioc:creator_of ?photoalbum
}
LIMIT 100

16. Show the list of a user's top-10 close friends. Tips: If two people are tagged in the same photo, it is likely that they are close friends or colleague. Thus, sort your friendship according to the number of time that a user and her friend are both tagged in a photo, then according to number of your tags for each friends (user may not tag you in your photo when you upload).

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?user ?friend count(?photo) as ?numPhoto
WHERE
{
        ?user foaf:knows ?friend.
        FILTER (?user = %user%).
        ?photo sib:tag ?user.
        ?photo sib:tag ?friend.
}
GROUP BY ?user ?friend
ORDER BY DESC(?numPhoto)

17. Find top-10 friends or all friends of friends of you that have common interest (Based on the similarity between the tags in your posts and tags in their posts)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?friend count(?friendposttag) as ?numTag
WHERE
{
        ?user foaf:knows ?friend.
        FILTER (?user = %user0%).
        ?user sioc:creator_of ?userpost.
        ?userpost a sioc:Post.
        ?userpost sib:tag ?tag.

        ?friend sioc:creator_of ?friendpost.
        ?friendpost a sioc:Post.
        ?friendpost sib:tag ?friendposttag
        FILTER (str(?friendposttag) = str(?tag))

}
GROUP BY ?friend
ORDER BY DESC(?numTag)
LIMIT 10

18. What are the current hottest events/problems? (Get the hash tags from posts and order by the number of their appearances in 10 recent days)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?tag count(?post) as ?numPost
WHERE
{
        ?post a sioc:Post.
        ?post dc:created ?postdate.
        FILTER (xsd:dateTime(?postdate) > "2010-12-01T00:00:00Z"^^xsd:dateTime).
        ?post sib:tag ?tag

}
GROUP BY ?tag
ORDER BY DESC(?numPost)
LIMIT 10

19. Which area is the most active area? (Order by the total number of posts in each location in 5 recent days)

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?location count(?post) as ?numPost
WHERE
{
        ?person foaf:based_near ?location.
        ?user sioc:account_of ?person.
        ?user sioc:moderator_of ?forum.
        ?forum sioc:container_of ?post.
        ?post dc:created ?postdate.
        FILTER (xsd:dateTime(?postdate) > %recentDays%)

}
GROUP BY ?location ORDER BY DESC(?numPost)
LIMIT 10

20. Return the top-10 locations that have the fastest growth in the number of users. (Count the number of people joined during the 10 recent days).

prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

SELECT ?location count(?user) as ?numuser
WHERE
{
        ?person foaf:based_near ?location.
        ?user sioc:account_of ?person.
        ?user dc:date ?joindate.
        FILTER (xsd:dateTime(?joindate) > %recentdate%^^xsd:dateTime).
}

GROUP BY ?location

ORDER BY DESC(?numuser)

LIMIT 10


Queries from 1 to 8 are for the information of Profiles & Friends, 9-14 are for posts or tweets, 15-18 for tagging, 19-20 for other information.


Update query mix

Basic users' update actions can be divided into several groups as following.

1. Profile Q1.Update profile information


prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

MODIFY %GraphURI%

DELETE {
         ?person foaf:gender ?gender.
         ?person foaf:firstName ?firstName.
         ?person foaf:lastName ?lastName.
        }

INSERT
        {
                ?person foaf:gender %newgender%.
                ?person foaf:firstName %newFirstName%.
                ?person foaf:lastName %newLastName%
        }
WHERE   {

                 ?person foaf:gender ?gender.
                 ?person foaf:firstName ?firstName.
                 ?person foaf:lastName ?lastName.
                 FILTER (?person = %person%)

        }

( The %GraphURI% names the graph in the graph store to be updated. For example, if we load rdf datasets into "http://localhost:8890", it can be <http://localhost:8890>)

2. Posts/Tweets:

  • Add a posts
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>
prefix dc: <http://purl.org/dc/elements/1.1/>
prefix dcterms: <http://purl.org/dc/terms/>

INSERT INTO  %GraphURI%

{
        ?user sioc:creator_of %post%.
        %Post%
           a sib:Post , sioc:Post ;
           dcterms:title %postTitle% ;
           dc:created %postCreatedDate% ;
           sioc:content %postContent%.
        ?forum sioc:container_of %Post%.
        ?user sioc:creator_of %Post%.
        %Post% sib:tag %postTag%.
        %Post% sib:tag %postTag%.

}
WHERE {
        ?user sioc:moderator_of ?forum.
        ?user sioc:subscriber_of ?forum.
        FILTER (?user = %user%)
}
  • Remove a posts
  • Add tags for your friends
  • Add/Remove a comment

3. Friends

  • Add a friend
  • Remove a friend

4. Group, Event

  • Join/Leave a group/event
  • Remove a user from a group (only moderator of the group can do this)
  • Add/Delete post in the group/event

5. Photos

  • Add/Delete a photo
  • Add/Remove tags in the photo (Not only the user can remove tag from his uploaded photo, anyone can remove the tag for her on the photo)
  • Add/Remove a comment

In addition, the update query mix for this benchmark contains more complicated update queries.

6. Remove all the tags to a user from the pictures or posts of her friends


prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix sioc: <http://rdfs.org/sioc/ns#>
prefix sib: <http://www.ins.cwi.nl/sib/vocabulary/>

DELETE FROM %GraphURI%
{
        ?photo sib:tag ?user1
}
WHERE {
        ?photo a sib:Photo.
        ?user1 sioc:account_of %person%.
        ?photo sib:tag ?user1.
        ?user1 foaf:knows ?user2.
        ?photoalbum sioc:container_of ?photo.
        ?user2 sioc:creator_of ?photoalbum
}

7. Remove all friends of a user who do not have any interaction with her

8. Add top-10 close friends of a user to all group that he creates.


Analysis query mix

  1. Where and when to advertise a product, e.g., Hello Kitty?
    • Based on the IP domains and the browsing times of Hello Kitty fan, we can know where and when the advertisement for Hello Kitty should take place. It is also important to attract those users who have not been Hello Kitty fan yet, but their posts contain or mention about Hello Kitty.
  2. Find the challengers of Hello Kitty?
    • Find the fictional entities (e.g., Totoro) which are occurring together with Hello Kitty in users' posts.
    • Additionally, a fictional entity which does not appear in the same post about Hello Kitty, but their fans are from the same geographical areas with those of Hello Kitty fans can also be considered as a challenger or a competitor of Hello Kitty.
  3. Who are iPhone users or potential iPhone clients?
    • The iPhone or BlackBerry users (or potential users) can be identified by using two sources of information. First is the user agent used for browsing SNs, e.g., if a user browses from an iPhone, it has a very high probability that she is an iPhone user. Second are the tags or words in their posts. This query will find the professions of iPhone's users in comparing with Blackberry users in order to know who are current iPhone users (so that advertisement for iPhone accessories should aim at), or who are potential iPhone users.
  4. Wildfire
    • Find the first mentions of a concept in the last day such that the concept is not mentioned before and the concept occurs in more than 10% of new posts in groups involved with politics.
  5. Associated product
    • People who consider/mention about iPhone also mention about which products? This information can be used as a recommendation for a user when they buy iPhone. For example, people who buy iPhone also consider buying a bumper.
  6. Product lifetime
    • In this benchmark, a lifetime of a product or a concept starts when it is first mentioned in a post. The lifetime ends when there is no more post (or very small number of posts) about this product during 30 days. This can be considered as the moment that not many people want to discuss about this product. Hence, information about a new version of the product or another product should be released by the company to attract the users' attentions.
    • Find the top-10 products that have longest lifetime?
    • When it is the time for releasing the information about the new iPhone version?
  7. Troublemakers and Duplicates
    • This is for finding duplicated identities based on behavior patterns.
    • In this benchmark, a zealot is defined as a person who starts threads which may get a large number of fast replies due to inflammatory content but the threads do not last long. This person may start the similar threads by using different accounts. To find this troublemaker having different accounts, we first identify one zealot, then looking for others with similar behavior pattern and subject matter. The results will be sorted according to the overlapping content in the initial firebrand post, then sorted by similarity in the comments (replying contents).
    • Extra points are given for repetitive behavior and even more points for leaving/being banned from the groups where the behavior occurs. Repetitive behavior is posting a second time with the same concepts and getting a similar reaction.
    • Find a troublemaker and his duplications?
  8. Application accounts
    • Current SNs provide a lot of applications/games to attract the users. One user may create several accounts used for their applications/games only. For example, many FB users who play game Mafia War have more than 5 accounts. We are detecting these application accounts by checking their activities, i.e., those accounts whose all posts are about their applications.
    • Find all the application accounts of an application.
  9. Expert finding
    • An user is considered as an expert in a specialty if she/he has contributed in discussions on both domains for a long time and has many social connections to other people in that domain.
    • There can be several requests looking for users who are experts in different areas. (These queries usually form a small graph of social connections)
    • Find a user who is expert in law and medicine.
    • Find a user who is expert in computer science and have friends who are expert in Maths and Physics.
  10. Who is new star?
    • A new star is defined as a person whose fan page has the fastest increment on the number of members during the last 30 days.
    • Find the top-5 new stars?
  11. The fastest propagating ideas
    • What is the topic with the most users who have joined in the last day? A user is considered to have joined if the user was not discussing this in the past 10 days.


References

<To be updated soon>