W3C

Interview: BBC on Publishing and Linked Data

I chatted recently with Olivier Thereaux, Yves Raimond (senior technologist in R&D), and Silver Oliver (data architect) of the BBC about the Web, publishing, and linked data.

Ian: The BBC is prolific and large. How do you view yourselves?

Silver: The BBC is primarily a broadcasting organization. Content is developed or commissioned within different editorial domains (such as News or Music or Sports) then distributed through diverse channels (TV, Radio, web, apps, etc). This fragmentation exists also on the web, with development of individual sites being largely delegated to dedicated teams.

Ian: How do you move beyond silos?

Yves: We have a lot of data that we are now using to draw connections among various BBC TV and radio programs and entities in other domains, like music or nature. We also expose the corresponding data. For example the programmes site exposes data views giving details about all the music tracks played in a given radio programme, and those details link to (and draw from) artist profiles on the BBC’s music site… which themselves are also available as data views.

Olivier: We also reuse data that’s available on the Web (e.g., from musicbrainz and wikipedia). Because the public is curating the information they can update it more rapidly than we could on our own. In a way, the Web is our Content Management System.

Ian: What are you using to aggregate and expose the data?

Yves: For the programmes and music site we use a relational database internally but then we expose the information in RDF.

Olivier: And we benefit from the ways that people have innovated around the RDF data we expose. When people play with the interfaces and massage the data, we can build on their experience.

Ian: Why not use RDF internally?

Yves: I think the main reason is that the people who originally built these sites site were unaware of RDF, or were concerned about using an unfamiliar technology on such a big project. But we use it with other projects.

Ian: How has your uses of data affected reporting?

Silver: In the past our editorial efforts have been captured in whole HTML documents. This causes problems for reuse in new data views and across platforms and applications (including IPTV). The key is in working with existing editorial workflows to capture a sub-set of machine readable information. In its simplest form this might be a byline and small number of tags the story is about.

Ian: How do reporters use the data to make connections between stories?

Silver: Connections have always happened, but it didn’t scale. Linking between sports and news was a manual process and reliant on a journalist’s knowledge of BBC output. But now we have rich data models behind the scenes. These models help the BBC editorial staff represent their understanding of the world and our audience’s interests, and let us make connections in a scalable fashion.

Olivier: The data is a substrate that pre-populates a lot of the site, and then journalists can focus on the stories and not re-entering the data bits.

Silver: In sport, for example, we pay for the sport data (fixtures, results and statistics) then we write stories about match reports, and tagging ensures that everything gets linked properly. That’s how we built the sites for the 2010 world cup or the 2012 olympics.

Ian: Do the reporters add data to the system directly?

Silver: Yes, we ask them to tag the stories they pull together so that we can put those stories into different contexts (or aggregations). We were quite happy to realize the natural curatorial process was already happening, we just needed to give people a way to capture data.

Ian: You mentioned buying and using data from various sources, including commercial ones. Do you make use of data provenance information?

Yves: We need to be very transparent about where our data comes from. Our reporters, partners, official organisations, sometimes our audience too.

Olivier: There is an interesting tension between making use of provenance information and ensuring user privacy. These days people expect to receive personalized content. To achieve that we make use of “attention data”: what you watch or like. We have been looking at how to guarantee that we uphold privacy while at the same time asking for the minimal amount of information to tailor the best experience. That’s probably less about “Do Not Track” and closer to the spirit of W3C’s older P3P technology. On the other hand, we want to know whether information is reliable. This is challenging for user-generated content in particular: who is the user? how much do we trust them?

Ian: Do you think making provenance information available to readers can help digital literacy?

Silver: We had an interesting debate internally whether to include links from health stories to the journals that published the original research. Some felt that readers would not be interested in the links or would find the research complex. Others encouraged the links so that the community could respond to our articles with their own interpretations, including challenging the articles from various angles. This, in turn, would generate more discussion and perspectives from a much larger audience.

Ian: How did it turn out?

Silver: In stories about politics we have begun to include links to relevant legislation. And we are exploring how to extend the linking to pull in data from these sources to weave into BBC story-telling. For example data about committees that commented on bills, which members of parliament commented, and so on. These data models allow us to make more connections among stories, as we discussed earlier.

Ian: This sounds like a linked data project!

Silver: Internally we have wholesale signed up for and understood the value of linked data as way to manage our organizational complexity. We will draw data from various sources and use RDF to stitch them together. We can make use of the information in ways we could not do before because it was either too costly or unmanageable. Semantic Web technology is now core to our strategy as an enterprise.

Ian: Have you measured cost savings by using Semantic Web technology?

Silver: It’s still too early to say. There were costs associated with our initial projects, since we needed to acquire expertise. But we have since been able to roll out highly trafficked BBC content using Semantic Web technology.

Ian: Thank you all so much for your time!