Dr. John Breslin is a senior semantic web researcher working at the Digital Enterprise Research Institute in Galway, Ireland. We met at the BlogNation Japan launch party at Web2.0 Expo Japan, and then again at BlogTalk2008 - a conference he was organising in Cork. I was interested to learn a little more about DERI, and how it engages with business, so fired him some questions by email, for which he has kindly found time in his mad schedule to respond to.
How did DERI come about, and why the focus on semantic technologies?
DERI was established at the National University of Ireland, Galway in late 2003 as part of an initiative by Science Foundation Ireland, an Irish government-funded agency, to establish what are called Centres for Science, Engineering and Technology (CSETs) in the areas of ICT and biotech at various third-level institutions around Ireland. The chosen focus for DERI was the Semantic Web, as there was and is a recognised need for research into how to manage the information explosion on the Web using semantic technologies.
How did the Seoul and Stanford branches get started?
The senior researchers and directors at DERI had long-established ties with Stanford University; DERI director Professor Decker previously worked there for some time. In conjunction with Mike Genesereth and Charles Petrie from the Stanford Logic Group, NUI Galway agreed with Stanford University to establish "DERI Stanford" under whose umbrella formal research collaborations between DERI Galway and Stanford could progress.
As regards Korea, DERI researchers had been working with staff from Seoul National University for a number of years, in particular the Biomedical Knowledge Engineering Laboratory (BiKE) led by Professor Hong-Gee Kim. There has also been a series of researcher exchanges between our two institutions (and DERI Galway also has a PhD student who originated in SNU), so a formal arrangement was confirmed in late 2007 leading to the creation of DERI Seoul.
Any specific reason for Korea in the Asia region?
With rapidly growing communications and scientific infrastructures in Korea, it was recognised that there are a large number of Korean institutions (academic and commercial) who are focusing on the application of semantic technologies to the ICT and biotech domains. As such, it made sense to establish DERI Seoul to liaise with these organisations and to identify common challenges that could be tackled through various research projects (some of these in conjunction with DERI Galway).
I see on your website that your mission includes assisting the commercialisation of semantic technologies through "Business Development Outreach". Can you tell me how you engage with businesses, perhaps with some examples?
As part of DERI's remit from Science Foundation Ireland, we have two "outreach" branches, the first being community and education outreach (working with local groups and schools) and the second is business outreach. Business outreach means that we have staff at DERI who work with various local SMEs and multinationals, describing the research areas that DERI is involved in and investigating if there is a potential need for the application of semantic technologies to solve research challenges in these organisations.
Business outreach also includes efforts to bring together related companies to solve common research requirements (e.g., the Elite initiative brings together seven or eight companies in the domain of semantics applied to e-learning). DERI has also attracted a number of companies to Galway based on our reputation and our expertise base; some of these have relocated staff and others are establishing new bases near NUI Galway.
What areas do you see semantic technologies being use commercially, and perhaps for online consumer applications in particular?
I think that we are now beginning to see the real commercial applications of what can be done when all kinds of things on the Web are connected together using semantics. This is obvious in the attention being given to startup companies in this space like Powerlabs (Powerset), Metaweb (Freebase) and Radar Networks (Twine), and also since many big companies including Reuters (Calais API), Yahoo! (semantically-enhanced search) and Google (Social Graph API) have recently announced what they are doing with semantic data. There has been a lot of talk recently about the social graph (notably from Google's Brad Fitzpatrick), which looks at how people are connected together (friends, colleagues, neighbours, etc.), and how such connections can be leveraged across websites. In the Semantic Web, it is not just people who are connected together in some meaningful way, but documents, events, places, hobbies, pictures, you name it! And it is the commercial applications that exploit these connections that are now becoming interesting.
Radar Networks' Nova Spivack recently gave a keynote talk at BlogTalk 2008 as CEO of one of the companies that is practically applying Semantic Web technologies to social software applications. Radar have a product called Twine, which is a "knowledge networking" application that allows users to share, organise, and find information with people they trust. I find Twine very interesting, and as well as using it to gather information about SIOC (more below), I intend to use it to gather and publish personal interests that I think will be of interest to the public.
Part of your work is with SIOC - a web standards submission for connecting online communities. How is that process going?
SIOC is an initiative that I've been working on for the past four years at DERI (with Uldis Bojars, Stefan Decker, and others) that aims to make semantic data available from online communities and Web 2.0 spaces, and to use and leverage that data in interesting and useful ways. As well as being the Irish word for frost, SIOC stands for "Semantically-Interlinked Online Communities" and the schema or vocabulary of terms that serves as its basis was recently submitted as a Member Submission to the World Wide Web Consortium (W3C). We have about 30 or 40 SIOC applications and modules that use and consume SIOC data already.
The process to achieve traction with SIOC was as follows. Firstly, we created the schema of terms (Site, User, Forum, Post, Container, Item, etc.). Then, we made some SIOC metadata exporters for various open-source discussion systems and popular community sites, in the hope that we could "infect" the Web infrastructure with semantics - during the next upgrade cycle, gigabytes of community data can become available (a case in point will be the forthcoming SIOC producer for Irish message board site boards.ie). To produce this mass of linked data from various online communities, we wanted to allow people to easily integrate SIOC with their open-source applications and services, such that there are now SIOC data producers and wrappers available for a range of systems including b2evolution, Dotclear, Drupal, phpBB, WordPress, mailing lists, IRC, Twitter, Jaiku, and others. The next step was to produce easy-to-use APIs for writing your own SIOC applications. We have APIs already for PHP, Ruby on Rails and Java. As well as academic papers about SIOC, we then provided some easy-to-read documentation and usage examples at our SIOC website (http://sioc-project.org/).
Are there some interesting implementations of SIOC? How is uptake of the blog service plugins?
I think that the interesting applications that are appearing now are those Web 2.0 or Semantic Web applications that realise the advantages of producing SIOC data (and other semantic formats, especially FOAF). Companies like Seesmic, Talis, OpenLink Software and Radar Networks have either implemented SIOC support in their commercial applications or will do so shortly.
The WordPress plugin is probably the most popular SIOC data producer, but Giovanni Tummarello and his team will shortly release a very interesting plugin that shows one advantage of producing SIOC data from various sites. This new plugin will allow you to click on an icon beside a blog poster or commenter and view a synopsis of their content and topics created across a range of semantically-enabled websites (as gathered by the semantic indexer Sindice).
Thanks again to John for sharing this with us. For more information, check out John's Blog: http://johnbreslin.com/