Nifty Develops Spam Blog Filter - Finds 40% of Domestic Blogs to be "Spam Blogs"

Nifty Labs, the marketing research group within Nifty Corporation, has developed filtering technology to automate the detection of so-called "Spam Blogs"  - blogs whose sole purpose is to artificially inflate traffic and affiliate commissions.  Japan has been renowned for the number such sites, and it is certainly an issue when trying to gauge the true blogging population and activity level here. 

The survey combined several filtering techniques, and data came from a 100,000 article sample from 5 months of Nifty's Buzz Pulse Blog analysis service.  Nifty says Buzz Pulse indexes 90% of Japan's blogs, including over 450 million articles as of March 2008.  The average level of Spam Blogging was about 40% ...

2007-10 39.3%
2007-11 40.1%
2007-12 39.7%
2008-01 39.9%
2008-02 40.5%

 

Nifty plans to make this information available
on their BuzzSeeQer site - the online servce for BuzzPulse.

 

Press Release: http://www.nifty.co.jp/cs/07shimo/detail/080326003337/1.htm (Japanese)

Original Blog: http://bb.watch.impress.co.jp/cda/news/21375.html (Japanese)

 

Interview with Dr. John Breslin - DERI, Semantic web, industry outreach ...

Dr. John Breslin is a senior semantic web researcher working at the Digital Enterprise Research Institute in Galway, Ireland.  We met at the BlogNation Japan launch party at Web2.0 Expo Japan, and then again at BlogTalk2008 - a conference he was organising in Cork.  I was interested to learn a little more about DERI, and how it engages with business, so fired him some questions by email, for which he has kindly found time in his mad schedule to respond to.


How did DERI come about, and why the focus on semantic technologies?

DERI was established at the National University of Ireland, Galway in late 2003 as part of an initiative by Science Foundation Ireland, an Irish government-funded agency, to establish what are called Centres for Science, Engineering and Technology (CSETs) in the areas of ICT and biotech at various third-level institutions around Ireland.  The chosen focus for DERI was the Semantic Web, as there was and is a recognised need for research into how to manage the information explosion on the Web using semantic technologies.

How did the Seoul and Stanford branches get started?

The senior researchers and directors at DERI had long-established ties with Stanford University; DERI director Professor Decker previously worked there for some time.  In conjunction with Mike Genesereth and Charles Petrie from the Stanford Logic Group, NUI Galway agreed with Stanford University to establish "DERI Stanford" under whose umbrella formal research collaborations between DERI Galway and Stanford could progress.

As regards Korea, DERI researchers had been working with staff from Seoul National University for a number of years, in particular the Biomedical Knowledge Engineering Laboratory (BiKE) led by Professor Hong-Gee Kim.  There has also been a series of researcher exchanges between our two institutions (and DERI Galway also has a PhD student who originated in SNU), so a formal arrangement was confirmed in late 2007 leading to the creation of DERI Seoul.

Any specific reason for Korea in the Asia region?

With rapidly growing communications and scientific infrastructures in Korea, it was recognised that there are a large number of Korean institutions (academic and commercial) who are focusing on the application of semantic technologies to the ICT and biotech domains.  As such, it made sense to establish DERI Seoul to liaise with these organisations and to identify common challenges that could be tackled through various research projects (some of these in conjunction with DERI Galway).

I see on your website that your mission includes assisting the commercialisation of semantic technologies through "Business Development Outreach".  Can you tell me how you engage with businesses, perhaps with some examples?

As part of DERI's remit from Science Foundation Ireland, we have two "outreach" branches, the first being community and education outreach (working with local groups and schools) and the second is business outreach.  Business outreach means that we have staff at DERI who work with various local SMEs and multinationals, describing the research areas that DERI is involved in and investigating if there is a potential need for the application of semantic technologies to solve research challenges in these organisations.

Business outreach also includes efforts to bring together related companies to solve common research requirements (e.g., the Elite initiative brings together seven or eight companies in the domain of semantics applied to e-learning).  DERI has also attracted a number of companies to Galway based on our reputation and our expertise base; some of these have relocated staff and others are establishing new bases near NUI Galway.

What areas do you see semantic technologies being use commercially, and perhaps for online consumer applications in particular?

I think that we are now beginning to see the real commercial applications of what can be done when all kinds of things on the Web are connected together using semantics.  This is obvious in the attention being given to startup companies in this space like Powerlabs (Powerset), Metaweb (Freebase) and Radar Networks (Twine), and also since many big companies including Reuters (Calais API), Yahoo! (semantically-enhanced search) and Google (Social Graph API) have recently announced what they are doing with semantic data.  There has been a lot of talk recently about the social graph (notably from Google's Brad Fitzpatrick), which looks at how people are connected together (friends, colleagues, neighbours, etc.), and how such connections can be leveraged across websites.  In the Semantic Web, it is not just people who are connected together in some meaningful way, but documents, events, places, hobbies, pictures, you name it!  And it is the commercial applications that exploit these connections that are now becoming interesting.

Radar Networks' Nova Spivack recently gave a keynote talk at BlogTalk 2008 as CEO of one of the companies that is practically applying Semantic Web technologies to social software applications.  Radar have a product called Twine, which is a "knowledge networking" application that allows users to share, organise, and find information with people they trust.  I find Twine very interesting, and as well as using it to gather information about SIOC (more below), I intend to use it to gather and publish personal interests that I think will be of interest to the public.

Part of your work is with SIOC - a web standards submission for connecting online communities.  How is that process going?

SIOC is an initiative that I've been working on for the past four years at DERI (with Uldis Bojars, Stefan Decker, and others) that aims to make semantic data available from online communities and Web 2.0 spaces, and to use and leverage that data in interesting and useful ways.  As well as being the Irish word for frost, SIOC stands for "Semantically-Interlinked Online Communities" and the schema or vocabulary of terms that serves as its basis was recently submitted as a Member Submission to the World Wide Web Consortium (W3C).  We have about 30 or 40 SIOC applications and modules that use and consume SIOC data already.

The process to achieve traction with SIOC was as follows.  Firstly, we created the schema of terms (Site, User, Forum, Post, Container, Item, etc.).  Then, we made some SIOC metadata exporters for various open-source discussion systems and popular community sites, in the hope that we could "infect" the Web infrastructure with semantics - during the next upgrade cycle, gigabytes of community data can become available (a case in point will be the forthcoming SIOC producer for Irish message board site boards.ie).  To produce this mass of linked data from various online communities, we wanted to allow people to easily integrate SIOC with their open-source applications and services, such that there are now SIOC data producers and wrappers available for a range of systems including b2evolution, Dotclear, Drupal, phpBB, WordPress, mailing lists, IRC, Twitter, Jaiku, and others.  The next step was to produce easy-to-use APIs for writing your own SIOC applications.  We have APIs already for PHP, Ruby on Rails and Java.  As well as academic papers about SIOC, we then provided some easy-to-read documentation and usage examples at our SIOC website (http://sioc-project.org/).

Are there some interesting implementations of SIOC?  How is uptake of the blog service plugins?

I think that the interesting applications that are appearing now are those Web 2.0 or Semantic Web applications that realise the advantages of producing SIOC data (and other semantic formats, especially FOAF).  Companies like Seesmic, Talis, OpenLink Software and Radar Networks have either implemented SIOC support in their commercial applications or will do so shortly.

The WordPress plugin is probably the most popular SIOC data producer, but Giovanni Tummarello and his team will shortly release a very interesting plugin that shows one advantage of producing SIOC data from various sites.  This new plugin will allow you to click on an icon beside a blog poster or commenter and view a synopsis of their content and topics created across a range of semantically-enabled websites (as gathered by the semantic indexer Sindice).

 

Thanks again to John for sharing this with us.  For more information, check out John's Blog: http://johnbreslin.com/

BlogTalk2008 - Michael Breidenbrücker - Let's face it: Web 2.0 is all about advertising...

Keynote talk: Michael Breidenbrücker (Lovely Systems, Last.fm) (Introduction by Thomas Burg)
Let's face it: Web 2.0 is all about advertising

 

 

BlogTalk2008 - Joe Lamantia - The DIY future: what happens when everyone designs social media?

WebCamp on SNP 2008 Cork - Ajit Jaokar - Social Privacy and Revocation

Ajit Jaokar - Privacy and revocation, two sides of the same coin: using Google OpenSocial APIs to illustrate a new privacy model for the social web

Ajit Jaokar was top batter for the WebCamp on Social Network Portability.  Ajit is on the Web2.0 workgroup, is deep in mobile, and advises various startups.  This talk was about a concept concerning socially driven reputation models as a mechanism for software certification.

ZDNet Japan Builder Techday: Open API & Beyond - 5) Building an Application "Nowadays"

Building an "imadoki', or "nowadays" application was the topic of the panel discussion.

On stage were Tatsuhiko Miyagawa, software engineer from Six Apart, an Yukihisa Yonemochi, Technology Evangelist from IBM.

While the contrasting views on Openness and Open Source didn't lead to any passionate disagreement, it was interesting to see the two companies' differing approaches.  Miyagawa-san is a big contributor to Open Source Perl projects, and while IBM also invest in open source projects, they have a corporate manual for engaging with OSS, and the projects they run themselves are subject to different licences. 

The one project that has been raising it's head recently is Project Zero. Project Zero is a Java-based engine for creating RESTful web applications, that can be programmed against in either java or PHP, and is designed to give PHP and Java folks the simplicity and productivity of a Rails like experience.  The project is not open source, though, and fall into IBM's Community Driven Commercial Software Development (CDCD) model.  This means that the code is open for anyone to inspect and suggest corrections or improvements for, but IBM are the the only committers, so they retain control over exactly what makes it into the final build. 

This approach allows them to provide a high level of warantee for their products, but I feel that the whole idea of community drive development is that in a way, the strength of the community provides the warantee, and in a more responsive way that having a central control bottleneck.  If I have a specific issue with the code, then I can suggest that IBM fix it, but it still remains up to their discretion to fix it.  Similarly, say that the IBM powers that be decide that they no longer consider the product "strategic" and decide to orphan it.  How does that work if the community needs to continue using and supporting it?

Next Yohnemochi-san presented some slides showing how IBM planned to support what he described as an explosion in the growth of applications. 

The Geek Spectrum IMG_1684

US Research presented describes what I call the "Geek Spectrum" of application developers, from hard core J2EE coders, through scripting language based Web Developers, to business users who have traditionally put together macros,  but can now create mashups and hook in to other services by assembling apps from building blocks.

To enable each of these segments of the spectrum, IBM has various tools, and the aforementioned Project Zero is targetted at the upper end of the Web Developer layer.

Finally they both talked about Open Source, and there were questions about how it worked in Japan verses in the west.  Miyagawa-san said that feeback for open source projects comes much quicker overseas, probably because people have more willingness to state their minds.  Yonemochi-san raised some embarrassed chuckes when he said that Japan had been compared to a black hole for Open Source ... everything gets sucked in,  but nothing ever comes out again.   Harsh!   He also said that due to the difference in geographical size, some companies in the US retain onsite engineers, where in Japan they might not, simply from the practical stand point of physical response time.  Most places in Japan are reachable within a few hours, but that is not so in the US.

Technorati Tags: ,,

Web Camp on Social Network Portability

Will blog about this later, but now streaming to Rob-TV:

Web+DB Press video panel series on REST

The Browser Coming to a Media Player Near You

I don't know what the chances are we'll see them on these shores, but the new versions of Archos' wifi-enabled media players have version 9 of Opera's web browser pre-installed, and the ability to download all standard media types plus Flash.  The platform also supports Opera Widgets.

This product certainly gives the iPod Touch a run for it's money, with twice the storage, and about a USD70 lower price tag.  This kind of device-based web browsing experience will become more common, and when it to gets an always on everywhere wireless connection, it could become the main interface to the 'net for a segment of the population.

Opera is available as a free download for some platforms, but the specialised mobile versions cost money ... there is a $30 activation fee to get it going on the Archos. 

I vaguely remember mailing the company listed as Archos' distributor in Japan about a unit I was interested in about three years ago, but never heard back from them.  Might be a nice time to raise awareness here.  (They might also like to fix the product links on their website!)

 

Original Article: Opera 9 Browser To Power Archos Media Players

Web2expo: How mixi conquered the Japanese SNS market.... mixi and OpenSocial

IT Media ran an article about mixi president Kasahara Kenji's presentation at Web2.0 Expo Tokyo last week.

mixi launched in February 2004, and overtook competitor GREE as Japan's top SNS within its first year of operations.  Now the service has 12 million users, and a very solid 60% of these have logged in in the last 3 days.

[More]

More Entries

BlogCFC was created by Raymond Camden. This blog is running version 5.9.002. Contact Blog Owner