Data
From SIOC Wiki
We provide a collection of data resources for those interested in studying or processing SIOC.
Contents |
[edit] Boards.ie
- For more explanation, see Data/Boards.ie.
The discussions on the Irish web forum site http://boards.ie/ are available as SIOC here. The top-level site documents link to users (and on to FOAF files) as well as top-level forums. Forums link to subforums and threads, which finally link to individual posts. The posts link to each other based on replying and quoting. The FOAF files link to each other, describing a social network based on the users' "buddy lists".
The data in total over 10 years is around 9 million documents and takes about 50 gigabytes of disk space, so we slice it into smaller archives. The first slices are all data for the year 1998, and the site, forum, user and FOAF documents, all in the RDF/XML file format.
[edit] Downloads
Slice archives by year/size: 1998/2MB 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
Slice archives by type/size: Posts Threads Forums/6MB Site/1MB Users/6MB FOAF/7MB
[edit] Services
Services based on this data:
[edit] Access over HTTP
If you want to access the files over http (This is an easy way to make the links work), you can use a http proxy such as a simple proxy in Python.
find 1998/ -type f | python http_proxy.py 12345 http_proxy=localhost:12345 wget http://boards.ie/vbulletin/sioc.php?sioc_type=forum&sioc_id=7
