Data/Boards.ie/Structure

From SIOC Wiki

Revision as of 21:01, 2 September 2008 by Thosch (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

What the data looks like

Top-down links between the documents

In order to "navigate" the complete content you have to start at the top-level sioc:Site document. It links to each and every sioc:Forum and sioc:User. Note that the Site document is paged, so page 1 points to the first 20 forums, 20 users and to page 2, which in turn contains users and forums 21-40 and so on.

A sioc:User document contains a link to a FOAF file with information about the foaf:Person that owns the user account.

sioc:Forums link to the sioc:Threads they contain, which in turn link to each individual sioc:Post published in these threads.

Structure of content

File format

All files use RDF/XML data format. It is a representation of RDF graph data in XML syntax.

Participants can choose any tools for processing this information. Be aware that there is a number of widely available RDF tools for most programming languages:

URIs

URIs are used to identify resources. We differentiate between:

  1. URIs for HTML pages
  2. URIs pointing to their corresponding RDF/XML page (using rdfs:seeAlso)
  3. URIs for SIOC concepts

1. This is the ususal URL where the vBulletin forums, threads, posts and user profile pages can be retrieved, e. g.

http://boards.ie/vbulletin/forumdisplay.php?f=1
http://boards.ie/vbulletin/showpost.php?p=54636504
http://boards.ie/vbulletin/member.php?u=31714

When there are <sioc:link> and <sioc:links_to> properties in the data, they always point to HTML pages.

2. The URIs for RDF/XML are constructed by appending sioc.php? followed by the type of the SIOC concept (site, forum, thread, post and user), the vBulletin ID of the object and optionally a paging parameter. To get e. g. the RDF data for thread #190769 we would append the following to the base URL http://boards.ie/vbulletin/:

sioc.php?sioc_type=thread&sioc_id=190769&page=2

For the FOAF files the structure of the URI looks a bit different, it just uses foaf.php?u=userid This gives us URIs like that for the first pages of the RDF/XML documents:

http://boards.ie/vbulletin/sioc.php?sioc_type=site  (no id appended as there is only one site)
http://boards.ie/vbulletin/sioc.php?sioc_type=forum&sioc_id=1
http://boards.ie/vbulletin/sioc.php?sioc_type=thread&sioc_id=1
http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=1
http://boards.ie/vbulletin/sioc.php?sioc_type=user&sioc_id=1
http://boards.ie/vbulletin/foaf.php?u=1

The <rdfs:seeAlso> property is used to point to URIs like these

3. The unique identifiers for the SIOC concepts themselves are the same as the HTML pages displaying them for sioc:Site sioc:Forum sioc:Thread and sioc:Post. sioc:User and foaf:Person are exceptions as they are not considered information resources. Their URIs are constructed by appending #user to the URI of the HTML user profile page or #person to the RDF/XML URI of the FOAF file:

http://boards.ie/vbulletin/foaf.php?u=3289#person
http://boards.ie/vbulletin/member.php?u=3289#user

foaf:Document

Each page is a foaf:Document. Its "primary topic" is always a SIOC concept (site, forum, etc.) which is also described in further detail in the same document.

If we take a look at the top-level document http://boards.ie/vbulletin/sioc.php?sioc_type=site we see a section <foaf:Document> as shown in the example below. Besides information about the title and description of this document, it tells us that it has the <foaf:primaryTopic> http://boards.ie/vbulletin/, which is the URI for the sioc:Site itself.

<foaf:Document rdf:about="">
	<dc:title>SIOC profile for "boards.ie"</dc:title>
	<dc:description>A SIOC profile describes the structure and contents of a community site (e.g., weblog) in a machine processable form. For more information refer to the <a href="http://rdfs.org/sioc&quot;>SIOC project page</a></dc:description>
	<foaf:primaryTopic rdf:resource="http://boards.ie/vbulletin/"/>
	<admin:generatorAgent rdf:resource="http://wiki.sioc-project.org/index.php/PHPExportAPI?version=1.01"/>
	<admin:generatorAgent rdf:resource="http://sw.deri.org/svn/sw/2005/08/sioc/vbulletin/"/>
</foaf:Document>


sioc:Site

In continuation of the example from above, we now look at the section of the same document that actually describes the sioc:Site.

It lists links to each (public) forum of the site using <sioc:host_of>, the example below shows that the site has a forum with the URI http://boards.ie/vbulletin/forumdisplay.php?f=1 . It also has a usergroup that lists every registered user via <sioc:has_member>.

<sioc:Site rdf:about="http://boards.ie/vbulletin/">
	<sioc:host_of rdf:resource="http://boards.ie/vbulletin/forumdisplay.php?f=1"/>
	<sioc:host_of rdf:resource="http://boards.ie/vbulletin/forumdisplay.php?f=2"/>
	[... many more forums ...]
	<sioc:has_Usergroup rdf:resource="http://boards.ie/vbulletin/memberlist.php#usergroup"/>
</sioc:Site>

<sioc:Usergroup rdf:about="http://boards.ie/vbulletin/memberlist.php#usergroup">
	<sioc:has_member>
		<sioc:User rdf:about="http://boards.ie/vbulletin/member.php?u=1#user">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=user&sioc_id=1"/>
		</sioc:User>
	</sioc:has_member>
	[... many more users ...]
</sioc:Usergroup>

The data for sioc:Site is paged, each page displays 20 users and 20 fourms. A rdfs:seeAlso is used to point to the URI for the next page:

<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=site&page=2"/>

sioc:Forum

Each discussion forum is a generic sioc:Forum. But in the case of our vBulletin board it is also an instance of the more specific type sioct:MessageBoard.

On a forum's page you can see which threads it contains and links to those threads. This is expressed by using <sioc:parent_of>. In the example below you can see that this forum contains threads #190394 and #190769. If there are more than 20 threads the forum's data will be paged (again using rdfs:seeAlso).

You can also see if this forum is a sub-forum of another one, by looking for <sioc:has_parent>. In our example you can see that it is in fact a subforum of forum #177.

<sioct:MessageBoard rdf:about="http://boards.ie/vbulletin/forumdisplay.php?f=475">
	<rdf:type rdf:resource="http://rdfs.org/sioc/ns#Forum" />
	<sioc:link rdf:resource="http://boards.ie/vbulletin/forumdisplay.php?f=475"/>
	<dc:title>Galway City</dc:title>
	<dc:description>Fast growing city in the west.</dc:description>
	<sioc:has_parent>
		<sioc:Forum rdf:about="http://boards.ie/vbulletin/forumdisplay.php?f=177">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=forum&sioc_id=177"/>
		</sioc:Forum>
	</sioc:has_parent>
	<sioc:parent_of>
		<sioc:Thread rdf:about="http://boards.ie/vbulletin/showthread.php?t=190394">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=thread&sioc_id=190394"/>
		</sioc:Thread>
	</sioc:parent_of>
	<sioc:parent_of>
		<sioc:Thread rdf:about="http://boards.ie/vbulletin/showthread.php?t=190769">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=thread&sioc_id=190769"/>
		</sioc:Thread>
	</sioc:parent_of>
	[... many more threads ...]
       <rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=forum&sioc_id=475&page=2"/>
</sioct:MessageBoard>

sioc:Thread

The data for a thread shows which posts it contains via <sioc:container_of>. It can be paged when there are more than 20 posts in total. You can see the forum which contains this thread by looking for <sioc:has_parent>

<sioc:Thread rdf:about="http://boards.ie/vbulletin/showthread.php?t=2055195633">
	<sioc:link rdf:resource="http://boards.ie/vbulletin/showthread.php?t=2055195633"/>
	<sioc:num_views>1079</sioc:num_views>
	<dc:title>Weekly Soccer game (kick about)</dc:title>
	<dcterms:created>2007-12-05T13:03:03Z</dcterms:created>
	<sioc:has_parent>
		<sioc:Forum rdf:about="http://boards.ie/vbulletin/forumdisplay.php?f=475">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=forum&sioc_id=475"/>
		</sioc:Forum>
	</sioc:has_parent>
	<sioc:container_of>
		<sioc:Post rdf:about="http://boards.ie/vbulletin/showpost.php?p=54582327">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=54582327"/>
			<sioc:next_by_date rdf:resource="http://boards.ie/vbulletin/showpost.php?p=54589700"/>
		</sioc:Post>
	</sioc:container_of>
</sioc:Thread>

sioc:Post

Each post in a thread is a generic sioc:Post. But similar with sioc:Forum/sioct:MessageBoard each sioc:Post is at the same time a sioct:BoardPost.

A post has a creator and it has a link to the creator's user account profile page and another link to a FOAF file. It also has creation date and time denoted with dcterms:created.

A post can have links to posts that it replies to.
You can see that <sioc:reply_of> is used to declare the example post as a reply of another post with the URI http://boards.ie/vbulletin/showpost.php?p=54636434 This information was derived from the fact that post #54636434 was quoted in the content of our example post. As mulit-quotes are possible, some post are replys to more than one other posts.
Some posts are don't contain quotes, but still can be replys to others, as there is information about the reply structure in the database, but this is not 100% accurate, because it is impossible to find out what the author of a post actually wanted to replying to.

If a post's content contains a link to any http or ftp resource, it should be listed as a <sioc:links_to>, which can be seen in the example (note that we didn't find all links, due to an imperfect regex used to extract links from the content).

<sioct:BoardPost rdf:about="http://boards.ie/vbulletin/showpost.php?p=54636504">
	<rdf:type rdf:resource="http://rdfs.org/sioc/ns#Post" />
	<sioc:has_creator>
		<sioc:User rdf:about="http://boards.ie/vbulletin/member.php?u=29810#user">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=user&sioc_id=29810"/>
		</sioc:User>
	</sioc:has_creator>
	<foaf:maker>
		<foaf:Person rdf:about="http://boards.ie/vbulletin/foaf.php?u=29810#person">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/foaf.php?u=29810"/>
		</foaf:Person>
	</foaf:maker>
	<dcterms:created>2007-12-12T13:03:03Z</dcterms:created>
	<sioc:content>where is drom?It's where Salthill Devon (http://www.salthilldevon.ie/) play.</sioc:content>
	<content:encoded><![CDATA[[QUOTE=JIZZLORD;54636434]where is drom?[/QUOTE]It's where [URL="http://www.salthilldevon.ie/"]Salthill Devon[/URL] play.]]></content:encoded>
	<sioc:links_to rdfs:label="Salthill Devon" rdf:resource="http://www.salthilldevon.ie/"/>
	<sioc:reply_of>
		<sioc:Post rdf:about="http://boards.ie/vbulletin/showpost.php?p=54636434">
			<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=54636434"/>
		</sioc:Post>
	</sioc:reply_of>
</sioct:BoardPost>

If the post contains an image, there will be a link to it like in the following excerpt of another post:

<sioct:BoardPost rdf:about="http://boards.ie/vbulletin/showpost.php?p=54742177">
	<dcterms:hasPart>
		<dcmitype:Image rdf:about="http://a1259.g.akamai.net/f/1259/5586/5d/images.art.com/images/-/Good-Clown--C10101183.jpeg"/>
	</dcterms:hasPart>
</sioct:BoardPost>

sioc:User

When it comes to sioc:User and foaf:Person there is an important distinction to make: A foaf:Person holds an online account. This online account is the sioc:User. So in our sense a user is not a person, but only an account that belongs to a person (and this person can have several user accounts on differnt sites.) A person uses her sioc:User account to create posts, so the triples denoting authorship are of the form sioc:User sioc:creator_of sioc:Post

Each sioc:User account links to the FOAF profile (=foaf.php?u=userid) of the user.

The sioc:Role just displays a name for different roles like "Registered User", "Moderator", "Banned User", "Administrator".

<foaf:Person rdf:about="http://boards.ie/vbulletin/foaf.php?u=3289#person">
	<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/foaf.php?u=3289"/>
	<foaf:holdsAccount>
		<sioc:User rdf:about="http://boards.ie/vbulletin/member.php?u=3289#user">
			<sioc:name>mike65</sioc:name>
			<sioc:has_function>
				<sioc:Role>
					<sioc:name>Moderator</sioc:name>
				</sioc:Role>
			</sioc:has_function>
		</sioc:User>
	</foaf:holdsAccount>
</foaf:Person>

foaf:Person

The documents for foaf:Person are a bit different than the rest - they lack the <foaf:Document> section described in the section "General Structure" above (this is because a different exporter produced these files, but it should be updated in the future to generate the same output as for the SIOC RDF documents) Also the FOAF file URIs have a different structure http://boards.ie/vbulletin/foaf.php?u=''userid'' and the URI for the person itself has a #person appended.

The example shows that a person hold an online account, which is a sioc:User.

A person can have marked others as "buddies" in his vBulletin profile, those are displayed using <foaf:knows>.

<foaf:Person rdf:about="http://boards.ie/vbulletin/foaf.php?u=3289#person">
	<foaf:name>mike65</foaf:name>
	<foaf:nick>mike65</foaf:nick>
	<foaf:depiction rdf:resource="http://boards.ie/vbulletin/images/avatars/Looney_Tunes_-_Sylvester.gif"/>
	<foaf:knows>
	 	<foaf:Person rdf:about="http://boards.ie/vbulletin/foaf.php?u=1854#person">
		 	<foaf:nick>dahamsta</foaf:nick>
		 	<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/foaf.php?u=1854"/>
	 	</foaf:Person>
 	</foaf:knows>
 	<foaf:knows>
	 	<foaf:Person rdf:about="http://boards.ie/vbulletin/foaf.php?u=4051#person">
		 	<foaf:nick>Tristrame</foaf:nick>
		 	<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/foaf.php?u=4051"/>
	 	</foaf:Person>
 	</foaf:knows>
 	<foaf:holdsAccount>
	 	<foaf:OnlineAccount rdf:about="http://boards.ie/vbulletin/member.php?u=3289#user">
		 	<foaf:accountName>mike65</foaf:accountName>
		 	<foaf:accountServiceHomepage rdf:resource="http://boards.ie/vbulletin/" /> 
		 	<rdfs:seeAlso rdf:resource="http://boards.ie/vbulletin/sioc.php?sioc_type=user&sioc_id=3289" />
	 	</foaf:OnlineAccount>
 	</foaf:holdsAccount>
</foaf:Person>

Known problems

Some of the posts (and possibly a few titles of threads) contain special or control characters, that result in their containing documents being invalid XML. Of the 9 million documents this should affect maybe around one or two thousand documents.


Error reports by users

Please describe problems with the data here .

More information

If you have questions about this competition please contact us for more information.

Personal tools