Data/Boards.ie/Error reports

From SIOC Wiki

Jump to: navigation, search

Contents

Error reports

Please describe problems with the data here


Sep 3, 2008 Illegal URI

The post file http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=71610 has an illegal URI on line 56. The Jena parser's error report is:

WARN [main] (RDFDefaultErrorHandler.java:36) - 
http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=71610# (line 56 column 101): 
{W107} Bad URI: <http://redjohno@eudoramail.com> 
Code: 59/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.

Presumably the http: should be mailto:, but I don't know if it stems from an error in the source data, or indicates a problem with the RDF generation script.

Sep 5, 2008 Sesame issue with empty URI properties

In nearly every file there is a line like this:

<foaf:Document rdf:about="">

the "rfd:about" without an URI makes Sesames RDF/XML parser crash with an error like this:

Not a valid (absolute) URI:  [line 17, column 3]

if the rdf:about is removed everything works fine.

Sesame uses a SAX parser and creates a new URI from the value the parser delivers. "" is in this case not a valid URI for Sesame. I do not know if this is a bug of Sesame or whether empty URIs are allowed or not - in general. At least Sesame does not like them.

Solution

The described problem was discussed here http://tuukka.iki.fi/tmp/sioc-2008-09-02.html and a solution for Jena suggested. Hopefully it works with Sesame, too.

Yes, there is a way to do this in Sesame:

connection.add(file, URLDecoder.decode(file.getName(), "utf-8"), RDFFormat.RDFXML);

Thanks for the suggestion.

Sep 23, 2008 Problems downloading the zipped files

Some people have reported timeouts when downloading the very large files. We've made a second store of the data sets available, which you can access using the same username and password for the original download site.

You can access the second store at download.sioc-project.org.

Sep 24, 2008 Post files without actual data

randomity	some of the files in boards.ie-post.tar.gz have no data. They have a <?xml..., a <rdf:RDF... , and a full <foaf:Document> declaration, but no actual data
randomity	e.g. post/000/000/http%3A%2F%2Fboards.ie%2Fvbulletin%2Fsioc.php%3Fsioc_type%3Dpost%26
randomity	sioc_id%3D1967011
thosch	randomity: these should all be posts that were deleted/are access restricted. 
thosch	if you check the post of the respective id on the boards.ie server you can see it: http://boards.ie/vbulletin/showpost.php?p=1967011
thosch	but of course these posts shouldn't be in the dump, but it wasn't easy to remove all of them.
Personal tools