Error reports

Please describe problems with the data here

Sep 3, 2008 Illegal URI

The post file has an illegal URI on line 56. The Jena parser's error report is:

WARN [main] ( - (line 56 column 101): 
{W107} Bad URI: <> 
Code: 59/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.

Presumably the http: should be mailto:, but I don't know if it stems from an error in the source data, or indicates a problem with the RDF generation script.

Sep 5, 2008 Sesame issue with empty URI properties

In nearly every file there is a line like this:

<foaf:Document rdf:about="">

the "rfd:about" without an URI makes Sesames RDF/XML parser crash with an error like this:

Not a valid (absolute) URI:  [line 17, column 3]

if the rdf:about is removed everything works fine.

Sesame uses a SAX parser and creates a new URI from the value the parser delivers. "" is in this case not a valid URI for Sesame. I do not know if this is a bug of Sesame or whether empty URIs are allowed or not - in general. At least Sesame does not like them.


The described problem was discussed here and a solution for Jena suggested. Hopefully it works with Sesame, too.

Yes, there is a way to do this in Sesame:

connection.add(file, URLDecoder.decode(file.getName(), "utf-8"), RDFFormat.RDFXML);

Thanks for the suggestion.

Sep 23, 2008 Problems downloading the zipped files

Some people have reported timeouts when downloading the very large files. We've made a second store of the data sets available, which you can access using the same username and password for the original download site.

You can access the second store at

Sep 24, 2008 Post files without actual data

randomity	some of the files in have no data. They have a <?xml..., a <rdf:RDF... , and a full <foaf:Document> declaration, but no actual data
randomity	e.g. post/000/000/
randomity	sioc_id%3D1967011
thosch	randomity: these should all be posts that were deleted/are access restricted. 
thosch	if you check the post of the respective id on the server you can see it:
thosch	but of course these posts shouldn't be in the dump, but it wasn't easy to remove all of them.
