SIOC/Data/Boards.ie/Error reports
Error reports
Please describe problems with the data here
Sep 3, 2008 Illegal URI
The post file http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=71610 has an illegal URI on line 56. The Jena parser's error report is:
WARN [main] (RDFDefaultErrorHandler.java:36) - http://boards.ie/vbulletin/sioc.php?sioc_type=post&sioc_id=71610# (line 56 column 101): {W107} Bad URI: <http://redjohno@eudoramail.com> Code: 59/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
Presumably the http: should be mailto:, but I don't know if it stems from an error in the source data, or indicates a problem with the RDF generation script.
Sep 5, 2008 Sesame issue with empty URI properties
In nearly every file there is a line like this:
<foaf:Document rdf:about="">
the "rfd:about" without an URI makes Sesames RDF/XML parser crash with an error like this:
Not a valid (absolute) URI: [line 17, column 3]
if the rdf:about is removed everything works fine.
Sesame uses a SAX parser and creates a new URI from the value the parser delivers. "" is in this case not a valid URI for Sesame. I do not know if this is a bug of Sesame or whether empty URIs are allowed or not - in general. At least Sesame does not like them.
Solution
The described problem was discussed here http://tuukka.iki.fi/tmp/sioc-2008-09-02.html and a solution for Jena suggested. Hopefully it works with Sesame, too.
Yes, there is a way to do this in Sesame:
connection.add(file, URLDecoder.decode(file.getName(), "utf-8"), RDFFormat.RDFXML);
Thanks for the suggestion.
Sep 23, 2008 Problems downloading the zipped files
Some people have reported timeouts when downloading the very large files. We've made a second store of the data sets available, which you can access using the same username and password for the original download site.
You can access the second store at download.sioc-project.org.
Sep 24, 2008 Post files without actual data
randomity some of the files in boards.ie-post.tar.gz have no data. They have a <?xml..., a <rdf:RDF... , and a full <foaf:Document> declaration, but no actual data randomity e.g. post/000/000/http%3A%2F%2Fboards.ie%2Fvbulletin%2Fsioc.php%3Fsioc_type%3Dpost%26 randomity sioc_id%3D1967011 thosch randomity: these should all be posts that were deleted/are access restricted. thosch if you check the post of the respective id on the boards.ie server you can see it: http://boards.ie/vbulletin/showpost.php?p=1967011 thosch but of course these posts shouldn't be in the dump, but it wasn't easy to remove all of them.