SIOC/CommonMistakes

From W3C Wiki

This page list common mistakes when exporting SIOC data.

While this page is written specifically for SIOC data exporters from community sites, most of the problems and solutions also apply to other applications producing XML, RDF/XML and especially those that try to embed HTML content in RDF/XML.

Invalid XML entities

Symptoms:

 XML parser error - Entity 'nbsp' not defined

Reason:

 XML DTD defines only 5 basic entities < " ' & > - all other symbolic entities are invalid in XML unless explicitly defined 

Solutions:

 a) change all invalid entities to their numberic entity equivalents (preferred)
 b) explicitly add entity definitions for the missing entities

Invalid labels

When adding labels to resource, take care to replace " by their XML equivalent &quot; , espacially when extracting links from posts, eg

<sioc:reference rdfs:label=""State of the blogosphere"" rdfs:resource="http://www.sifry.com/alerts/archives/000419.html"/>

should be

<sioc:reference rdfs:label="&quot;State of the blogosphere&quot;" rdfs:resource="http://www.sifry.com/alerts/archives/000419.html"/>

CDATA sections

When using <![CDATA[ ]]> to enclose character data you have to check if the actual content includes "]]>".

Wikipedia's CDATA article says about that problem: to encode "]]>" in the middle of a CDATA section, replace all occurrences with the following:

 ]]]]><![CDATA[>

(This effectively stops and restarts the CDATA section).