Semantic Data Extractor

Part of Data

Author(s) and publish date

By:
Published:
Skip to 7 comments

Every so often, someone writes to me or to the public-qa-dev mailing list to report bugs, or simply to give thanks on the semantic data extractor.

I'm always pleasantly surprised when I hear that, what started as a 10 minutes demonstrator of the semantics attached to HTML, is actually used as a tool by a number of developers.

With a name such "semantic data extractor", it was a bit of a shame that the tool didn't highlight the usage of GRDDL or RDFa on pages that use either of these technologies; I have just added detection of both of these to the extractor.

As a bonus, I have also added detection of non-semantic markup: at this time, it will detect purely-wrapping <div>, empty <span>, and tables with a single row or a single column (which have good chances to be layout tables); if you have suggestions for detecting other non-semantic markup, let me know!

Related RSS feed

Comments (7)

Comments for this post are closed.