Re: [html-techs] Table Type

I'm still looking at the problem of determining table type and found
research that may help.

There's a recent paper from Yalin Wang and Jianying Hu that has an
interesting approach to determining table type[1].

They classify tables as genuine or non-genuine. In our terminology these
would be data tables and layout tables.

Along with their research they have compiled a large database of over 11,000
tables that have been classified.

I've also found an earlier paper from Hsin-Hsi Chen, Shih-Chung Tsai and
Jin-He Tsai that deals with table classification[2] that may be helpful.

And another paper that contains an algorithm for determining table type[3].

All the algorithms described in these papers work, and the Wang/Hu method
works very well, but seem much too complex for the average person to
implement.

Our simple TH rule (data tables must have them, layout tables can't have
them) seems much more practical.

I'm still reviewing the Wang/Hu table database for where this rule breaks
and have invited Yalin Wang to comment on our suggestions.

[1] http://www2002.org/CDROM/refereed/199/
[2]
http://citeseer.nj.nec.com/cache/papers/cs/21313/http:zSzzSznlp3.korea.ac.krzSzproceedingzSzcoling2000zSzCOLINGzSzpszSz025.pdf/chen00mining.pdf
[3] http://www.csc.liv.ac.uk/~wda2001/Papers/13_yoshida_wda2001.pdf

Chris

Received on Thursday, 19 February 2004 10:08:25 UTC