Most frequently encountered invalid elements/attributes

Hi all,

I've just had MAMA do a validation pass on about 4.8 million URLs as
part of an updated study of URLs that it previously analyzed. Olivier
and the rest of the kind W3C crew were able to help in this process
and I just wanted to give a big thanks for that.

There is a lot more analysis and filtering of the results to do
before I can speak to what was discovered, but there was one specific
request that I can already say something about. Keep in mind that
these results are pretty raw, and I can try and do some further
correlation if needed.

The main validation errors that MAMA encountered in its last crawl
were error 76 (element not defined) and 108 (no attribute X). The way
I set up MAMA's storage last time, it didn't save the arguments for
individual error messages. Since 76 and 108 were the most popular, it
was interesting (especially for Olivier and company) to try and find
out this time what elements and attributes were generating the most

Here's a list of the top 50 "element not defined" error arguments:

Rank      Element             Quantity
1         embed               596216
2         frame               261478
3         frameset            261414
4         marquee             119502
5         script              101868
6         font                98239
7         meta                97210
8         nobr                85973
9         a                   82982
10        img                 69357
11        center              67397
12        iframe              59825
13        br                  59763
14        td                  58999
15        tr                  57505
16        table               56409
17        o:p                 56238
18        div                 43928
19        p                   40632
20        csscriptdict        28110
21        span                28060
22        csactiondict	      27004
23        spacer              26298
24        noscript            26142
25        noindex             24848
26        b                   23482
27        bgsound             22625
28        layer               22304
29        u                   22061
30        blink               20352
31        link                20092
32        input               20049
33        title               19783
34        csobj               19578
35        ilayer              18940
36        tbody               17637
37        scr                 17237
38        variable            16058
39        strong              15946
40        form                14862
41        body                14527
42        head                13999
43        noembed             13139
44        style               12139
45        st1:place           12094
46        param               12008
47        csactions           11831
48        csaction            11787
49        object              11774
50        html                10918

And the list of the top 50 "No attribute X" error arguments:

Rank      Element             Quantity
1         height              1624934
2         src                 1018458
3         width               926904
4         topmargin           884663
5         leftmargin          831174
6         marginheight        792137
7         background          791243
8         marginwidth         786816
9         name                755187
10        border              745194
11        type                685526
12        pluginspage         498477
13        quality             494275
14        bordercolor         436465
15        align               384137
16        frameborder         321235
17        bgcolor             318466
18        target              289435
19        scrolling           253640
20        framespacing        239515
21        language            224452
22        rows                208679
23        color               197183
24        id                  193811
25        cols                193689
26        valign              159971
27        rightmargin         153635
28        allowscriptaccess   151814
29        style               136092
30        wmode               132042
31        alt                 127582
32        href                125676
33        bottommargin        122285
34        content             116613
35        onmouseover         111657
36        onmouseout          103736
37        onclick             100550
38        hspace              99552
39        size                93957
40        class               93321
41        loop                92015
42        vspace              89939
43        onload              79416
44        allowfullscreen     74648
45        cellpadding         73775
46        bordercolorlight    72975
47        cellspacing         71222
48        scrollamount        69989
49        bordercolordark     69810
50        face                68687

(It might be interesting for the error message to also list the
element it is hitting the attribute error with - that would
help explain why height is occurring almost twice as much as
width here).

Hope this is interesting and/or helpful,

Received on Monday, 16 March 2009 15:33:51 UTC