Re: Confidence Claims - more discussion

Shadi Abou-Zahra wrote:
> 
> Hi,
> 
> On today's ERT WG call we had some more discussion about the
> earl:confidence property and how we should be handling it in EARL. It
> seems we could agree on the following:
> 
> 1. Even though the confidence claims are more related to test case
> descriptions than to the results, expressing them as part of the result
> is still an important aspect. Several tools are already using this
> property and would like to continue doing so.

Indeed.

Whether confidence is a property of the test or the result is up to the
tool.  For example, take the classic case of ALT attributes.

Tool 1 has one test for alt attributes.  It reports a violation with
high confidence if there is no ALT, or if the ALT is from a list of
bogosity-detector keywords like "bullet" or "spacer".  It reports
with a lower confidence if the ALT contains one of those words in
a phrase (parsing "small red bullet" vs "inserting the bullet"
is a bit too ambitious), or if the ALT ends with a suspicious
string like ".gif" or ".jpe?g".  In this tool, the confidence is
a property of the result.

Tool 2 has a series of different tests for alts.  Overall it tests
the same things as Tool 1, but each test has only a single yes/no
result.  Here confidence is a property of the test.  The test that
flags "bullet" has a higher confidence than the test that flags
"small red bullet".  But although confidence is defined as a
property of the test, it can also be expressed as a property
of the result.

My own Valet tool works with small and simple tests, as described
for Tool 2.  I'm not sure where Chris's tool fits.  But we should
be able to accommodate either case in EARL.  Making confidence a
property of the result works for both cases; making it a property
of testcase could be problematic for Tool 1.

> 
> 2. earl:confidence is not simply a relay of the respective property in
> the test case description (testcase:confidence as a pseudo URI), but it
> is "processed" by the evaluation tool before it is inserted into a
> report. An example is to override the value in the test case description
> when a human evaluator executes the test.
> 
> 3. earl:confidence should be based on a numeric value (such as
> percentage or interval). The values "high", "medium", and "low" should
> be mapped to appropriate numeric values but should remain available for
> describing ordinal values.

Hmmm.  I seem to recollect suggesting that after you had expressed a
preference for numeric values, but before Chris had described his use
of (and preference for) heuristic high/medium/low values.  I'll accept
either form, but I'm not sure we agreed on making numeric values the
primary/canonical form.

> 
> 4. There will always be differences between tool results, also in
> earl:confidence. However, more clarity on how to assign confidence
> values will reduce the gap that is currently causing reduced
> interoperability of reports between tools.
> 
> 5. It is unclear if the documentation on how to assign confidence is
> within the scope of EARL (as part of the EARL primer for example) or
> rather something relevant and important to have.

Hmmm.  Maybe as a starting point, Chris and I could write notes on how
we currently use confidence values?  I'm not sure how useful that'll be,
but if you think it would be useful, I'll certainly give it some
thought.

-- 
Nick Kew

Received on Tuesday, 31 May 2005 21:18:06 UTC