ERT WG -- 31 May 2005

EARL Requirements

saz: any reaction on the requirements documents?
... will should and may type of document
... is anything missing?

niq: not sure about F02 and F03
... validity of results (?)

saz: you are suggesting " the persistency of validity of results"

jim: good idea

saz: i agree

sorry: (

<JibberJim> mute JibberJim

<JibberJim> mute Jim_Ley

<JibberJim> aarrgh, stupid phone

saz: next stage, next stage to publish as working draft
... get some feedback outside the group
... success of EARL will depend if these requirements are met
... this is an intiial working draft - anybody willing to take up future editing of this?
... collect feedback and incorporate it?

jim: I will take that

saz: we can work together
... we can publish very soon next week or two

EARL Confidence

saz: wehave been discussing to drop confidence values
... use instyead some form of percentage
... maybe issue for test case description rather than test result

is this chris?

niq: may be not appropriate for some applications, allow heuristic values, keep also high medium low

saz: do people agree to adopt numerical values, and work out what exactly values we take uo?

<niq> I agree with what JibberJim just said, too

chris: i prefer to keep low-medium-high, maybe people can find useful, its useful for me as well

saz: i hesitate to put it as an optional property . only useful for fully automated tests
... 75% of the time it works, which is not the case
... describe in the spec how we can calculate high-medium-low, otherwise we encourage less interoperability, tools not able to exchange this info
... numerical values can be an extension
... chris how did u use it in your tool?

chris: if you have alt text if appropriate, then u say with high-confidence

<niq> Zakim: q+ to say Valet assigns confidence values to results, to determine how likely it is that a guideline has been violated

chris: if the user can make a decision, high medium and low
... 90% certainty, it will be difficult to exchange the data
... we have to define how high medium and low relate to numerical values

<Zakim> JibberJim, you wanted to say I think this is more of a test case, how reliably you can detect something is a function of the test case rather than of the result

jim: not sure if it is good idea having a machine giving a value for a good or bad alt text
... but u can use it like I am accurate 80% of the time

<Zakim> niq, you wanted to say Valet assigns confidence values to results, to determine how likely it is that a guideline has been violated and to say I also use "certain"

niq: two different rules, no alt at all is a violation of guideline for sure

saz: i think this is a good use case
... however what about, if there is alt text but it does not seem right

niq: that would give low confidence result
... different test

<niq> s/low/lower/ :-)

thanx

saz: this is an important use case
... confidence "how good is the test" but with results you should be careful
... would somebody be willing to abstract it in a more generic way?

chris: we have been looking at this, it is a bit loose, are u thinking something even more abstract?

saz: no, the examples apply to WCAG 1.0, but we have to have sth more generic
... if a human says pass this confidence is hogh
... things like that take into account

chris: exam-result, collect response, is wrong with high level of certain
... the most generic way is to say high-medium-low

saz: two different developers they both produce confidence values in similar way, so as to compare results with each other
... high low is not so comparable
... define how you use the confidence level

chris: two cases of the same level, will be interchangable

who is speaking?

jim: interoperability, we want to know if two tests are equivalent

saz: we know success or fail, but high-medium-low is more granular
... e.g. one tool is fully auto and the other in semi, the second has high confidence
... the result will be the same, the confidence and the test mode will be different

<Zakim> niq, you wanted to say we must expect different tools to differ in some cases. Shouldn't be a problem

saz: we have to give more detail on how to use this property

chris: it will be up to the tools

saz: what we need is what are the factors that influence confidence
... e.g. manualy, automatically, heuristically, or other

very difficult to follow, but line

niq: we should not correlate test mode directly to confidence level

saz: anyone wants to check this?e.g. if we take the WCAG test suite

chris: each of the tests have an inside confidence level?

<ChrisR> yes, each of the tests have an assigned confidence level

saz: confidence value you take from the test?
... if you have confidence value to you want them in the report?
... thinking of a checklist - if human, if semi automatic test etc.
... would that be helpful?

chris: yes that's what we have

<niq> that was me, sorry

niq: taht's what we have

Sorry about that

saz: is confidence ;level related to test description, or do we want a way of "calculating"
... are we interested in developing something like that?

chris: not what's in the test description, it depends on the result

niq: it depends on the tool unless if a Tdl, property of the test case

ci: it depends on how the test was done
... not of the test result

ca: not sure cannot tell..

jim: i am not sure, previously it was part of the test, rather than result. I am happy if we can find out later

<JibberJim> okay, cut me off mid call why don't you phone...

<JibberJim> "the conference is restricted at this time"

<JibberJim> 'cos it's after 6?

saz: on this call it seems majority wants confidence property but optional

<niq> JibberJim: zakim told you that?

saz: cause different developers use it differently

<JibberJim> yes

<niq> ok

ERT WG

31 May 2005

Attendees

Contents

EARL Requirements

EARL Confidence

Summary of Action Items