ERT WG f2f -- 4 Mar 2005

Test Suites (with some people from WCAG TTF)

Visitors: Ben, Wendy, Ken, John, Michael, Al

<AWK> SAZ: meeting focus for the morning is looking at how the WCAG 2.0 test suite fits with ERT work

<AWK> SAZ: want to reveiw expectations for Test Suites between groups

<AWK> MC: WCAG needs test suites to demonstrate code ao match techniques

<AWK> MC: all techniques need to have positive and negative tests

<AWK> MC: not entirely clear what will become of the test suites, once fully cooked.

<AWK> MC: Potential impact for user agents and authoring tools using test suites

<AWK> MC: Chris Ridpath working on test suite - tries to create 'atomic unit

<AWK> ...tests

<AWK> MC: in test suites WCAG doesn't want to dictate what Eval tools do

<AWK> KHS: is EARL is to be extensible, we need to think about the wider range of uses for it (e.g. security, privacy, etc.)

<AWK> LS: Not clear on what the usage of earl is - to comply to a technique/accessible fora checkpoint?

<AWK> LS: concerned that the techniques are very technology specific

<AWK> LS: EARL's usage from the user perspective complicated by linking to potentially multiple technology specific techniques

<AWK> LS: concerned about difficulty in getting a 1 to 1 mapping between guideline/user need and technique

<AWK> CMN: EARL just reports pass/fail result. Doesn't care whether the test is useful/comprehensive - that's up to the test suite makers

<AWK> CMN: the aspect of EARL where a claim references "according to..." is key.

<AWK> CMN: how we describe the relationship between sets of tests is important and being able to cross-relate with tests done by another tools is also

<AWK> KK: EARL seems to be based on a binary result - what about multiple results that may be meaningful?

<chaalsBOS> KK for example a compiler that notes multiple errors, not just "didn't compile"

<Zakim> Michael, you wanted to discuss my opinions as an ERT developer, contrasted from my role in WCAG

<AWK> MC: As an eval tool developer, wants to be able to prove that tool is better than competitor

<AWK> MC: in role of chair wants standards harminization

<AWK> MC: roles in conflict

<AWK> MC: managing the involvement of Eval tools is difficult and falls into ERT scope. (I think this is what he said, in part)

<AWK> AG: To KK's point - multiple results would be represented in multiple entries in EARL

<AWK> AG: CityUniversity's work testing specific tasks but allowing for multiple paths was useful

<AWK> AG: Can use logic in RDF to furnish logical relationships between tests

<Zakim> chaalsBOS, you wanted to describe how the multiple result works (for example HTML validation)

<AWK> CMN: agrees with AG

<AWK> CMN: AERT is a useful set of tests. Complex issues in establishing suite that defines 'accessibility'

<AWK> CMN: what ERT would like is an identifier (URI) for each test

<AWK> CMN: discussed methdods for displaying confidence and more, these are very difficult to standardize

<shadi> ack to request version number for each test

<AWK> LS: drew picture of RDF diagram

<AWK> LS: "this" conforms to "guideline" (not test)

<AWK> LS: then has two options to increase descriptiveness

<chaalsBOS> [Ah yes, this turns a lightbulb on again].

<AWK> LS: "guideline" by "the following test"

<AWK> LS: or instead of "conforms" the term "details" could be used

<AWK> JS: Interested in there being ways to address things that are machine testable as well as not easily (e.g. reading level)

<Zakim> Michael, you wanted to ask about identifiers for new tests

<AWK> MC: test suite won't have fully coverage of all tests done by eval tools

<Zakim> shadi, you wanted to request version number for each test

<AWK> Would an eval tool use a URI of a competitor? Question about this policy issue.

<AWK> SAZ: wants version number for tests

<AWK> AG: should use identifier instead of version number

<AWK> AG a la dublin core

<AWK> WC: Can user testing be represented in EARL if earl deals with binary tests?

<Zakim> chaalsBOS, you wanted to suggest that in earl:message, or something analagous, we want to be able to add justification information. This is particularly important for results

<AWK> CMN: EARL doesn't provide much value on user tests. Must provide pass/fail result

<AWK> CMN: to LS's point, there is value in being able to descrip the how and why of the pass/fail result in EARL

<AWK> CMN: EARL tests have modes - (test done manually, heuristically, etc). Want to be able to point to these as a basis for making conformance claims (is this what you mean?)

<AWK> CMN: a different version of a test is a different test.

<AWK> CMN: there should be a way to relate them, though

<AWK> CMN: on Michael's question - RDF doesn't care what the URI looks like.

<AWK> CMN: If a tool creates a test it is great if you can actually put the uri in a browser and get info about the test.

<Zakim> Al, you wanted to ask if John Slatin is saying that WCAG might offer mensuration concepts to ER in the "clear and simple" area

<AWK> AL: Mensuration = measurement techniques

<AWK> AG: how do you create an observation instrument?

<chaalsBOS> Hmmm. EARL doesn't provide (currently) a lot of interoperable stuff you can use to refine results from sujective testing - you can describe your justification in an earl:message.

<chaalsBOS> For this, which is LS point as well as Wendy's, it would be useful to extend the model of EARL to allow for some explicit justification information. This has been talked about before, and particularly in the context of heuristic mode assertions, where you claim that based on some other results you infer or deduce that a particular test has a given result (e.g. failed to meet some requirement of HTML, so failed to confom to HTML validity, or passed

<chaalsBOS> all WCAG priority 1 checkpoints, therefore passed conforming to WCAG level-A)

<AWK> JS: with readability, the measures deal with whether the conditions for readability are present

<chaalsBOS> So we need to add this issue into our new issue list in the group and deal with it

<AWK> JS: not if some text is actually readable.

<AWK> JS: There are other methods for getting at comprehension and we need to get a techniques for these

<AWK> LS: flexibility in techniques is crucial

<AWK> LS: allowed in some domains and not other

<Zakim> shadi, you wanted to prepare group for coffee break

<shadi> [coffee break]

ag: two kinds of things from WCAG like groups: unit test exposure definitions and knowledge base plugged into rollup process

WCAG 1 conformance has boolean rollup from checkpoints to levels

up to test set to make assertions about relations between test set and checkpoints

EARL designed for this

js: for WCAG 2, three levels of conformance, level 1 means you've met all SC at that level etc.

Architecture: Principles, Guidelines, Success Criteria

SC are explictily written as boolean testable assertions

Test suites provide evidence to support testable assertions in SC

<Zakim> shadi, you wanted to explain some differences between WCAG 1.0 & 2.0 models

saz: in WCAG 1.0, lowest defined unit was techniques

in WCAG 2.0, goes further, down to test cases

for ERT group this is good

want not just tests but a conformance model, published in machine readable format, so tools can present to people as preferred and allow inferences about level of conformance

wac: not sure what this would look like

not WCAG WG's top priority

might do after Rec

saz: tests in good XML, allows URIs to tests

would like to know how to use test descriptions in machine readable format (not in ERT charter)

can discuss with Chris since he's in ERT WG as well as WCAG WG

wac: WCAG WG has discussed ways to farm out work to other groups

techniques and test suites, WCAG could provide guidance, but expect work to be done elsewhere

process document could bridge WCAG and ERT

saz: ERT hasn't discussed picking up test suite development yet

some groups might be willing to contribute (e.g., SVG, SMIL)

but question whether they coordinate with WCAG or ERT

ERT lacks sufficient experience

js: in SVG mtg yesterday, were careful that all examples be accessible

accessibility chapter could have more concentrated discussion of accessibility issues

we could use that resource if they're open to it

we're not necessarily thinking specifically about test suites, but if we can coordinate with other groups on materials they produce, might make it easier on us to create our tests

<Zakim> chaalsBOS, you wanted to ask for the WCAG1 to WCAG2 mappings

joint WCAG + ERT + {technology X group} coordination

cmn: mapping of tests to guidelines is a key deliverable

provides use case for how tests relate to each other

would like clear information on how to contribute tests to WCAG

<Zakim> wendy, you wanted to say, "test suite feedback from wednesday's lunch discussion"

wac: when we get large numbers of tests, how do we vet them efficiently?

we do have a techniques submission form and could provide process document for test case development

accessibility needs to be addressed in spec as prequel to being addressed in test suites

skills are distributed: spec knowledge, accessibility, testing, I18N, etc.; how do we bring it all together?

W3C working on tools and processes

more groups need to do, slower project goes

CR mandate to do test suites adds a big burden

need to discuss with QA, focused on specgl

if we prototype, should coordinate with QA to make it extensible and useful across W3C

<Zakim> chaalsBOS, you wanted to respond on test suites

cmn: XQuery looking at 400,000 tests

have to look at interactions, not just atomic tests

this is a lot of work, and very important

WCAG not in a vacuum; context of technologies

there's a few thousand pages of content to figure out what to draw tests from

but shouldn't drop this in order to get to Rec faster

experience in real world is "these things don't work because they weren't tested"

ls: QA work is also pretty boring, core developers skip it, therefore often outsourced

XQuery or XSL group allowed different companies to submit test suites, they didn't try to develop entirely on their own

aa: interactions are the major real world feature; atomic checkpoint requirements not the major issue for users

ls: validate not just against spec, but against user experience

saz: can user experience be broken down into tests?

<shadi> wac: there is something we need to scratch at

<wendy> lol

js: doing research on this stuff

<Andrew> ls: talked about the relevant priority of different checkpoints in different places; eg sometimes a P3 is more important to a page or path than a P1

possible, desirable to do analysis

cmn: yes, yes

EARL doesn't describe tests

relation between atomic tests something to take on

description of user tests leave to side, but useful

real world test analysis wants to know about user tests that were performed

subjective analysis possible and desirable, but more difficult to define tests

ls: would be useful for someone to draw the model, and we could see what missing

<chaalsBOS> http://www.w3.org/2001/sw/Europe/talks/031202-earl/all.es.htm has a simple vision of EARL at its most basic

<chaalsBOS> http://www.w3.org/2001/sw/Europe/talks/031202-earl/all.en.htm has a simple vision of EARL at its most basic in english...

ag: what can ERT bring to WCAG

is there ongoing evaluation activity, and anything running EARL could be used for gathering

perform gap analysis a la Rich for Web today

help WCAG to know what gaps are

saz: lot of overlap with WCAG group, much of Techniques work should have been ERT, but ERT wasn't chartered

wac: benefit of WCAG doing itself is kept us honest - though this doesn't mean we can't keep doing it

<wendy> ack

<Zakim> wendy, you wanted to say, "would allow us focus in our work - create techniques w/testable statements. hand to others to interpret and show us where vague/arbitrary, etc. also,

wac: if WCAG could focus on techniques for testable assertions and then hand it off

reviewers would then come back to us when it doesn't make sense

would also help us at CR by having an externally developed test suite to show interpretability of guidelines

this would free us to focus us on guidelines, SC, techniques

<wendy> (and by external - i mean non-wcag, so ert would fit that description)

work plan for EARL?

saz: pulling things together right now into requirements and issue tracking

feel ERT will need to dig through test suites to see if they meet ERT needs

may come back with modifications

anyway need to become more involved in development of this type of resource

but can't project availability for that

cmn: we have a bit of a car, and a bit more of a car in a box. We're now unpacking the box and figuring out how to put car together.

we've looked at some big pieces of work

three years ago, ERT WG looked at this stuff, but in a year couldn't figure it out

right now, could solidify EARL in a year if very focused (but no bets)

to take on test suites would require additional resources, or EARL will slow down too much

wac: strongly recommend a project plan; would have helped WCAG if adopted at start

saz: ERT group still forming right now

useful to have checked in between the groups today

Test Descriptions

<chaalsBOS> CMN add issue from Lisa's stuff this morning. Being able to add the evidence, and or rules to define it.

<chaalsBOS> CMN explains in painful and only slightly confused detail

<chaalsBOS> A simple, but complete EARL example from http://www.w3.org/2001/sw/Europe/talks/200311-earl/all.htm

<chaalsBOS> [[[

<chaalsBOS> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

<chaalsBOS> xmlns:earl="http://www.w3.org/WAI/ER/EARL/nmg-strawman#">

<chaalsBOS> <earl:message>This page is valid XHTML</earl:message>

<chaalsBOS> ]]]

<shadi> A1: test 1 passed

<shadi> A2: test 2 passed

<shadi> A3: test 3 passed

<shadi> A4: test A1, A2, A3 passed

<chaalsBOS> I am suggesting adding something like

<shadi> [discussion on evidence for assertions]

<shadi> conclusions: evidence could replace confidence claims

<ericP> if y'all want to chat about an earl server, give me a shout

<ericP> 'zat mean i should drop by now?

<ericP> where are you?

<chaalsBOS> constitution 321

<ericP> roger. i'm on the queue just now. probably 5 min ETA

<ericP> (q is in SWBP)

<ericP> while waiting for q, here's a bit of status:

<ericP> i've rewritten the annotations server to be more generic.

<ericP> it accepts bookmarks and i think i threw an old earl query in there as well.

<ericP> this query enables the annotation server to "host" an earl assertion. that is, provide an identifier for it in HTTP space.

<ericP> i'd like to know if the SPARQL interface to the annotaitons server will serve everything that earl needs.

<chaalsBOS> Oki, come talk to us aboot it.

<ericP> will do

<shadi> http://lists.w3.org/Archives/Public/public-wai-ert/2005Mar/0003.html

Discussion on Annotea and SPARQL (by EricP)

<chaalsBOS> ericP joins

<chaalsBOS> EGP Have been reworking annotea server to be more general - it can accept a bunch of RDF. If it has a query that identifies an EARL annotation, you give it an anonymous one and it hosts it for you and gives it a URI. It also means you can query it by doing a GET on the URI, and you can update it.

<chaalsBOS> ... does that serve the EARL group? Has anybody looked at SPARQL? Are the queries on EARL documents served by SPARQL.

<chaalsBOS> ... SPARQL is an RDF query language. Lets you look for a graph pattern - a superset of RDF, talking about all the things that RDF does, and goes beyond it because it uses variables.

<chaalsBOS> ... there are value constraints on variables too. aiming for candidate rec in a month.

<chaalsBOS> New working draft http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050217/

<chaalsBOS> CMN: WOuld like to have this. Does it have Access control?

<chaalsBOS> EGP: Thanks for asking that question. Yeah, it does have access control already...

<chaalsBOS> ... the model is that for any operation, (GET, PUT, POST) you can require or not require authentication.

<chaalsBOS> ... not bombproof security - just allows you to have a username associated with what got done.

<chaalsBOS> LS Can you have mulitple identities

<chaalsBOS> EGP On the server as is, if you have no access, it asks for an email address and creates a user name. Then it creates an account, but you need a different email address for each one

<chaalsBOS> ... which is pretty easy

<chaalsBOS> ... The annotest server (a public Annotea server) has the policy that anyone can query or get any page, but you have to have autherntication to write content. If you overwrite something you have to be the same user that created it in the first place. Haven't yet got groups who would overwrite stuff. But no use case yet.

<chaalsBOS> CMN I have a use case for you...

<chaalsBOS> CMN Does SPARQL query over collections?

<chaalsBOS> EGP Ernhh...

<chaalsBOS> ... we punted on that in SPARQL. I think we can make this server handle collections but it won't mean that any other server will e able to do the same thing.

<chaalsBOS> ... ut it is likely that whatever I do will get stole^H^H^H^H^H dumped on the community

<chaalsBOS> ... so could be adopted. It won't be a standard feature of SPARQL ut can be added to the server as an extension.

<ericP> charles: why didn't you add collections/where are the challenges?

<ericP> ... will it get into rev2/ when will this happen?

<chaalsBOS> EGP The problem was that we have a notion of closure. Fault of database geeks (who are wierd)

<chaalsBOS> ... any SPARQL query can produce RDF that you can re-run the query over and get an answer.

<chaalsBOS> ... first query trims a large set of data, and you can chain a second query on it. This is a Good Thing TM as demonstrated by the database world.

<chaalsBOS> ... the challenge: How do we represent the answer to a collection? We could just send a collection down. There was a proposal to have a special accessor predicate. But if I serialise those I lose closure because it isn't that, and I lose the edges on my collection.

<chaalsBOS> ... when you give the answers we don't have completeness. There is a notion in SPRQL that you can shut off a stream at any time, and if you do that you can break a collection. It is esoteric and ahrd to do in a short time.

<chaalsBOS> ... That said, it is doable, and is the first priority of a version 2.

<chaalsBOS> ... (IMHO)

<chaalsBOS> ... I think that we are likely to go on vacation for 6 months after getting SPARQL 1 out, and then come back and spend a year to get SPARQL 2 out.

<chaalsBOS> ... when one of us comes up with a way to deal with collections it will get quick widespread adoption.

<chaalsBOS> CMN Sounds OK...

<chaalsBOS> EGP How important is Collection

<chaalsBOS> CMN we have a listed issue. I think it will be important... but not consensus on that.

<chaalsBOS> EGP How big are the collections?

<chaalsBOS> size could be any size...

<chaalsBOS> EGP the workaround is to write an OPTIONAL query, that traverses the list manually up to a given number of members. If you go beyond that, then it starts to get ugly - you have to extend the number of OPTIONAL lines in the query by the number of memers a list might have.

<ericP> SPARQL

<ericP> g:SPARQL

<ericP> OPTIONALs in SPARQL -> http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050217/#optionals

<ericP> SELECT ?name ?mbox

<ericP> WHERE ( ?x foaf:name ?name )

<ericP> OPTIONAL ( ?x foaf:mbox ?mbox )

<ericP> considering:

<ericP> SELECT ?name ?mbox

<ericP> WHERE { ?x foaf:name ?name .

Refine Requirements and Issues Documents

Document: http://www.w3.org/WAI/ER/2005/03/requirements.html

Document: http://www.w3.org/WAI/ER/2005/03/issues.html

ERT WG f2f

4 Mar 2005

Attendees

Contents

Test Suites (with some people from WCAG TTF)

Test Descriptions

Discussion on Annotea and SPARQL (by EricP)

Refine Requirements and Issues Documents

Summary of Action Items