<Cliff> pwd?
<scribe> scribe: CarlosD
round of introductions
Wilco_: Kristian couldn't make it, but Jean-Yves will take over
Jean-Yves: we have been thinking
about how to promote ACT-rules and what are the benefits for
business
... there has been a growing interest in ACT-rules
... business need confidence in the tests they are using
... but some times is hard to address some of the questions
they make
... sometimes even we don't know what is the real
coverage
... having a dataset of pages that we could test about we could
identify not only the SC that are being tested
... we could also have a better estimate of the percentage of
SC violations that ACT-rules can identify
... it would be easier to evaluate the usefulness of ACT
... and also help identify what new rules would represent the
biggest gain
... but we have the challenge to build this dataset
Wilco_: you're suggesting we build a set of pages where we know how many violations are, and we would be able to work how many would be caught by ACT-rules
Jean-Yves: correct
... the data set should be built on real pages
... the current test cases are useful but don't allow us to
estimate the number of problems in real websites
Wilco_: any suggestions on how we would build this?
Jean-Yves: not really... we need
to discuss this
... of course, the larger the data set the better it would
be
... we could built it incrementaly
... if our organizations are assessing pages we could be adding
them to the data set
... but there are issues to consider: we might need to archive
the pages
... they might need to be anonymized
anne_thyme: as part of the
WAI-Tools we used a data set of pages to assess the rules
... they were real pages, and I agree that we should use real
pages
... but we might need multiple testers to review the pages,
because they will find different problems, and we should try to
find as many problems as possible
... we will need to decide on the manual testing methodology
for these checks
Jean-Yves: one benefit from this problem could be the identification of places in WCAG that need further clarification
Drew_Nielson: I agree with the
use of real pages
... but we need to define a method to incorporate the page in
our data set
... there are copyright issues
... but also if we test the page and find issues with it, it
can be negative for the owner of the page
... even if we anonymize it, it can be possible to find who the
owner is by comparing the code
Wilco_: how many people would be interested to contribute to this project?
Jean-Yves: I believe we need a
static archive of the page
... otherwise if the page changes we would need to do the
manual check again
Wilco_: one potential place to
start is with the EU monitoring of the web accessibility
directive
... I believe some of the manual assesements should be
public
TomBrunet: we might need to consider other aspects of the page, not just the violations
anne_thyme: the testing pages in
Denmark will not be public
... maybe we can use the data from gov.uk
<Wilco_> Carlos: Don't think Portugal monitoring pages are public
Helen_: on people only checking
what's wrong and not what's right
... when auditing people also tend to mark what passes
kathyeng_: the trusted tester
training course has practice content where we know what should
pass and fail
... this might be available
... but I don't know how "real world" the content is
Wilco_: I guess that is key for this project to be able to find how much a rule or implementation covers
TomBrunet: on the testing, there is implicit stuff that we don't check, because it is part of the context of the test
Wilco_: what would be next steps?
Jean-Yves: we need to research
what we can put into the dataset
... we can start with a small dataset
... that will allow us to start comparing findings of the tools
with manual testers
... that could allow comparing granularity of checks from
automated rules and manual tests
Drew_Nielson: suggestion to
address copyright issues
... in the US anything created by a public body is essentially
free to use
... we could start collecting examples of real world pages
there
<Helen_> +1 to that idea
Wilco_: can we set up a meeting between people interested in the project to define what would be the initial steps
Interested people: Jean-Yves, Wilco, Carlos, Andrew, Cliff, Helen, Daniel
<TomBrunet> Just for context and thought on what I mean.. w3c.org has 500 DOM elements. Sanity check of name/role/value verification alone would be 1500 test results. Most of these checks are implicitly skipped because they're just correct by default.
ack?
Wilco_: the ACT CG website has a
table of implementations at the bottom of every rule with at
least one implementation
... the table reports consistency and whether the data is
complete
... and allows checking at individual results for every test of
a rule
... they don't have to match exactly
... these tables are going to move to the WAI website
... and the rules are going to be linked from the WCAG
techniques
... but we're currently redefining what we mean by a consistent
implementation
for consistency we defined levels
scribe: it can be complete, which
means no false positives and negatives and you report on the
SC
... it can not have untested results
... if that is the case, it will be a partially consistent
implementation
... with a partial you can have false negatives, but not false
positives
... the minimal implementation is a new level that shows you
can test the existence of something, but not the
requirements
... basically, it tests the applicability of a rule
TomBrunet: does the test case id changes when you modify a test case?
Wilco_: yes
TomBrunet: does the implementation report get updated when this changes happen? we have submitted one report and haven't seen it
Wilco_: I haven't been able to
update it yet
... we also want to start reporting the coverage of an
implementation
... the coverage is a metric of how much you covered compared
to the total number of test cases
... tools report cantTell results, so this metric will account
for that
... we're also determined automated coverage
... for tools that only do automated this will be equal to the
total coverage
... for tools that include semi-automated checks it will allow
distinguishing
... Is this a good direction?
Skotkjerra: can you provide further context at why we are allowing false negatives but not false positives
Wilco_: it comes from what ACT
rules are designed to do - find problems
... in almost all cases where an ACT rule does not find a
problem, further testing needs to be done
Skotkjerra: but if we want to
measure the quality of the coverage we also need to know what
is not covered
... the best way to have few false positives is to test as
little or critically as possible
... we might be sending the wrong signal
Wilco_: is the implication that
we shouldn't be reporting partial implementations?
... either you have a complete implementation or you
don't
... we're running out of time... mail me if you have any
concerns about this process
This is scribe.perl Revision VERSION of 2020-12-31 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/Intereste/Interested/ Default Present: Daniel, Jean-Yves, CarlosD, Helen_, kathyeng_, anne_thyme, ToddL_, Wilco_, Drew_Nielson, TomBrunet, Skotkjerra_ Present: Daniel, Jean-Yves, CarlosD, Helen_, kathyeng_, anne_thyme, ToddL_, Wilco_, Drew_Nielson, TomBrunet, Skotkjerra_ Found Scribe: CarlosD Inferring ScribeNick: CarlosD WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]