Silver Task Force & Community Group Virtual Face to Face -- 07 May 2020

<ChrisLoiselle> scribe:ChrisLoiselle

<Detlev> ok

<Detlev> will scribe

<Lauriat> Thank you, Chris & Detlev!

Jeanne: How to include functional user needs is very important to our work. This two hour block is dedicated to this topic.

I've drawn a diagram that talks to this. Then we can talk to how and where to update the functional user needs.

Jake: I would like to focus on a solution around functional needs. Benchmarking and benchmarking use within UX.

Jeanne: Let's talk to this in agenda topic two in about 15 minutes. Jake: I have some links and will add those at that time on how we can test

pass / fails and task based flows

<Lauriat> Excellent, thank you, Jake! Looking forward to talking through it.

<jeanne> https://docs.google.com/drawings/d/16OF5F72Sv3B6GvEiAOWPRqwI7WN8nCE-_mNXrJXbYJA/edit

Jeanne: Let us talk to the architecture. Jeanne provides link to Silver Architecture May 2020

Guidelines and Conformance. Guidelines are individual guidelines. Guidelines have how to and methods. All tabs within how to and methods are listed underneath these parent content areas.

On the diagram, conformance is located on right hand side of the diagram. Conformance has buckets / tabs for scope , samples / paths and total score.

<sajkaj> Janina wonders whether "total score" is more complexity than we need

Bruce: Conformance isn't linked within Guidelines yet? Jeanne: Yes, the architecture is not complete yet, but a ongoing visible representation of the architecture we are building.
... How to expand to how to . How do we get from guidelines to conformance?

Peter: Do you see a place in this approach for what we discussed yesterday on site's designed paths?

Jeanne: Fits under conformance > Samples / task paths.

Peter: How does that relate to scoring?
... If we have a simple website with two users paths. All guidelines apply or only 4 guidelines are potentially an issue. 2 user tasks, example, how would this work with scoring?

Jeanne: You'd go through for each guideline > method > test . Test would have a score. Test scores would add up (normalized) to a total score.

Shawn: Experiments with examples will be completed to see how useful this actually is. Peter: I agree that a sample would be great before building out more robust / detailed examples.

Detlev: Scoring and how to , how does that fit in?

Jeanne: Fallback method to how you can score if you are meeting user needs.

Shawn and Jeanne: A sort of placeholder to be built out.

JF: Conformance not being built into the full architecture? Seems to be off on its own per diagram?

Scoring is two places? But would conformance fit under how to somewhere?

Jeanne: Writing process talks to user needs and functional outcomes

Conformance shouldn't be visually off to the right as far as it is currently visually located.

JF: Entities would be interested in increasing score from 86 percent to 88 percent (or points)

<kirkwood> JF is correct

<kirkwood> its unfrtunte how important the scoring has become

Shawn: We are going to try this out in examples to provide actual data points to work with moving forward. What we are talking today about is how functional needs are fitting into the architecture

Jeanne: At the end of analyzing user needs, we write the functional outcomes for each guideline. The guideline should be normative.

How to is informative and on separate page. One to many on methods , so there is a listing of methods . If it is technology specific, it would feed in to total score.

If not scoring would be pulled in from how to tab

Detlev: Has a decision been made on a total score? Is that open still?

<sajkaj> I hope we set time somewhere to discuss pros and cons of "total score"

Jeanne: Individual guidelines would be scored and would move to a total score that possibly is bucketed by "bronze, gold , etc."

<JF> +1 to Detlev

<Fazio_> I like the idea of functional needs scores

Detlev: Separate scores for different functional user needs should be reviewed

<Lauriat> +1 to Detlev

Janina: Eager to hear from Jake. I'm wondering about value about total score. Getting things down to a single adjective, could hide issues for certain users. Maybe the guideline by guideline and entire total score may be difficult to review. Perhaps a table of scores is beneficial?

JF: Talks to FICO score , life of credit history etc. There are measurable things that impact the score. At end of day, what is the score? What can they do to improve the score?

<Lauriat> +1 to JF, though we need to keep Janina's point in mind as we work through how to make that happen.

<Fazio_> average functional needs scores

Total score including the functional needs would be beneficial

JohnK: I agree with what the group is saying. The score depends on the audience. If looking from management point of view. If we are looking at a team of people putting thing into a process, more iterative / detailed approach to scoring. High level and granular are needed w

<Lauriat> +1, well put.

<jeanne> +1 JohnK for a table of scores

<JakeAbma> Benchmarking UX: Tracking Metrics

<KimD> +1 to Janina, Kirkwood

<JakeAbma> https://www.nngroup.com/articles/benchmarking-ux/

<JakeAbma> Quantitative vs. Qualitative Usability Testing

Jake: I would like present tracking metrics and benchmarking

<JakeAbma> https://www.nngroup.com/articles/quant-vs-qual/

<JakeAbma> Qualitative vs. Quantitative UX Research

<JakeAbma> https://www.nngroup.com/videos/qualitative-vs-quantitative-research/

Jake: New tests are based on benchmarks then you test against it again and again.

Bench marking is based on tasks

ACT fits within this.

<jeanne> We talked about including benchmark tests in Silver, but didn't have anyone who knew how to write them.

<Fazio_> +1 to benchmarking UX quantitative & qualitative testing

User Experience on quality and task completion is well known, we are adding the layer of accessibility on top of this. Compare to bench marks , qualitative are on ACT rules . Up to us on two different results. Do we want to merge these for total score ? Or qualitative vs. quantitative score

<Fazio_> Also jives with ISO

<jeanne> +1 to reversing the names quality vs quanty - but it's a minor detail

<KimD> +1 to Jake & leaning into UX models

<jon_avila> I agree with Detlev - heading presence, level are quantitative while heading label purpose might be more qualitative. user testing is more qualitative.

Detlev: WCAG quantitative metric analysis like a benchmark . Screen reader user and qualitative aspects , user testing more qualitative?

<JF> +1 to Detlev

Jake: The links talk to the differences in each type of test. Per the link provided (for reference) Qualitative research informs the design process; quantitative research provides a basis for benchmarking programs and ROI calculations.

Test framework for quantitative benchmark would be reviewed. Task completion part , would end up with percentage score.

<Zakim> jeanne, you wanted to talk about benchmark testing and Silver research

<jeanne> Giorgio Brajnik

Jeanne: As far as research and development working group, accessibility testing was reviewed and benchmark testing was talked to.

Giorgio Brajnik was the lead on the presentation or paper...

Janina: Mentions she may be able to follow up on that particular topic

JF: Detlev's concern +1 to 80/20 rule.
... Are we talking to establishing benchmarks on functional needs?

Jake: The benchmark will concentrate on the 20 percent. The experiment would be on whether it fits well. In the end, a bench mark would be needed.

Composite benchmarks could be viewed across different functional needs. JF: Styling of headings for visual users is a need for one user vs. others.

JF: Different users have different needs. Optimum for each user group. The developers would need to merge the profiles together for an end product. Real needs and real users.

<Zakim> Lauriat, you wanted to mention the lowest-score-applies idea we've discussed now and again

Shawn: concern about consolidated score is valid however the same is said for score for total. Perhaps a lowest-score applies idea, where the lowest score is the consolidated score. If you want a better score, raise that score for that particular functional need / outcome

I have worked with an org that uses a likert scale with wcag 2.1 and has scored a lot of websites. In this siutation 5=Pass in 2.1 since its a hard pass or fail. When broken down by functional areas though the #

<Zakim> bruce_bailey, you wanted to ask Rachael how many sites she scored?

Bruce to Rachael : How many have you scored? Rachael: Two thousand sites scored.

<jon_avila> was the scoring based on automated and manual testing?

Rachael: It was a merged version of automated and manual per Success Criteria

<JF> +1

<sajkaj> +1

Shawn: Total score and individual score level of testing needs to be looked at.

<KimD> +1

<bruce_bailey> +1

<jeanne> +1

<Detlev> not sure what that would mean on the detail level...

<Makoto> +1

<OmarBonilla> +1

Detlev: Filtering of results for different user groups. Current technique , h1-h6 for heading levels. To benefit blind users, but how is this related to other user needs ?

<Detlev> OK, remember that

JF's heading example talked to functional user needs.

<Detlev> Got you (will take over scribing)

Shawn: I.e. use of screen readers and impact of headings for screen reader users vs. visual markup of headings.

<Detlev> scribe: Detlev

<ChrisLoiselle> Thanks Detlev!

Shawn: Describing the way functional user needs relate to particular tests / tasks

<Zakim> jeanne, you wanted to ask Jake to explain where benchmarking by functional User need is by guideline or by overall task completion?

Jeanne: Jake do you envision benchmarking on a guideline or at a task completion level?

Jake: Its on the task completion level - we need to create examples to show people how it works
... Aspects of headings would be the qualititative level, while task completion would be on the quantitative benchmarking level
... task have to be very specific and fine-grained (find telephone number, find price)
... so you can measure can the user complete the process, nit if it is implemented well technically

<ChrisLoiselle> Per NNgroup , Benchmarking is a method https://www.nngroup.com/articles/ux-research-cheat-sheet/

Jake: all well documented on the links to NN Group stuff above

Jeanne: Put benchmarking into diagram - is that represented correctly?
... its by task completion, not guideline

Shawn: talking about extra NN group link above

Jeanne: Like that it is later in the process - my concern is not wanting to have task completion testing at the end of the dev process
... risk that people move resources from design to testing part of the process
... Going back to architecture: We start with user needs / seven functional needs - we have agreement that in the conformance section in the end product we will have a score by functional need area

Shawn: ..as a result of the benchmark approach

Jeanne: someone using a poduct thes can see how guidelines relate to functional user needs

Shawn: if you have a particular guideline mapping to 3 different functional needs we would need to be able to assess how the content measures up against each of these
... that's wh that definition should be informative to give more flexibility

Jeanne: We could normalise individual guidelines by eliminating those that didn't apply to they can all feature in a total score

Deltev: an example would help to understand this

Jeanne: will look for an example
... was from TPAC session in France
... Alastair and Wilco were involved

Shawn: Have we arrived at the action items for this to move on to next agenda items?

<jeanne> drop item 1

<jeanne> drop item 2

How to update the Functional User Needs?

<jeanne> older http://mandate376.standards.eu/standard/functional-statements

Jeanne: we have different functional user needs, here is the older one
... reads User Accessibility Needs

Bruce: The US comparison to 508 requirements?

Jeanne: just the list of user accessibility needs
... cognitive disabilities were broken down more

Bruce: will take a minute

Jeanne: Lets talk about concerns about this list

<Rachael> needed to be useful varies greatly. For example, the score needed for someone who is blind to use a site is much higher than for someone who experiences siezures. So if that approach is taken, we will have to normalize scores at the testing level across functional areas.

<CharlesHall> Map of Functional Needs to WCAG SC - EN 301 549 Annex B https://docs.google.com/spreadsheets/d/1W5CSvU4XxWXNneu4ibokjcYUCsG386xL1rGOiTrDvt8/edit?usp=sharing

<CharlesHall> Disabilities https://docs.google.com/spreadsheets/d/12wcZh1SgnL52Sz6gYHoLKyWQi5viAMv28kmgnOv06-k/edit?usp=sharing

Jeanne: its missing some key disability areas

<CharlesHall> Coga Functions https://docs.google.com/document/d/1QsiD0Y0lLCXvbmOOC4-EPf-2lFEPoEMaqNomQtPzBQI/edit?usp=sharing

Jeanne: color blindness addressed as lack of colour perception

<jon_avila> It also doesn't address with limited vision and with limited hearing and multiple disabilities

Jeanne: cognitve issues lumped together

<bruce_bailey> Here it the place in the Revised 508 Standards:

<bruce_bailey> https://www.access-board.gov/guidelines-and-standards/communications-and-it/about-the-ict-refresh/final-rule/text-of-the-standards-and-guidelines#302-functional-performance-criteria

JonA: Addresses situation of limited vision and limited audio perception

<jon_avila> CVAA

<PeterKorn> CVAA - Communications and Video Accessibility Act of 2010

Jeanne: Can you find that CVAA Jon, for the record?

<PeterKorn> Or more properly, The 21st Century Communications and Video Accessibility Act

<bruce_bailey> We use: Without Vision, With Limited Vision, Without Perception of Color, Without Hearing, With Limited Hearing, Without Speech, With Limited Manupulation, With Limited Reach and Strength, With Limited Language, Cognitive, and Learning Abilities

Jeanne: makoto, any further gaps ypu know of?

<jon_avila> https://www.law.cornell.edu/cfr/text/47/14.21

<bruce_bailey> So 508 is nine categories, 302.1 through 302.9

<jon_avila> (ii) Operable with low vision and limited or no hearing. Provide at least one mode that permits operation by users with visual acuity between 20/70 and 20/200, without relying on audio output.

Jeanne: (reading from list CVAA?)
... then further breakdown regarding availability of information

<Fazio_> Operable with prosthetic device seems complicated to test

<CharlesHall> operable without tactile sensory information (i have touched the screen hard enough)

<scribe> ...(continues reading)

JF: The thing about CVAA is it keeps using 'operable' - in WCAG we have four principles, we also need to think abbout perception
... CVAA assumes you can perceive

<bruce_bailey> (2) All information necessary to operate and use the product, including but not limited to, text, static or dynamic images, icons, labels, sounds, or incidental operating cues, [shall] comply with each of the following, assessed independently:

Jon: next section in CVAA mirrors perception, also references seizures

JF: Lots of overlap between CVAA and sectino 508 and Airline requirements as well

Makoto: Japanese standards have the same kind of list of needs, need to look for that

Shawn: How do we want to break this down is the main topic here - then reference sources

Jeanne: need to look more broadly at othe countries (South America, Africa)

<CharlesHall> and the list needs to include ‘intersectional needs’

Jeanne: there is a lot od overlap - how do we resolve the conflicts and have a list of our own that can be used, is acceptable? JF raised the harmonisation issue before

CharlesH: did you identify contradictions or lack of harmony? We want to extend list rather than contradict them

Jeanne: There are minor places of disagreement

Shawn: Its mostly for the writing process: these are those we pick for functional outcomes, methods, tests - so we need to draw up a list - could be referenced, are there examples how that is done in WCAG?

<KimD> And mental health issues (anxiety, etc)

Jeanne: Vestibular disorders: when these lists were drawn up, this was not on anyone's radar then

<Fazio_> Our COGA Content Usable guide has a lot of that too

Jeanne: the design pattern creating a problem for vestibular d. started perhaps 5 years ago

<Fazio_> for instance PTSD depression etc, all have cognitive impact that can be aggravated by ICT design

DavidF: We are consolidating the stuff out there, not contradicting things out there

<Fazio_> we have supporting research as well

Janina: make it easier to add to the list as new knowledge emerges

<Lauriat> +1 to Kim & Fazio

DavidF: There's a lot of work done in the Coga TF on this, identified new user needs that are not yet covered
... discalcular, mental fatigue, and others

Jeanne: David Swallowdid interesting work on what can be done to reduce anxiety

<jon_avila> The cognitive category could really be broken out and expanded to include things like perception of emotion, anxiety, distraction, etc.

<davidofyork> I am still on the call, hi! That's great to hear, I didn't know that.

DavidF: Wording can set up the wrong tone, lot of research what triggers anxiety and prevents proper use
... some people just can't percieve
... Lots of things that have not yet been including in WCAG

Detlev: Should be as simple as possible

<Rachael> 1+

<Lauriat> +1

DavidF: We have broken that down in categories, so Silver should accommodate that

<jeanne> +1 to including the detailed cognitive needs in WCAG3

CharlesH: To Detlev: Do you think info architecture could solve that concern? If one id selective of say, 25 categories and only pick

<Fazio_> qWe can benchmark Neuropsychological eval categories

<Fazio_> So we won't be reinventing the wheel if necessary

Jake: Still wondering what results we want to deliver - of all the different functional needs should be covered, the effort will be far too high - it would take a lot of time to include all the needs in the testing outcome

<Fazio_> +1 to Jake

<jeanne> +1 to Jake about putting more effort of the functional needs analysis into the guideline writing and add to benchmark testing.

Jake: the number doesn't matter if we create a baseline, we need to provide the baseline for benchmarking - we can include the cognitive needs in that benchmark so it will be covered

<Fazio_> For instance visual complexity, low contrast, etc create mental fatigue - a COGA issue

Jake: it is our work to create the benchmark that is inclusive of these needs so testers can use that - the tester then does not need to know about it in any detail

<Fazio_> So, grouping would be effective

<jeanne> +1 to put the burden on us and not the individual testers

Rachael: it would be good to put together a tree structure but perhaps nit to decide on grouping right now - too early

<Lauriat> +1 to Rachael

<Fazio_> should we paste content usable link?

Jeanne: Wuld be helpful to have some of it now - we should put the burden on us (+1 to jake) not to individual testers

<KimD> +1 to Jeanne, Jake - Functional needs is for us to figure out

<Lauriat> +1

<kirkwood> +1

Jeanne: would be good to have a draft version when writing the individual guidelines - maybe we can have a separate functional note that anyone coul duse

<CharlesHall> +1 to central doc for all groups

<Fazio_> I'll help

Janina: agree with MichaelC to start writing an extended list of functional needs

<CharlesHall> if you create a doc, i will contribute to it

Shawn: We got David F and CharlesH as volunteers

<Zakim> Lauriat, you wanted to mention grouping as a way of showing our work.

Shawn: the grouping is important to have a concise list that can be expanded

<Fazio_> we have 2 EU peeps in COGA

<Fazio_> we can ask them also

Jeanne: Can anyone in the EU or elsewhere contribute so it is not US-centric

<PeterKorn> Regrets. I need to leave early

DavidF: We have discussed how the different user needs contribute to fatigue, how to aggregate that - that would require more research
... many aspects (white space, contrast etc) can contribute

<CharlesHall> there are also issues like depth perception for XR

Jeanne: Anyone els who can contribute?

<Lauriat> +1 Charles, good example.

Jeanne: anyone at FAST (Framework for accessible specification of technologies)

MichaelC: It is a core that should inform different guidelines

<Fazio_> being blind in half of each eye having depth perception is contingent on cocking my head and angling my eyes

Jeanne: Recap: We are pursuing adding scoring by functional need types, contributing to a total score, working through examples of how that could work; create a group to create a comprehensive lsit of user functional needs

<sajkaj> Fast Intro: https://www.w3.org/2019/Talks/0516_FAST_MC/ Checklist: http://w3c.github.io/pfwg/wtag/checklist.html

Chase task based scoring (Giorgio B. / janina)

<jeanne> +1 for concrete examples

Jake: Would like to create examples so people get claritiy on how this works - it is only examples where you start seeing the flaws - check how quantitative and qualitative can be merged

Jeanne: ping me on these examples
... any issues reg. functional needs to be still tackled?

<CharlesHall> the complexity of intersectional needs or the conflict of needs is a bug challenge

JF: We have to expand it beyond seven or nine, what is the right number of categories - risk is that it gets too granular, 20-30 gets too much - we need to find the right balance between issue and needs of content creators

Jeanne: adds this to Wiki list of outstanding issues

Janina: We resume in an hour

<bruce_bailey> scribe: bruce_bailey

Putting the Conformance pieces together

Objective of next two hours is to take pieces from last 36 hours and put together to get a substantive list of what still needs to be done

Jeanne: i have updated diagram
... out line in red is functional user needs
... and added arrows for what informs what

<ChrisLoiselle> could we have the link for the diagram again?

Jeanne: still a little rough, but should be better

<jeanne> https://docs.google.com/drawings/d/1hYzmiqrvNo_ymuXbtV5P5oKiyKIJRpevkHQ-KDD8l4o/

link to new illustration

scribe: i took out color, and will be taking out arrows
... based on call yesterday, split two roots of testing, which need name
... in the conformance process, we have task based pass
... many orgs will do both, but not all will want to

<Fazio> I'd assume they'd do what's cheapest

can both approaches work?

John Foliot: it will be a combination, especially at launch

scribe: with new pages, less user testing and more page based approach

<Lauriat> +1 to JF, though I think we can still express the overall results in terms of tasks.

scribe: functional walk through useful, but old (like 2007) not really useful

<Fazio> +1 JF concern about dated walk throughs

scribe: so old dated walkthru should not be part of scoring

<CharlesHall_> so does scope include a date?

Jeanne: I like what you raise about maintenance being an important issue, and that has not been part of this
... maitentence is more tradition than task based?

JF: That kind of task based testing has not really been happening today very frequently
... focus is on "this weeks updates" for example, a new blog post

<sajkaj> I think John's example is talking about Challenge #3 from the Challenges doc

JF: we know blog post is published in accessible setting, but authors prose might use "click green icon on left"

Jeanne: John, in your experience, what do client do today. Say a bank asks for review of client service. How do you scope?

JF: We have a range of options, automated manual scanning, expert review, or screen-reader user acceptance testing
... so we dialog with customer about what problem they are trying to solve

Jeanne: suppose customer asks for just "create new account" how do you scope that?

JF: We set expectations and negotiate around what is a representative sample.

Jeanne: Same question to TPG.

Charles (not for TPC): cognitive walk through typically does not look at individual content

<Fazio> We do task based scripts put together with input from owners intent

<Fazio> plus wireframes

scribe: so we could look at scope of claim and it does not as time sensitive because it looks at process and not editoral content

<Lauriat> +1 to Charles

Jeanne: that help me, as when I worked at TPG, customers really wanted TPG to only look at new section
... we would scope out tasks for new section, and representative pages in that section.
... so we did whole page look and task flow look, and then test with different assistive technologies.

<CharlesHall_> essentially, a new cognitive walkthrough evaluation would be required if the task changed or the pattern changed, but not if there are routine editorial maintenance releases

Jeanne: Client says "this is what i want to test" and TPG working out what that implies

<davidofyork> (Sorry, I'm having mic issues). Jeanne, I'm rarely involved in scoping anymore, but from what I gather the process is similar to what you've described.

JF: for companies like mine, we have dashboard like tools, but customers also want feedback to limits risks
... so they need dashboard views in addition to our testing

<Fazio> to Jake's earlier quant qual direction automated test provides quantitative data to scope qualitative testing

JF: we have api and tools to put into their processes, but we dont have tools for editorial evaulation
... like click on the red header instructions, that needs human evaluation

<Lauriat> Big +1 to both levels of testing and tracking!

JF: a large company needs need to have confidence about divisions

Jeanne: Would you say this is a reflection of traditional view point and task based side?

JF: There is a 3rd aspect: the dashboard spider tools

Jeanne: That it is the traditional side, does not care about tasks

JF: The human based testing, if used to assess initial score, how does that age?

Shawn: You example of initial human testing is a wide comprehensive sweep [JF agrees]
... so minor updates over the years, does manual testing happen for the incremental additions?

JF: Yes, sure. Gives example of migration to Drupal. Lots of testing at beginning, but fewer editorial controls over time.

Shawn: Thinks teams like those at Google, work a little bit different, so accessibility reviews are more contemporary

<Fazio> +1 to JF's concern

JF: Consider NYS school system. Set up as robust as possible, but then editor controls are not strong after the initial rollout

John Kirkwood: Agreed, this is a real issue with the network I oversea.

Shawn: Can we have a date stamp of last inspected, like one sees in an elevator, can we scope those "sell by date"
... for a tradition website, initial review could work for years. With other technologies, say VR, a six-month old review might not be meaningful.
... We definitely need clarification that review does not cover content after X date, but how long review should be good for is an issue.

JF: With WordPress, site content writer might not update control panel setting.

<sajkaj> Apropos this discussion, I still use lynx

<Fazio> It helps comply with ISO

<Fazio> WCAG is ISO also

<Zakim> jeanne, you wanted to say that it isn't a business of standards to say how long conformance can last, that is regulations

John Kirkwood: We need to be very careful about getting into maintenance process. Our stuff needs to be timeless if possible.

<Lauriat> +1 to jeanne, especially if we define compliance at a lower level like a task.

Jeanne: I am not thinking that it should be up to us about how long a review should be good for.

<KimD> +1 to Jeanne

<JakeAbma> +1 to Jeanne, not up to us

Jeanne: We can say what is the basis for accessibility, and it up to site owner and regulators to worry about expiry.
... I would like to go back to the time stamp approach we agreed to last month.

JF: Our clients are looking for a true value report.
... a 3 year old usability study is not valuable

<sajkaj> Why is that up to us?

<KimD> +1 determining expiration date or value is not on us.

Jeanne: The usability study has a date stamp, so does not factor for the score today.

JF: Industry wants up to date dashboard.

David Fazio: Can prior tests factor into the current day score?

<Lauriat> -1, that's a different level of testing and reporting.

Jeanne: Lets have the discussion about legacy sub tasks.

JF: Anything that effects score has to be figured out now.

DavidF: We have issue with WCAG 1.0 conforming implies 2.0 conforming, but this might not work with 3.0.

<Lauriat> -1 to tying compliance to litigation as a matter of how we define conformance.

Jeanne: Do we want to include earlier human scoring?

Do legacy subtasks include in a project?

<JF> Q: when does current become legacy?

<JF> and why?

ShawnL: In the course of path-based testing, you have to test sub paths. Sometime it makes sense to test separtely.

To question as to when become legacy, is case-by-case, and not important if we have transparency.

<sajkaj> +100 to Shawn

scribe: it does not fit in terms of conformance.

JF: We are working on standards, so a standard way of approaching a problem.
... All I am saying is that the value of human testing diminishes over time.
... if an author fixes heading, scores go up. But a 2018 evaluation has what impact on the dashboard score?
... Ex, leading up to v3 of website, a company does extensive user testing. Company does not want to loose that work.

Jeanne: If company does new conformance statement, those old test cannot be counted.
... Asking about traditional versus path based assessment.

DavidF: Even pages not changed are effected by updated to hardware and browsers

KimD: I am hesitant to say that something has an expiration date. Mom-and-pop pizza site could be fine years later.

<kirkwood> +1 to Kim

KimD: Many things go out of date, but that should not be a flat assumption.

<jeanne> +1 to Kim. Pizza shop not changing and transparency

<sajkaj> What Kim is saying is why I can still use lynx successfully much of the time

KimD: I like what Shawn said about transparency and exposure, so if company cites years old report, that is on them.

<JF> @janina, much, but not always

<kirkwood> +1 to disclosure

<Fazio> then we can't factor it in I would think

It is all about disclosure. Some one challenging the claim, sees the date of testing.

JF: As mentioned, not changing any code, the website could not work well since browsers change.
... Mechanical tests could show how things kept working over time.

<kirkwood> agree with KIm

<Lauriat> +1 to Kim

KimD: Answer for me is disclosure.

<sajkaj> +1 to Kim

<jeanne> +1 for not including old tests in the score. It doesn't make sense

<kirkwood> disclosure allows courts to decide

<Lauriat> +1 to Kirkwood as to why

DavidF: This is still a problem, example with trying open old documents in new versions of software.

Jeanne: I would like to propose that we allow people who want to make an overall report, that they be allowed to include when a particular date was done.
... That lets us report and overall with full transparent of when review was done.

Shawn: I want to highlight two example of use of conformance.
... 1 conformance used to make a judgement call, say in a court

<kirkwood> think we should have a use case of ‘archived material’ included

Shawn: 2 other setting is clients asking "where am i now" -- conformance needs to support that
... if we haven't tested this part is six months, with NVDA and browser updates, that might not be recent enough.

<Zakim> JF, you wanted to ask if regulators have been polled on that

JF: The other use case is the regulator use case.
... Have we talked to the regulators? They are key stake holders.
... Accuracy of currency and timeliness of report is very important.
... Judge at court would not be impressed with years old report.

John Kirkwood: The robustness of current standards is why they have been so well respected...

scribe: if we are talking about large data sets, say archives, if it was accessible then, it is probably still very accessible

<KimD> +1

<Lauriat> +1

scribe: cant overlook foundations for new stuff
... archived materials is a huge amount of data

<sajkaj> +1 to John K

DetLev: JF asked about regulators, thinking about EN 301 549 which trickles down to 28 member states

based on WCAG 2.1 and 50 success criteria

scribe: I am worried that we get too complex and graded depending on many things and timelines
... wonder how this will work to become benchmark for 27 countries. Big question mark for me.

<JF> +1

Shawns: JF and DetLev comment reminds me that regulators are part of our core audience, we need to reach out to them and find out if our cases work for them.

<KimD> +1

<Zakim> Lauriat, you wanted to speak to the stakeholder point JF raised, a very good reminder.

Shawns: those writing regulation and those needed to use regulations to make judgement callls

Jeanne: Would like a straw poll / temperature on this topic.
... what would be a statement that could summarize this issue?

<jeanne> Do we want to include older results in current conformance with a date stamp?

<sajkaj> +1

<kirkwood> ! yes with a date stamp

<Rachael> +1

Jeanne: Do we want to include older results (with a date stamp) into current conformance reports?

<kirkwood> +1

<Fazio> depends if it factors into new score and how

<JF> if the date has no impact on the final score, it's window dressing

<Lauriat> +1 to Fazio.

DetLev: I still do not understand this question.

<Detlev> it depends on whether the old stuff can be seen, cross-checked -provided it hasn't#t changed basically

<KimD> +1

DetLev: time stamp or not does not impact responsible to say what is accessible or not

<Detlev> that is Jake, Bruce!

<jeanne> +1 with a date

DetLev: in Dutch government, with EN 301 549, agencies have to affirm every year.

<OmarBonilla> Too many external factors affect the validity of past reports, including the current state of ATs, browsers, etc. For some industries, such as those that serve governments, those governments may work on old tech and require considerable backwards compatibility. Others may be more bleeding edge. The time stamp is fine, but the significance or the "atomic decay" as it were, the rate of that decay is affected by a lot of things.

<kirkwood> it does matter. it shows the management of accessibility, frequency and the correlation with technology and state

DavidF: I think the question is if old test factor into the new contemporary score.

<kirkwood> and the courts understand it

<Rachael> I would like to a statement to the effect that you Can include previous reviews w a date stamp but ideally do the complete test. Then add this discussion summarized in the document.

Detlev: It is not an issue. If you claim 100% but its 50% because old test is not valid, you are responsibility.

<sajkaj> Murder doesn't invalidate laws against it

DavidF: Can old automated test from 2018 factor to new 2020 silver report?

<Rachael> If the code baseline hasn’t changed, why can’t you?

Detlev: It is up to the person making the conformance claim to be accurate.

DavidF: But we have the test and score.

<Detlev> BRUCE, it is JAKE that has been speaking, not me!!

<JF> @Rachael, because your code maybe hasn't changed, butFirefox and NVDA have...

Jeanne: Score is for what you are testing, not the overall site.

David: Say headers accessible in 1998 but in 2021 that is not enough.
... we have new standards and expectation.

Jeanne: No one proposing that.

David: JF points out that people want to make use of old testing.

<sajkaj> scribe: sajkaj

df: Notes web is just one factor/page in EN conformance
... Worried about whether we're building a solid foundation for 3.0 conformance claims

sl: We're working through complex interdependencies, so it's confusing still.
... We don't yet know which work, and which don't.
... And we need to get good at the complexity before we can simplify
... Believe too many variables assumed in Jeanne's straw poll

<Fazio> tests/claims should be dated yes

js: We agree we want transparency

<KimD> +1

<Fazio> yes +1

js: We don't want old results in new score, but we don't want to invalidate old results

<KimD> +1

sl: Don't believe we're in the business of assigning and predetermining expiration dates

<Fazio> Can we poll member companies of their preference?

<Fazio> do they want to factor in old scores? will they object if we say no?

<OmarBonilla> +1 to us not being the ones to determine expiration dates. The rate of "decay" of the old tests is dependent on too many other factors for us to make a blanket call on that.

sl: Believe we agree on expiration, but unclear about underlying problem
... We could have "date last tested" with appropriate granularity
... Could have some kind of decay notion
... Or, it's just the claimant's responsibility

df: Believe we should poll, though not sure how feasible
... Believe people will want backward compatibility

<Detlev> +1 to DavidF

sl: Believe we can have some backward in some of our testing, but don't believe we can do that in conformance

df: I'm thinking at task level

<Detlev> * Janina, you have df both for David Fazio AND for me...

df: for each task based test

sl: We don't want to make it difficult to move from 2 to 3

df: If date last tested factors ...

sl: Needs to point to 3 if it claims 3

<Zakim> sajkaj, you wanted to suggest it's up to the claimant when they move from 2.x to 3.x

<Fazio> Yes

<jeanne> Options:

<jeanne> 1 Date lasted tested and the version of WCAG included in the claim

<jeanne> 2 Date last tested decaying over time included in an overall score on a conformance claim

<jeanne> 3 The responsibility of the client to represent their claim truthfully.

<JakeAbma> 3

<Lauriat> 3

<Rachael> 1 (but 3 is also true)

I like both 1 and 3, but believe our job is only 3

<KimD> 1 (although 3 is true also)

<jeanne> 1

<Lauriat> +1 to Janina, why I voted that way

<CharlesHall_> 1

<OmarBonilla> 1, plus 3 is true

<kirkwood> 1

<Detlev> can't say

<Fazio> Option 1 is less messy

<Detlev> I suggest a carefully worded survey

<bruce_bailey> 1

<Fazio> I have a feeling public will want option 2

sj: Suggests 1 and 2 not in conflict with 3

<Lauriat> +1 to Detlev, though I think we may want to prototype 1 & 2 to see how well they work?

js: Will make a more carefully worded survey
... So, believe we were asking whether we wanted two separate tracks, traditional and/or task based

sj: Clarifies wcag-em; task-based; or combo

js: Yes, but maybe separately reported

df: Not sure how much task testing we'll actually get
... Difficult to organize and do correctly with real users

sl: Clarifies that task based testing isn't necessarily user based

df: OK, my misunderstanding. That makes sense.

sl: Q about 2.1 conformance claims: Is any distinction required ...
... Do we need today's testing approaches into account in scoping 3 conformance for page based

<Fazio> I think the public expects it

sl: Don't want to have tgraditional compete with task based in conformance

<Fazio> Understanding WCAG says similar

sl: If the two ways of arriving at a claim compete, that's bad
... We should specify how the conformance is described

df: Public expects this to be as easy as possible, and the transition to be as smooth as possible
... Notes Understanding says it's possible to meet criteria but still be unusable for some pwd

sl: We don't need to specify how testing is done,
... But specifying how conformance is claime will set up the structure to expose how it was done

<Fazio> Just keep in mind no one wants to run 2 audits of the same pages, elements etc

<OmarBonilla> +1 to lack of desire for redundant testing

sl: We're just supporting a definition of how you show conformance

js: There's a lot of capitalized resources for today's approach which isn't conducive to task based testing

sl: why not?

js: Because it would take investment to retool for tasks path testing
... We want to support people's way of working

<Fazio> which gives more legal credibility

sl: Not worried about our testing becoming outdated by 3

js: Interested to hear about that

sl: Notes that log output can be associated with test flow and abstract data from that

<Fazio> lol

sl: If you define the path, than your page based provides your data
... Also supports describing impact on users when a particular failure on a particular page is noted

<Fazio> crickets

js: Can we ask people to do that?

sl: No, but we don't need to

<Fazio> +1 to Shawns flow

sl: We provide the building blocks

js: More vpats?

sl: vpat may adopt some of this
... It always comes down to describing what I was trying to do

kd: Asking SL if we only specify task conformance we'd get the same results as page based?
... Are you saying task based and traditional would yield the same results?

<Fazio> +1 again

sl: No, that you can extract task description out of page based testing

kd: If we only test components, we miss stuff

df: Agrees with SL

<Detlev> -q

df: Can meet all kinds of reqs and still end up challenged

<Zakim> sajkaj, you wanted to say extracting task conformjance out of page testing assumes all rthe relevant pages have been tested

<Detlev> +1 to DavidF

kd: Reminds SJ about login piece
... Adds examples, ...

<Detlev> +1 to Kim

sl: Agrees with the idea, but don't believe we cfan mandate all the pieces

l: We can have noninterference testing

sl: For login we can reuse UXS world to show how the task is defined

df: Concerned that people would focus on a few tasks and miss other aspects of their pages
... That may make page based more thorough
... We shouldn't lose what works well today

sl: It's already possible to claim only a few pages for conformance today -- or main features
... You just can't say the entire site (or app) conforms based on such testing
... Don't believe page based does that good a job anymore

<Fazio> +1

df: Repeated errors in a part of pages that no one interacts with shouldn't work against a11y
... Aware that many things come up through interaction and aren't really a page in the traditional sense
... It should be transparent how many of a total are tested

sl: We shouldn't narrow the scope down and avoid what affects users
... We haven't yet figured out the mechanics
... How to go through a given task and what's in testing scope as you audit
... A footer strangeness little impact, but autoplaying video would make a big mess

<KimD> Both are bugs - that goes to severity

js: Severity goes with task based and not well documented via traditional

sl: Think at both and impacts individual tests and overall score

js: Moving to wrap up where we are and what to work on ...

<jeanne> https://www.w3.org/WAI/GL/task-forces/silver/wiki/Conformance_Issues_and_Decisions#How_do_companies_track_their_issues_over_time

<jeanne> https://www.w3.org/WAI/GL/task-forces/silver/wiki/Conformance_Issues_and_Decisions

js: Above is Jeanne's open issues tracker
... Asking about "needs more owrk" parts
... composite rules that could game
... we want generic advice about breaking tasks into path parts
... Granularity of reporting ...
... How should older testing affect scoring?

js; How large the functional needs list?

js: Will survey how companies track over time

<CharlesHall_> and appropriate grouping / hierarchy of user needs

js: +1 to Charles
... Task based vs traditional--where and how to factor noninterference; no resolution and continue discussion. SL?

sl: Still confused by the framing; if conformance measure is at the task level, it doesn't matter imo

js: Recall's df's point re current tooling

sl: Believe they continue useful and usable

<Detlev> DavidF's point...

sl: It's how yhou use the output of those tools

js: Asks for additional actions??
... Recalls functional needs work with FAST

<Fazio> agreed

sl: Believe today we've mostly reinforced yesterday's scoping discussion -- we need to see how things fit
... We need to explain our outcomes as best as we can to our stakeholders for their comment and reaction

js: Thanks all around!!

<kirkwood> bes to all!

- DRAFT -

Silver Task Force & Community Group Virtual Face to Face

07 May 2020

Attendees

Contents

How to update the Functional User Needs?

Putting the Conformance pieces together

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output