<ChrisLoiselle> scribe:ChrisLoiselle
<Detlev> ok
<Detlev> will scribe
<Lauriat> Thank you, Chris & Detlev!
Jeanne: How to include functional user needs is very important to our work. This two hour block is dedicated to this topic.
I've drawn a diagram that talks to this. Then we can talk to how and where to update the functional user needs.
Jake: I would like to focus on a solution around functional needs. Benchmarking and benchmarking use within UX.
Jeanne: Let's talk to this in agenda topic two in about 15 minutes. Jake: I have some links and will add those at that time on how we can test
pass / fails and task based flows
<Lauriat> Excellent, thank you, Jake! Looking forward to talking through it.
<jeanne> https://docs.google.com/drawings/d/16OF5F72Sv3B6GvEiAOWPRqwI7WN8nCE-_mNXrJXbYJA/edit
Jeanne: Let us talk to the architecture. Jeanne provides link to Silver Architecture May 2020
Guidelines and Conformance. Guidelines are individual guidelines. Guidelines have how to and methods. All tabs within how to and methods are listed underneath these parent content areas.
On the diagram, conformance is located on right hand side of the diagram. Conformance has buckets / tabs for scope , samples / paths and total score.
<sajkaj> Janina wonders whether "total score" is more complexity than we need
Bruce: Conformance isn't linked
within Guidelines yet? Jeanne: Yes, the architecture is not
complete yet, but a ongoing visible representation of the
architecture we are building.
... How to expand to how to . How do we get from guidelines to
conformance?
Peter: Do you see a place in this approach for what we discussed yesterday on site's designed paths?
Jeanne: Fits under conformance > Samples / task paths.
Peter: How does that relate to
scoring?
... If we have a simple website with two users paths. All
guidelines apply or only 4 guidelines are potentially an issue.
2 user tasks, example, how would this work with scoring?
Jeanne: You'd go through for each guideline > method > test . Test would have a score. Test scores would add up (normalized) to a total score.
Shawn: Experiments with examples will be completed to see how useful this actually is. Peter: I agree that a sample would be great before building out more robust / detailed examples.
Detlev: Scoring and how to , how does that fit in?
Jeanne: Fallback method to how you can score if you are meeting user needs.
Shawn and Jeanne: A sort of placeholder to be built out.
JF: Conformance not being built into the full architecture? Seems to be off on its own per diagram?
Scoring is two places? But would conformance fit under how to somewhere?
Jeanne: Writing process talks to user needs and functional outcomes
Conformance shouldn't be visually off to the right as far as it is currently visually located.
JF: Entities would be interested in increasing score from 86 percent to 88 percent (or points)
<kirkwood> JF is correct
<kirkwood> its unfrtunte how important the scoring has become
Shawn: We are going to try this out in examples to provide actual data points to work with moving forward. What we are talking today about is how functional needs are fitting into the architecture
Jeanne: At the end of analyzing user needs, we write the functional outcomes for each guideline. The guideline should be normative.
How to is informative and on separate page. One to many on methods , so there is a listing of methods . If it is technology specific, it would feed in to total score.
If not scoring would be pulled in from how to tab
Detlev: Has a decision been made on a total score? Is that open still?
<sajkaj> I hope we set time somewhere to discuss pros and cons of "total score"
Jeanne: Individual guidelines would be scored and would move to a total score that possibly is bucketed by "bronze, gold , etc."
<JF> +1 to Detlev
<Fazio_> I like the idea of functional needs scores
Detlev: Separate scores for different functional user needs should be reviewed
<Lauriat> +1 to Detlev
Janina: Eager to hear from Jake. I'm wondering about value about total score. Getting things down to a single adjective, could hide issues for certain users. Maybe the guideline by guideline and entire total score may be difficult to review. Perhaps a table of scores is beneficial?
JF: Talks to FICO score , life of credit history etc. There are measurable things that impact the score. At end of day, what is the score? What can they do to improve the score?
<Lauriat> +1 to JF, though we need to keep Janina's point in mind as we work through how to make that happen.
<Fazio_> average functional needs scores
Total score including the functional needs would be beneficial
JohnK: I agree with what the group is saying. The score depends on the audience. If looking from management point of view. If we are looking at a team of people putting thing into a process, more iterative / detailed approach to scoring. High level and granular are needed w
<Lauriat> +1, well put.
<jeanne> +1 JohnK for a table of scores
<JakeAbma> Benchmarking UX: Tracking Metrics
<KimD> +1 to Janina, Kirkwood
<JakeAbma> https://www.nngroup.com/articles/benchmarking-ux/
<JakeAbma> Quantitative vs. Qualitative Usability Testing
Jake: I would like present tracking metrics and benchmarking
<JakeAbma> https://www.nngroup.com/articles/quant-vs-qual/
<JakeAbma> Qualitative vs. Quantitative UX Research
<JakeAbma> https://www.nngroup.com/videos/qualitative-vs-quantitative-research/
Jake: New tests are based on benchmarks then you test against it again and again.
Bench marking is based on tasks
ACT fits within this.
<jeanne> We talked about including benchmark tests in Silver, but didn't have anyone who knew how to write them.
<Fazio_> +1 to benchmarking UX quantitative & qualitative testing
User Experience on quality and task completion is well known, we are adding the layer of accessibility on top of this. Compare to bench marks , qualitative are on ACT rules . Up to us on two different results. Do we want to merge these for total score ? Or qualitative vs. quantitative score
<Fazio_> Also jives with ISO
<jeanne> +1 to reversing the names quality vs quanty - but it's a minor detail
<KimD> +1 to Jake & leaning into UX models
<jon_avila> I agree with Detlev - heading presence, level are quantitative while heading label purpose might be more qualitative. user testing is more qualitative.
Detlev: WCAG quantitative metric analysis like a benchmark . Screen reader user and qualitative aspects , user testing more qualitative?
<JF> +1 to Detlev
Jake: The links talk to the differences in each type of test. Per the link provided (for reference) Qualitative research informs the design process; quantitative research provides a basis for benchmarking programs and ROI calculations.
Test framework for quantitative benchmark would be reviewed. Task completion part , would end up with percentage score.
<Zakim> jeanne, you wanted to talk about benchmark testing and Silver research
<jeanne> Giorgio Brajnik
Jeanne: As far as research and development working group, accessibility testing was reviewed and benchmark testing was talked to.
Giorgio Brajnik was the lead on the presentation or paper...
Janina: Mentions she may be able to follow up on that particular topic
JF: Detlev's concern +1 to 80/20
rule.
... Are we talking to establishing benchmarks on functional
needs?
Jake: The benchmark will concentrate on the 20 percent. The experiment would be on whether it fits well. In the end, a bench mark would be needed.
Composite benchmarks could be viewed across different functional needs. JF: Styling of headings for visual users is a need for one user vs. others.
JF: Different users have different needs. Optimum for each user group. The developers would need to merge the profiles together for an end product. Real needs and real users.
<Zakim> Lauriat, you wanted to mention the lowest-score-applies idea we've discussed now and again
Shawn: concern about consolidated score is valid however the same is said for score for total. Perhaps a lowest-score applies idea, where the lowest score is the consolidated score. If you want a better score, raise that score for that particular functional need / outcome
I have worked with an org that uses a likert scale with wcag 2.1 and has scored a lot of websites. In this siutation 5=Pass in 2.1 since its a hard pass or fail. When broken down by functional areas though the #
<Zakim> bruce_bailey, you wanted to ask Rachael how many sites she scored?
Bruce to Rachael : How many have you scored? Rachael: Two thousand sites scored.
<jon_avila> was the scoring based on automated and manual testing?
Rachael: It was a merged version of automated and manual per Success Criteria
<JF> +1
<sajkaj> +1
Shawn: Total score and individual score level of testing needs to be looked at.
<KimD> +1
<bruce_bailey> +1
<jeanne> +1
<Detlev> not sure what that would mean on the detail level...
<Makoto> +1
<OmarBonilla> +1
Detlev: Filtering of results for different user groups. Current technique , h1-h6 for heading levels. To benefit blind users, but how is this related to other user needs ?
<Detlev> OK, remember that
JF's heading example talked to functional user needs.
<Detlev> Got you (will take over scribing)
Shawn: I.e. use of screen readers and impact of headings for screen reader users vs. visual markup of headings.
<Detlev> scribe: Detlev
<ChrisLoiselle> Thanks Detlev!
Shawn: Describing the way functional user needs relate to particular tests / tasks
<Zakim> jeanne, you wanted to ask Jake to explain where benchmarking by functional User need is by guideline or by overall task completion?
Jeanne: Jake do you envision benchmarking on a guideline or at a task completion level?
Jake: Its on the task completion
level - we need to create examples to show people how it
works
... Aspects of headings would be the qualititative level, while
task completion would be on the quantitative benchmarking
level
... task have to be very specific and fine-grained (find
telephone number, find price)
... so you can measure can the user complete the process, nit
if it is implemented well technically
<ChrisLoiselle> Per NNgroup , Benchmarking is a method https://www.nngroup.com/articles/ux-research-cheat-sheet/
Jake: all well documented on the links to NN Group stuff above
Jeanne: Put benchmarking into
diagram - is that represented correctly?
... its by task completion, not guideline
Shawn: talking about extra NN group link above
Jeanne: Like that it is later in
the process - my concern is not wanting to have task completion
testing at the end of the dev process
... risk that people move resources from design to testing part
of the process
... Going back to architecture: We start with user needs /
seven functional needs - we have agreement that in the
conformance section in the end product we will have a score by
functional need area
Shawn: ..as a result of the benchmark approach
Jeanne: someone using a poduct thes can see how guidelines relate to functional user needs
Shawn: if you have a particular
guideline mapping to 3 different functional needs we would need
to be able to assess how the content measures up against each
of these
... that's wh that definition should be informative to give
more flexibility
Jeanne: We could normalise individual guidelines by eliminating those that didn't apply to they can all feature in a total score
Deltev: an example would help to understand this
Jeanne: will look for an
example
... was from TPAC session in France
... Alastair and Wilco were involved
Shawn: Have we arrived at the action items for this to move on to next agenda items?
<jeanne> drop item 1
<jeanne> drop item 2
<jeanne> older http://mandate376.standards.eu/standard/functional-statements
Jeanne: we have different
functional user needs, here is the older one
... reads User Accessibility Needs
Bruce: The US comparison to 508 requirements?
Jeanne: just the list of user
accessibility needs
... cognitive disabilities were broken down more
Bruce: will take a minute
Jeanne: Lets talk about concerns about this list
<Rachael> needed to be useful varies greatly. For example, the score needed for someone who is blind to use a site is much higher than for someone who experiences siezures. So if that approach is taken, we will have to normalize scores at the testing level across functional areas.
<CharlesHall> Map of Functional Needs to WCAG SC - EN 301 549 Annex B https://docs.google.com/spreadsheets/d/1W5CSvU4XxWXNneu4ibokjcYUCsG386xL1rGOiTrDvt8/edit?usp=sharing
<CharlesHall> Disabilities https://docs.google.com/spreadsheets/d/12wcZh1SgnL52Sz6gYHoLKyWQi5viAMv28kmgnOv06-k/edit?usp=sharing
Jeanne: its missing some key disability areas
<CharlesHall> Coga Functions https://docs.google.com/document/d/1QsiD0Y0lLCXvbmOOC4-EPf-2lFEPoEMaqNomQtPzBQI/edit?usp=sharing
Jeanne: color blindness addressed as lack of colour perception
<jon_avila> It also doesn't address with limited vision and with limited hearing and multiple disabilities
Jeanne: cognitve issues lumped together
<bruce_bailey> Here it the place in the Revised 508 Standards:
JonA: Addresses situation of limited vision and limited audio perception
<jon_avila> CVAA
<PeterKorn> CVAA - Communications and Video Accessibility Act of 2010
Jeanne: Can you find that CVAA Jon, for the record?
<PeterKorn> Or more properly, The 21st Century Communications and Video Accessibility Act
<bruce_bailey> We use: Without Vision, With Limited Vision, Without Perception of Color, Without Hearing, With Limited Hearing, Without Speech, With Limited Manupulation, With Limited Reach and Strength, With Limited Language, Cognitive, and Learning Abilities
Jeanne: makoto, any further gaps ypu know of?
<jon_avila> https://www.law.cornell.edu/cfr/text/47/14.21
<bruce_bailey> So 508 is nine categories, 302.1 through 302.9
<jon_avila> (ii) Operable with low vision and limited or no hearing. Provide at least one mode that permits operation by users with visual acuity between 20/70 and 20/200, without relying on audio output.
Jeanne: (reading from list
CVAA?)
... then further breakdown regarding availability of
information
<Fazio_> Operable with prosthetic device seems complicated to test
<CharlesHall> operable without tactile sensory information (i have touched the screen hard enough)
<scribe> ...(continues reading)
JF: The thing about CVAA is it
keeps using 'operable' - in WCAG we have four principles, we
also need to think abbout perception
... CVAA assumes you can perceive
<bruce_bailey> (2) All information necessary to operate and use the product, including but not limited to, text, static or dynamic images, icons, labels, sounds, or incidental operating cues, [shall] comply with each of the following, assessed independently:
Jon: next section in CVAA mirrors perception, also references seizures
JF: Lots of overlap between CVAA and sectino 508 and Airline requirements as well
Makoto: Japanese standards have the same kind of list of needs, need to look for that
Shawn: How do we want to break this down is the main topic here - then reference sources
Jeanne: need to look more broadly at othe countries (South America, Africa)
<CharlesHall> and the list needs to include ‘intersectional needs’
Jeanne: there is a lot od overlap - how do we resolve the conflicts and have a list of our own that can be used, is acceptable? JF raised the harmonisation issue before
CharlesH: did you identify contradictions or lack of harmony? We want to extend list rather than contradict them
Jeanne: There are minor places of disagreement
Shawn: Its mostly for the writing process: these are those we pick for functional outcomes, methods, tests - so we need to draw up a list - could be referenced, are there examples how that is done in WCAG?
<KimD> And mental health issues (anxiety, etc)
Jeanne: Vestibular disorders: when these lists were drawn up, this was not on anyone's radar then
<Fazio_> Our COGA Content Usable guide has a lot of that too
Jeanne: the design pattern creating a problem for vestibular d. started perhaps 5 years ago
<Fazio_> for instance PTSD depression etc, all have cognitive impact that can be aggravated by ICT design
DavidF: We are consolidating the stuff out there, not contradicting things out there
<Fazio_> we have supporting research as well
Janina: make it easier to add to the list as new knowledge emerges
<Lauriat> +1 to Kim & Fazio
DavidF: There's a lot of work
done in the Coga TF on this, identified new user needs that are
not yet covered
... discalcular, mental fatigue, and others
Jeanne: David Swallowdid interesting work on what can be done to reduce anxiety
<jon_avila> The cognitive category could really be broken out and expanded to include things like perception of emotion, anxiety, distraction, etc.
<davidofyork> I am still on the call, hi! That's great to hear, I didn't know that.
DavidF: Wording can set up the
wrong tone, lot of research what triggers anxiety and prevents
proper use
... some people just can't percieve
... Lots of things that have not yet been including in WCAG
Detlev: Should be as simple as possible
<Rachael> 1+
<Lauriat> +1
DavidF: We have broken that down in categories, so Silver should accommodate that
<jeanne> +1 to including the detailed cognitive needs in WCAG3
CharlesH: To Detlev: Do you think info architecture could solve that concern? If one id selective of say, 25 categories and only pick
<Fazio_> qWe can benchmark Neuropsychological eval categories
<Fazio_> So we won't be reinventing the wheel if necessary
Jake: Still wondering what results we want to deliver - of all the different functional needs should be covered, the effort will be far too high - it would take a lot of time to include all the needs in the testing outcome
<Fazio_> +1 to Jake
<jeanne> +1 to Jake about putting more effort of the functional needs analysis into the guideline writing and add to benchmark testing.
Jake: the number doesn't matter if we create a baseline, we need to provide the baseline for benchmarking - we can include the cognitive needs in that benchmark so it will be covered
<Fazio_> For instance visual complexity, low contrast, etc create mental fatigue - a COGA issue
Jake: it is our work to create the benchmark that is inclusive of these needs so testers can use that - the tester then does not need to know about it in any detail
<Fazio_> So, grouping would be effective
<jeanne> +1 to put the burden on us and not the individual testers
Rachael: it would be good to put together a tree structure but perhaps nit to decide on grouping right now - too early
<Lauriat> +1 to Rachael
<Fazio_> should we paste content usable link?
Jeanne: Wuld be helpful to have some of it now - we should put the burden on us (+1 to jake) not to individual testers
<KimD> +1 to Jeanne, Jake - Functional needs is for us to figure out
<Lauriat> +1
<kirkwood> +1
Jeanne: would be good to have a draft version when writing the individual guidelines - maybe we can have a separate functional note that anyone coul duse
<CharlesHall> +1 to central doc for all groups
<Fazio_> I'll help
Janina: agree with MichaelC to start writing an extended list of functional needs
<CharlesHall> if you create a doc, i will contribute to it
Shawn: We got David F and CharlesH as volunteers
<Zakim> Lauriat, you wanted to mention grouping as a way of showing our work.
Shawn: the grouping is important to have a concise list that can be expanded
<Fazio_> we have 2 EU peeps in COGA
<Fazio_> we can ask them also
Jeanne: Can anyone in the EU or elsewhere contribute so it is not US-centric
<PeterKorn> Regrets. I need to leave early
DavidF: We have discussed how the
different user needs contribute to fatigue, how to aggregate
that - that would require more research
... many aspects (white space, contrast etc) can contribute
<CharlesHall> there are also issues like depth perception for XR
Jeanne: Anyone els who can contribute?
<Lauriat> +1 Charles, good example.
Jeanne: anyone at FAST (Framework for accessible specification of technologies)
MichaelC: It is a core that should inform different guidelines
<Fazio_> being blind in half of each eye having depth perception is contingent on cocking my head and angling my eyes
Jeanne: Recap: We are pursuing adding scoring by functional need types, contributing to a total score, working through examples of how that could work; create a group to create a comprehensive lsit of user functional needs
<sajkaj> Fast Intro: https://www.w3.org/2019/Talks/0516_FAST_MC/ Checklist: http://w3c.github.io/pfwg/wtag/checklist.html
Chase task based scoring (Giorgio B. / janina)
<jeanne> +1 for concrete examples
Jake: Would like to create examples so people get claritiy on how this works - it is only examples where you start seeing the flaws - check how quantitative and qualitative can be merged
Jeanne: ping me on these
examples
... any issues reg. functional needs to be still tackled?
<CharlesHall> the complexity of intersectional needs or the conflict of needs is a bug challenge
JF: We have to expand it beyond seven or nine, what is the right number of categories - risk is that it gets too granular, 20-30 gets too much - we need to find the right balance between issue and needs of content creators
Jeanne: adds this to Wiki list of outstanding issues
Janina: We resume in an hour
<bruce_bailey> scribe: bruce_bailey
Objective of next two hours is to take pieces from last 36 hours and put together to get a substantive list of what still needs to be done
Jeanne: i have updated
diagram
... out line in red is functional user needs
... and added arrows for what informs what
<ChrisLoiselle> could we have the link for the diagram again?
Jeanne: still a little rough, but should be better
<jeanne> https://docs.google.com/drawings/d/1hYzmiqrvNo_ymuXbtV5P5oKiyKIJRpevkHQ-KDD8l4o/
link to new illustration
scribe: i took out color, and
will be taking out arrows
... based on call yesterday, split two roots of testing, which
need name
... in the conformance process, we have task based pass
... many orgs will do both, but not all will want to
<Fazio> I'd assume they'd do what's cheapest
can both approaches work?
John Foliot: it will be a combination, especially at launch
scribe: with new pages, less user testing and more page based approach
<Lauriat> +1 to JF, though I think we can still express the overall results in terms of tasks.
scribe: functional walk through useful, but old (like 2007) not really useful
<Fazio> +1 JF concern about dated walk throughs
scribe: so old dated walkthru should not be part of scoring
<CharlesHall_> so does scope include a date?
Jeanne: I like what you raise
about maintenance being an important issue, and that has not
been part of this
... maitentence is more tradition than task based?
JF: That kind of task based
testing has not really been happening today very
frequently
... focus is on "this weeks updates" for example, a new blog
post
<sajkaj> I think John's example is talking about Challenge #3 from the Challenges doc
JF: we know blog post is published in accessible setting, but authors prose might use "click green icon on left"
Jeanne: John, in your experience, what do client do today. Say a bank asks for review of client service. How do you scope?
JF: We have a range of options,
automated manual scanning, expert review, or screen-reader user
acceptance testing
... so we dialog with customer about what problem they are
trying to solve
Jeanne: suppose customer asks for just "create new account" how do you scope that?
JF: We set expectations and negotiate around what is a representative sample.
Jeanne: Same question to TPG.
Charles (not for TPC): cognitive walk through typically does not look at individual content
<Fazio> We do task based scripts put together with input from owners intent
<Fazio> plus wireframes
scribe: so we could look at scope of claim and it does not as time sensitive because it looks at process and not editoral content
<Lauriat> +1 to Charles
Jeanne: that help me, as when I
worked at TPG, customers really wanted TPG to only look at new
section
... we would scope out tasks for new section, and
representative pages in that section.
... so we did whole page look and task flow look, and then test
with different assistive technologies.
<CharlesHall_> essentially, a new cognitive walkthrough evaluation would be required if the task changed or the pattern changed, but not if there are routine editorial maintenance releases
Jeanne: Client says "this is what i want to test" and TPG working out what that implies
<davidofyork> (Sorry, I'm having mic issues). Jeanne, I'm rarely involved in scoping anymore, but from what I gather the process is similar to what you've described.
JF: for companies like mine, we
have dashboard like tools, but customers also want feedback to
limits risks
... so they need dashboard views in addition to our testing
<Fazio> to Jake's earlier quant qual direction automated test provides quantitative data to scope qualitative testing
JF: we have api and tools to put
into their processes, but we dont have tools for editorial
evaulation
... like click on the red header instructions, that needs human
evaluation
<Lauriat> Big +1 to both levels of testing and tracking!
JF: a large company needs need to have confidence about divisions
Jeanne: Would you say this is a reflection of traditional view point and task based side?
JF: There is a 3rd aspect: the dashboard spider tools
Jeanne: That it is the traditional side, does not care about tasks
JF: The human based testing, if used to assess initial score, how does that age?
Shawn: You example of initial
human testing is a wide comprehensive sweep [JF agrees]
... so minor updates over the years, does manual testing happen
for the incremental additions?
JF: Yes, sure. Gives example of migration to Drupal. Lots of testing at beginning, but fewer editorial controls over time.
Shawn: Thinks teams like those at Google, work a little bit different, so accessibility reviews are more contemporary
<Fazio> +1 to JF's concern
JF: Consider NYS school system. Set up as robust as possible, but then editor controls are not strong after the initial rollout
John Kirkwood: Agreed, this is a real issue with the network I oversea.
Shawn: Can we have a date stamp
of last inspected, like one sees in an elevator, can we scope
those "sell by date"
... for a tradition website, initial review could work for
years. With other technologies, say VR, a six-month old review
might not be meaningful.
... We definitely need clarification that review does not cover
content after X date, but how long review should be good for is
an issue.
JF: With WordPress, site content writer might not update control panel setting.
<sajkaj> Apropos this discussion, I still use lynx
<Fazio> It helps comply with ISO
<Fazio> WCAG is ISO also
<Zakim> jeanne, you wanted to say that it isn't a business of standards to say how long conformance can last, that is regulations
John Kirkwood: We need to be very careful about getting into maintenance process. Our stuff needs to be timeless if possible.
<Lauriat> +1 to jeanne, especially if we define compliance at a lower level like a task.
Jeanne: I am not thinking that it should be up to us about how long a review should be good for.
<KimD> +1 to Jeanne
<JakeAbma> +1 to Jeanne, not up to us
Jeanne: We can say what is the
basis for accessibility, and it up to site owner and regulators
to worry about expiry.
... I would like to go back to the time stamp approach we
agreed to last month.
JF: Our clients are looking for a
true value report.
... a 3 year old usability study is not valuable
<sajkaj> Why is that up to us?
<KimD> +1 determining expiration date or value is not on us.
Jeanne: The usability study has a date stamp, so does not factor for the score today.
JF: Industry wants up to date dashboard.
David Fazio: Can prior tests factor into the current day score?
<Lauriat> -1, that's a different level of testing and reporting.
Jeanne: Lets have the discussion about legacy sub tasks.
JF: Anything that effects score has to be figured out now.
DavidF: We have issue with WCAG 1.0 conforming implies 2.0 conforming, but this might not work with 3.0.
<Lauriat> -1 to tying compliance to litigation as a matter of how we define conformance.
Jeanne: Do we want to include earlier human scoring?
Do legacy subtasks include in a project?
<JF> Q: when does current become legacy?
<JF> and why?
ShawnL: In the course of path-based testing, you have to test sub paths. Sometime it makes sense to test separtely.
To question as to when become legacy, is case-by-case, and not important if we have transparency.
<sajkaj> +100 to Shawn
scribe: it does not fit in terms of conformance.
JF: We are working on standards,
so a standard way of approaching a problem.
... All I am saying is that the value of human testing
diminishes over time.
... if an author fixes heading, scores go up. But a 2018
evaluation has what impact on the dashboard score?
... Ex, leading up to v3 of website, a company does extensive
user testing. Company does not want to loose that work.
Jeanne: If company does new
conformance statement, those old test cannot be counted.
... Asking about traditional versus path based assessment.
DavidF: Even pages not changed are effected by updated to hardware and browsers
KimD: I am hesitant to say that something has an expiration date. Mom-and-pop pizza site could be fine years later.
<kirkwood> +1 to Kim
KimD: Many things go out of date, but that should not be a flat assumption.
<jeanne> +1 to Kim. Pizza shop not changing and transparency
<sajkaj> What Kim is saying is why I can still use lynx successfully much of the time
KimD: I like what Shawn said about transparency and exposure, so if company cites years old report, that is on them.
<JF> @janina, much, but not always
<kirkwood> +1 to disclosure
<Fazio> then we can't factor it in I would think
It is all about disclosure. Some one challenging the claim, sees the date of testing.
JF: As mentioned, not changing
any code, the website could not work well since browsers
change.
... Mechanical tests could show how things kept working over
time.
<kirkwood> agree with KIm
<Lauriat> +1 to Kim
KimD: Answer for me is disclosure.
<sajkaj> +1 to Kim
<jeanne> +1 for not including old tests in the score. It doesn't make sense
<kirkwood> disclosure allows courts to decide
<Lauriat> +1 to Kirkwood as to why
DavidF: This is still a problem, example with trying open old documents in new versions of software.
Jeanne: I would like to propose
that we allow people who want to make an overall report, that
they be allowed to include when a particular date was
done.
... That lets us report and overall with full transparent of
when review was done.
Shawn: I want to highlight two
example of use of conformance.
... 1 conformance used to make a judgement call, say in a
court
<kirkwood> think we should have a use case of ‘archived material’ included
Shawn: 2 other setting is clients
asking "where am i now" -- conformance needs to support
that
... if we haven't tested this part is six months, with NVDA and
browser updates, that might not be recent enough.
<Zakim> JF, you wanted to ask if regulators have been polled on that
JF: The other use case is the
regulator use case.
... Have we talked to the regulators? They are key stake
holders.
... Accuracy of currency and timeliness of report is very
important.
... Judge at court would not be impressed with years old
report.
John Kirkwood: The robustness of current standards is why they have been so well respected...
scribe: if we are talking about large data sets, say archives, if it was accessible then, it is probably still very accessible
<KimD> +1
<Lauriat> +1
scribe: cant overlook foundations
for new stuff
... archived materials is a huge amount of data
<sajkaj> +1 to John K
DetLev: JF asked about regulators, thinking about EN 301 549 which trickles down to 28 member states
based on WCAG 2.1 and 50 success criteria
scribe: I am worried that we get
too complex and graded depending on many things and
timelines
... wonder how this will work to become benchmark for 27
countries. Big question mark for me.
<JF> +1
Shawns: JF and DetLev comment reminds me that regulators are part of our core audience, we need to reach out to them and find out if our cases work for them.
<KimD> +1
<Zakim> Lauriat, you wanted to speak to the stakeholder point JF raised, a very good reminder.
Shawns: those writing regulation and those needed to use regulations to make judgement callls
Jeanne: Would like a straw poll /
temperature on this topic.
... what would be a statement that could summarize this
issue?
<jeanne> Do we want to include older results in current conformance with a date stamp?
<sajkaj> +1
<kirkwood> ! yes with a date stamp
<Rachael> +1
Jeanne: Do we want to include older results (with a date stamp) into current conformance reports?
<kirkwood> +1
<Fazio> depends if it factors into new score and how
<JF> if the date has no impact on the final score, it's window dressing
<Lauriat> +1 to Fazio.
DetLev: I still do not understand this question.
<Detlev> it depends on whether the old stuff can be seen, cross-checked -provided it hasn't#t changed basically
<KimD> +1
DetLev: time stamp or not does not impact responsible to say what is accessible or not
<Detlev> that is Jake, Bruce!
<jeanne> +1 with a date
DetLev: in Dutch government, with EN 301 549, agencies have to affirm every year.
<OmarBonilla> Too many external factors affect the validity of past reports, including the current state of ATs, browsers, etc. For some industries, such as those that serve governments, those governments may work on old tech and require considerable backwards compatibility. Others may be more bleeding edge. The time stamp is fine, but the significance or the "atomic decay" as it were, the rate of that decay is affected by a lot of things.
<kirkwood> it does matter. it shows the management of accessibility, frequency and the correlation with technology and state
DavidF: I think the question is if old test factor into the new contemporary score.
<kirkwood> and the courts understand it
<Rachael> I would like to a statement to the effect that you Can include previous reviews w a date stamp but ideally do the complete test. Then add this discussion summarized in the document.
Detlev: It is not an issue. If you claim 100% but its 50% because old test is not valid, you are responsibility.
<sajkaj> Murder doesn't invalidate laws against it
DavidF: Can old automated test from 2018 factor to new 2020 silver report?
<Rachael> If the code baseline hasn’t changed, why can’t you?
Detlev: It is up to the person making the conformance claim to be accurate.
DavidF: But we have the test and score.
<Detlev> BRUCE, it is JAKE that has been speaking, not me!!
<JF> @Rachael, because your code maybe hasn't changed, butFirefox and NVDA have...
Jeanne: Score is for what you are testing, not the overall site.
David: Say headers accessible in
1998 but in 2021 that is not enough.
... we have new standards and expectation.
Jeanne: No one proposing that.
David: JF points out that people want to make use of old testing.
<sajkaj> scribe: sajkaj
df: Notes web is just one
factor/page in EN conformance
... Worried about whether we're building a solid foundation for
3.0 conformance claims
sl: We're working through complex
interdependencies, so it's confusing still.
... We don't yet know which work, and which don't.
... And we need to get good at the complexity before we can
simplify
... Believe too many variables assumed in Jeanne's straw
poll
<Fazio> tests/claims should be dated yes
js: We agree we want transparency
<KimD> +1
<Fazio> yes +1
js: We don't want old results in new score, but we don't want to invalidate old results
<KimD> +1
sl: Don't believe we're in the business of assigning and predetermining expiration dates
<Fazio> Can we poll member companies of their preference?
<Fazio> do they want to factor in old scores? will they object if we say no?
<OmarBonilla> +1 to us not being the ones to determine expiration dates. The rate of "decay" of the old tests is dependent on too many other factors for us to make a blanket call on that.
sl: Believe we agree on
expiration, but unclear about underlying problem
... We could have "date last tested" with appropriate
granularity
... Could have some kind of decay notion
... Or, it's just the claimant's responsibility
df: Believe we should poll,
though not sure how feasible
... Believe people will want backward compatibility
<Detlev> +1 to DavidF
sl: Believe we can have some backward in some of our testing, but don't believe we can do that in conformance
df: I'm thinking at task level
<Detlev> * Janina, you have df both for David Fazio AND for me...
df: for each task based test
sl: We don't want to make it difficult to move from 2 to 3
df: If date last tested factors ...
sl: Needs to point to 3 if it claims 3
<Zakim> sajkaj, you wanted to suggest it's up to the claimant when they move from 2.x to 3.x
<Fazio> Yes
<jeanne> Options:
<jeanne> 1 Date lasted tested and the version of WCAG included in the claim
<jeanne> 2 Date last tested decaying over time included in an overall score on a conformance claim
<jeanne> 3 The responsibility of the client to represent their claim truthfully.
<JakeAbma> 3
<Lauriat> 3
<Rachael> 1 (but 3 is also true)
I like both 1 and 3, but believe our job is only 3
<KimD> 1 (although 3 is true also)
<jeanne> 1
<Lauriat> +1 to Janina, why I voted that way
<CharlesHall_> 1
<OmarBonilla> 1, plus 3 is true
<kirkwood> 1
<Detlev> can't say
<Fazio> Option 1 is less messy
<Detlev> I suggest a carefully worded survey
<bruce_bailey> 1
<Fazio> I have a feeling public will want option 2
sj: Suggests 1 and 2 not in conflict with 3
<Lauriat> +1 to Detlev, though I think we may want to prototype 1 & 2 to see how well they work?
js: Will make a more carefully
worded survey
... So, believe we were asking whether we wanted two separate
tracks, traditional and/or task based
sj: Clarifies wcag-em; task-based; or combo
js: Yes, but maybe separately reported
df: Not sure how much task
testing we'll actually get
... Difficult to organize and do correctly with real users
sl: Clarifies that task based testing isn't necessarily user based
df: OK, my misunderstanding. That makes sense.
sl: Q about 2.1 conformance
claims: Is any distinction required ...
... Do we need today's testing approaches into account in
scoping 3 conformance for page based
<Fazio> I think the public expects it
sl: Don't want to have tgraditional compete with task based in conformance
<Fazio> Understanding WCAG says similar
sl: If the two ways of arriving
at a claim compete, that's bad
... We should specify how the conformance is described
df: Public expects this to be as
easy as possible, and the transition to be as smooth as
possible
... Notes Understanding says it's possible to meet criteria but
still be unusable for some pwd
sl: We don't need to specify how
testing is done,
... But specifying how conformance is claime will set up the
structure to expose how it was done
<Fazio> Just keep in mind no one wants to run 2 audits of the same pages, elements etc
<OmarBonilla> +1 to lack of desire for redundant testing
sl: We're just supporting a definition of how you show conformance
js: There's a lot of capitalized resources for today's approach which isn't conducive to task based testing
sl: why not?
js: Because it would take
investment to retool for tasks path testing
... We want to support people's way of working
<Fazio> which gives more legal credibility
sl: Not worried about our testing becoming outdated by 3
js: Interested to hear about that
sl: Notes that log output can be associated with test flow and abstract data from that
<Fazio> lol
sl: If you define the path, than
your page based provides your data
... Also supports describing impact on users when a particular
failure on a particular page is noted
<Fazio> crickets
js: Can we ask people to do that?
sl: No, but we don't need to
<Fazio> +1 to Shawns flow
sl: We provide the building blocks
js: More vpats?
sl: vpat may adopt some of
this
... It always comes down to describing what I was trying to
do
kd: Asking SL if we only specify
task conformance we'd get the same results as page based?
... Are you saying task based and traditional would yield the
same results?
<Fazio> +1 again
sl: No, that you can extract task description out of page based testing
kd: If we only test components, we miss stuff
df: Agrees with SL
<Detlev> -q
df: Can meet all kinds of reqs and still end up challenged
<Zakim> sajkaj, you wanted to say extracting task conformjance out of page testing assumes all rthe relevant pages have been tested
<Detlev> +1 to DavidF
kd: Reminds SJ about login
piece
... Adds examples, ...
<Detlev> +1 to Kim
sl: Agrees with the idea, but don't believe we cfan mandate all the pieces
l: We can have noninterference testing
sl: For login we can reuse UXS world to show how the task is defined
df: Concerned that people would
focus on a few tasks and miss other aspects of their
pages
... That may make page based more thorough
... We shouldn't lose what works well today
sl: It's already possible to
claim only a few pages for conformance today -- or main
features
... You just can't say the entire site (or app) conforms based
on such testing
... Don't believe page based does that good a job anymore
<Fazio> +1
df: Repeated errors in a part of
pages that no one interacts with shouldn't work against
a11y
... Aware that many things come up through interaction and
aren't really a page in the traditional sense
... It should be transparent how many of a total are tested
sl: We shouldn't narrow the scope
down and avoid what affects users
... We haven't yet figured out the mechanics
... How to go through a given task and what's in testing scope
as you audit
... A footer strangeness little impact, but autoplaying video
would make a big mess
<KimD> Both are bugs - that goes to severity
js: Severity goes with task based and not well documented via traditional
sl: Think at both and impacts individual tests and overall score
js: Moving to wrap up where we are and what to work on ...
<jeanne> https://www.w3.org/WAI/GL/task-forces/silver/wiki/Conformance_Issues_and_Decisions
js: Above is Jeanne's open issues
tracker
... Asking about "needs more owrk" parts
... composite rules that could game
... we want generic advice about breaking tasks into path
parts
... Granularity of reporting ...
... How should older testing affect scoring?
js; How large the functional needs list?
js: Will survey how companies track over time
<CharlesHall_> and appropriate grouping / hierarchy of user needs
js: +1 to Charles
... Task based vs traditional--where and how to factor
noninterference; no resolution and continue discussion. SL?
sl: Still confused by the framing; if conformance measure is at the task level, it doesn't matter imo
js: Recall's df's point re current tooling
sl: Believe they continue useful and usable
<Detlev> DavidF's point...
sl: It's how yhou use the output of those tools
js: Asks for additional
actions??
... Recalls functional needs work with FAST
<Fazio> agreed
sl: Believe today we've mostly
reinforced yesterday's scoping discussion -- we need to see how
things fit
... We need to explain our outcomes as best as we can to our
stakeholders for their comment and reaction
js: Thanks all around!!
<kirkwood> bes to all!
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/Chris to Rachael: I missed the first half of your comment. (5 is equal to a pass, but if you break it down to functional areas, the score you want to for a blind user vs. a person with a seizure would need to have an equal baseline)/I have worked with an org that uses a likert scale with wcag 2.1 and has scored a lot of websites. In this siutation 5=Pass in 2.1 since its a hard pass or fail. When broken down by functional areas though the #/ Succeeded: s/Davide Swallo /David Swallow/ Succeeded: s/Jpohn/John's example/ Succeeded: s/TPC/TPG/ Present: jeanne Lauriat ChrisLoiselle sajkaj PeterKorn Detlev bruce_bailey JF OmarBonilla Makoto Fazio_ KimD JakeAbma MichaelC CharlesHall kirkwood Fazio OmarBonilla_ CharlesHall_ Found Scribe: ChrisLoiselle Inferring ScribeNick: ChrisLoiselle Found Scribe: Detlev Inferring ScribeNick: Detlev Found Scribe: bruce_bailey Inferring ScribeNick: bruce_bailey Found Scribe: sajkaj Inferring ScribeNick: sajkaj Scribes: ChrisLoiselle, Detlev, bruce_bailey, sajkaj ScribeNicks: ChrisLoiselle, Detlev, bruce_bailey, sajkaj WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]