Silver Community Group Teleconference -- 02 Aug 2019

<pkorn> Testing web-based IRC...

<shari> preent+

<pkorn> Semi-present...

<pkorn> (having audio issues)

<scribe> Scribe: Chuck

Jeanne: Had conversation in content meeting and then in conformance meeting. What do the 2 different sub-groups need from eachother?
... Starting with content group: What do you need from conformance group to help you with your work?

Chuck: We need a conformance model.

:-)

Jeanne: It's a good question. Those who are writing content. Chuck, Bruce, several other people have started writing content.
... Where are you being held up that would be unblocked by getting something from Conformance?

Jeanne: Flipping around. From conformance, what do you need from content group?

JF: On Tuesday, sorry I missed some of the call <airport travel>, we started talking about on Tuesday was rolling up sc into the larger functional needs.
... We were looking at multiple sc that are related in someway shape or form for textual alternatives with non-text content.
... Sean was rolling this up, identifying 3-4 sc which had an impact.
... I was thinking about this... at some level we need to think about individual SC going to be worth points wise so that as content group comes back with a sc...
... It has to have a worth-value attached to it as part of the scoring.
... We need to have an awareness of of the functional need is going to be for worth (score)...
... If we start from higher level functional level, we can give points to .... or
... Do we look at individual sc, give then individual scores.... and add up for overall functional need.
... Do we start from the pieces or do we start from the whole?

Jeanne: We haven't yet agreed on what we are going to measure.

JF: In some ways we have though, we said that Bronze will be equiv to WCAG A and AA. We need to measure that.
... We need to measure each sc which will be part of a larger score.

PK: <introduction>

Call in user 2: <introductions>

PK: JF you mentioned points for different sc. A different cut would be to look at the testability of the sc. That which is programatically determinable will be more amenible in larger sites.
... May be a valuable cut at the content side.

JF: I agree Peter.

Jeanne: You said that the testability is a part of the measurability, particularly for larger organizations.

PK: Any site which is large, complex, dynamic... if you have a site that updates 10s of K a day, human testing becomes unfeasible.

Jeanne: We've been looking at that over the last years. Sever proposals on the table.
... Which is why I don't want us to jump to the point system when we haven't figured out what we will measure.

Janina: <unmuting?????>
... Wondering if it has been determined if we will use a scoring system or if it's just a proposal.

<KimD> +1 to Jeanne - It doesn't seem like we're ready to figure out the point system quite yet

Janina: People are talking about it as if it's a forgone conclusion.

Jeanne: One of the docs I put in the agenda today... for the conformance model, is a summary of 6 months of work done over the past year.

<jeanne> https://docs.google.com/document/d/1wklZRJAIPzdp2RmRKZcVsyRdXpgFbqFF6i7gzCRqldc/edit#heading=h.sevi88jq0fiq

Jeanne: This is the current conformance draft.
... What that summarizes is the work we did based on 18 months of silver research, then the analysis and leading up to silver design sprints.
... At design sprint we had some potential solutions identified. Different sub-groups worked on a prototype conformance model so that it could identify issues.
... How we got to a point system is that we wanted to give rewards to orgs that did more.
... Not just say "here's the checklist, here's the minimum you must do. Anything less than 100% means you get nothing."
... We wanted to tie that into changes we made to information architecture. We wanted to flatten... we would have guideilnes (including most principals, guidelines, sc), then methods
... Which include techniques and sc in WCAG 2.1 which is very technology specific. We want a structure where the guidelines are technology neutral. Techniques move into methods.
... Guidelines would be technology agnostic. We want to keep measurement in the technique. That gave a number of different options, more ways to measure than a t/f sc.
... We've been using facebook as an example. Yes we need "this" guideilne (color contrast for example), we could say "these are the ways we meet the guildline). Traditional techniques,
... Or another way. As long as they meet the need, they get the points.
... We've been talking about numerous kinds of tests <lists a bunch>

<Zakim> janina, you wanted to ask whether points has reached decision of the group?

Jeanne: Number of different kinds of tests which make Silver more flexible. Would be helpful for many different types of orgs.
... That's a high level summary.

Janina: So the answer is "yes" we will do some kind of scoring.

Jeanne: Yes. We did a lot of work on scoring a year ago. One of the issues we ran into during feasibility tests it that it didn't work the way we expected it to. We determined we need different
... kinds of scoring than what exists in WCAG 2.1.
... That's where we left it.
... Part of what we are trying to do is write enough content with some of these new types of tests (Rubrik, Coga walkthrough) so that we could have real data to exercise in the point system.
... That's where we ran into problems, having solid examples.
... That's where we were in May. And then there was a push to stop that and work on conformance.
... We have received a # of proposal for point systems.
... Also have the content group work and move forwad simultaneously. So we don't lose time.
... We are starting to come up with the needs to have data to test with.
... Where this q started: Is there anything specific that the content people need from conformance group.
... What the conformance group needs is pretty clear. Need data to use.
... We set up a 3 step process to evaluate conformance. Pros & Cons, Feasibility Test of survivors, merge the good points, and then run against real sc.
... We want data driven and not opinion driven.

Janina: Appreciate the summary.

Jeanne: Very welcome. Things become clearer when we talk about it.

Janina: I don't understand Rubrik yet.

Jeanne: A common term in education. Every homework assignment has a rubrik. In elementary school... if you do the absolute minimum you would get 1 pt. If you did more you get 2 pts.
... If you did a good job you get 3 pts. If you do an excellent job you get 4 pts. Rubrik is the definition of the bands.

Janina: Your welcome.

Chuck: I didn't know that either.
... Kids have been out of kindergarten for many years.

Janina: I did some research, but didn't find what I needed.

<JF> Develop a point and ranking system that will allow more nuanced measurement of the content or product: e.g. a bronze, silver, gold, platinum rating where the bronze rating represents the minimal conformance (roughly equivalent to meeting WCAG 2 AA), and increasing ranks include inclusive design principles, task-based assessment, and usability testing.

JF: Want to go back to something.... respectfully disagree. We have things to be measured, and things to measure with.
... Point #3: We need to have something that is equivalent to WCAG 2.1 SC. If you meet all of the A and AA, you are roughly at bronze. In the WCAG 2.x model, you are at 100%
... We know that we will accept less than 100%, but 100% = bronze.
... If you add up those points will meet 100% bronze. We do have things that can be measured, and we need to think about that. We have new SC coming. 16 new in 2.2 (on the agenda).
... 9 new, 4 addition, 3 updates. They need a value, and will contribute to the overall points. If we acknowledge that Rubrik and Coga walkthroughs add additional points, that's fine.
... But the existing WCAG can be defined now.
... We can start at a higher level thing. "All text alternatives can have...."
... If you succeed at all 3, you get the maximum points.
... Chuck said we need a conformance model. What do the conformance model people need. They need to know the values.

Jeanne: Content people are expecting conformance people to provide.

PK: I appreciate something I heard John say. The doc suggests that baselevel bronze = 100% of WCAG 2.0 AA, and higher leves silver & gold would be beyond 2.0.
... Is that the consensus position of Silver? that doesn't seem to address the problem you outlined that came out of 16-18 month of study. Getting all pages that update frequently doesn't seem possible.
... There's a mis-match between points for meeting WCAG and the fact that we aren't able to accurately test all pages in any kind of snapshot to come up with points in the first point.

Jeanne: Correct, that's why I haven't want to jump into what John wants to address. I think we have a few things to work out first. The spreadsheet with points will be easy to do.

<KimD> +1 to Jeanne - we aren't ready to get in the weeds about points yet

Jeanne: The hard question is these issues that are more structural. What do we measure, how, and will it actually provide benefit. There's a number of issues we still need to look at.
... I'd like to move on to next agenda.

JF: One of the things. PK to your question. We don't have a difinitive answer. Mikoto tossed a number over the wall, we would base the conformance model not on pages...
... Rather he mentioned a representative sampling... 40 screens. You would list those 40 in your conformance statement. 40 things key components, things, pages... your conformance would be based on that.
... A representative score.
... An org like Amazon (any large org) that the score would be reporting would be representative rather than granular. Which is impossible.
... Jeanne... what is the unit of measure? It's hard to measure because we haven't defined the unit of measure.
... We don't know how we will break it out in the spreadsheet, other than we've been calling it a point.

Jeanne: If you knew the unit of measure, what would you do next?

JF: We would apply that unit of measure against existing sc.

<CharlesHall> i understood the unit of measure to be within the conformance model. i think how that aligns to content is assigning values using that unit.

Jeanne: We aren't ready to do that yet. Because we need to figure out WHAT we are measuring. We looked at your proposal... Bruce B. raised an important point.
... We were giving lower points for lower hanging fruit.
... What are we going to measure and what are the longer term impacts. We can figure that out before we work on spreadsheets.

JF: You want to figure out what we want to measure, I want to figure out how we are going to measure and what with...

Jeanne: I think that we'll build on the work that ACT came up with. Tests.

JF: Does each test have same value and worth?

Jeanne: We need to discuss that. Then we'll know what a point is worth. And we can turn that into a spreadsheet that has meaning.

JF: There's 2 ways of looking at it. It can be a % or it can be a number. Something needs to = bronze and bronze needs to = WCAG 2.0 AA.
... I've got the beginning of a Rubrik. Meet these, you are bronze.
... I've got a starting point. Bronze = 100% of WCAG 2.0 AA. Can I chissle that down?
... If the 30some A and AA total up to equal bronze, then I can chissle down the SC into percentages for each standard.

Jeanne: I think that's overly simplistic. We have some other issues to solve.
... We work these things out, then we can chizzle out the individual standards.

JF: One of the problems we want to solve is that large websites can have a conformance statement.

Jeanne: I'm kind of wearing a chair's hat and defend a year's worth of work. Could use some help.

PK: I didn't work on it, but I am in queue.

Jeanne: Go ahead.

<jeanne> acl jf

PK: Curious about the notion of 40 samples is the right number.
... FAcebook has 1.5b people on planet.

Chuck: That was just a number thrown out so we could advance.

PK: Why is that the right number? Where did that come from?

JF: it was tossed out to start the conversation. Mikoto is using that in Japan. It's a starting point. Not carved in stone, in steam rather.

PK: John it sounds like you have been driving off of statement that Bronze = 100% of 2.0 I thought that Jeanne didn't agree with that, but on page that's what it says. Is this a final decision?

Jeanne: That's in jello.

PK: That doesn't get to the problem statement.

Jeanne: Correct. That's why I say it's Jello. It's a goal, we want to give people a path to Silver. If you are successfull today, you will meet SOME level of silver.
... And we discussed AA would meet some level.

PK: Do we need a different term?

Jeanne: That might be helpful.

PK: That conformance meant what we did when we looked microscopically, but now need a newer term for a larger site?

Jeanne: We talked about moving away from the page model all together. Part of our mandate is to go beyond web, and include current and future techs.
... Page model has to go away anyways. That's when we started looking at a whole site eval.
... That's why when we aren't expecting an exact equivalent of AA to silver/bronze.
... WCAG AA silver level bronze. Can't go from page model to whole site model and have it match. That's why I haven't wanted to get into talking about what Bronze is valued at.

PK: If we are moving away from page model, notion of evaluating a subset of pages... don't you need something that applies at the page level distinct from something that applies at the site level?
... How do you then measure by points or anything else?

Jeanne: The 40 page example was a use case that Mikoto proposed. No consensus on that.

PK: More saying in any kind of evaluation, you are doing a couple of things. Looking at subset of whole if large, looking at snapshot in time. When looking at subset regardless of quantity...
... You are still evaluating that subset by presumably looking at the things you can measure from the sc.
... Or am I not on solid ground anymore?

Jeanne: This is one of the key issues we were working on when we had to stop. We recognized what you are pointing out. there was some discussion, I'd have to review minutes to re-familiarize with solutions.
... I think they were looking at having some guidelines that are measured the way we measure today, but others would be measured by task completion, and some would be process accomplishment.

PK: When that work gets picked up again, where will it occur? here?

Jeanne: Yes, as we speak.

PK: Back to my opening comment to John. Absent enough solidness to start working on points towards a point total that equals bronze...
... Would be some quantifying of testability as a useful thing that the content group can work on to help with the conformane effort.

JF: Yes, we need to boil it down to unit tests. Which is ACT. And we have a good collection of unit tests which map to WCAG 2.0 AA. What are those points worth?

PK: So... I'm trying to say something different. What of these unit tests need what level of human level and judgement to do the test?
... What I can't tell easily is if the alt text is correct.

JF: In my straw man proposal (that's all it was) I had thought about that. I suggested that level of effort would be a multiplier in coming up with a value. Click a button get a score...
... will be worth something. Click a button and human reviews, worth something more. Broadly speaking.

PK: Not sure I agree with the worth more. Worth for what?

JF: In terms of.... one of the things we are trying to do is there is some gamification going on. Someone recommended multiple currencies.
... In that regard Peter, the more effort you need to invest to ensure success, should be rewarded proportionally?

PK: For what purpose?

JF: For your conformance score?
... Why?

Jeanne: There are other ways to value the effectiveness of meeting the needs. Effectiveness would be another factor. That's in one of the proposals.
... We didn't get to talking about different proposals. I would like to remind everyone to review the proposals and write up pros/cons/risks/holes (that which is missing).
... Email to me or post them or put on Google and send me link. So that we can collect these. We have 3 so far. We will look at these Tue. Evening.

PK: My last thought John is a measure of programatic testability, or testability hardness challenge without a... standing on it's own without any value judgement... just a strict "how programatically testing is possible"
... Would be a valuable model.
... Content effort would be fruitful.

JF: I think you are right. The ability to do programatically testing low level, there are certain existing sc that are fully testable programatically. But some of the tests have a greater contribution for more users.
... Lanugage of doc vs. language of content. Where language of doc is easy to implement and test, and impacts X users. Getting inline words is slightly harder to test, because human eval is needed of text on screen.
... You need to test for words you believe need to be marked up. Doing so, the impact of doing that right is much more involved than language of doc. when comparing both sc against eachother...
... One has a greater impact than the other, and has a greater effort than the other. When I put those together, one should be worth more than the other. That's the way the logic breaks out.

PK: Take the worth value judgement out of it, just say "we've got 2 dimensions", one is how much effort it takes to evaluate, one is how much it impacts the customer.
... The second one will be harder. The impact of a mislabled word on a page if the word isn't very important will be different.

Chuck: Maybe you can work on a suggestion...

PK: Thanks for the invite. I'm now trying to understand the existing ones.

Jeanne: I echo Chuck's invitation. Maybe you can jot down some of these ideas.
... Another thing is that there are different types of measures we can use.
... It's the first proposal in the list <reviewing>
... Called "Scoring Parameters", it's linked on the wiki page. It's one member's analysis of different things that could be measured and why.
... On that note, time to end the call.
... Remember the different meetings on Tuesday.

7am ET US on Tuesday will be next conformance meeting.

<janina> s/PM/PM?

- DRAFT -

Silver Community Group Teleconference

02 Aug 2019

Attendees

Contents

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output