<pkorn> Testing web-based IRC...
<shari> preent+
<pkorn> Semi-present...
<pkorn> (having audio issues)
<scribe> Scribe: Chuck
Jeanne: Had conversation in
content meeting and then in conformance meeting. What do the 2
different sub-groups need from eachother?
... Starting with content group: What do you need from
conformance group to help you with your work?
Chuck: We need a conformance model.
:-)
Jeanne: It's a good question.
Those who are writing content. Chuck, Bruce, several other
people have started writing content.
... Where are you being held up that would be unblocked by
getting something from Conformance?
<silence>
Jeanne: Flipping around. From conformance, what do you need from content group?
JF: On Tuesday, sorry I missed
some of the call <airport travel>, we started talking
about on Tuesday was rolling up sc into the larger functional
needs.
... We were looking at multiple sc that are related in someway
shape or form for textual alternatives with non-text
content.
... Sean was rolling this up, identifying 3-4 sc which had an
impact.
... I was thinking about this... at some level we need to think
about individual SC going to be worth points wise so that as
content group comes back with a sc...
... It has to have a worth-value attached to it as part of the
scoring.
... We need to have an awareness of of the functional need is
going to be for worth (score)...
... If we start from higher level functional level, we can give
points to .... or
... Do we look at individual sc, give then individual
scores.... and add up for overall functional need.
... Do we start from the pieces or do we start from the
whole?
Jeanne: We haven't yet agreed on what we are going to measure.
JF: In some ways we have though,
we said that Bronze will be equiv to WCAG A and AA. We need to
measure that.
... We need to measure each sc which will be part of a larger
score.
PK: <introduction>
Call in user 2: <introductions>
PK: JF you mentioned points for
different sc. A different cut would be to look at the
testability of the sc. That which is programatically
determinable will be more amenible in larger sites.
... May be a valuable cut at the content side.
JF: I agree Peter.
Jeanne: You said that the testability is a part of the measurability, particularly for larger organizations.
PK: Any site which is large, complex, dynamic... if you have a site that updates 10s of K a day, human testing becomes unfeasible.
Jeanne: We've been looking at
that over the last years. Sever proposals on the table.
... Which is why I don't want us to jump to the point system
when we haven't figured out what we will measure.
Janina:
<unmuting?????>
... Wondering if it has been determined if we will use a
scoring system or if it's just a proposal.
<KimD> +1 to Jeanne - It doesn't seem like we're ready to figure out the point system quite yet
Janina: People are talking about it as if it's a forgone conclusion.
Jeanne: One of the docs I put in the agenda today... for the conformance model, is a summary of 6 months of work done over the past year.
Jeanne: This is the current
conformance draft.
... What that summarizes is the work we did based on 18 months
of silver research, then the analysis and leading up to silver
design sprints.
... At design sprint we had some potential solutions
identified. Different sub-groups worked on a prototype
conformance model so that it could identify issues.
... How we got to a point system is that we wanted to give
rewards to orgs that did more.
... Not just say "here's the checklist, here's the minimum you
must do. Anything less than 100% means you get nothing."
... We wanted to tie that into changes we made to information
architecture. We wanted to flatten... we would have guideilnes
(including most principals, guidelines, sc), then methods
... Which include techniques and sc in WCAG 2.1 which is very
technology specific. We want a structure where the guidelines
are technology neutral. Techniques move into methods.
... Guidelines would be technology agnostic. We want to keep
measurement in the technique. That gave a number of different
options, more ways to measure than a t/f sc.
... We've been using facebook as an example. Yes we need "this"
guideilne (color contrast for example), we could say "these are
the ways we meet the guildline). Traditional techniques,
... Or another way. As long as they meet the need, they get the
points.
... We've been talking about numerous kinds of tests <lists
a bunch>
<Zakim> janina, you wanted to ask whether points has reached decision of the group?
Jeanne: Number of different kinds
of tests which make Silver more flexible. Would be helpful for
many different types of orgs.
... That's a high level summary.
Janina: So the answer is "yes" we will do some kind of scoring.
Jeanne: Yes. We did a lot of work
on scoring a year ago. One of the issues we ran into during
feasibility tests it that it didn't work the way we expected it
to. We determined we need different
... kinds of scoring than what exists in WCAG 2.1.
... That's where we left it.
... Part of what we are trying to do is write enough content
with some of these new types of tests (Rubrik, Coga
walkthrough) so that we could have real data to exercise in the
point system.
... That's where we ran into problems, having solid
examples.
... That's where we were in May. And then there was a push to
stop that and work on conformance.
... We have received a # of proposal for point systems.
... Also have the content group work and move forwad
simultaneously. So we don't lose time.
... We are starting to come up with the needs to have data to
test with.
... Where this q started: Is there anything specific that the
content people need from conformance group.
... What the conformance group needs is pretty clear. Need data
to use.
... We set up a 3 step process to evaluate conformance. Pros
& Cons, Feasibility Test of survivors, merge the good
points, and then run against real sc.
... We want data driven and not opinion driven.
Janina: Appreciate the summary.
Jeanne: Very welcome. Things become clearer when we talk about it.
Janina: I don't understand Rubrik yet.
Jeanne: A common term in
education. Every homework assignment has a rubrik. In
elementary school... if you do the absolute minimum you would
get 1 pt. If you did more you get 2 pts.
... If you did a good job you get 3 pts. If you do an excellent
job you get 4 pts. Rubrik is the definition of the bands.
Janina: Your welcome.
Chuck: I didn't know that
either.
... Kids have been out of kindergarten for many years.
Janina: I did some research, but didn't find what I needed.
<JF> Develop a point and ranking system that will allow more nuanced measurement of the content or product: e.g. a bronze, silver, gold, platinum rating where the bronze rating represents the minimal conformance (roughly equivalent to meeting WCAG 2 AA), and increasing ranks include inclusive design principles, task-based assessment, and usability testing.
JF: Want to go back to
something.... respectfully disagree. We have things to be
measured, and things to measure with.
... Point #3: We need to have something that is equivalent to
WCAG 2.1 SC. If you meet all of the A and AA, you are roughly
at bronze. In the WCAG 2.x model, you are at 100%
... We know that we will accept less than 100%, but 100% =
bronze.
... If you add up those points will meet 100% bronze. We do
have things that can be measured, and we need to think about
that. We have new SC coming. 16 new in 2.2 (on the
agenda).
... 9 new, 4 addition, 3 updates. They need a value, and will
contribute to the overall points. If we acknowledge that Rubrik
and Coga walkthroughs add additional points, that's fine.
... But the existing WCAG can be defined now.
... We can start at a higher level thing. "All text
alternatives can have...."
... If you succeed at all 3, you get the maximum points.
... Chuck said we need a conformance model. What do the
conformance model people need. They need to know the
values.
Jeanne: Content people are expecting conformance people to provide.
PK: I appreciate something I
heard John say. The doc suggests that baselevel bronze = 100%
of WCAG 2.0 AA, and higher leves silver & gold would be
beyond 2.0.
... Is that the consensus position of Silver? that doesn't seem
to address the problem you outlined that came out of 16-18
month of study. Getting all pages that update frequently
doesn't seem possible.
... There's a mis-match between points for meeting WCAG and the
fact that we aren't able to accurately test all pages in any
kind of snapshot to come up with points in the first point.
Jeanne: Correct, that's why I haven't want to jump into what John wants to address. I think we have a few things to work out first. The spreadsheet with points will be easy to do.
<KimD> +1 to Jeanne - we aren't ready to get in the weeds about points yet
Jeanne: The hard question is
these issues that are more structural. What do we measure, how,
and will it actually provide benefit. There's a number of
issues we still need to look at.
... I'd like to move on to next agenda.
JF: One of the things. PK to your
question. We don't have a difinitive answer. Mikoto tossed a
number over the wall, we would base the conformance model not
on pages...
... Rather he mentioned a representative sampling... 40
screens. You would list those 40 in your conformance statement.
40 things key components, things, pages... your conformance
would be based on that.
... A representative score.
... An org like Amazon (any large org) that the score would be
reporting would be representative rather than granular. Which
is impossible.
... Jeanne... what is the unit of measure? It's hard to measure
because we haven't defined the unit of measure.
... We don't know how we will break it out in the spreadsheet,
other than we've been calling it a point.
Jeanne: If you knew the unit of measure, what would you do next?
JF: We would apply that unit of measure against existing sc.
<CharlesHall> i understood the unit of measure to be within the conformance model. i think how that aligns to content is assigning values using that unit.
Jeanne: We aren't ready to do
that yet. Because we need to figure out WHAT we are measuring.
We looked at your proposal... Bruce B. raised an important
point.
... We were giving lower points for lower hanging fruit.
... What are we going to measure and what are the longer term
impacts. We can figure that out before we work on
spreadsheets.
JF: You want to figure out what we want to measure, I want to figure out how we are going to measure and what with...
Jeanne: I think that we'll build on the work that ACT came up with. Tests.
JF: Does each test have same value and worth?
Jeanne: We need to discuss that. Then we'll know what a point is worth. And we can turn that into a spreadsheet that has meaning.
JF: There's 2 ways of looking at
it. It can be a % or it can be a number. Something needs to =
bronze and bronze needs to = WCAG 2.0 AA.
... I've got the beginning of a Rubrik. Meet these, you are
bronze.
... I've got a starting point. Bronze = 100% of WCAG 2.0 AA.
Can I chissle that down?
... If the 30some A and AA total up to equal bronze, then I can
chissle down the SC into percentages for each standard.
Jeanne: I think that's overly
simplistic. We have some other issues to solve.
... We work these things out, then we can chizzle out the
individual standards.
JF: One of the problems we want to solve is that large websites can have a conformance statement.
Jeanne: I'm kind of wearing a chair's hat and defend a year's worth of work. Could use some help.
PK: I didn't work on it, but I am in queue.
Jeanne: Go ahead.
<jeanne> acl jf
PK: Curious about the notion of
40 samples is the right number.
... FAcebook has 1.5b people on planet.
Chuck: That was just a number thrown out so we could advance.
PK: Why is that the right number? Where did that come from?
JF: it was tossed out to start the conversation. Mikoto is using that in Japan. It's a starting point. Not carved in stone, in steam rather.
PK: John it sounds like you have been driving off of statement that Bronze = 100% of 2.0 I thought that Jeanne didn't agree with that, but on page that's what it says. Is this a final decision?
Jeanne: That's in jello.
PK: That doesn't get to the problem statement.
Jeanne: Correct. That's why I say
it's Jello. It's a goal, we want to give people a path to
Silver. If you are successfull today, you will meet SOME level
of silver.
... And we discussed AA would meet some level.
PK: Do we need a different term?
Jeanne: That might be helpful.
PK: That conformance meant what we did when we looked microscopically, but now need a newer term for a larger site?
Jeanne: We talked about moving
away from the page model all together. Part of our mandate is
to go beyond web, and include current and future techs.
... Page model has to go away anyways. That's when we started
looking at a whole site eval.
... That's why when we aren't expecting an exact equivalent of
AA to silver/bronze.
... WCAG AA silver level bronze. Can't go from page model to
whole site model and have it match. That's why I haven't wanted
to get into talking about what Bronze is valued at.
PK: If we are moving away from
page model, notion of evaluating a subset of pages... don't you
need something that applies at the page level distinct from
something that applies at the site level?
... How do you then measure by points or anything else?
Jeanne: The 40 page example was a use case that Mikoto proposed. No consensus on that.
PK: More saying in any kind of
evaluation, you are doing a couple of things. Looking at subset
of whole if large, looking at snapshot in time. When looking at
subset regardless of quantity...
... You are still evaluating that subset by presumably looking
at the things you can measure from the sc.
... Or am I not on solid ground anymore?
Jeanne: This is one of the key
issues we were working on when we had to stop. We recognized
what you are pointing out. there was some discussion, I'd have
to review minutes to re-familiarize with solutions.
... I think they were looking at having some guidelines that
are measured the way we measure today, but others would be
measured by task completion, and some would be process
accomplishment.
PK: When that work gets picked up again, where will it occur? here?
Jeanne: Yes, as we speak.
PK: Back to my opening comment to
John. Absent enough solidness to start working on points
towards a point total that equals bronze...
... Would be some quantifying of testability as a useful thing
that the content group can work on to help with the conformane
effort.
JF: Yes, we need to boil it down to unit tests. Which is ACT. And we have a good collection of unit tests which map to WCAG 2.0 AA. What are those points worth?
PK: So... I'm trying to say
something different. What of these unit tests need what level
of human level and judgement to do the test?
... What I can't tell easily is if the alt text is correct.
JF: In my straw man proposal
(that's all it was) I had thought about that. I suggested that
level of effort would be a multiplier in coming up with a
value. Click a button get a score...
... will be worth something. Click a button and human reviews,
worth something more. Broadly speaking.
PK: Not sure I agree with the worth more. Worth for what?
JF: In terms of.... one of the
things we are trying to do is there is some gamification going
on. Someone recommended multiple currencies.
... In that regard Peter, the more effort you need to invest to
ensure success, should be rewarded proportionally?
PK: For what purpose?
JF: For your conformance
score?
... Why?
Jeanne: There are other ways to
value the effectiveness of meeting the needs. Effectiveness
would be another factor. That's in one of the proposals.
... We didn't get to talking about different proposals. I would
like to remind everyone to review the proposals and write up
pros/cons/risks/holes (that which is missing).
... Email to me or post them or put on Google and send me link.
So that we can collect these. We have 3 so far. We will look at
these Tue. Evening.
PK: My last thought John is a
measure of programatic testability, or testability hardness
challenge without a... standing on it's own without any value
judgement... just a strict "how programatically testing is
possible"
... Would be a valuable model.
... Content effort would be fruitful.
JF: I think you are right. The
ability to do programatically testing low level, there are
certain existing sc that are fully testable programatically.
But some of the tests have a greater contribution for more
users.
... Lanugage of doc vs. language of content. Where language of
doc is easy to implement and test, and impacts X users. Getting
inline words is slightly harder to test, because human eval is
needed of text on screen.
... You need to test for words you believe need to be marked
up. Doing so, the impact of doing that right is much more
involved than language of doc. when comparing both sc against
eachother...
... One has a greater impact than the other, and has a greater
effort than the other. When I put those together, one should be
worth more than the other. That's the way the logic breaks
out.
PK: Take the worth value
judgement out of it, just say "we've got 2 dimensions", one is
how much effort it takes to evaluate, one is how much it
impacts the customer.
... The second one will be harder. The impact of a mislabled
word on a page if the word isn't very important will be
different.
Chuck: Maybe you can work on a suggestion...
PK: Thanks for the invite. I'm now trying to understand the existing ones.
Jeanne: I echo Chuck's
invitation. Maybe you can jot down some of these ideas.
... Another thing is that there are different types of measures
we can use.
... It's the first proposal in the list <reviewing>
... Called "Scoring Parameters", it's linked on the wiki page.
It's one member's analysis of different things that could be
measured and why.
... On that note, time to end the call.
... Remember the different meetings on Tuesday.
7am ET US on Tuesday will be next conformance meeting.
<janina> s/PM/PM?
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) FAILED: s/AM/PM?/ Succeeded: s/AM/PM/ Present: jeanne KimD JF janina CharlesHall Chuck shari Jennison MichaelC johnkirkwood Regrets: JohnKirkwood Bruce Angela Found Scribe: Chuck Inferring ScribeNick: Chuck WARNING: No "Topic:" lines found. Found Date: 02 Aug 2019 People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]