testing with stakeholders
<scribe> scribe: jeanne
Angela: We are adding a section
on the Structure of Silver so people can add information
correctly for the correct section. We will be able to see the
Style Guide in action when the Accessibility Guidelines Working
Group uses it next Tuesday for their testing.
... we recruited a new person to help with Plain
Language.
... I will be mailing out a new request to the Plain Language
team to work on translating some new plain language sections
for the prototype for user testing.
Jeanne: Then it will be ready for
testing with users.
... any ideas for what we should be testing for plain
language?
Charles: One test needs to be
tested on outcomes and mapping back to the style guide. The end
result and whether it conforms to the instructions in the Style
Guide.
... we need to test the style guide itself to test if the style
guide is understandable and can be followed. We should recruit
new people to test it.
Shawn: I was thinking whether we should test whether we can rewrite SC by following the style guide.
Charles: A survey might be the right way to get that answer.
Jeanne: I thought we would be setting tasks for people to find information in the plain language prototype and evaluating if they could find it and understand it. I didn't think about testing the style guide because that is for our use.
Angela: I didn't think we would be testing the style guide, but it would help with the nay sayers.
Charles: Jeanne has two separate
tests, finding and understanding.
... finding hte information is a separate activity than the
information itself.
Jeanne: In the broadest sense of testing, what we should show to a single person, could we do both?
Charles: Yes, but for statisticly
valid academic information, we have to separate the two.
... the Style Guide is a set of instructions of how to write.
We need to test if they produce the outcome we want. We also
want to test if itself is an ideal solution. If the Style Guide
says A, B, and C, and the outcome says A, B, and C. But we need
A, B, C, and D. We need to know that now.
Angela: No matter how well someone follows the style guide, all the translations and rewrites will need editing. I don't know how that will impact the tests.
Jeanne: One of the things that
has come from some of the AGWG review, is that we need
technical review.
... is it worth spending the time?
Charles: If it is only used as a suggestion, then it's not worth spending the time. But if it is going to be a requirement for something to make it into Silver that it has to make it into the style guide, then it's worth it to test the Style Guide.
Angela: Could we handle it with the editors?
Jeanne: I worry that we would be tying our hands too soon. We may have changes to the style guide as we get further get into writing content.
Kim: THere are so many variables,
it is hard to test if the plain language translation is
accurate because there are so many things that we haven't
decided on yet.
... I think we need a made up success criterion so people won't
get caught up in what people already think it is. Once we have
written a few of these, then we can test the plain language
version.
Angela: The made-up critierion is an interesting idea.
Jeanne: Do you think we are ready to test the tabs and the organization of the tabs?
Kim: Yes, we need to find out if the direction we are going serves the needs of the people using it.
Charles: We need to find out if the labels on the tabs are correct. Is this what you expected to find here?
Jeanne: Do we give people tasks to do, do we ask their opinions? What's the best way to approach it?
Charles: Start a google doc with
assumptions that we need to validate: like, we assume that
people will understand the tab headings correspond to
activities. Then we ask questions to determine that.
... like "Get started" implies a new project from scratch, when
you might be remediating something that's been around for 10
years.
Jeanne: I can start a google doc, and we can ask some of the plain language experts to contribute it.
Charles: We can set and work it
back to set a schedule. Say that we want to finish the report
before Christmas, then we determine when we have to get the
results back, then how long the survey is open, then how many
days to write the questions, etc.
... I have limited time, but I'll work on the final report and
socialize the tests.
Jeanne: Has anyone talked with
Mike Crabb?
... I reach out to him this week.
... are there any tests that we should do around the
information architecture?
Charles: We need to do a card sort of the tags. I haven't found the software that will allow us to do a sizeable card sort test that is free.
Shawn: The homework assignment that we want the AGWG members to do. That will help test the ability to use the architecture and the maintainability features.
Kim: What are the elements you are looking for in a card sort tool?
Charles: We need something free
with a larger scale than 30 cards and 10 participants
... some of it will already be tested from the plain language
survey.
... one of the steps we need Mike Crabb to do, is to move the
IA prototype into W3C Github repo so traffic and latency
doesn't distort the results.
Jeanne: I will ask about that when I talk with him.
Charles: It would be valuable to
have the filtering and Methods set up in two ways. Today there
are no results and the filter displays Methods. I am more used
to showing all the Methods and using the filter to reduce them.
That way we could do A/B testing of the two approaches.
... I would start with assumptions of other patterns. The
Methods tab starts in an empty state. If development had a (3)
after it, it would help. Having a match between expectations
and results is always helpful.
Jeanne: start with slide 26, the next 5 slides are new for A11yBOS.
Shawn: We have to watch out with assigning points, where we don't know what the user experience will be, like "the first rule of ARIA is don't use ARIA". Using the built in semantics should have a higher points. In an audio, a transcript plus captions should be higher.
Jeanne: We were thinking that an automated test would be 1 point, some guidance that is now AAA would be a 3 pt. A more complex task evaluation could be 5 points and butts-in-seats testing would be 15 points. For example. None of the points are exact -- it's just for illustration.
Charles: Every button in the site can working except for one, but if it's the login button, then it fails.
Shawn: This gets to the intersection of task completion and Methods and tests.
Jeanne: Let's talk about this more on Tuesday.
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Present: KimD Charles Lauriat AngelaAccessForAll jeanne Found Scribe: jeanne Inferring ScribeNick: jeanne Found Date: 02 Nov 2018 People with action items: WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]