Meeting minutes
Classifying parts of an AT response
jugglinmike some respnses can b nonverbal, e.g. a mode change
We started this discussion last meeting.
It is related to separating testin into 2 steps: collection of responses and assignment of verdicts
So far, we have conflated those two steps.
We might need to recognize the difference to separate what automation can do from what it can't do.
MK: The first things we need to discuss are What is the anatomy of a response?
Matt_King: It's more like, what parts of a response do we need to characterize?
Matt_King: Do we need to characterize 100 percent of a AT response to a command?
Matt_King: One example, when we use the full settings they can give alot of information about how to operate a control, generally speaking that response is something we ignore. The one exception is that as a human doing this evaluation,. if that info is incorrect, than we say we observed a undesirable behavior
Matt_King: In general, however, we don't really care about these responses
Matt_King: So do we need to characterize every element of a response, like those kinds of responses we don't really care abouty
michael_fairchild: Thats a good point.
Matt_King: Should one of our characterizations be "Don't care"?
Matt_King: So machines will gather everything, but humans only answer questions based on the assertions. Maybe this isn't the right question
michael_fairchild: We could say that there are aspects we don't care about
Matt_King: Z brought up last time that there are specific things like "the state is conveyd" and we haven't treated those undesirable behavior's as assertions. They are more like negative assertions
mzgoddard: in normal web development you don't normally test that a certain error isn't thrown,
mzgoddard: currently the way we document the underlying assumption is with undesirable assertions
jugglinmike: undesirable behavior's could also be called exceptional behavior's
Matt_King: I don't know if exceptional captures the undesirable aspects. They are both problematic and undesirable
michael_fairchild: I'd argue that expectational makes sense in a programmatic context, but not as well in a human context
jugglinmike: How about erroneous
Matt_King: Maybe we don't need to wordsmith or rename unexpected behavior's. We need to decide if we formally treat those as assertions in this classification of the AT responses
Matt_King: Right now, for human testing, its a yes or no on every single test if a unexpected behavior occurred, then the human tester will describe which one occurred
Matt_King: So we are in agreement, there is a clear difference between collecting responses, and analyzing responses
Matt_King: How should we label these two steps in the process? Is collecting and analyzing the right words?
jugglinmike: I have been using those word. We should also discuss using the term verdict assignment
Matt_King: The analysis is more that just verdict assignment.
Unless each part of the response is a different assertion, than its more than just a verdict assignment.
mzgoddard: What would be expected to be stored in a database for a verdict assignment?
Matt_King: For example, in the case of Excess verbosity, the response is yes or no, would that be a verdict assignment?
jugglinmike: Part of me thinks that its better to explicitly classify what part of the response was excess verbosity? rather than just yes or no
jugglinmike: that only makes sense to me if if there are going to be multiple text based AT responses, than you would be disambiguating something
Matt_King: So we have an input field there for the user to describe the unexpected behavior's. So I expect that would be stored, and that is more than just verdict assignment
mzgoddard: So the output of the analysis is a numerical value and a assement
mzgoddard: The first step for automating verdicts, is using existing matching data for test and response, but we would need humans to assign the initial verdicts because the response is not going match the human response with out lots of modification.
Matt_King: There is a part of me that is wondering if a human is running a test today, if we want to automate the collection of responses they observe
Matt_King: we still need the human there to collect a response the automated collector didn't collect
Matt_King: There will be parts that are not collectable at the start, it will require more development
mzgoddard: I think thats a good goal, but that will be tough to achieve on some ones system.
mzgoddard: While we are the ones developing that tool, it may leave some ones machine in a state that stops responding because of a bug in our stuff.
Matt_King: I think we could use the NVDA add on to try this out
Matt_King: I'm wondering if there is a world in which you start the test by pressing the start up button on a webpage, the human performs a command, the machines collects the output, then the human presses the stop test button
mzgoddard: I think there may be security concern there
Matt_King: We could try to normalize parts of the human responses, then have the NVDA addon collect responses, do string compares between them, and work towards convention that way
Matt_King: Its sort of an in-between step between collecting and analyzing responses
Matt_King: I guess we have to have the consensus before proceeding
mzgoddard: I think we could store verdicts for automated responses with the human verdicts and responses
Matt_King: I just had an idea, lets look at how we do it today
Matt_King: we might have what we need
Matt_King: a human runs a test, then a machine runs a whole test plan
Matt_King: we already have this code that looks for conflicts between two people
Matt_King: it does this by comparting assertion verdicts
Matt_King: you could have a similar set of code that just looks for conflicts in output, after normalizing output on both sides
Matt_King: Then you have normalized output
Matt_King: In the case where there is conflict after normalizing the output, a human can go to rerun the test and review. It the output matches, the human can update the verdict
Matt_King: We would need a couple of different buttons in the interface if the runner was a machine, for the human to review the test
Matt_King: If there are conflicts, then we would review with our normal conflict resolution process
jugglinmike: Can you say more about what a equivalent output could be?
Matt_King: I don't know what the machine recorded responses will look like
Matt_King: Right now, we are operating under the assumption that the machine output could differ from the human output
jugglinmike: One issue could be homophones, or localization
jugglinmike: one aspect we haven't talked about is how and when we order when responses matter?
Matt_King: I have a hard time imagining that the order will change for human and machines with the same configuration, it would have to be configuration issue
jugglinmike: I'm thinking of a mode switch
jugglinmike: the human might interpret the events differently than the machine would
Matt_King: We may need a way to code for Events
jugglinmike: Events is a new term for this conversation
jugglinmike: Right now we say here are the things that happened in a response, I don't know if the data structure captures the order of these responses
Matt_King: The response back from the API Call will match the output the human would hear
mzgoddard: I think we need to write this into the spec
mzgoddard: When do we record the speech response? When it begins or when it ends?
mzgoddard: The human may perceive a different start and end of a response than a machine would
jugglinmike: Thats related to what I was thinking, but different
jugglinmike: I'm still thinking about the need for process and UI for recognizing equivalency in responses.
Matt_King: Its clear to me that some of these things we just need to move forward with experimental implementations and see what issues we run into
Matt_King: We should put together a framework on how we want to do that
Matt_King: We can't make it all perfect at the outset. Lets get real world situations than figure out how to deal with them
Matt_King: I think we have a strong sense of what we need to anticipate!