Talk:ARIA 1.1 Automated Testing

From W3C Wiki

Let's have the discussion about design in here!

start Command

Joanie->Shane: Possible problems with "title of the window being tested" include: 1) Is it the window's accessible name or the displayed-to-sighted-users titlebar content? These two typically coincide, but you cannot be 100% sure. 2) The accessible name and/or displayed title might include the user agent name, and unstable user agents might have different names (e.g. Firefox is "Nightly"). 3) Is there any danger of multiple open windows having the same name, or of the test author doing something that makes it harder for the ATTA to locate the desired page?

The above can always be dealt with by each ATTA as needed, but perhaps it would make test runs more reliable if we can maximize uniqueness of what gets sent to the ATTA for the purpose of locating the test window. The test case URI should be hopefully be reliably unique for any given test run, and all ATs should be able to obtain that. I don't know if other platforms could use the user agent's process ID to verify they had the right accessible application. I can on my platform.

Shane->Joanie: The URI is obviously available to the script that runs in the window. By title I literally meant the title of the HTML page - I assumed that was relevant. Is the URI something that can be easily found in the A11Y tree?

Joanie->Shane: Then if nothing else, I'd change "window" to "HTML page" or "(top level) document" or something like that. Because my ATTA would do one thing for displayed/titlebar title, another thing for accessible-window name, and a third thing for document title. As for relevance, it's not irrelevant. But it relies upon the test creator doing the right thing (providing a guaranteed-unique title). I'm pretty sure that the URI of a loaded document can be easily found in the A11y tree. It definitely can on my platform, and I think it would be odd if it couldn't on others.

Okay - I am convinced. I will change the name to URL.

test Command

@@@TODO@@@ Joanie has suggested that this command take a list of tests, and that the ATTA could return a list of responses. Thoughts?

Joanie->Shane, et al.: Above it says "list of tests"; what I'm suggesting is "list of testable statements" (to use the language in the Example Dialog). For instance, the table for an unchecked checkbox for ATK has a role expectation (ROLE_CHECK_BOX) and two state expectations (has STATE_CHECKABLE, lacks STATE_CHECKED). I'm suggesting (optionally) sending all three in the data JSON structure. The ATTA would return a single response JSON structure which contains one result per assertion sent.

Shane->Joanie, et al.: I agree completely. This feels like a reasonable extension change. The JSON structure an ATTA would then receive as part of a 'test' command in the 'data' parameter might then be:

[
  { state: STATE_CHECKABLE, value: true },
  { state: STATE_CHECKED, value: "<undefined>" }
]

And the response result value might look like:

[
  { result: "PASS", message: "" },
  { result: "FAIL", message: "STATE_CHECKED was defined" }
]

Is that what you are thinking?

JonGunderson->All: I think we need a list of all possible properties and values. Do we need some type of ID for the tests that pass or fail? Otherwise it maybe difficult to know which assertion passed or failed.

Joanie->Shane: Yes. And optionally send the role assertion as well.

@@@TODO@@@ Joanie also thinks we might want an argument to retry a testable statement. If you agree, stop reading. :)

Example scenario: Testable statement is a checkbox has STATE_CHECKED (which, in terms of implementation, means ATK_STATE_CHECKED). However, we have to test this client-side, which means AT-SPI. With this test in mind, consider the following reality: AT-SPI maintains a cache which it updates in response to signals from the app/toolkit. If an implementer fails to emit object:state-changed:checked on a checkbox, AT-SPI's cache will be stale and it will look like the implementer failed to implement the correct state. If you clear AT-SPI's cache and check again, however, you will discover the implementer actually did implement the correct state and the failure is due to the implementer's failure to also emit the expected signal. The lack of signal is bad, but in a test which is not evaluating signal emission, a result of FAIL is arguably bogus. But glossing over the issue by creating an ATTA which always clears the cache strikes Joanie as not quite right. If we allow a retry, the first result would be FAIL, the second result would be PASS and contain a message explaining cache was cleared, and we'd have more accurate results about the implementation being tested.

JonGunderson->Joanie: It sounds like if the browser is not issuing the signal could be viewed as a failure to fully implement the feature, since ATs would have the same problem of getting stale information.

Joanie->JonGunderson: I agree with what you say in terms of real-world needs. Orca has code that clears the cache for that very reason. However, in terms of testing, I think we should have two tests: 1) Is the expected state being put there by the implementor? 2) Is the expected signal being emitted by the implementor? 1 and 2 can be independent in terms of user agent code. If we are testing 1 and say "FAIL: Not implemented," we may be wrong. It may indeed be implemented. Knowing this distinction means we have more accurate test results (needed for CR) and can file a bug against the user agent which makes sense to those developers (telling them they haven't exposed the state when they know they have tends to lead to rabbit holes and/or no fix -- until we tell them what's really wrong). Thus what I'm suggesting is don't FAIL it (without explanation) and don't PASS it (without explanation). Instead, we indicate what actually happened.

Shane->Joanie: WPT doesn't envision multiple results for a single test or subtest. So I don't know how we would tell the tester that this happened. I agree it is a good idea. Just not sure what it would mean.

Joanie->Shane: Hmmm.... If WPT tests were ARIA checkboxes, we'd call that "mixed". ;) Jokes aside, could the solution be a tri-state result: PASS, FAIL, PASS_ON_RETRY?

Shane->Joanie: We don't actually have a way to add new "results" to WPT. At least, not without changing the test environment and the reporting tools. What we *could* do is allow the ATTA to return information that the JS would use to change the "name" of the subtest in such a way that it was clear it passed on rerun. Would that work for you?

Other Feature Requests and API questions

Joanie->Shane: I'm thinking we might want a public method in the ATTA to check versions and dependencies. In the case of ATK/AT-SPI, we want to be sure the testing environment has the required libraries installed and at least the minimum version. (Related aside, that minimum version will need to be bumped for aria-errormessage and aria-details.) We could forgo this verification, but then we are in danger of bogus FAIL reports. And I'm afraid I don't trust people to RTFM. ;)

Right now, my ATTA-in-progress is doing these sanity checks upon initialization. But if the ATTA is re-launched multiple times during a testing session, it seems a waste of time to re-do those checks.

Related to this: One of the things the ATTA is supposed to report is the supported version of the API. In (at least) my case, the supported version is a minimum; not an exact. I think that knowing the actual version of the accessibility APIs in the test environment are also relevant -- and perhaps should be reported as part of the results. If you agree, then it's another one of those things which should be done once during a testing session, but does not need to be (re)done if the ATTA is relaunched during that session.