DID WG Topic Call on test suites — Minutes

Date: 2020-09-01

See also the Agenda and the IRC Log

Attendees

Present: Markus Sabadello, Orie Steele, Wayne Chang, Dmitri Zagidulin

Regrets:

Guests:

Chair: Brent Zundel

Scribe(s): Brent Zundel

Content:


Orie Steele: we have discussed this before. The last time was at the ff2f, and got through some resolutions
… i.e., it would be nice to be able to submit test results to a thing, especially if there was a docker container folks could run.
… we liked the idea of throwing stuff at a webserver and getting results.
… then there was some yaml vs json conversation.

Dmitri Zagidulin: the reason for YAML was - comments!

Orie Steele: so I sat down and wrote everything in JSON
… so now we have a docker container that runs a server that runs tests based on a JSON file you submit.
… there suggested input and outputs .
… it would assume your system is doing something that produces the inputs and outputs.
… it only took a couple of hours. I spend more time I wanted to get a pulse check of where we’re at.
… it checks a set of assertions

Wayne Chang: we just implemented a passing rust implementation of the VC test suite, it was really ergonomic.
… we wished it did more checking of signatures, etc., but it was a good interface.
… using docker as well sounds nice.

Manu Sporny: at a high level that sounds good. I think encapsulating the entire test suite in docker is good. That last time we talked it seemed that people wanted to have the docker container give an implementation some inputs, which would spit out outputs
… then the test suite will say whether the output is correct.
… but we may also want negative tests.
… it would help me if you could walk me through the details of the inputs and outputs.
… what are the details of how the invocation of the docker container happen

Markus Sabadello: I have similar questions
… feels like the container may need some parameters, where the implementation is and how does it communicate with the implementation

Orie Steele: we talked about a data model test suite and a callback style test suite. I did the former. the latter is way beyond what I have the tie to work on.
… there’s a test suite that conforms that a set of data matches expectations, output is a result of checking expectations.
… you can do that on any piece of data without external dependencies.
… then the call to resolve returns the proper outputs. If you have an implementation that can take those inputs and output those outputs.
… an implementer will need to write a framework to pass between the two.

Markus Sabadello: I don’t really understand.
… this doesn’t test a live implementation?

Orie Steele: correct
… the first thing you need to do is agree to a structure for inputs and outputs. Then the test asserts that the output matches what is expected.

Manu Sporny: it’s a bit flipped form what I thought. So if I had to do an implementation, and the test calls for btcr, then I need to do that.

Orie Steele: the implementer would need to submit a set of JSON files that the tests compare to the data model.

Manu Sporny: okay, that is what I thought.

Orie Steele: this let’s an implementer call the docker container with their new set of fixtures and show they’re compliant.
… a did method implementer needs to write some code that outputs fixtures that look write.

Manu Sporny: that works for producers, but what about producers?

Orie Steele: the scenario there is, the input is a DID, make sure the DID follows the abnf, then you get test results from that.
… I think it works for consumers as well, the scenario is described in JSON, we are responsible to make sure that the scenarios cover the normative statements in the spec.

Markus Sabadello: I think the VC test suite has different groupings for different options. For example the VC test suite can take a VC, a VP, or a JWT.
… for out scenarios, we need to check the did document syntax, etc.

Orie Steele: all of this is predicated on a JSON file that specifies everything the program needs to evaluate so that it covers everything that needs to be tested.

Manu Sporny: http interface, what are we doing? posting a scenario and the input?

Orie Steele: the way the endpoints work today, you can specify the scenario using a POST request .
… the response to that is a scenario test object.
… you can submit a massive JSON object that tests everything.

Manu Sporny: when it is running, are these examples in the readme?

Orie Steele: there are fixtures committed, basically yes.
… it may be more helpful to look at the specific fixtures.
… and this is all strawman.
… but the question is, is this on the right path?
… if we can avoid anything other than testing fixtures that test the normative statements, that would be good.

Manu Sporny: this looks good. The response, you’ve got a resolve scenario, the input did and the output you should get back. The scenario includes input and output. POST that to an endpoint
… there are a set of assertions that run against that.
… then the response is a set of assertion results

Orie Steele: correct

Manu Sporny: then I take all the results and dump them in a file and I’m done?

Orie Steele: you can always ask for test results of all scenarios you have. It is JSON, so you can do surgery if you need to.
… it’s possible we could auto-format things, but I’m not volunteering to do that.
… the key is that the implementer submits the inputs and the outputs.
… that way we don’t need to wait on a network.

Manu Sporny: what happens when the output is an error?
… maybe the scenario is resolve/negative. What is the expected output if the input is a known bad did?

Orie Steele: for negative tests you have a set of structured failures as the outputs.
… did resolution failure-bad abnf.
… the most painful part of this is that you need to write a bunch of javascript programs that [something I missed]

Manu Sporny: https://github.com/OR13/did-core-tests

Orie Steele: Here is an example of the kinds of programs test suite authors would need to write

Orie Steele: https://github.com/OR13/did-core-tests/blob/master/packages/did-core-test-server/src/services/scenarios/resolve.js

Markus Sabadello: looking at the resolve scenario, what does it look like if the scenario is just parsing?
… would the output just be true or false?

Orie Steele: I haven’t thought through all of the structure. Parsing could be problematic.
… the fixtures are always in JSON, but they could store different representations.

Markus Sabadello: I was just talking about parsing a DID

Orie Steele: if we know something is a byte array, we may need to on the assertion side, if you’re looking at a response type of CBOR, you may need to have a set of tests that pass or fail based on what you provide there.

Manu Sporny: I guess, as far as parallelizing the work.
… it feels like the scenarios and assertions could be parallel.

Orie Steele: Each set of assertions could map to a set of normative statements that could be implemented as a set of small programs to a repo.
… so anybody who can write a javascript program that reads JSON should be able to do that.

Wayne Chang: what are the initial set of did methods for the core tests?

Manu Sporny: did:key! :)

Orie Steele: I don’t know. It would be a big mistake to pick a complicated one.
… in a way we could build all of this with did:example
… you would have a pretty trivial time to implement did:key from that.
… the problem is did:example isn’t real.

Orie Steele: did:key , did:web would be good first candidates

Markus Sabadello: wondering if we could use this call to come up with an initial set of scenarios.
… then we can fill in the tests.
… resolver feels like a more advanced scenarios. then there’s parsing a DID string or a DID document.

Manu Sporny: I thought it may be helpful to go through what Amy put together. She’s categorized them for us.
… there are positive and negative DID syntax tests.
… it may be that our scenarios map cleanly to sections of the spec.

Orie Steele: there’s string tests, did resolution. Create, update, deactivate, and read - tests around those would be helpful.
… but let’s go through 384 and see what shakes out.
… this conforming producer/consumer language, I’m not sure it’s testable. The structure of the inputs and outputs.
… the scenario around producing and consuming are the more complicated ones. looseness in typing as a feature vs a defect, a possible whole host of scenarios around that.
… the easiest ones are about strings.
… DID syntax could be the scenario, or each property could be its own scenario.

Manu Sporny: so how would you do a negative test?

Orie Steele: the ascii, hmmm, I’m not sure. You would have to do did-syntax-negative test, pass in a DID encoded as a utf8 array and get a failure case.
… in order to test something as ascii, you need to pass something that’s not ascii.

Manu Sporny: basically you’re taking a bad did and the test suite is saying, yep, its a bad did.

Orie Steele: so yes that is testing the spec, rather than the did method.
… method implementers may never submit tests for those and won’t get results for those scenarios.

Markus Sabadello: also wondering what the JSON file looks like. does the JSON file always need and input and output. Also the negative tests, would they be a sepparate scenario?

Orie Steele: the input file doesn’t have to contain expected outputs.
… every test that can should.
… regarding the grouping, it would not be the end of the world if the scenarios were very fine-grained.
… things become difficult to maintain if they are too large.
… each file labeled clearly and tests one thing.
… we could write a scenario that tests everything, but we want a focused input that only tests a small set of things, along with just the assertions for those things.

Manu Sporny: so we have did syntax pos and neg tests.
… parameters pos and neg tests
… path?

Orie Steele: I’ve never seen path used, so I’m not sure.

Manu Sporny: query and fragment seem straightforward. they could be on their own or part of the abfn

Orie Steele: If you think there will be more added over time means we need more scenarios.
… there are a lot of scenarios we could add around that. My point is that if fragment is going to show up in a bunch of places, it may be better to group scenarios by function rather than position in the spec.

Wayne Chang: making an annotation: because we’re following a path specification from ietf, I don’t think there’s much in the way of normative tests we’ll need.

Manu Sporny: good point. do we want to put normative language in or leave it? That may be an issue.

Wayne Chang: I’ll open an issue.

Manu Sporny: representations, extensibility. now we’re in some of the properties. did subject must have an id. Would we re-use inputs here?

Orie Steele: you could re-use assertions, imported as their own module, but that requires a binding to the input structure.

Manu Sporny: seeing if any don’t map. basically all the properties are fairly straightforward.

Orie Steele: hardest thing may be checking that an identifier isn’t re-used. Not sure if there are tests for comparing complex data from one section to another.
… we could save ourselves by kicking a bunch of normative stuff out.

Manu Sporny: I agree, but good luck.

Orie Steele: I’ll invite them to write the tests for normative statements imported from other specs.

Manu Sporny: core representations are pretty straightforward.

Orie Steele: There are untestable statements
… you can see the content type in the input and test that the output matches, but what does determine mean?
… I don’t know that the language we’re looking at may help them get what they want.

Wayne Chang: do we have some higher order tests, you must satisfy at least one core representation, plus something else.

Orie Steele: I think that’s handled by the way the scenarios are written.
… are we going to do these scenarios for each representation? I think we need to
… given the structure of the representation that are supported, you must submit tests for that representation.
… consumers must not determine the representation of a document through its content alone, that isn’t testable.

Wayne Chang: you could have something that looks like JSON but is missing a comma.

Markus Sabadello: or if its missing the content type.

Orie Steele: who is the test targeted to? You can’t tell a 3rd party software to not do content parsing.

Manu Sporny: tis looks good to me, we should proceed.

Orie Steele: we need to have a formal DID WG call where we say “we are going to write a bunch of tests like this”
… it will be very hard to change this once we’re part of the way through, so the group needs to look into this.

Wayne Chang: when we talk about did method spec testing, we should think about testing the test suites of the methods, rather than the specs themselves.

Orie Steele: not sure we’re on the hook for that.

Manu Sporny: are did methods going to use this suite to say they’re conformant with the spec?

Orie Steele: yes, and the did spec authors will use it to show the spec is testable

Markus Sabadello: primary function of the test suite is to test the spec. I think this is great.

Manu Sporny: Thank you to brent for scribing!!! :)