WSC WG face-to-face -- 30 May 2007

Agenda Bashing

agenda bashing

next f2f dicussion moved to later on

Conformance model

tlr on what we need to do for our specs

tlr:what does it mean for us to recommend something?
.. wants us to make testable statements ...
... by structuring the rec text into requirements, good practice and ...
... implementation techniques ...
... also need to specify assumptions, e.g. icon display doesn't ...
... work with screen reader, have to say that visual display ...
... is an assumption ...

robert: wants no device dependence
... we should assume everything is usable device independent including for accessibility, mobility etc

tlr: yes, but some implementation techniques are device dependent and we do want to document those
... WCAC does content accessibility guidelines...
... but we're not talking so much about content, more about user agents (chrome etc)

robert: US govt. accessibility rules != w3c ones

tlr: let's consider some rec-text as a group now...
... the stuff about favicons.

1st rec is that sites should not incorporate favicons at this time

phb: for display that informs trust decisions, only display authenticated information

text here is at: http://www.w3.org/2006/WSC/drafts/rec/#favicon-favicons-rec

beltzner: who are we expecting to read/conform to REC?

(basically are we addressing UAs and/or sites)

phb: some types of site may pay attention (e.g. FIs)

mez: charter allows us to include both

serge: any evidence about favicons being trusted?

marzita: yep

tyler: still in the dark about what's a useful REC, hard to have abstract discussion
... current bank practice shows that FIs may drag their feet anyway

ynvge: maybe move favicon to somewhere user normally doesn't trust?

sean: favicons being used, how likely is it that something else would be accepted?

mez: back to what needs to be in the text for FPWD?

tlr: shows that there are lots of favicons in use
... shows example of padlock favicon
... this applies to UAs that display bitmaps
... and for which favicon uses (e.g bookmarks, desktop, location bar...)
... address bar is where favicon shows where we are now

mike: tlr is asking us to be very specific about MUST/SHOULD/etc

tlr: level of abstraction needs to be right, current text very far from being specific enough

tyler: would like to document stuff we're planning to do to get feedback

mez: we can do that

tlr: says what REC means

tyler: we're not competent yet to do MUST/SHOULD

tlr: ok to use MUST/SHOULD in FPWD even if its likely to change (don't worry)

mez: FPWD is important to get attention/feedback, important that status is clear

rob: MUST/SHOULD/MAY all exist?

tlr: yes, we can use rfc 2119 or something else

audian: back to tlr's bullets, is what tlr said what we want?

tlr: types on screen

phb: text equivalent is <title>
... lots of chat...

serge: we need text about definitions, e.g. saying what we mean...
... when we say "verified sites"

mez: we have a glossary

phb: different logos should have different authentication types/levels

<ses> Hi. Stuart isn't really awake right now but he'll be recording what he sees in the jabber room until he wakes up (most likely for the post-lunch discussion.)

break for n/w

<beltzner> tlr, http://beltzner.ca/webdav/forthomas.txt

<beltzner> Mez_, http://www.w3.org/2002/09/wbs/39814/f2f3sched/results

<Mez_> serge and sduffy, here is the glossary, which should include chrome

<Mez_> http://www.w3.org/2006/WSC/wiki/Glossary

back now...

tlr: postpone rec formatting discussion until usability testing

tyler: why change the template?

tlr: to be able to have conformance requirements
... tlr typing a new template..
... new template's most important bits are applicability, requirement and techniques

<tlr> ACTION: thomas to update template with material from discussion; notify e-mail list [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action01]

<trackbot> Created ACTION-227 - Update template with material from discussion; notify e-mail list [on Thomas Roessler - due 2007-06-06].

<tlr> http://www.w3.org/2002/09/wbs/39814/f2f3sched/

for next f2f - fill in questionaire before lunch (next 2 hrs)

<beltzner> scribe is beltzner

<beltzner> tlr, ^ make that happen

<johnath> scribenick beltzner

<johnath> does that do it?

<tlr> ScribeNick: beltzner

<tlr> Chair: MEZ

Intermediate agenda bashing

(there is some agenda bashing happening as we mitigate for technology-availability)

Mez_: call to order, Rachna to start session on "Usability Testing"

<Mez_> tlr, please action rachna to share the slides somehow; tx

<scribe> ACTION: rachna to share slides about usability testing from dublin f2f [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action02]

<trackbot> Created ACTION-228 - Share slides about usability testing from dublin f2f [on Rachna Dhamija - due 2007-06-06].

Mez_: agenda reordering: s/Rachna/BillD

Robustness Testing

billd: Robustness Testing
... current browser environments from a user standpoint bring a bunch of technologies together ...

<Mez_> tlr, please action bill to share his slides too. tx.

<scribe> ACTION: bill to share his slides on robustness testing from the dublin f2f [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action04]

<trackbot> Created ACTION-229 - Share his slides on robustness testing from the dublin f2f [on Bill Doyle - due 2007-06-06].

rachna: do we have a definition for robustness?

Mez_: no, one should be added to the glossary

<scribe> ACTION: bill to define robustness for WSC glossary [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action05]

<trackbot> Created ACTION-230 - Define robustness for WSC glossary [on Bill Doyle - due 2007-06-06].

<Mez_> tx beltzner

bill: in IT circles, a "tiger team" was used to test effectiveness of IT security measures ...
... one side would attack, another side would try to detect and evaluate effectiveness of process and procedures being tested ..

Mez_: I see some potential overlap between items to be tested for robustness vs. user undestanding or usability ...

<Mez_> things that web content can do that exactly emulates the security context information displays in our technical report are "pure" robustness attacks

<Mez_> they will leverage either user agent vulnerabilities, or design gaps or issues in the user agents

bill: the attacking team might go after the OS, plugins, user agent, network layer

<stephenF> tlr has the magic zakim trick for the phone

<tlr> stephen, context?

bill: your browser actually sends a lot of information when you visit websites

<stephenF> tlr - Mez wants the phone on

bill: [demonstrates metasploit]

<Mez_> jan vidar, is that you?

<Mez_> can you hear us?

beltzner: this seems to be out of scope, though, since we're talking about user agent and system exploits

bill: well, exploits are out of scope, but the UI for security context is in scope

beltzner: in so far as the user agent isn't exploited, yes

bill: so if the patches are out of date, at the user agent, OS or plugin level, and the user is at risk

PHB: so, this used to be a concern with things like macromedia, where the browser allowed plugins to take control easily and thus become exploited

<Zakim> tlr, you wanted to ask what our question is

tlr: our deliverables are about how browsers and site authors should do things, and I wonder what are we asking of a robustness testing process?

rachna: it's hard to enumerate the robustness tests in advance, really

<tlr> beltzner: should we have indicators that help people ensure they have the latest browser?

rachna: thus far we've talked about security indicators in chrome about the web content, but not indicators about ensuring that the user is running with an unexploited browser
... is that in scope?

Mez_: or in our goals? I don't have an immediate reaction

???: I'm hearing a lot of "the user can't determine", and want to remind people that the user isn't an administrator or security professional

sduffy: this feels like an odd road to me, in terms of whether or not the user has an up-to-date browser

rachna: but I think the up-to-date-ness has a larger impact on security

Rob: when we look at testing (robustness, user, etc) we're talking about user-agents ...
... users are doing a lot of different things in the content-area, not just content, but applications, and sometimes applications in those applications ..
... shouldn't we be breaking out our testing per area/recommendation developed by this group?

Mez_: The reason we have three one-hour discussions about testing is to lead into planning those three categories of testing.

stephenF: if you're talking about applications updates and such, there's an ITF workgroup on network assessment, they'll address most of these issues

<tlr> http://www.ietf.org/html.charters/nea-charter.html

sduffy: my main objection to including user agent updates in scope is that it doesn't end up solving the problem since the OS could be out of date

tlr: I don't even understand what it means for us to recommend that a user has the latest user agent, since the user isn't the compliance target

<tlr> ... or is he?

tlr: we have a bunch of items that describe current robustness practises that have not been migrated to the format required by our recomendation template
... in my opinion that should be a priority so that we can recommend appropriate robustness tests

Mez_: that's not what we're talking about, IMO, so I'll call that out of scope for the current conversation as is all of browser-updating

bill: how about the second issue of user agents divluging information about the user?

Mez_: how does that fit in our scope?

beltzner: it does in that if the user doesn't mean to provide information to non-trusted websites, and isn't aware of the information being provided by default

Mez_: if we make a recommendation about privacy, then we should ensure that the recommendation is robust
... until we make a recommendation about that, though, discussing how to test its robustness seems wrong
... more conversation about scope definition ...

tlr: what I hear you (Rob) saying is that robustness testing can help us identify weaknesses in all sorts of web applications, and while that's valuable, it's not within the charter of this working group

<johnath> :)

tlr:tlr continues his point, driving it home with the force of a pile driver ...

Mez_: anything else to share, billd?

bill-d: these were the points I wanted to raise

sduffy: we decided that some aspects of web aps were in scope ...
...
... there's a difference between SQL injection and website vulnerability where clicking on a URL results in XSS/untrusted content ...
... the former is out of scope, the latter is in scope ...
... so web-apps as a whole aren't out of scope, are they?

johnath: it feels like we have enough work testing the robustness of user facing display of web security context
... so I'm excited enough without taking on extra worries about user agent/plugin/OS robustness

bill-d: I'm still trying to lock down discussions of what's in-scope/out-of-scope, which is why I wanted to bring this up again

Mez_: well, now you know: it's testing the recommendations, not the entire system

bill-d: still not clear where we are on dilvuging of information to websites

beltzner: that's not part of our problem statement yet, let alone our goals, let alone ...

Mez_: proposes an action on starting a discussion about the information divluged to websites by user agents?

tlr: IMO, it stretches the notion from communicating web security context to one about privacy

johnath: I think it's doomed for other reasons, but I don't think that precludes the discussion on the group

<scribe> ACTION: bill to start a discussion about including descriptions of the information divulged to websites by user-agents [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action06]

<trackbot> Created ACTION-231 - Start a discussion about including descriptions of the information divulged to websites by user-agents [on Bill Doyle - due 2007-06-06].

<johnath> meeting adjourned till 12:50pm local for lunch

<ses> When does local lunch end?

<johnath> ses: 5 minutes left

Mez_: Rachna, take it away!

<johnath> ses: starting back up now

<tlr> jvkrey, we're restarting

security usability testing

rachna: so, I'll start by defining what I mean by usability testing ...
... traditional security methodology of robustness is good but not sufficient ...
... HCI methrodology isn't sufficient since the attackers are modifying along with us ...
... so I propose "red team" usability testing, where we actively attack the user
... so both "can we use the system" and "how can we attack the user to confound them"
... I have a bunch of questions
... 1. Will we test ideas or specific implementations of ideas?

<ses> I just joined

Mez_: how would we test a concept?

rachna: so for example, we could test a variety of implementations instead of a specific one

<maritzaj> http://www.w3.org/2006/WSC/wiki/SharedBookmarks

tyler: the tricky thing is once they make their implementation, they stopped testing the concept

<maritzaj> #2 under usability studies about internet security

<Mez_> http://www.simson.net/ref/2006/CHI-security-toolbar-final.pdf

serge: instead of testing a toolbar, the study in question (see maritzaj's link) tested the effectiveness of each indicator

<ses> Is anyone else on the phone? I could barely hear Rachna and I can't hear Serge at all.

ses, sec, I'll move the phone

<ses> And he's got the phone so we should be able to hear him :)

beltzner: one could test the concepts on which a design is founded, instead of the design itself

maritzaj: the answers will likely vary per recommendation

Mez_: so I don't understand how we'd decide whether to test a concept or a design

rachna: yeah, I think we'll need to figure that out

tyler: part of this might be recognizing patterns in our recommendations and test abstractions that cover aspects in each recommendation
... to what extent do you think we can/should rely on the literature instead of retesting some of those findings?

johnath: we have a huge body of research that has led to some of these recommendations, I think it should be up to us to point to the foundation and identify areas for follow up testing

rachna: well, where huge = from 2005 onwards
... 2. At what level of fidelity should we be testing?

<ses> Calling the existing body of research huge is the kind of statement that could lead a small research area to turn downright anorexic.

rachna: low-fi prototyping is sketches, medium-fi is flash or web mockups, high-fi is extensions or browser modifications

maritzaj: previous research will come into play as references

rachna: using lower-fidelity prototypes will increase our bandwidth
... 3. What should we be testing?
... learnability, efficiency, skills required, flexibiliy, satisfaction, errors, compliance rates

Mez_: remembers a study, sort of, that might be about how instructions affected the results ...

maritzaj: the Jackson/MSR one?

Mez_: yes! and I haven't heard anything about that dimension

rachna: I think you're referring to a problem that exists in that they had to describe EV certificates to one group

tyler: yes, but I thought they controlled for that

<PHB> PHB thought he was on the queue before, had disconnected when I powered up the VPN

Audian: often helps to start with a list of assertions and verify/validate those first, then move onto low-fi mockups, and use those for the validation

rachna: yes, that's an excellent way to do designs

<ses> Can't hear Phil.

<ses> I can't stay awake without some cursing in my direction.

<ses> :)

<Audian> what = a quality test base?

<Audian> statistical relavance

rachna: we need to identify the goals of a study, as well: why do users behave they way they do? what are users reliably capable of performing? does technology X protect against attack A? etc, etc.

Mez_: 100% usable security is a dream, not a goal, and we should make sure that the studies aren't tasked with finding the 100% solution ...
... people who believe in a recommendation will always argue that the hit rate is "good enough", though, which worries me.

tyler: do we want to provide a target hit rate, then?

<Zakim> johnath, you wanted to respond to Mez, tyler, on quality of data

johnath: it's easy to say "20%" is worse than "40%", but the trick will be showing statistical significance in the effectiveness of the recommendation
... the test you want is "this creates _an_ improvement"

johnath: at which point we can defend the assertion

<Mez_> better for the majority of web users

beltzner: once a recommendation is established as significant, if it competes against another recommendation, then we can do a comparison test

tyler: so what does it take for something like that to get put into Mozilla?

beltzner: bring it forward to the Firefox product team (it's an open meeting) and propose it; in my experience, you need to prove the worth and value for the majority of the web

rachna: should we test in-lab or in the wild?

tlr: do we need to answer this now, or leave this to you?

rachna: we can come back to this later, but it depends on the resources?

PHB: if we can work out a way of doing it in the wild, it would be much better

<Mez_> +1 to phb

<Mez_> from an industry pov

PHB: oftentimes in-lab participants already know something about security

tyler: this intersects with the fidelity; depolyable add-ons are more easily tested "in the wild"

maritzaj: tradeoffs for both, in-wild experiments aren't easy to organize

<ses> Everyone forgot to project. I haven't heard any of the speakers.

<ses> Beltzner and Rachna are coming in clear.

dan: in lab can be used to filter and then in the wild can be used to test larger populations

tyler: bias is obviously a concern, but studies (like rachna and stuart's) have shown a non significant correlation between bias and effectiveness

rachna: well, that doesn't mean it didn't exist, since we were controlling for it, not testing for it

serge: we didn't find any correlation either
... we discussed some of these issues at CHI, and the differences seemed to break down as wild being good for quantitative, lab being good for qualitative

tim: there are various audiences for the user agents which will affect the demographics of in the wild testing

dan: we could do tests through deploying in various banks, etc, to get a good cross section

Audian: I've often seen domain experts being far more critical of a solution until the "would you recommend this to a friend" at which point they all decided they would

rachna: another complication is that each proposal can be attacked in a different way

stephenF: how do you figure out the workload?

rachna: depends on the attack, and whether or not there are easily-reused exploits we can copy and paste

stephenF: there's some shortcuts we can take

PHB: we should be tactical about some things, like we know that w.r.t. phishing, a takedown service shifts the problem to another target, but doesn't kill the overall problem
... or in crypto, adding a bit to the key doubles the work for the attacker

<PHB> Well it would be rather nice to know if the solution was intended to be tactical or strategic

tlr: does this imply a different set of scenarios or can we just rely on our existing ones?

<PHB> Rather a lot of solutions being sold a year ago as strategic turned out to be distinctly tactical

tyler: our recommendations should each address the threat that they attempt to defeat

<Mez_> http://www.w3.org/2006/WSC/wiki/ThreatTrees

<PHB> Sitekey

rachna: does that mean modifying our threat trees to include these (on her slides) attacks?

tyler: I think it would be a different section

rachna: I'm very confused

dan: we need to make sure the recommendations are significant at attacking the problems (or something? help?)

rachna: another question is do we want to discover the usability problems, or do we want to assert significant effect?

tlr: that is a question which we should try to answer today

<johnath> actually

<johnath> Serge is saying it

serge: my feeling is that we should be trying to assert effectiveness

<johnath> oh okay...

maritzaj: small N might be useful in the early stages as we try to filter down

<Mez_> +1 to johnath

johnath: sprinkles +1s all over the PhD students

serge: I'm on a grant, we might be able to get similar resources from other groups

<Zakim> Mez_, you wanted to tell serge that we're testing recommendations

Mez_: we're gonna be testing our recommendations, so you should be testing those

<johnath> serge: I think Mez misinterpreted your "I have a grant to work on this stuff" to mean "I have stuff that I am already testing, that maybe I haven't mentioned here yet" instead of "I have money, I can probably help out with testing our recs."

<serge> no, what I meant is that recommendations can be implemented and tested

<johnath> right, yes, but not that you were just incidentally mentioning unrelated thesis work :)

rachna: testing requires IRB approval for me, this adds to overhead

Mez_: the W3C staffers won't be doing the testing, so there's no MIT/IRB requirement

<serge> hey, if the recommendations can be included in that, great. Likewise, those in the corporate world have a vested interest in making the user studies happen because the results can be incorporated into products.

<ses> Many IRBs demand to be involved if anyone in their institution will be an author on findings. If the note is considered a finding, that could cause issues.

<PHB> No human subjects oversight? time to redo Milgram?

<Mez_> tim andI would never redo milgram

<johnath> Mez is too nice for Milgram

<johnath> aww

<Mez_> ses: author/editors only I presume; I pesume acknowledgements wouldn't be an issue

<ses> I'm the one who couldn't spell.

<Mez_> hahaha

<tlr> ses, MEZ, I'd think it's most useful to aim at scholarly publication of original results and summarize / cite that in the W3C deliverables

<ses> My spalling suks.

<serge> I'm currently working on a related study, if anyone's interested I can summarize it

<Mez_> we're running a bit long; how about in email serge?

<serge> okay, in that case I might just wait until I have more results.

<Mez_> what's the eta on that?

<serge> maybe 2-3 weeks.

<Mez_> ok, sounds just fine

<Mez_> beltzner; want to give serge the action item?

<scribe> ACTION: serge to share results from his study once he has them [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action07]

<trackbot> Created ACTION-232 - Share results from his study once he has them [on Serge Egelman - due 2007-06-06].

<tlr> ACTION: rdhamija2 to make sure Jagatic et al on social phishing is in SharedBookmarks [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action08]

<trackbot> Created ACTION-233 - Make sure Jagatic et al on social phishing is in SharedBookmarks [on Rachna Dhamija - due 2007-06-06].

<tlr> ACTION-232 due 2007-06-30

<serge> 6/6 is not 2-3 weeks, unless we're on some crazy new calendar system...

<Mez_> http://www.indiana.edu/~phishing/social-network-experiment/phishing-preprint.pdf

<tlr> serge, see my remark about due date

<Mez_> is the jagatic study, and it's in our shared bookmarks

<serge> thanks

<Mez_> beltzner, give rachna an action on the ebay www2006 jakobsson paper; I can't find it in shared bookmarks or on the web

<scribe> ACTION: rdhamija2 to add www2006 jakobsson, Florencio & Hursley MSR paper to our shared bookmarks list [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action09]

<trackbot> Created ACTION-234 - Add www2006 jakobsson, Florencio & Hursley MSR paper to our shared bookmarks list [on Rachna Dhamija - due 2007-06-06].

<Audian> battery almost dead

tyler: so it looks like there's two in the wild tactics: actively attack and measure effectiveness, or insrument existing browsers or our solutoins

<tlr> ACTION-234 confuses several papers

<tlr> Jakobsson Ratkiewicz is what I meant, it's from WWW 2006. http://www2006.org/programme/item.php?id=3533

<Mez_> tlr, give whatever other actions are needed

beltzner: it seems to me like active attacks are way of validating our threat trees, not our solutions

tlr, want to take the action from rachna? I'm sure she woudln't mind

bill-d: if we make changes to the user experience, how do you test in the wild?

rachna: right, and that's tough in lab as well, since sometimes users need to be trained, or the act of them being in the lab ends up training them

Audian: there's ways of doing it almost at random by pulling people aside in schools and malls

stephenF: possibility that some of the "hits" are false positives of users entering wrong passwords on purpose

<Mez_> tlr or beltzner - an action on rachna to update timeline with things like irb turnaround

<Mez_> please

rachna: once we have proposals, we need to enter low-fi prototyping phase, then figure out what we're trying to prove, set up the studies, set up the infrastructure, etc, and this all requires resources and time

<scribe> ACTION: rdhamija2 to update / create a user testing timeline with things like IRB turnaround, setup, etc. [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action10]

<trackbot> Created ACTION-235 - Update / create a user testing timeline with things like IRB turnaround, setup, etc. [on Rachna Dhamija - due 2007-06-06].

serge: there have been cases of people in studies entering real information, in counterpoint to stephenF

maritzaj: reiterating an earlier point, we should tie references in the shared bookmarks and tie them to recommendations

<maritzaj> http://www.w3.org/2006/WSC/wiki/StatusQuoUserStudyResults

Audian: how do we know who's submitting resources for testing

<scribe> ACTION: rdhamija2 to track donations of time and resources for usability testing [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action12]

<trackbot> Created ACTION-236 - Track donations of time and resources for usability testing [on Rachna Dhamija - due 2007-06-06].

<scribe> ACTION: maritza to drive process of tying recommendations to references in SharedBookmarks [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action13]

<trackbot> Created ACTION-237 - Drive process of tying recommendations to references in SharedBookmarks [on Maritza Johnson - due 2007-06-06].

<scribe> ACTION: rhdamija2 create and document user testing plan (with links to timeline, donations, prototypers, etc) [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action14]

<trackbot> Sorry, couldn't find user - rhdamija2

<scribe> ACTION: rdhamija2 create and document user testing plan (with links to timeline, donations, prototypers, etc) [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action15]

<trackbot> Created ACTION-238 - Create and document user testing plan (with links to timeline, donations, prototypers, etc) [on Rachna Dhamija - due 2007-06-06].

<ses> I'm singing off. I've had too hard a time keeping up and staying focused given the quality of the call. (I also need to drive into work at some point.)

<johnath> break until 2:30 local time (12 minutes)

implementation / testing / etc

beltzner: big believer in prototyping, sketching, whiteboarding - to build wireframes
... use that as a way of expressing things better than text
... allows communication and discussion
... but once that finishes, there should be no limits on what technology is used.
... should be something that enables testers to get what they want out of it - HTML, Flash, Firefox extensions
... all the way to an installable browser client which can be downloaded.

tlr: might include changes to other browsers (e.g. Opera)
... on the one hand - prototypes for testing
... on the other hand - things taken up by user agent implementers.
... what recommendations are sufficiently spelled out to allow for implementation?
... what more do the browser vendors need to understand for each of the recommendations?
... what are the reasonable expectations for how long it would take to implement?

beltzner: kind of putting the cart before the horse ... lets get the prototypes available so that anyone can run time (any browser vendor included)

tyler: in terms of time it takes - have experience with add-ons for Firefox and IE.
... much easier (in Tyler's experience) in Firefox than in IE.
... IE requires use of the COM libraries
... Firefox lacks some documentation (source browsing required in some cases to understand Firefox operation)
... IE has support for .html and .hta - where HTA provides ALL window format to be under the control of the HTML (HTA) file. This allows
... testing out of toolbars and such.

mez: not sure this covers all the testing types

serge: everything that requires attacking the user will require HIGH fidelity prototype

RobY: hoping that what we want to test for can be tested programmatically
... all should be programmatically testable (i.e. in a high-fidelity agent)

serge: clarify "doing tests" vs. "doing studies"

RobY: example: assure that a C# file did not contain X ... this can be done by a program testing for this.

mez: hold off on questions on how to do conformance testing for a half hour.
... have been assuming that someone will be testing EVERY recommendation we produce

<serge> so how do we make claims about the recommendation without doing studies?

tyler: the model is likely to be the "champion" model - whoever is most interested, will take up the flag

<beltzner> serge, you must be new to W3C recommendations!

<serge> I was on the P3P 1.1 group

<beltzner> for your wounds, sir

<serge> there's a difference between turning existing privacy standards into electronic form and recommending arbitrary design guidelines for user agents.

<serge> if we're going to make recommendations, we need some data to support them.

<beltzner> serge, I reject your argument entirely, but that's a topic for when we're drinking

tlr: there is a candidate recommendation step of going ahead further
... this could include something that requires an implementation step in order to go further
... implementation might require coaxing and encouragement
... who, at this point, is in a position to say that they might be able to start doing something?

RobY: one of the things mentioned was getting people to write the test. ... for best practices, we don't have to write the tests, we just need to let developers know what to test.

<stephenF> dinner: looking at www.boxtyhouse.ie for 6pm

<stephenF> coffee: more outside now

Yngve: when there are more particulars, we can look at getting a team involved.
... cannot say when we will be able to test.

mez: as part of going forward - there needs to be something about implementation/prototype and conformance, robustness, and usability test
... and for these three, there's going to have to be some sort of implementation
... would like to demonstrate something by our next face to face

bill-d: what does it take to get someone to sign up?

mez: people who are champions and are capable of doing it themselves (or coax someone else) are going to get their recommendations through first.

<Zakim> johnath, you wanted to point out that this is a helpful step anyhow - particularly under a champion model

jonath: echoing that some implementation is going to be a healthy thing. And having the champion make someone write it gives this a > 1 collaboration effort

serge: what's the point of doing conformance testing if we're only using this to bolster the recommendation

mez: hold conformance testing for 15 minutes

tlr: I agree on the importance of implementation
... we should not create an environment in which there are proposals which correspond to a person's personal burden to implement/push.
... must do this as a group - advance recommendations which WE agree should move forward

<johnath> staikos: conversation is already underway - backscroll should give you a good enough idea as to whether you object horribly :)

<staikos> one thing I hope will be covered, if it hasn't, is html5. I'm not sure how many of you have read the draft spec for this but it kind of turns our work on its head

mez: is there someone who thinks we can get to concensus on a recommendation without appropriate testing - please speak up.

<hal> dropping off to attend ws-sx tc call - back in 30 mins or less

<staikos> you know, with things like web pages able to open files, sockets, register themselves as protocol handlers, etc

tyler: we have to be sensitive that there are limited developer resources in the sky to work on this

<Audian> maybe not "invisible" but the lack of a favicon is hard to test...its been there for a long time and now it isn't

tlr: there may be concensus on recommendations which should be looked at further
... don't want to get bogged down on waiting for an implementation step

tyler: should we issue recommendations for things that have NOT been tested?

tlr: recommendation is the final stage on the recommendation track ... and this means it HAS been subjected to tests
... what we're working on right now is "drafts for recommendation" ... which we can document now ... without testing having been done.

<Zakim> johnath, you wanted to say that my support for the champion model didn't imply that implementation was *necessary*

johnath: if no one touches a recommendation and it drops on the floor ... then maybe that is OK. Likes the idea of champions for a recommendation

serge: seems weird to come up with draft recommendations before testing them out to see if they're useful.

<Zakim> tlr, you wanted to make an ontological point

tlr: perhaps we are again having some terminology conflicts

mez: we need a noun for what was discussed in Lightening Discussions

rachna: PROPOSALs is offered up

stephenf: not everyone is in the room - public draft is useful to get wider input

mez: believes we can put out a first draft with only expert opinions ... and hopefully by June/July
... this last discussion has been quite good
... break for 30 minutes.

<johnath> bill-d: http://www.boxtyhouse.ie/

<johnath> staikos: any recommendations for or against?

<staikos> well

<staikos> the best food I had in Dublin was at an italian cafe :)

starting up again

stephenF: 6PM - Temple Bar (10-15 minute walk)

functional and conformance testing

<stephenF> staikos: they're just getting what they asked for:-)

RobY: conformance testing is for telling testers how to test what has been defined
... do we want to test compliance to this standard - or do we just want to put it out? It is a significant amount of work to do all of this testing.
... do we need conformance testing for the developers who implement these recommendations?
... unless we put things in that say what a developer cannot do, Rob doesn't see the need for adding in conformance testing

mez: the bulk of our recommendations will be towards user agent developers ... though some recommendations will be pointed towards content providers
... but conformance testing not required for user agents/user agent developers?

RobY: the set of user agent developers is not a huge community

serge: agree with Rob that we should not focus on conformance testing

<tlr> thomas: testability of work product vs. broad-scale testing vs. ability to test a limited population

<tlr> ... when limited population, then need some tests, but don't need ability to automate that testing ...

<Mez_> http://www.w3.org/2005/10/Process-20051014/process.html

<Mez_> "Part of a Working Group's activities is developing code and test suites "

<Mez_> http://www.w3.org/QA/WG/2005/01/test-faq

<Mez_> Two types of testing are particularly helpful:

<Mez_> Conformance testing

<Mez_> Focuses on testing only what is formally required in the specification in order to verify whether an implementation conforms to its specifications. Conformance testing does not focus on performance, usability, the capability of an implementation to stand up under stress, or interoperability; nor does it focus on any implementation-specific details not formally required by the specification.

<Zakim> stephenF, you wanted to ask can we have some examples of such tests? (in a minute)

tyler: there are some well known test cases - how the browser renders certain things in certain ways
... one thing we may want to specify is that key sequences used on first authentication with a site should be different from a second or subsequent authentication to the same site.

<serge> This seems to be a matter of charter, from 1.3 in the Process Document: "The Working Group charter sets expectations about each group's deliverables (e.g., technical reports, test suites, and tutorials)."

<tlr> ... or not. ;-)

tlr: as far as conformance testing is concerned - we are not expected to build an automated test suite
... we would be required to formulate a test suite that could be followed to evaluate conformance. the test MAY consist of manual work (like examining a user interface)

dan: conformance testing is likely not as big a deal as some of the other parts of our recommendation

tlr: this is conformance testing work, but not as detailed or involved as it has seemed to be implied so far today
... lets get to writing recommendations and examples of using these (which should lead to conformance tests)

stephenF: there could be quite a bit of testing needed - lots of configuration settings and such

RobY: as it becomes more and more defined, more and more folks/companies will take interest

tlr: critical piece is to have tests and examples. More critical to have an example and a test with it than to have an implementation.
... requirement -> example + testcase -> then implementation

tyler: one place of potential problem - if something makes a request/requirement of a third party on the authenticity of something
... this would be difficult to find a non-conformant and conformant example.

tlr: two ways around that:

<stephenF> ways 1 & 2?

tlr: 1) if you speak about conformance, give a definition of "trusted". ... phrase like "There shall be a phrase or outside-managed list which is consulted."
... 2) the other way is to declar that "trusted" is defined as follows ....

<johnath> ping

InScopeByCategory

<Mez_> http://www.w3.org/2006/WSC/Group/track/actions/179

hal: should be able to just walk through the information in the wiki

<tlr> http://www.w3.org/2006/WSC/wiki/InScopebyCategory

<johnath> rachna: don't forget your DVI->VGA donglything - tlr just unplugged it

<Mez_> http://www.w3.org/2006/WSC/drafts/note/#filters

<Mez_> 5.5 Content based detection

<Mez_> Techniques commonly used by intrusion detection systems, virus scanners and spam filters to detect illegitimate requests based on their content are out of scope for this Working Group. These techniques include recognizing known attacks by analyzing the served URLs, graphics or markup. The heuristics used in these tools are a moving target and so not a suitable subject for standardization. The Working Group will not recommend any checks on the content served by web si

<Mez_> 5.5 is part of out of scope

tyler: some of these seem to line up with what has been proposed in PII Editor work

Bill-d: thought identity management systems are out of scope

stephenF: there are things for Semantic Approaches that could suggested which are out of scope (so they won't be suggested)

group: first two under semantic approaches are deemed IN scope

<hal> I am not able to hear most of the discussion

<Mez_> hold on a bit

rachna: even the third item (federated identity management) has elements in-scope (as a form-filler extension seems in-scope)

<johnath> I won't be the one to recommend OpenID as a proposal. :)

mez: our intent was to look at these today to see if there were concrete proposals which should be put forth
... hearing nothing from the group, it appears NO.
... onto the next category - What doesn't work

hal: this has had much discussion already, so let us skip
... move on to Education category
... it is unresolved whether users understand that they are making "risk management" decisions
... next category General Principles
... some of these are conflicting
... next category New Indicators

tyler: has Firefox reserved any "drawing modes" for itself?
... such as transparency?

beltzner: only thing we've reserved is chrome
... one way is to have the element cross the information boundary

tlr: we have talked about existing robustness practices
... still needs to be pulled together from raw material in the wiki
... existing practices need to be written up

rachna: history and petnames still in?

tyler: petnames are still in

tlr: see here antipatterns for SSL certificate ... but not patterns

mez: some of the positive is wrapped up in Jonathan's proposal

johnath: both identity and what is a secure page are in the recommendations

<tlr> +1 to skipping over process recommendations

hal: skip over process indicators
... final section - technical recommendations
... comprehensive architecture for web authentication is out of scope
... incorporate viable authentication techniques - should be covered
... next several are really "motherhood"
... extensibility so authentication can be continuously improved - not sure how to write a recommendation
... specify infrastructure is out of scope
... metadata has already been discussed.

<johnath> hal - ping

tlr: if there are recommendations around trusted attention sequences - then there might be a deployment recommendation that sites include certain instructions

<Mez_> welcome back hal

<Mez_> taking a minute

hal: petnames is in play
... matching certificate contents is in play
... user controlled notation is in the same vein
... default blocking mode - is like safe browsing mode proposal that is under discussion right now

<hal> hello

tyler: SSL can detect a suspected MITM attack - currently user agent pops a dialog box. Should this be switched to just being a Error 404 not found?

yngve: opera indicates that potential "eavesdropping" may be underway - so similar dialog

tlr: there is stuff in the wiki that needs to be pulled together

<tlr> ACTION-177 closed without doing

<tlr> ACTION: farrell to pick up on ACTION-177, complement with review of TLS spec and exceptions given there; goal is to limit user interaction when not needed - due 2007-06-19 [recorded in http://www.w3.org/2007/05/30-wsc-minutes.html#action16]

<trackbot> Created ACTION-240 - pick up on ACTION-177, complement with review of TLS spec and exceptions given there; goal is to limit user interaction when not needed [on Stephen Farrell - due 2007-06-19].

<tlr> ACTION-240 due 2007-06-26

<staikos> what time are you wrapping up?

<johnath> tlr: what's the urlhack to allow editing when I don't have the "edit this action" link?

<tlr> append /edit

<beltzner> staikos: 12:30 EDT

<staikos> heh guess it's not worth calling now

<tlr> but if you don't have that link, it means you're looking at the public version

hal: secure letterhead is something still in play

<johnath> tlr: so how do I log in to the action tracker?

hal: Service Security Requirement (SSR) record in DNS proposal - should we work on it?

mez: appears to be no interest from the group.

hal: leverage new features (from workshop)

beltzner: xul:browsermessage - it's the mark up which indicates what comes up when a "pop-up" is blocked or if something should be installed

<Mez_> rachna

<Mez_> is talking

rachna: APIs for anti-phishing? This could be APIs for third party services

mez: no comments or interest reflected by the group

day One wrap-up

mez: agenda for tomorrow - lead off on logistics for next face-to-face
... tyler on remaining Note issues that we have
... bulk of the day walking through the editor's draft

<beltzner> it doesn't appear that the xul:notificationbox can be used in content

<beltzner> reference is here http://developer.mozilla.org/en/docs/XUL:notificationbox

adjourned for the day

<Mez_> http://www.boxtyhouse.ie/

Summary of Action Items

ACTION-228 - Share slides about usability testing from dublin f2f [on Rachna Dhamija - due 2007-06-06].

ACTION-229 - Share his slides on robustness testing from the dublin f2f [on Bill Doyle - due 2007-06-06].

ACTION-230 - Define robustness for WSC glossary [on Bill Doyle - due 2007-06-06].

ACTION-231 - Start a discussion about including descriptions of the information divulged to websites by user-agents [on Bill Doyle - due 2007-06-06].

ACTION-232 - Share results from his study once he has them [on Serge Egelman - due 2007-06-06].

ACTION-233 - Make sure Jagatic et al on social phishing is in SharedBookmarks [on Rachna Dhamija - due 2007-06-06].

ACTION-234 - Add www2006 jakobsson, Florencio & Hursley MSR paper to our shared bookmarks list [on Rachna Dhamija - due 2007-06-06].

ACTION-235 - Update / create a user testing timeline with things like IRB turnaround, setup, etc. [on Rachna Dhamija - due 2007-06-06].

ACTION-236 - Track donations of time and resources for usability testing [on Rachna Dhamija - due 2007-06-06].

ACTION-237 - Drive process of tying recommendations to references in SharedBookmarks [on Maritza Johnson - due 2007-06-06].

ACTION-238 - Create and document user testing plan (with links to timeline, donations, prototypers, etc) [on Rachna Dhamija - due 2007-06-06].

ACTION-240 - pick up on ACTION-177, complement with review of TLS spec and exceptions given there; goal is to limit user interaction when not needed [on Stephen Farrell - due 2007-06-19].

WSC WG face-to-face
30 May 2007

Attendees

Contents

Agenda Bashing

Conformance model

Intermediate agenda bashing

Robustness Testing

security usability testing

implementation / testing / etc

functional and conformance testing

InScopeByCategory

day One wrap-up

Summary of Action Items

WSC WG face-to-face 30 May 2007

Attendees

Contents

Agenda Bashing

Conformance model

Intermediate agenda bashing

Robustness Testing

security usability testing

implementation / testing / etc

functional and conformance testing

InScopeByCategory

day One wrap-up

Summary of Action Items

WSC WG face-to-face
30 May 2007