Usability Evaluation - First Cut

This is a working document to review the proposed recommendations from a usability perspective. Our goal is to find similarities, shared assumptions and differences among the proposals. We describe each according to the following outline:

Requirements - What pre-conditions are necessary to study this proposal (e.g. system requirements)? What user population is assumed (skills, level of expertise, training, etc)?
User Expectations - What expectations does the proposal make about user behavior?
Relevant Literature - in this section, we will map the assumptions discussed above to relevant literature in usable security and HCI-Sec.
1. What is known? Do we have any existing experimental or observational data about the user expectations in this proposal?
2. What is unknown? What are the unanswered research questions?
Testing
1. Possible study hypotheses and designs- New study designs, or tie ins with existing or ongoing studies.
2. What we will need to study this (missing details, prototype, infrastructure, etc)
3. Heuristic Evaluation where appropriate (Note: most of the proposals are not detailed enough to do a heuristic evaluation)

PII EditorBar

Requirements - this proposal assumes that:
1. Users have the extension installed in their browser.
2. Users are novice users- they are not security experts and received no training.
User Expectations- this proposal assumes that:
1. Users will complete the bootstrap scenario (e.g. they will select petnames for sites they care about).
2. When a user wants to fill out a web form, the user will enact the attention sequence key to move from the web form to the editor bar.
3. The user will notice spoofing scenarios and won't submit PII data to a webpage when:
  1. an illegitimate mimic page requests information that has previously been submitted to the legitimate website
  2. an illegitimate page tries to convince the user to create a new relationship with the site. Either:
    1. The user will remember that they previously created a relationship with this website and be suspicious, OR
    2. The browser will detect the reuse of a petname, because the user will rename the same site with the same petname as used previously.
4. Users will pick unique petnames, and they won't be tricked by PII editor bar spoofing that attempts to mimic their customization.
Relevant Literature
1. What is known
  1. Can users use the activation sequence correctly? One prior study (A Usability Study and Critique of Two Password Managers by Sonia Chiasson and P.C. van Oorschot) indicates that users have trouble remembering to activate an attention sequence, activating it at the right time (e.g., when focus is in the text box), or in knowing when it has been activated. For example, in the study of Password Multiplier, they found that users would not know if they had entered the attention sequence, even though the feedback of that was obvious (a pop up a dialog box). The feedback of the attention sequence in the PII editor bar is more subtle (focus moves from the form to the bar), so we expect that many users will have a problem with this.
  2. Are users willing to make site specific nicknames? From deployments of the petname toolbar (e.g., at HP), we can get an estimate of how many users gave a petname to what sites, and how many chose the same petname for a given site.
2. Unknown
  1. Will users be willing to use the attention sequence for each form field? Shifting focus away from each field may require more effort than the effort of typing, because it is cumbersome and a new skill (most users do not use keyboard shortcuts and use the mouse to move between fields).
  2. Will users correctly fulfill the behavioral expectations described above and not succumb to spoofing attacks?
Testing
1. We require a prototype implementation of this proposal to do further analysis, because much of the usability and security will depend on specific design decisions.

Identity Signal

A. Requirements - this proposal assumes that:

Users are novice users- they are not security experts and received no training.

A. User Expectations- this proposal assumes that:

Users will notice the identity signal if it is primary chrome.
Users will seek out the identity signal information if it is in secondary chrome.
Users will understand the difference between the identity-info-available and identity-info-not-available indicators (we assume that the latter is an actual indicator and not an absence of the former indicator).
Users will be able to distinguish between the identity signal indicator in primary chrome and a spoofed copy of the indicator that appears in the page content.

Relevant Literature
1. What is known
  1. In current browser interfaces, we know that most users do not notice information in primary chrome. This is partially because the indicators are in the periphery of the user's area of focus, and because the indicators are meaningless to the user (e.g., URL bar is hard to interpret, users don't know what a yellow background in the address bar means).
  2. In current browser interfaces, we know that most users do not seek information in secondary chrome. This is partially because the affordances for doing so are not obvious (most users don't notice the lock or know that they can click on the lock) and because the resulting information is not easy to understand (e.g., certificate info).
  3. Most users are not suspicious and do not question the identity of the sites they visit [Why Phishing Works, Dhamija, et al.]
  4. If users are used to seeing an identity-info-not-available indicator regularly and for legitimate sites, their suspicions will not be aroused when they see this indicator on illegitimate sites.
2. Unknown
  1. If peripheral identity indicators are made more understandable, will they be noticed and relied upon by users?
  2. If indicator affordances are made obvious and the resulting information is understandable, will users seek identity information?
  3. In one study (Why Phishing Works, Dhamija, et al) 50% of users did not notice a picture in picture attack where a spoofed positive indicator (Firefox SSL yellow address bar) was presented in the content of an illegitimate website (with no yellow address bar in the real chrome). How does the presence of a negative indicator compare to the absence of a positive indicator? (check: Does Jackson study address this?)
Testing
1. The goal of this proposal is not to prevent spoofing attacks, but only to aide users in finding identity information and give them an easier way to do so. Whether this proposal accomplishes that will depend on the particular implementation of this idea.
2. Is there a way to combine this proposal in a comparative study with the other EV related proposals?

Revisiting Past Decisions

Assumptons
1. This proposal requires that a trust decision made at one time not persist past that time, or that the user be able to revisit this trust decision. This assumes that users want to revisit decisions they have made in the past.
Relevant literature
1. Some literature on adaptive interfaces discusses the opposite problem. That is, once a user makes a decision, how can the system learn from that decision so that the user does not have have to revisit similar decisions over and over again (For example, when a user decides to accept or reject a cookie, the system can learn from a few instances and create a cookie acception policy).
2. This proposal might be combined with a proposal for *reducing* trust decisions. When making a trust decision, we expect that most users would prefer to be given the option to revisit their decisions, but not be *required* to.
Testing
1. This proposal is hard to evaluate without any specific examples or implementation. It appears to be a complementary feature to many proposals.

Page Security Score

Requirements- this proposal assumes that:
1. users may be novice or advanced (novice or unmotivated users are not assumed to benefit as much from this proposal).
User Expectations
1. Users will notice the summary indicator in primary UI.
2. Users will make decisions about online behavior based on the level of security risk as indicated by the possible summary indicators (e.g., colors, numerical score ranges)
3. Users will make meaningful distinctions between the summary indicators.
4. Technical users will examine the raw page score and find this useful in adjusting their online behavior (or in advising others?)
5. Users will not be fooled by the image of the most trustworthy indicator in the content of an untrustworthy page.
Relevant Literature
1. There are many studies on binary indicators (lock/broken lock/no lock), three way indicators (red/green/white IE7 address bar, traffic light).
2. It is well known that color is not an effective way to indicate numerical quantity or rank (the exception to this is a gradient of one color- e.g. greyscale). The temperature bar is a more effective technique. No studies exist in the computer security indicator literature. We might be able to draw on literature of other warning indicators.
Testing
1. What is the best type and number of indicators to convey security score?
2. Do users make meaningful distinctions between the indicators?
3. What is the distribution of indicators? If the majority of sites that users visit are in the lower ranges, will they habituate to ignore them the same way they have with binary and colored indicators?
4. It is likely that if the scores change with user behavior (if the score increases with repeat visits), that the perception of the indicator will be very different than existing indicators which are based on external factors.
5. There are a number of interesting things to test here, which can be studied very easily using low-fidelity prototyping methods.

Security Protocol Error Messages

Requirements:
1. This proposal assumes average users, because these users fail to understand current SSL warnings/indicators
2. This proposal assumes they will be using their regular browsers
3. This proposal assumes that these new warnings will not interfere with advanced users
Expectations- this proposal assumes that:
1. current SSL icons go unnoticed by average users
2. SSL warnings are dismissed by most users without reading them
3. users will notice active warnings that interrupt the primary task, albeit if displayed rarely
4. only a subset of current SSL warnings need to be displayed for most users
5. many current SSL errors can be eliminated due to the actual risk of an attack
Relevant literature:
1. What is known:
  1. Users do not know to look for lock icon or HTTPS, unless they are primed for security. Additionally, *very* few users understand how to read relevant information from a certificate (http://tjwhalen.googlepages.com/eye-tracking_gi.pdf).
  2. Users are habituated to clicking yes to dialog boxes regarding security. They will not read the dialog, and instead find a way of dismissing and continuing on to their primary task (http://cups.cs.cmu.edu/soups/2007/proceedings/p76_brustoloni.pdf).
  3. They will easily fall for MITM attacks because of habituation and because they do not understand risk or the concept of certificates (http://www.cs.pitt.edu/~jcb/papers/www2005.pdf).
2. What is unknown:
  1. How do expert users interact with current SSL indicators and warnings?
  2. What is the likelihood of the consequences of ignoring an SSL warning?
  3. For critical SSL errors, will users obey the warnings if they interrupt the primary task?
- Thus, SSL warnings should be separated into two classes: those that only expert users will care about (passive indicators), which will have reasonably low risk when ignored, and high risk active indicators where the primary task is interrupted.
Testing:
- The study will be done in three parts: a survey of IT professionals, an in-the-wild study of security conscious users, and a laboratory study of active SSL warnings.
- The survey will be online and sent to IT professionals who have no doubt dealt with SSL warnings in the past. We will show them screenshots of these warnings, ask them what they mean, what actions the warning wants them to take, and the perceived risk level. This will tell us how IT professionals deal with the current warnings, and will also give some insights into actual vs. perceived risk with SSL warnings.
- I currently run a Tor exit node. I plan on running a MITM attack on a certain percentage of SSL connections, substituting my own self-signed certificate when a user initiates an SSL connection. We can be reasonably assured that the browser will warn the user and can then examine whether they heed the warning and navigate away, or ignore the warning and proceed. This will tell us how security-conscious users interact with warnings regarding self-signed certificates, which can be compared with the survey data.
- I plan on designing my own active SSL warnings to be displayed when a user encounters a critical SSL error. This warning will be designed similar to the ones used by IE7 and Firefox 2 that warn about phishing. One use case for this warning is a user visiting a site with a revoked certificate or a domain mismatch. Results from this study will be compared to those of the previous study to see if more users heed these warnings.

EV Certs, Secure Letterhead, Favicons and Certificate Logos

* All of these proposals rely on the same assumptions and concepts, so we can test them together.

Requirements:
1. This proposal assumes average users
2. This proposal assumes it will be integrated in most web browsers
Assumptions- these proposals assume that:
1. users will notice passive indicators in chrome
2. users will not fall for spoofed indicators
3. users can tell the difference between indicators in chrome and indicators in content
4. only sites with true EV certs can modify the chrome to display these particular indicators
5. only reputable websites can obtain EV certs
6. users can be trained to look for EV certs
Relevant literature:
1. What is known:
  1. Users do not look at passive indicators when they are not explicitly told to look for security indicators (http://tjwhalen.googlepages.com/eye-tracking_gi.pdf).
  2. Users do not notice and do not trust passive indicators used for phishing (both positive and negative; http://www.simson.net/ref/2006/CHI-security-toolbar-final.pdf)
  3. Users do not understand the difference between chrome and content (http://people.deas.harvard.edu/~rachna/papers/why_phishing_works.pdf; http://cups.cs.cmu.edu/soups/2006/proceedings/p79_downs.pdf)
  4. Users are unable to notice picture-in-picture attacks (again, they cannot distinguish between chrome and content; http://usablesecurity.org/papers/jackson.pdf), nor do they notice the lack of EV information in the address bar.
2. What is unknown:
  1. If properly trained, will users be able to identify EV certs and notice sites without them?
  2. Assuming users can be trained properly, will they fall for other attacks, such as installing new root certificates or other plugins that defeat the purpose of the EV certs?
  3. Can a malicious individual purchase an EV cert anonymously?
Testing:
- We will conduct a study in three parts, to examine the last three assumptions:
- Users will come in to our laboratory and given a tutorial on EV certs. They will be trained to look for them in the address bar, and to only transact with websites with EV certs. They will be given tests to determine how well they understand these concepts. Only after passing these tests will they begin the second phase of the test. This will show if they are capable of learning to look for EV certs.
- In phase two, we will create a shopping task where a user will interact with a site without an EV cert. The experimental group will be displayed a popup box that says something along the lines of "to turn the address bar green, click here to install this plugin." The link will go to a browser extension that, if installed, will turn the address bar green for that particular site. The control group will not see this link. Another possible experimental group could be shown a picture-in-picture attack. If users install the plugin, this will show that regardless of whether we can train people to use EV properly (and consistently), there will always be ways around it.
- The final phase of the experiment will examine the issuing procedures for EV certs. If a cert can be issued anonymously, then it doesn't matter if we can train users to look for them or not. Phishing sites will start using EV certs if they come into widespread usage.

Hypothesis:
- Even if users can be trained to look for EV certs, they will still fall victim to social engineering and spoofing. But this may all be moot if it's possible for an attacker to buy an EV cert anonymously.

Safe-Browsing Mode

Requirements.
1. Target user group - the motivated user. This is assumed to mean a user who is concerned enough with the security of the accounts they access online to be willing to take extra security precautions before conducting sensitive transactions.
User Expectations
1. Users will invoke SBM before sensitive transactions
2. Users will populate their own white list of sites to access using SBM
3. Users will learn to perform all sensitive transactions only in SBM
4. Users will be able to differentiate between regular browsing and SBM
5. Users want website identity assurance enough to take active precautions to check site identity.
Relevant Literature
1. What is known
  1. An evaluation of WebWallet showed users could be trained to invoke a secondary password manager before entering credentials at a site. For the motivated user, an evaluation of assumption 1A might show similar results.
  2. Survey results show users want security over convenience. Also, the number of customers willing to pay for a secureID token for login provides additional evidence that users are willing to take extra measures.
  3. Many studies show users are unable to differentiate between content and chrome and are highly susceptible to visual spoofing attacks so assumption 1D will likely not hold.
  4. A consistent interface across browsers will help avoid unnecessary confusion for the user and minimize attacker's opportunities to spoof the interface.
2. What is not known
  1. Would a different look and feel be enough to differentiate between safe browsing mode and regular browsing.
  2. Will users be able to create their own white list? What will be the user behavior when they attempt to access a site that safe browsing mode won't display?
  3. Is it sufficient/acceptable to provide a solution only for the motivated user?
Testing
1. A study to evaluate SBM might include asking a user to complete a number of tasks both at real and phishing sites. They would be briefed with instructions on using SBM beforehand and would need to populate their white list while they completed the tasks. It would be interesting to see which sites they attempted to access using SMB. This would also serve as an evaluation of the white list creation process. It could also be used to gather feedback on what the user would do when faced with a site they couldn't access through SBM.
2. Hypothesis: SBM will fail if the user can not distinguish SBM from regular browsing mode. User confidence in SBM will be greatly decreased if an attacker can convince them they are in SBM when they are not.
3. Note on testing: Before continuing with a usability evaluation a design for the SBM mode interface is necessary. The process for adding and removing sites on the whitelist needs to be outlined.

BMA Browser Recs

It is unclear what the recommendation is from the text in the Wiki. The proposal needs to be edited to make a concrete recommendation.

What is a secure page

(Maritza's WIP)

BrowserLockDown

Note: It's unclear whether this proposal is being suggested only for organizations where users are forced to follow specific policies, or if this would also be used by the at home user who more or less decides their own policy. Some of these comments may not apply if this proposal isn't intended for the average (at home) user.

A. Requirements - this proposal assumes that:

Each site a user visits will be assigned to be accessed in the correct mode.
It's possible to distribute profiles within a community in a secure manner.

A. User Expectations- this proposal assumes that:

Users will be capable of assigning a profile to each domain they visit (outside a closed environment like work).
Users will be able to install new profiles and initialize a white list for domains where it should be used.
Users will not disable or override the settings when a profile prevents them from completing a task.

Relevant Literature
1. What is known
  1. When users are prevented from completing a task without a comprehensible reason why, they will disable the feature, or continue and investigate further to see if they agree. The interface suggested in the proposal for showing the reason why a site was blocked would need to give enough information to convey the reasoning behind the block so it doesn't end up being just a way for the user to know which profile to disable.
2. Unknown
  1. How users will react to an implicit policy.
  2. What security decisions can be generalized and are there any that require an actual user's response? Are there any decisions a user would prefer to make?
Testing
- Note: Will the user know which mode they are browsing in, or would this information only show up in the interface that displays why a site was blocked? From the recommendation it sounds like the policy will be enforced without much user interaction, if this isn't the case, more detail should be included in the recommendation, and this should be rewritten.
  1. If the policy is completely implicit to the user, the setup process would need to be evaluated with a group of participants who are knowledgeable enough to either create the profile or install one from a list of profiles our group provides.