W3C

Evaluation and Report Language (EARL) 1.0

(not yet a) W3C Working Draft 4 October 2002

This version:
http://www.w3.org/WAI/ER/2002/10/WD-EARL10-20021004.html
Latest version:
http://www.w3.org/WAI/ER/EARL/
Previous version:
http://www.w3.org/WAI/ER/2002/06/21-earl1.html
Editors:
Wendy Chisholm, W3C
Sean B. Palmer

Abstract

This is a W3C Working Draft produced by the Evaluation and Repair Tools Working Group (ERT WG). The purpose of this document is to explain how and why to use Evaluation and Report Language (EARL) 1.0. The ERT Working Group encourages feedback about this document as well as implementation of the language in authoring tools, testing tools, search engines, and other relevant tools.

Evaluation and Report Language (EARL) is a general-purpose language for expressing test results. This specification describes how to use EARL to describe test results and defines a basic vocabulary for this purpose.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This draft document may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Send comments about this document to the Evaluation and Repair Tools Working Group. The archives for this list are publicly available.

This document has been produced as part of the W3C Web Accessibility Initiative (WAI). The ERT WG is part of the WAI Technical Activity. The goals of the ERT WG are discussed in the Working Group charter.


Table of Contents


1. Introduction

Evaluation and Report Language (EARL) is a language to express test results. Tests can include a variety of scenarios from bug reports to testing software or Web content for conformance to a specification.

Recording test results in EARL creates a variety of opportunities. The data can be--

@@ s/express/make statements about??

@@s/Recording/Stating??

How EARL fits into the testing process

The following figure illustrates a generalized end-to-end test process. (description).

An illustration of an end-to-end testing process

The following roles are common in software development.

In small organizations, all of these roles might be performed by a single person. In large organizations, there might be multiple people in each role who need to coordinate.

Product developers may accumulate evaluations from a variety of testers. Machine-comprehensible exchange of this information allows the developer or manager to more easily collect and compare this data. Having the data in a machine-understandable form supports the following possible work-flow:

  1. Tester tests software using an evaluation tool.
  2. Tool stores data in EARL.
  3. Developer imports tests results into development tool.
  4. Developer makes repairs.
  5. Tool stores data in EARL.
  6. Manager keeps track of tester and developer data and is able to track the progress of tests and repairs.

A variety of user scenarios are covered in more detail in the "User Scenarios" chapter.

An RDF Vocabulary

This section attempts to describe RDF and EARL in non-technical terms. For a more technical RDF Primer, refer to [RDF-PRIMER].

intro paragraph attempt #1: Resource Description Framework (RDF) is a general-purpose language for describing information on the World Wide Web. The purpose of RDF is to make machine-readable information on the Web machine-understandable. A description of information (data about data) is called metadata. EARL is a vocabulary to express metadata that describes how a resource performed against a test.

intro paragraph attempt #2: Resource Description Framework (RDF) is a general-purpose language for describing information. RDF uses the World Wide Web as a venue for publishing and exchanging information. The purpose of RDF is to make machine-readable information machine-understandable. EARL is an RDF vocabulary used to make statements about how a resource performed against a test.

RDF uses "triples" to describe information. A triple is a statement that contains a subject, a predicate (a verb), and an object. The simplest statement in EARL is:

an Assertor ---asserts---> an Assertion

The following diagram illustrates a basic EARL statement. The assertor and assertion resources are "typed." When making information machine-understandable, the rules and relationships between pieces of information are declared. @@This gets into the class structure used in RDF. (SVG version of abbreviated basic assertion).

A graph of a simple assertion. Namespaces have been abbreviated for clarity.

"rdf:" and "earl:" are abbreviations for the namespaces that uniquely identifies where the data originates or where that type of data is defined. @@namespaces primer?

The following diagram shows the statement with the namespaces spelled out. (SVG version of basic assertion with full namespace references).

Illustration of a basic assertion

For more information on RDF, please refer to the following references:

@@ include stuff from:

@@dave's image of "test environment architecture"

2. User Scenarios

This section outlines some typical examples of how EARL could be used and by whom.

1 - Web accessibility consultant

(single user, multiple tools, single site)

Scenario: A consultant evaluting a client's Web site uses a variety of evaluation tools to generate an accessibility report of the site. Where more than one tool performs the same test, the consultant wants to compare the results of the test between tools.

Checking for accessibility is similar to using a spell-checker on a document. There are some spellings that the spell-checker knows are wrong but there are many others that may not seem right, but it requires the human to say for sure. For example "teh" is likely a typo for "the" while people's names are likely to be identified as misspellings.

Accessibility is similar in that the tools can only perform some of the checks and rely on a human to perform the final check. In matters of syntax, a machine can be confident in results, but beyond syntax the human needs to make an assertion. For example, determining if an image has a text equivalent is a matter of syntax (i.e. rule matching) while determining if the text equivalent is appropriate is only something a human can do - to determine if the meaning of the image is properly conveyed in text.

@@describe when EARL is generated, where it is stored, how it is used (for each scenario).

Questions that the consultant should be able to derive from EARL statements:

The consultant should be able to use the EARL data to programmatically derive a report for the client.

@@earl example - graph of combined results of tool A and tool B.

2 - Web site developer

Scenario: A developer maintaining a company's Web site fixes bugs reported by a team of testers. Where a tester has identified a bug, the developer should be able to answer the following questions from the EARL generated by the test team's tools:

The developer may combine the testers' data with data from the project history or with other developers and answer the following questions:

3 - W3C Working Group

Scenario: A W3C Working Group is trying to meet their exit criteria for Candidate Recommendation by developing a test suite to show at least two independent implementations. As a UA is tested, the results are stored in EARL. Periodically, the working group will ask the following questions to see how much more implementation work is needed:

@@earl example: graph of data, chart of generated result? (refering to test suites. CSS test suite data (currently on IE6, add other UAs))

4 - User agent developer

(many sets of tests, one UA)

Scenario: A user agent developer wants to determine conformance that can be claimed for her user agent product. Using the data generated by the working group as they tested her product against their test suite, she can find out:

@@earl example: graph of data, chart of generated result?

5 - Student taking an online assessment

Scenario: A student, who is deaf, is using an online education tool and needs to take an assessment. The system constructs the assessment from a set of existing assessment pieces for the current lesson. The student has a learning profile that is matched against what tests need to pass. Matching the student's profile (not EARL) against the accessibility profile of the data (EARL), the education tool is able to assemble an assessment that does not use sound and presents all information visually.

@@earl example: something to point to within IMS work?

6 - Managing a Web site

Scenario: A Web site development unit that includes database developers, Web page developers and quality assurance testers synthesizes design from the public relations office and content from the operations units. Multiple tests conducted by the development unit, public relations office, and the operations units need to share results to report successful development or specific points of test failure and track status of work on repair of failures. The manager needs to track answers to the following questions:

@@earl example: be able to derive a management chart of some sort from EARL reports between time x and time y?

3. Classes

/* brief intro of RDF class/property model (if haven't already describe in the "RDF Vocabulary" section. If have, then point to that?) */

Assertion

An assertion is a statement about the results of performing a test.

An assertion can have the following properties:

Here is an example assertion block:

<earl:Assertion rdf:about="http://example.org/#assertion-1">
  <earl:subject rdf:resource="http://example.org/#someID02495"/>
  <earl:result rdf:resource="pass"/> 
  <earl:mode rdf:resource="&earl;manual"/> 
  <earl:testcase rdf:resource="http://example.org/#tc-1"/>   
  <earl:assertedBy rdf:resource="http://example.org/#assertor123" />
</earl:Assertion>

Assertor

An assertor states the results of a test (i.e. an assertor asserts and assertion). An assertor may be a person or a machine.

Subclasses of the Assertor class

Person
The Assertor is a human being.
Tool
The Assertor is a tool, such as: a black box testing tool of some sort or an evaluation and repair tool.

The assertor in the following example is a person and therefore Person (a subclass of Assertor) is used to describe the assertor.

<earl:Person rdf:about="http://example.org/#assertor123">
  <earl:name>Bob B. Bobbington</earl:name>
  <earl:email rdf:resource="mailto:bob@example.org"/>
</earl:Person>

TestSubject

The class of things that have been evaluated. It needs to be qualified with some type of information in order to make it unambiguous. You may use an unambiguous property, or unambiguous constellation of properties.

Subclasses of the TestSubject class

Tool
A tool. Most likely a piece of software such as an authoring tool, or evaluation and repair tool.
UserAgent
A piece of software used to access information on the World Wide Web.
WebContent
Information on the World Wide Web.

The subject in the following example is Web content and therefore WebContent (a subclass of TestSubject) is used to describe the test subject.

<earl:WebContent rdf:about="http://example.org/#someID02495">
  <earl:reprOf rdf:resource="http://www.w3.org/" /> 
  <earl:date>2001-05-17T23:07:35Z</earl:date> 
</earl:WebContent>

ResultProperty

The result of the test.

Instances of the ResultProperty

Properties of result

The following example shows the validty, confidence, and message properties applied to a result:

<earl:result rdf:parseType="Resource">
  <earl:validity rdf:resource="&earl;fail"/>
  <earl:confidence rdf:resource="&earl;high"/>
  <earl:message>malformed element in line 23</earl:message>
</earl:result>

TestCase

A TestCase is a resource that another resource is validated against - a test that can either be passed or failed. This may in fact include many things - validation classes, code test cases, or more subjective guidelines such as WCAG.

<earl:Testcase rdf:about="http://example.org/#tc-1">    
  <earl:testId rdf:resource="http://example.org/MyTestCaseThingy-1" />  
</earl:Testcase>

4. Properties

Assertion Properties

assertedBy

For earl:assertedBy(y,x), the assertor (x) asserts the assertion (y).

result

The result of the test. Refer to TestResult for possible values.

subject

That which is being tested.

testCase

The test that the test subject is put to.

testMode

testMode indicates if the test was conducted manually, automatically, or derived from other test results (heuristic).

5. Extensibility

@@describe how EARL can be extended. Use PageValet and HiSoft as examples? Create others?

6. Examples

User Scenarios

These used to be in the "user scenarios" section at the top of the document, but I've moved them back here because that earlier section was getting too long and I wanted to focus the reader on the most common scenarios. It's good info, so I didn't want to lose it, but don't think it is needed for the general understanding and application of EARL.

@@for each user scenario described above, create a sample piece of EARL.

Web site tester

Scenario: A tester evaluating a company's Web site uses a variety of testing tools to discover possible bugs on the site.

"power developer" that uses programming tools to produce site, not a WYSIWYG editor commonly used by less technical folk. QA folk use different tools than developers. (@@Jenae Andershonis would be good reviewer for this scenario)

queries:

Usability testing - manual tests

Scenario: test subjects evaluating sites - usability testing. e.g. together people make an assessment about alt-text. @@schema - enough info about test environment? (@@description of person would be the needs, testSubject which tests were relevant to that issue)

queries:

Certifier

Scenario: Person derives conformance claim from test data. In other words, they derive EARL statements from EARL statements by asking the following questions:

@@issue? confidence ratings? and how to use them.

@@schema - confidence. do we need it? should it be on test description rather than results. the only use case i can think of is covered by Can't Tell So far I have seen them used in Page Valet to indicate how commonly things that might cause problems actually do.

Grading students

Scenario: test results across semester.

queries

Change organizational policy to meet new requirements

Scenario: Need to meet a new set of requirements. Query existing results, using a new expression of how to derive a result, to see if there is any new testing missing

Information harvesting

Scenario: A robot tries to grab contact info from Web pages. It tracks which pages fail and which tools fail. (ala Nick Gibbons scenario)

7. Contributors

Giorgio Brajnik, Dan Brickley, Daniel Dardailler, Nick Gibbins, Al Gilman, Nadia Heninger, Ian Hickson, Leonard Kasday, Nick Kew, Jim Ley, William Loughborough, John Lutts, Charles McCathieNevile, Libby Miller, Tom Martin, Sean B. Palmer, Dave Pawson, Eric Prud'hommeaux, Chris Ridpath, Aaron Swartz, Rob Yonaitis

8. References

9. Appendix A: EARL 1.0 Schema

EARL 1.0 Schema available in XML RDF and n3. (@@link to once publish on site)

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE rdf:RDF [

 <!ENTITY earl 'http://www.w3.org/WAI/ER/EARL/nmg-strawman#'> 
     <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'> 
     <!ENTITY rdfs 'http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'>
]> 
<rdf:RDF xmlns:earl="&earl;"  
         xmlns:rdf="&rdf;" 
         xmlns:rdfs="&rdfs;"> 
<!-- Classes -->
  <rdfs:Class rdf:about="&earl;Assertion" rdfs:label="Assertion">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;Assertor"  rdfs:label="Assertor">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;ConfidenceLevel" rdfs:label="ConfidenceLevel">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/>
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;Platform" rdfs:label="Platform"> 
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;TestCase"  rdfs:label="TestCase"> 
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;TestResult"  rdfs:label="TestResult">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/>
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;TestSubject" rdfs:label="TestSubject"> 
     <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;Tool"  rdfs:label="Tool"> 
       <rdfs:subClassOf rdf:resource="&earl;TestSubject"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;UserAgent"  rdfs:label="UserAgent">
       <rdfs:subClassOf rdf:resource="&earl;TestSubject"/>
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;ValidityLevel" rdfs:label="ValidityLevel"> 
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;WebContent"  rdfs:label="WebContent">
       <rdfs:subClassOf rdf:resource="&earl;TestSubject"/>
  </rdfs:Class> 
<!-- Properties -->
  <rdf:Property rdf:about="&earl;assertedBy" rdfs:label="assertedBy">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;confidence"  rdfs:label="confidence">
       <rdfs:range rdf:resource="&earl;ConfidenceLevel"/> 
       <rdfs:domain rdf:resource="&earl;TestResult"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;contactInfo"  rdfs:label="contactInfo">
       <rdfs:range rdf:resource="&rdfs;Resource"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;email"  rdfs:label="email"> 
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
       <rdfs:subPropertyOf rdf:resource="&earl;contactInfo"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;format"  rdfs:label="format">
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;WebContent"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;message"  rdfs:label="message">
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;TestResult"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;name"  rdfs:label="name">
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;platform"  rdfs:label="platform"> 
       <rdfs:range rdf:resource="&rdfs;Resource"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;reprOf"  rdfs:label="reprOf"> 
       <rdfs:range rdf:resource="&rdfs;Resource"/> 
       <rdfs:domain rdf:resource="&earl;WebContent"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;result"  rdfs:label="result">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;TestResult"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;subject"  rdfs:label="subject">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;TestSubject"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;testcase"  rdfs:label="testcase">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;TestCase"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;validity"  rdfs:label="validity">   
       <rdfs:domain rdf:resource="&earl;TestResult"/> 
       <rdfs:range rdf:resource="&earl;ValidityLevel"/> 
  </rdf:Property>
<!-- Instances of Classes -->
  <earl:ValidityLevel rdf:about="&earl;cannotTell"  rdfs:label="cannotTell"/>
  <earl:ValidityLevel rdf:about="&earl;fail"  rdfs:label="fail"/>
  <earl:ConfidenceLevel rdf:about="&earl;high"  rdfs:label="high"/>
  <earl:ConfidenceLevel rdf:about="&earl;low"  rdfs:label="low"/>
  <earl:ConfidenceLevel rdf:about="&earl;medium"  rdfs:label="medium"/>
  <earl:ValidityLevel rdf:about="&earl;notApplicable" rdfs:label="notApplicable"/>
  <earl:ValidityLevel rdf:about="&earl;notTested"  rdfs:label="notTested"/>
  <earl:ValidityLevel rdf:about="&earl;pass"  rdfs:label="pass"/> 
</rdf:RDF>

11. Appendix C: History and Background

The story of earl....

12. Appendix D: Differences between 0.95 and 1.0