Evaluation and Report Language (EARL) 1.0

1. Introduction

Evaluation and Report Language (EARL) is a language to express test results. Tests can include a variety of scenarios from bug reports to testing software or Web content for conformance to a specification.

Recording test results in EARL creates a variety of opportunities. The data can be--

exchanged between tools;
used to create reports;
combined to compare how different test subjects fared on the same test.

@@ s/express/make statements about??

@@s/Recording/Stating??

How EARL fits into the testing process

The following figure illustrates a generalized end-to-end test process. (description).

An illustration of an end-to-end testing process

The following roles are common in software development.

product manager - responsible for delivering the product
product designer - designs the product and documents this in a design specification
tester - takes the product through a series of tests to find bugs
developer - creates something to satisfy the design specification

In small organizations, all of these roles might be performed by a single person. In large organizations, there might be multiple people in each role who need to coordinate.

Product developers may accumulate evaluations from a variety of testers. Machine-comprehensible exchange of this information allows the developer or manager to more easily collect and compare this data. Having the data in a machine-understandable form supports the following possible work-flow:

Tester tests software using an evaluation tool.
Tool stores data in EARL.
Developer imports tests results into development tool.
Developer makes repairs.
Tool stores data in EARL.
Manager keeps track of tester and developer data and is able to track the progress of tests and repairs.

A variety of user scenarios are covered in more detail in the "User Scenarios" chapter.

An RDF Vocabulary

This section attempts to describe RDF and EARL in non-technical terms. For a more technical RDF Primer, refer to [RDF-PRIMER].

intro paragraph attempt #1: Resource Description Framework (RDF) is a general-purpose language for describing information on the World Wide Web. The purpose of RDF is to make machine-readable information on the Web machine-understandable. A description of information (data about data) is called metadata. EARL is a vocabulary to express metadata that describes how a resource performed against a test.

"describing information on the World Wide Web" is ambiguous. RDF can describe people, places, and things, not only digital information.
Thus, the term "metadata" does not really fit.

intro paragraph attempt #2: Resource Description Framework (RDF) is a general-purpose language for describing information. RDF uses the World Wide Web as a venue for publishing and exchanging information. The purpose of RDF is to make machine-readable information machine-understandable. EARL is an RDF vocabulary used to make statements about how a resource performed against a test.

got rid of the use of metadata
tried to make the use of the web less ambiguous.

RDF uses "triples" to describe information. A triple is a statement that contains a subject, a predicate (a verb), and an object. The simplest statement in EARL is:

an Assertor ---asserts---> an Assertion

The following diagram illustrates a basic EARL statement. The assertor and assertion resources are "typed." When making information machine-understandable, the rules and relationships between pieces of information are declared. @@This gets into the class structure used in RDF. (SVG version of abbreviated basic assertion).

A graph of a simple assertion. Namespaces have been abbreviated for clarity.

"rdf:" and "earl:" are abbreviations for the namespaces that uniquely identifies where the data originates or where that type of data is defined. @@namespaces primer?

Illustration of a basic assertion

For more information on RDF, please refer to the following references:

RDF Primer - Frank Manola and Eric Miller (2002)
Resource Description Framework (RDF) Model and Syntax Specification - Ora Lassila and Ralph R. Swick (1999)
Why RDF model is different from the XML model - Tim Berners-Lee (1998)

@@ include stuff from:

http://infomesh.net/2001/earl1.0/#intro
http://infomesh.net/2001/earl1.0/#model
http://www.w3.org/2001/03/earl/#introduction
http://www.w3.org/WAI/ER/EARL/intro.html

@@dave's image of "test environment architecture"

2. User Scenarios

This section outlines some typical examples of how EARL could be used and by whom.

1 - Web accessibility consultant

(single user, multiple tools, single site)

Scenario: A consultant evaluting a client's Web site uses a variety of evaluation tools to generate an accessibility report of the site. Where more than one tool performs the same test, the consultant wants to compare the results of the test between tools.

Checking for accessibility is similar to using a spell-checker on a document. There are some spellings that the spell-checker knows are wrong but there are many others that may not seem right, but it requires the human to say for sure. For example "teh" is likely a typo for "the" while people's names are likely to be identified as misspellings.

Accessibility is similar in that the tools can only perform some of the checks and rely on a human to perform the final check. In matters of syntax, a machine can be confident in results, but beyond syntax the human needs to make an assertion. For example, determining if an image has a text equivalent is a matter of syntax (i.e. rule matching) while determining if the text equivalent is appropriate is only something a human can do - to determine if the meaning of the image is properly conveyed in text.

@@describe when EARL is generated, where it is stored, how it is used (for each scenario).

Questions that the consultant should be able to derive from EARL statements:

Have all of the tests been completed? If not, which tests are not complete?
Are there conflicts between results?
Are the results inconclusive because test X relies on test Y?
Can I derive a result from the results that I have?

The consultant should be able to use the EARL data to programmatically derive a report for the client.

@@earl example - graph of combined results of tool A and tool B.

2 - Web site developer

Scenario: A developer maintaining a company's Web site fixes bugs reported by a team of testers. Where a tester has identified a bug, the developer should be able to answer the following questions from the EARL generated by the test team's tools:

Where is the problem?
Is the problem true?
Has it already been fixed?
Which tester generated this bug report?
What tools did the tester use to identify the bug?

The developer may combine the testers' data with data from the project history or with other developers and answer the following questions:

Is there a history of this problem or something similar?
Have any test results been invalidated by changes?
What should the tester retest?

3 - W3C Working Group

Scenario: A W3C Working Group is trying to meet their exit criteria for Candidate Recommendation by developing a test suite to show at least two independent implementations. As a UA is tested, the results are stored in EARL. Periodically, the working group will ask the following questions to see how much more implementation work is needed:

Is every test passed by at least two UAs?
Which browsers do not support all of the items in the specification?
What items does browser X not support?
Some tests may be more important to pass than others. If so, the report should be sorted in a way that makes this clear.
Which parts of the spec are supported?
Which parts of the spec are not supported?

@@earl example: graph of data, chart of generated result? (refering to test suites. CSS test suite data (currently on IE6, add other UAs))

4 - User agent developer

(many sets of tests, one UA)

Scenario: A user agent developer wants to determine conformance that can be claimed for her user agent product. Using the data generated by the working group as they tested her product against their test suite, she can find out:

Which tests does my UA fail/passes?
What percentage of tests are passed for each group (e.g. css1 vs css2 vs css3)?
Which are the most severe failures?
How does my tool compare to a competitor?

@@earl example: graph of data, chart of generated result?

5 - Student taking an online assessment

Scenario: A student, who is deaf, is using an online education tool and needs to take an assessment. The system constructs the assessment from a set of existing assessment pieces for the current lesson. The student has a learning profile that is matched against what tests need to pass. Matching the student's profile (not EARL) against the accessibility profile of the data (EARL), the education tool is able to assemble an assessment that does not use sound and presents all information visually.

@@earl example: something to point to within IMS work?

6 - Managing a Web site

Scenario: A Web site development unit that includes database developers, Web page developers and quality assurance testers synthesizes design from the public relations office and content from the operations units. Multiple tests conducted by the development unit, public relations office, and the operations units need to share results to report successful development or specific points of test failure and track status of work on repair of failures. The manager needs to track answers to the following questions:

Where are the problems?
Who is working on them?
What are the status of repairs?
What are the changes over time?

@@earl example: be able to derive a management chart of some sort from EARL reports between time x and time y?

3. Classes

/* brief intro of RDF class/property model (if haven't already describe in the "RDF Vocabulary" section. If have, then point to that?) */

Assertion

An assertion is a statement about the results of performing a test.

An assertion can have the following properties:

assertedBy
subject
testCase
result
testMode

Here is an example assertion block:

<earl:Assertion rdf:about="http://example.org/#assertion-1">
  <earl:subject rdf:resource="http://example.org/#someID02495"/>
  <earl:result rdf:resource="pass"/> 
  <earl:mode rdf:resource="&earl;manual"/> 
  <earl:testcase rdf:resource="http://example.org/#tc-1"/>   
  <earl:assertedBy rdf:resource="http://example.org/#assertor123" />
</earl:Assertion>

Assertor

An assertor states the results of a test (i.e. an assertor asserts and assertion). An assertor may be a person or a machine.

Subclasses of the Assertor class

Person: The Assertor is a human being.
Tool: The Assertor is a tool, such as: a black box testing tool of some sort or an evaluation and repair tool.

The assertor in the following example is a person and therefore Person (a subclass of Assertor) is used to describe the assertor.

<earl:Person rdf:about="http://example.org/#assertor123">
  <earl:name>Bob B. Bobbington</earl:name>
  <earl:email rdf:resource="mailto:bob@example.org"/>
</earl:Person>

TestSubject

The class of things that have been evaluated. It needs to be qualified with some type of information in order to make it unambiguous. You may use an unambiguous property, or unambiguous constellation of properties.

Subclasses of the TestSubject class

Tool: A tool. Most likely a piece of software such as an authoring tool, or evaluation and repair tool.
UserAgent: A piece of software used to access information on the World Wide Web.
WebContent: Information on the World Wide Web.

The subject in the following example is Web content and therefore WebContent (a subclass of TestSubject) is used to describe the test subject.

<earl:WebContent rdf:about="http://example.org/#someID02495">
  <earl:reprOf rdf:resource="http://www.w3.org/" /> 
  <earl:date>2001-05-17T23:07:35Z</earl:date> 
</earl:WebContent>

ResultProperty

The result of the test.

Instances of the ResultProperty

cannotTell
fails
notApplicableTo
notTestedAgainst
passes
suspectAgainst

Properties of result

validity
confidence

The following example shows the validty, confidence, and message properties applied to a result:

<earl:result rdf:parseType="Resource">
  <earl:validity rdf:resource="&earl;fail"/>
  <earl:confidence rdf:resource="&earl;high"/>
  <earl:message>malformed element in line 23</earl:message>
</earl:result>

TestCase

A TestCase is a resource that another resource is validated against - a test that can either be passed or failed. This may in fact include many things - validation classes, code test cases, or more subjective guidelines such as WCAG.

<earl:Testcase rdf:about="http://example.org/#tc-1">    
  <earl:testId rdf:resource="http://example.org/MyTestCaseThingy-1" />  
</earl:Testcase>

testCase

testMode

testMode indicates if the test was conducted manually, automatically, or derived from other test results (heuristic).

Domain: Assertion
Range: TestMode

5. Extensibility

@@describe how EARL can be extended. Use PageValet and HiSoft as examples? Create others?

6. Examples

User Scenarios

These used to be in the "user scenarios" section at the top of the document, but I've moved them back here because that earlier section was getting too long and I wanted to focus the reader on the most common scenarios. It's good info, so I didn't want to lose it, but don't think it is needed for the general understanding and application of EARL.

@@for each user scenario described above, create a sample piece of EARL.

Web site tester

Scenario: A tester evaluating a company's Web site uses a variety of testing tools to discover possible bugs on the site.

"power developer" that uses programming tools to produce site, not a WYSIWYG editor commonly used by less technical folk. QA folk use different tools than developers. (@@Jenae Andershonis would be good reviewer for this scenario)

queries:

something that machine says or person?
group report where need to be fixed (developer vs content creator)

Usability testing - manual tests

Scenario: test subjects evaluating sites - usability testing. e.g. together people make an assessment about alt-text. @@schema - enough info about test environment? (@@description of person would be the needs, testSubject which tests were relevant to that issue)

queries:

Is this page accessible for someone who is deaf? blind? both? (does it meet the tests that meet the user's needs) (again, @@schema - how group tests?)

Certifier

Scenario: Person derives conformance claim from test data. In other words, they derive EARL statements from EARL statements by asking the following questions:

What is the end result?
If find a faulty test, where did this result come from? (@@schema - do we have traceability of heuristically derived results - was it asserted by someone or derived from other assertions?)

@@issue? confidence ratings? and how to use them.

@@schema - confidence. do we need it? should it be on test description rather than results. the only use case i can think of is covered by Can't Tell So far I have seen them used in Page Valet to indicate how commonly things that might cause problems actually do.

Grading students

Scenario: test results across semester.

queries

have 2 students been getting the same grades?
who passed/failed?
how does this class's grades compare with another group (previous year's, other classes of this subject, etc.)

Change organizational policy to meet new requirements

Scenario: Need to meet a new set of requirements. Query existing results, using a new expression of how to derive a result, to see if there is any new testing missing

combining individual tests into suites. ???@@ask CMN to clarify
is there a use case for comparing test suites - the success or use of test suites.

Information harvesting

Scenario: A robot tries to grab contact info from Web pages. It tracks which pages fail and which tools fail. (ala Nick Gibbons scenario)

7. Contributors

Giorgio Brajnik, Dan Brickley, Daniel Dardailler, Nick Gibbins, Al Gilman, Nadia Heninger, Ian Hickson, Leonard Kasday, Nick Kew, Jim Ley, William Loughborough, John Lutts, Charles McCathieNevile, Libby Miller, Tom Martin, Sean B. Palmer, Dave Pawson, Eric Prud'hommeaux, Chris Ridpath, Aaron Swartz, Rob Yonaitis

8. References

9. Appendix A: EARL 1.0 Schema

EARL 1.0 Schema available in XML RDF and n3. (@@link to once publish on site)

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE rdf:RDF [

 <!ENTITY earl 'http://www.w3.org/WAI/ER/EARL/nmg-strawman#'> 
     <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'> 
     <!ENTITY rdfs 'http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'>
]> 
<rdf:RDF xmlns:earl="&earl;"  
         xmlns:rdf="&rdf;" 
         xmlns:rdfs="&rdfs;"> 
<!-- Classes -->
  <rdfs:Class rdf:about="&earl;Assertion" rdfs:label="Assertion">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;Assertor"  rdfs:label="Assertor">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;ConfidenceLevel" rdfs:label="ConfidenceLevel">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/>
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;Platform" rdfs:label="Platform"> 
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;TestCase"  rdfs:label="TestCase"> 
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;TestResult"  rdfs:label="TestResult">
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/>
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;TestSubject" rdfs:label="TestSubject"> 
     <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;Tool"  rdfs:label="Tool"> 
       <rdfs:subClassOf rdf:resource="&earl;TestSubject"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;UserAgent"  rdfs:label="UserAgent">
       <rdfs:subClassOf rdf:resource="&earl;TestSubject"/>
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;ValidityLevel" rdfs:label="ValidityLevel"> 
       <rdfs:subClassOf rdf:resource="&rdfs;Resource"/> 
  </rdfs:Class> 
  <rdfs:Class rdf:about="&earl;WebContent"  rdfs:label="WebContent">
       <rdfs:subClassOf rdf:resource="&earl;TestSubject"/>
  </rdfs:Class> 
<!-- Properties -->
  <rdf:Property rdf:about="&earl;assertedBy" rdfs:label="assertedBy">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;confidence"  rdfs:label="confidence">
       <rdfs:range rdf:resource="&earl;ConfidenceLevel"/> 
       <rdfs:domain rdf:resource="&earl;TestResult"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;contactInfo"  rdfs:label="contactInfo">
       <rdfs:range rdf:resource="&rdfs;Resource"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;email"  rdfs:label="email"> 
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
       <rdfs:subPropertyOf rdf:resource="&earl;contactInfo"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;format"  rdfs:label="format">
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;WebContent"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;message"  rdfs:label="message">
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;TestResult"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;name"  rdfs:label="name">
       <rdfs:range rdf:resource="&rdfs;Literal"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;platform"  rdfs:label="platform"> 
       <rdfs:range rdf:resource="&rdfs;Resource"/> 
       <rdfs:domain rdf:resource="&earl;Assertor"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;reprOf"  rdfs:label="reprOf"> 
       <rdfs:range rdf:resource="&rdfs;Resource"/> 
       <rdfs:domain rdf:resource="&earl;WebContent"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;result"  rdfs:label="result">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;TestResult"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;subject"  rdfs:label="subject">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;TestSubject"/> 
  </rdf:Property>
  <rdf:Property rdf:about="&earl;testcase"  rdfs:label="testcase">
       <rdfs:domain rdf:resource="&earl;Assertion"/> 
       <rdfs:range rdf:resource="&earl;TestCase"/> 
  </rdf:Property> 
  <rdf:Property rdf:about="&earl;validity"  rdfs:label="validity">   
       <rdfs:domain rdf:resource="&earl;TestResult"/> 
       <rdfs:range rdf:resource="&earl;ValidityLevel"/> 
  </rdf:Property>
<!-- Instances of Classes -->
  <earl:ValidityLevel rdf:about="&earl;cannotTell"  rdfs:label="cannotTell"/>
  <earl:ValidityLevel rdf:about="&earl;fail"  rdfs:label="fail"/>
  <earl:ConfidenceLevel rdf:about="&earl;high"  rdfs:label="high"/>
  <earl:ConfidenceLevel rdf:about="&earl;low"  rdfs:label="low"/>
  <earl:ConfidenceLevel rdf:about="&earl;medium"  rdfs:label="medium"/>
  <earl:ValidityLevel rdf:about="&earl;notApplicable" rdfs:label="notApplicable"/>
  <earl:ValidityLevel rdf:about="&earl;notTested"  rdfs:label="notTested"/>
  <earl:ValidityLevel rdf:about="&earl;pass"  rdfs:label="pass"/> 
</rdf:RDF>

11. Appendix C: History and Background

The story of earl....

12. Appendix D: Differences between 0.95 and 1.0

1.0 uses properties instead of reification
got rid of x, y, z
added a, b, c

Evaluation and Report Language (EARL) 1.0

(not yet a) W3C Working Draft 4 October 2002