From HTML WG Wiki
Associating attributions, citations, quotations and references
Corresponds to issue number 50
In addition to quotations, authors have a need to markup attributions of other non-direct quotation ideas. Authors may also use the CITE element to markup attribution of works or persons originating ideas. Finally, these attributions are typically tied together through a reference list, bibliography or sources list. This proposal makes use of global attributes to associate: 1) attributions, 2) quotations, 3) citations, and 4) bibliographic reference items. This proposal makes use of global attributes and already defined (but also perhaps newly proposed) URI schema or URN namespaces to uniquely attribute quotations, ideas, originators, works and bibliographic reference information.
Problem statements / use cases
- Authors need a way to attribute non-direct quotations and other ideas to their originators
- Authors and users often want to view the quotations, ideas, citations and other attributions in a document all compiled together as a list
- While a IRI is a very compact way to encode bibliographic locating information and attribute ideas and quotations to persons and organizations, authors and users require a way to view a more human-readable version of the same information
- Authors often want to provide annotated bibliographies and make annotated attributions and quotations
- On sites like Wikipedia the needs for authors to follow standard attribution practice is especially important. As a cow path needing paving Wikipedia has developed some tortured server-side methods to achieve this result
- A problem with server-side solutions is that resources must first be processed server-side to meet some of the needs of the user whereas client-side processing allows a user to tailor presentation specifically to their needs even with static resources
To allow authors to encode this information within HTML documents, this proposal calls for the use of IRIs to identify bibliographic resources and originators and creators of bibliographic resources whether individuals or organizations. The proposal makes use of the exiting quotation elements (Q and BLOCKQUOTE) while also adding support for crediting originating authors and contributors for non-quotation ideas (called attributions here). The proposal makes use of the existing CITE element to include inline citations of bibliographic resources (usually by title). For HTML authoring of bibliographic reference lists and annotated bibliographic reference lists, the proposal makes use of the OL/UL and DL elements respectively. By using existing elements, the proposal permits the graceful degradation of new HTML5 content in legacy UAs.
- adds attributes: @attributeto, @cite @subcite and @annotatedby attributes globally
- adds a new type attribute to the list elements with a value of "biblio"
- makes use of existing CITE element and the proposed PN and DEFINE elements to divide the dual purpose CITE element into persons and organization on one hand and their works on the other hand
- adds UA conformance requirements for associating and presenting associations between and among attributions, citations, quotations, and bibliographic references
- adds UA recommendations or options to perform online lookup of non URL IRI information based in particular on URI schema such as 'urn', 'mailto', and 'ldap' as well as 'http'
- liaison with CSS WG over pseudo-elements and other properties for Q and BLOCKQUOTE
In summary, that's five new global bibliographic attributes, one attribute for list elements, and some new UA conformance requirements, recommendations and options that did not exist in HTML 4.01. Since the proposal can be accomplished without any new elements, it should degrade gracefully in pre-HTML5 UAs. Universal resource identifiers (URLs, URNs, URIs, IRIs, etc.) are used whenever possible, however a mechanism is provided to include further description of bibliographic resources and idea originators within the same document or another author provided or referenced document.
Global attribution and referencing attributes
In order for authors to include proper attributions and locations for citations, quotations, and otherwise referenced works, several global attributes should be added to HTML.
For associating, crediting and locating bibliographic references:
- subcite(string): for a page, column, page-location, time-index (or a range of the same); or if placed within the parentheses of RURI(), a relative URI completing the IRI from cite
The difference between @cite and @subcite attributes is that authors can ensure the cite attribute — for all logically identical bibliographic references — share the same cite URL as a locator for the bibliographic reference. While authors may use the @subcite attribute to provide more detailed locating information that differentiates bibliographic references that otherwise share the same cite attribute value.
For annotating those citations, attributions and quotations:
- annotatedby(a space-separated list of IRIs)
For attributing ideas to individuals or organizations:
Non-normatively, the HTML5 draft could recommend the use of Wikipedia as a standardized relatively persistent universally available IRI for famous originators of ideas; for those less famous the author or authoring community would need to define their own IRI conventions. This suggest four broad options for attributeto IRIs:
- Wikipedia URLs for famous organizations and persons living or dead
- Online identifiers of persons in a particular role such as mailto: ladap: and persistent http: resources precisely and uniquely identifying the person or organization
- Author provided document fragments that identify a person or organization
- Author or authoring community coined IRIs that do not necessarily serve as online resource locators
The @cite attribute should remain for the INS and DEL elements or for generally attributing edits in the document to their proper editor. However, the new attributes permit authors to provide adequate information within the document to automatically generate bibliographic information in a stylesheet determined manner. Note this separates the two concepts currently collapsed in HTML (in both the cite attribute and the CITE element): that of 1) attribution and 2) locating a reference. Though it may be possible to extract the attribution from a network resolvable location URI, this will not work reliably for all conditions.
On a CITE element these attributes would provide attribution connecting a work named in the CITE element to an IRI identifying an author and associating the marked up name of a book or other creative work to an IRI representing that work. For example an IRI might use the URN schema with an ISBN namespace such as:
Faust (Note how MoinMoin provides this feature on the server-side and resolves this ISBN URN; this is the same method this proposed solution recommends for client-side processing)
As <pn cite='mailto:email@example.com'>Dan Connolly</pn> said, <q cite='http://lists.w3.org/Archives/Public/public-html/2007Aug/0182.html' >...</q>.
UAs could optionally provide lookup information of URL IRIs or non-URL IRIs. For example Wikimedia provides an elaborate lookup mechanism to associate ISBN information with an ISBN identifier. While Wikimedia, MoinMoin and other content management software performs this URN processing on the server-side, it is better handled on the client-side providing user control over the resolving locations and the handling of the data resulting from resolution.
To markup attributions and locations where the author or work was not already identified by a universally recognized IRI (or at least recognized universally in the community targeted by the author), authors would make use of an IDREF to a list item within a bibliographic list either within the same document or another document.
<p>In a <cite cite='myPrivateLetterFromDarwin'>private correspondence</cite>, Charles Darwin assured me that ...</p> <dl type='biblio'> <dt cite='myPrivateLetterFromDarwin' >Letter from Darwin to me dated <time>2 December 1875</time></dt> <dd>This letter not only shows Darwin’s interest in ...</dd> </dl>
Similarly, to markup attributions and locations where the author was not already identified by a universally recognized IRI (or at least recognized universally in the community targeted by the author), authors would make use of an IDREF to a DEFINE or PN element (DefiningTermsEtc: proposed elsewhere) either within the same document or another document. This then provides information on the originator of an idea or quotation that readers can verify or for purposes of providing proper credit.
<p>In a <cite cite='myPrivateLetterFromDarwin' attributeto='Darwin' >private correspondence</cite>, <pn variantof='Darwin' >Charles Darwin</pn> assured me that ...</p> ⋮ <dl type='biblio'> <dt cite='myPrivateLetterFromDarwin' attributeto='Darwin' >Letter from Darwin to me dated <time>2 December 1875</time></dt> <dd>This letter not only shows Darwin’s interest in ...</dd> </dl> ⋮ <define id='Darwin' term='Darwin' >Charles Darwin was a pioneering biologist in the field of evolutions and ...</define>
Attribute @type='biblio' for the list elements
To provide a mechanism to markup a reference list this proposal recommends adding a new semantically distinct list: a bibliographic reference list using the existing list elements: DL, OL, or UL and their child LI elements. These lists would indicate a semantically distinct subtype: a bibliographic reference list. The @cite attribute on the list items would help associate the entry in the bibliographic list with quotations (Q and BLOCKQUOTE), citations (CITE) and other ideas (any element) attributed throughout the document (those sharing the same @cite IRI). With the addition of a global @cite attribute, the LI element would also be matchable with the bibliographic credits within a document.
The content model of the source item would be the same as LI: either block or inline, but not both. The contents of the list item in a bibliographic list represents a title of a bibliographic entry or additional bibliographic metadata at the authors discretion. To create an annotated bibliography, authors must use the DL with type='biblio' where the DT element contains the bibilographic metadata and the DD element contains the authors annotation on the bibliographic item.
subcite attribute and reference associations
The subcite attribute permits authors to divide the locating identifier for bibliographic resources into two parts. The first part — pointing to the primary URN or URL — is expressed with the cite attribute. Authors express the second part in the subcite attribute. This attribute generally takes a string for expressing a specific page, page-range, time-index, or time-index-range. Alternatively authors can use the keyword RURI and parenthesis that will be used with the cite attribute to form a complete IRI. For example:
<q cite='http://www.example.com/documents/html/thesis_summary/' subcite='RURI(../../main-content/chapter4#section2a?query)' >
<q cite='urn:ISBN:0553213482' subcite='32-34' >
In the first example the cite and subcite attributes would be automatically combined to form the URL:
In the second example, the two attributes indicate a specific edition of Goethe’s Faust and specifically the pages 32-34.
[need a way to attribute a single attribution (not a quotation but an idea) to multiple non-collaborating originators (for example, attributing the origination of calculus to Newton and Leibniz); perhaps an initial or ending BOOKMARK or A element or another specialized element inside the element needing attribution]
IRI schema extensions and examples
Many online sources can be cited with a simple URL value on the @cite attribute. Authors (individuals or organizations) may also have URLs that can act as IRIs that uniquely and semi-persistently identify those authors. However, the common URL schema for @cite values should be supplemented by other IRI schema such as 'urn' and 'mailto' to help match ideas, quotations, citations and other attributions to their sources. The 'urn' scheme already has many URN registered namespaces to provide for precise identification of sources.
The HTML5 WG could facilitate better citation practices by providing more varied examples of @cite IRI values in the HTML5 recommendation. Similarly, the WG could liaison with other organizations to ensure needed IRI schema or URN namespaces were available for sourcing sourcing of citations.
Using these varied IRI schema, UAs can assist users in looking up source details online: even providing automatic reference list generation from online sources for presentation in a device independent manner with flexible styling.
New UA conformance recommendations
To make these proposed facilities the most powerful for authors and users, UAs should provide mechanisms to associate attributions throughout a document with one another and to their associated source items in an author created or even UA generated source list. This is especially true for interactive UAs.
However, even for non-interactive UAs, source list generation could be a powerful advanced feature. Much of this data is readily and publicly accessible already through library online catalogs and online bookstores. Wikipedia includes an extensive list of sites throughout the World for the resolution of ISBN IRIs. UAs could provide the same list or a subset of this list for users to choose from.
Secondarily, users may have online access to the actual resources referenced by the ISBN though an institutional library. Users could provide another ordered list of online helper IRIs for the UA to attempt retrieval of the ISBN referenced resource (or any other URN schema).
This suggests the need for interactive bibliographic processing UAs to provide two online helper defaults for users:
- URN resolution to an URL for metadata retrieval (perhaps a separate online helper for each URN schema)
- URN resolution to an URL for resource retrieval (a ranked list of locations where UA may attempt URN resolved resource retrieval)
Even without the resolution of URN schema IRI and the retrieval of associated metadata or the referenced resources themselves, the referenced material in a document would all be associated through matching the cite='IRI' values. This would then associate attributions (Q, BLOCKQUOTE, A, SPAN, DIV), citations, (CITE) and reference list items (LI or DT/DD groups). These cited fragments would be hierarchically related where each attribution relates to a source citation and potentially a reference list item (even in the case where the reference list item is not explicitly included in the document but is generated by the UA).
- Attributions (either structured or phrase statements of others ideas or direct quotations in the body of a document)
- Citations (in-body citations of a person, book, article or other source)
- Reference list items (a separate item in a list of citations of a person, book, article or other source that may be included explicitly for annotated references or for better graceful degradation in older UAs)
In an inspector, UAs may provide a list of cited references with a sublist for each citation, quotation or attribution instance of the within the document. UAs may then allow users to easily access (e.g., scroll into view) each such instance of the reference.
Liaison with CSS3 WG
Liaison with CSS3 WG to provide pseudo-elements selectors to add content around Q and BLOCKQUOTE elements. This might be accomplished with existing ::before and ::after selectors and perhaps through adding support for DOM attributes in CSS 'content' properties and CSS attribute selectors. Needs for styling improvements include:
- The ability to deal with punctuation around quotations that is not a part of the quotation (some style guides call for including the punctuation within the quotation marks while others call for not including such punctuation within the quotation marks).
- The inclusion of parenthetical or otherwise presented content of an attribution elements, @cite, @subcite, and @annotatedby, attributes after the attribution element's generated box through a mechanism such as the CSS3 'content' property.
- Eventually the ability to refer to properties of an attribution that are derived from the attributions @cite, @subcite, and @annotatedby attributes such as the 'title', 'authors, etc.
- Eventually the ability to specify through CSS the automatic generation a reference list from bibliographic refernces, attributions and citations indicated in a global @cite attribute.
Discussion and Evaluation
Originally introduced in a review of the HTML5 draft section on phrase elements.
- Issue-tracker Issue:50
- MuchAdoAboutQ: Much ado about Q: Proposal to eliminate the Q / BLOCKQUOTE distinction and associating quotations with sources
- QuotationBlockVInline: Proposal for eliminating authoring distinctions between block and non-block semantics
- AddedAttributeQuotationMarks#preview: New markup attribute on Q and BLOCKQUOTE: "marks"
- Proposals facilitating the automatic generation of lists of document data
- Subtext: footnotes and endnotes from subtext markup
- bibliographic source reference list (this proposal)
- DefiningTermsEtc: glossary and indexes from VAR, TERM, PN and ABBR markup
- outline or table of contents from H and SECTION markup
- SemanticPresentationLegendCSS: presentation / semantic legend through CSS properties and supplemental HTML