18520 – The <code> element should get a dedicated attribute for describing the computer language

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18520 - The <code> element should get a dedicated attribute for describing the computer language

Summary: The <code> element should get a dedicated attribute for describing the comput...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-08-10 09:19 UTC by contributor
Modified:	2013-03-22 18:12 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description contributor 2012-08-10 09:19:42 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html
Multipage: http://www.whatwg.org/C#head
Complete: http://www.whatwg.org/c#head

Comment:
The <code> element should get a dedicated attribute for describing the
computer language

Posted from: 92.79.191.201
User agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0

Comment 1 Axel Dahmen 2012-08-10 09:26:39 UTC

I believe that limiting the value space of class names for dedicated purposes is not a good way of implementing standards.

There should be a dedicated string value attribute for defining the computer language used.

The element should have a dedicated name, like "clang" or "clanguage" and values should be MIME types, so XPath expressions can be easily and reliably used to find appropriate HTML <code> elements for syntax highlighting.

Comment 2 Ian 'Hixie' Hickson 2012-10-26 23:57:09 UTC

Axel: What's the use case?

Comment 3 Axel Dahmen 2012-10-27 10:47:28 UTC

Ian,

I actually see three use cases here:


1. Users and applications would not require extra testing to check whether a class name assigned to a <code> element is prefixed by "language-".

   If a prefix on class names were used, Web design applications, like e.g. DreamWeaver, would need to implement special code, checking HTML text for above class prefix in order to warn the user not to use class names prefixed with "language-" on <code> elements in order to avoid unexpected behaviour on particular display applications, e.g. web browsers.


2. Using a dedicated attribute, code implementing syntax highlighting would be able to apply their parser filters according to a dedicated attribute, not by string manipulations, thereby speeding up processing and reducing error-proneness.


3. Providing a structured HTML language design and allowing for robust and well structured programming.

Comment 4 Ian 'Hixie' Hickson 2012-10-29 22:06:18 UTC

1 isn't a use case, since if there's no reason to mark the language, there's no reason to avoid the class names. It's just something that you have to worry about _if_ there is a use case.

I disagree with the premise of 2; string manipulation of the class attribute is trivial and wouldn't affect performance or reliability, IMHO, at least not compared to a dedicated attribute which has its own costs.

3 isn't a use case, it's just a design philosophy. It would apply if there was a use case, but not if there wasn't.


Implementing syntax highlighting is a use case, but it's not clear to me that it happens enough to warrant a dedicated attribute. Browsers haven't shown any interest in implementing dedicated syntax highlighting, and scripts can already do it fine as it is.

Comment 5 Axel Dahmen 2012-10-30 10:58:16 UTC

I disagree. Here's an example for #1:

<html>
<head><title></title>
<style>
  .language-header  {}
  .language-item  {}
  .language-footer  {}
</style>
</head>
<body>
  <div>
    <div class="language-header">The book is available in these languages:</div>
    <div><code class="language-item">DE</code></div>
    <div><code class="language-item">EN</code></div>
    <div><code class="language-item">FR</code></div>
    <div class="language-footer">(Not for resale)</div>
  </div>
</body>
</html>

Regardless of the fact that there currently is no language called "item" a design application would always have to parse the html code for <code> elements having a class name beginning with "language-" and it would have to warn the user that some browsers might display unexpected results.

Comment 6 Ian 'Hixie' Hickson 2012-10-30 17:39:06 UTC

No browsers will show unexpected results because no browsers will do anything with these class names. The language-* class names are just a suggested convention, they're not a defined semantic.

Comment 7 Axel Dahmen 2012-10-31 09:33:11 UTC

Yes, from today's point of view.

But given Firefox Web Developer menu items oder Internet Explorer F12 Developer Tools there *may* be in the future.

And if they don't, any future add-on might and become a standard tool in the future.

So, still, using class names for dedicated purpuses is a bad design decision for a worldwide standard.

Comment 8 Ian 'Hixie' Hickson 2012-10-31 19:56:58 UTC

Any such tools would be non-conforming. There's no way to tell what language a <code>'s contents are in without having coordinated with the page author.

Now, if there are tools such as those you describe who want to implement that kind of thing, then that's a different matter, and we can at that point add such a feature. Are they interested in implementing such a feature?

Comment 9 Axel Dahmen 2012-11-11 11:58:24 UTC

> Any such tools would be non-conforming.

Yes, today. But not if you are going to define a mechanism for automatically determining a code's language. No matter which way you're doing it. From that moment on, such tools are conforming by definition.


> There's no way to tell what language a <code>'s contents are in without having coordinated with the page author.

In the HTML5 description it reads: "authors who wish to mark code elements with the language used" ...

I don't understand the gap. So what's th mechanism described above for then? Anything else than telling what language a <code>'s contents are?


>Now, if there are tools such as those you describe who want to implement that kind of thing, then that's a different matter, and we can at that point add such a feature. Are they interested in implementing such a feature?

You're asking the wrong guy here. You should ask this question to them.

Just a few links:

  http://code.google.com/p/google-code-prettify/
  http://dense13.com/blog/2008/08/17/new-javascript-syntax-highlighter-shjs/

Comment 10 Ian 'Hixie' Hickson 2013-01-31 22:46:31 UTC

> In the HTML5 description it reads: "authors who wish to mark code elements
> with the language used" ...
> 
> I don't understand the gap. So what's th mechanism described above for then?
> Anything else than telling what language a <code>'s contents are?

That's just documenting a possible way authors can mark this up for their own use. I've tried to make this clearer.


> Just a few links:
> 
>   http://code.google.com/p/google-code-prettify/
>   http://dense13.com/blog/2008/08/17/new-javascript-syntax-highlighter-shjs/

These all seem to just be scripts that work within the page, so they don't need a standard way to do things — they just need to document a convention for the author to use.

Comment 11 contributor 2013-01-31 22:47:35 UTC

Checked in as WHATWG revision r7682.
Check-in comment: Clarify that this is not a convention, just a possible technique for the author's own use.
http://html5.org/tools/web-apps-tracker?from=7681&to=7682

Comment 12 Ian 'Hixie' Hickson 2013-03-22 18:12:00 UTC

Please reopen this bug if there are use cases (that is, if someone wants to write software that needs to know the programming language of the contents of a <code> block and yet cannot coordinate with the author, e.g. because it's a browser or search engine and not a script that the author chooses and embeds).