This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 16688 - Consider rewriting algorithms as full algorithms so variables don't appear as if they're globals
Summary: Consider rewriting algorithms as full algorithms so variables don't appear as...
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: PC Windows 3.1
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 23155 23927
  Show dependency treegraph
 
Reported: 2012-04-10 16:40 UTC by Anne
Modified: 2014-03-27 16:49 UTC (History)
4 users (show)

See Also:


Attachments

Description Anne 2012-04-10 16:40:11 UTC
Global variables such as "iso-2022-kr state" seem to have unnecessarily long names (shortening, e.g, "euc-kr lead" to "lead" would make it confusingly similar to the local variable "lead", but that can be fixed differently).  The use of the definite article with global variables is not consistent.
Comment 1 Anne 2013-01-14 11:26:08 UTC
Do you have specifics on incorrect usage of the definite article? That seems like something that should be fixed.
Comment 2 pub-w3 2013-01-14 19:42:16 UTC
A clear example of inconsistent (not necessarily incorrect) article usage is ‘set iso-2022-kr state to’ alongside ‘[s]et the iso-2022-kr state to’.

More generally, it is not entirely clear why global variables tend to be used with a definite article in phrases like ‘the zxc is initially...’ but not elsewhere.
Comment 3 Anne 2013-01-15 10:28:35 UTC
So what I think I should actually do is make the encoder/decoder algorithms self-contained and inline the "global variables".

In addition I need to design some kind of stream object concept and an operation that can inject units into that stream object (for the error conditions).

Need to think about this some more.
Comment 4 Anne 2013-08-30 18:28:56 UTC
A stream consists of unit sequence ended with EOF. 

A stream's next operation either returns the next unit, EOF, or PENDING if neither unit nor EOF available.

TODO: Add a concept of "buffer" or a prepend operation.

Need to think about output of encoder/decoder operation. Want to have either all units until PENDING (API), all units including EOF (sequence), or next unit (parsers).
Comment 5 Anne 2013-09-04 16:21:59 UTC
Alternative plan for inlining the variables. We do something like:

The utf-8 decoder is utf-8's encoder. The utf-8 encoder is utf8's decoder. The utf-8 decoder and utf-8 encoder have an associated utf-8 code point, utf-8 bytes seen, ...

The utf-8 decoder's inner loop is: <current algorithm>


So basically we turn it into a clearer class system and then we can also define in a more clear way how the inner loop is invoked and when variables associated with the encoder/decoder are reset.

Combined with the abstract stream concept above does that make sense?
Comment 6 Joshua Bell 2013-09-04 16:26:59 UTC
SGTM
Comment 7 Anne 2014-03-26 18:35:33 UTC
https://github.com/whatwg/encoding/commit/dc8e4c10c9b4a91f188f3145c2e31ddec4d52a78

This is a massive change. Review appreciated.
Comment 8 Joshua Bell 2014-03-26 19:33:02 UTC
The new TextEncoder/TextDecoder "serialized stream" algorithms appear to be unreferenced. They need to be integrated into the encode()/decode() method descriptions.

(Also, there's a harmless typo in the source nearby: </codE>)
Comment 9 Joshua Bell 2014-03-26 19:34:03 UTC
(In reply to Joshua Bell from comment #8)
> The new TextEncoder/TextDecoder "serialized stream" algorithms appear to be

^^ "serialize stream"

> unreferenced. They need to be integrated into the encode()/decode() method
> descriptions.
> 
> (Also, there's a harmless typo in the source nearby: </codE>)

^^ Not harmless, mucks up the formatting on the TextEncoder's description.
Comment 10 Anne 2014-03-27 12:25:49 UTC
I fixed the typo. The algorithm is cross-referenced. If you go to http://encoding.spec.whatwg.org/#concept-td-serialize and click the bold term you will find where it is referenced from. I do not always reuse the exact term as it does not always make sense in context.
Comment 11 Joshua Bell 2014-03-27 16:49:43 UTC
(In reply to Anne from comment #10)
> I fixed the typo. The algorithm is cross-referenced. If you go to
> http://encoding.spec.whatwg.org/#concept-td-serialize and click the bold
> term you will find where it is referenced from. I do not always reuse the
> exact term as it does not always make sense in context.

Got it. 

That looks good to me. I haven't noticed any other glitches apart from the BOM handling ones mentioned in the other bug. Nice job on the refactor!