Re: [EventSource] feedback from implementors

On Mon, 21 Sep 2009 11:39:14 -0400, Per-Erik Brodin  
<per-erik.brodin@ericsson.com> wrote:

> Michael A. Puls II wrote:
>> On Fri, 18 Sep 2009 11:37:24 -0400, Per-Erik Brodin wrote:
>>
>>> When parsing an event stream, allowing carriage return, carriage return
>>> line feed, and line feed to denote line endings introduces unnecessary
>>> ambiguity into the spec. For example, the sequence "\r\r\n\n" could be
>>> interpreted as three or four line endings.
>>  That would always be 3 lines: a mac, a windows and a nix. "\n\r\n\r"  
>> would be the reverse order, but still 3.
> So what you are saying is that "\r\n" will always be a Windows line
> ending and never a Mac line ending followed by a Unix line ending?

Ideally, yes, imo.

>>  Universal newline normalization for input with mixed newline formats:
>>  // normalize newlines to \n
>> .replace(/\r\n|\r/g, "\n");
>>  // normalize newlines to \r\n
>> .replace(/\r\n|r|\n/g, "\r\n");
>>  // normalize newlines to \r
>> .replace(/\r\n|\n/g, "\r");
> While regular expressions are greedy by default, I have been told that
> there is no way to express such behavior using ABNF. For what it is
> worth, that means that the current ABNF definition of the event stream
> format can't stand on its own.
>
>>  Ideally, I think it's often best to do the first to normalize to \n  
>> for processing (like if you need to know line count) and then normalize  
>> to a different format *if needed* afterwards.
>>  IMO
>>
> Keep in mind that we are parsing a continuous stream where data arrives
> in chunks. It is entirely possible for a "\r\n" pair to be split up
> between two chunks which could be handled by either 1) dispatching an
> event immediately when receiving a carriage return and then upon
> reception of the next chunk "remember" that the last character in the
> previous chunk was a carriage return and discard the first character if
> it happens to be line feed, or 2) not dispatching an event until the
> next character after carriage return has been received which could lead
> to delays in event dispatch. Both these options are far from ideal.

#1 sounds like it makes great sense, imo. Ideally, even if you're handling  
things in chunks, the end result should be the same as if you got it all  
at once. In other words, if you can help it, don't let the chunkiness mess  
up your desired newline handling :).

Of course, it'd be nice if there's only ever \n to deal with.

-- 
Michael

Received on Monday, 21 September 2009 16:16:47 UTC