Server-sent Events and Load Balancing

Typical load balancing scenarios for HTTP center around routing an incoming
HTTP request to an appropriate cluster node based on some assessment of load
for all the nodes in the cluster.  The assumption is that the frequency of
HTTP response completion is sufficiently high to automatically re-balance
the load across the cluster nodes.

Server-sent Events changes this assumption by significantly increasing the
average length of each HTTP response, and also by breaking the correlation
between connectedness and response throughput.  Each node in a Server-sent
Events cluster may have many thousands of connected clients, but only a few
tens may be active at the current point in time.  If enough of those active
connected clients are co-located on the same cluster node, then throughput
of the cluster as a whole may be sub-optimal, and would benefit from
re-targeting one or more of the active connected clients to a different node
in the cluster.

As currently specified, Server-sent Events makes this seemingly impossible
to achieve without also specifying some unintended semantics.  One can set
"retry" to zero, then complete the response normally, causing the client to
reconnect right away (and be load balanced to a different node in the
cluster than previously used).  The reconnected stream can immediately reset
the "retry" to something on the order of 2-3s, but what happens if the
network or server fails before a successful reconnect?  When is the value of
"retry" automatically reset by the client to the default value?

If the semantics of "retry" are intended for failure recovery only, then we
need another way to trigger a reconnect for the non-failure use-case.

For example:
....
reconnect\n
\n
[end-of-stream]

If the semantics of "retry" are intended to cover both failure and
server-initiated reconnect use-cases, then perhaps the "retry" syntax could
be extended something like this:
....
retry: 0;reset\n
\n
[end-of-stream]

This would cause the value of retry to be set to zero for immediate
reconnect, then automatically reset to the previously defined value to
handle the network failure case in the usual way.

Kind Regards,
John Fallows.
-- 
>|< Kaazing Corporation >|<
John Fallows | CTO | +1.650.943.2436
800 W. El Camino Real, Ste 180 | Mountain View, CA 94040, USA

Received on Monday, 27 October 2008 22:08:31 UTC