Main Page   Namespace List   Class Hierarchy   Compound List   File List   Compound Members   File Members   Related Pages   Examples  

Push and Pull: Who Moves The Data?

What should the interface be for an object which transforms an unlimitted stream of data? Do you tell it where to get its input (setSource) or do you give it bits of input as you have them (add)? Do you tell it where to put the output (setSink) or ask for results when you want them (fetch)?

This question brings up a conflict between two style of programming, which I call "has control" and "provides services". Its easiest to write tranformers which have control, but they don't always fit with other objects. In particular, objects which always have control cannot be combined in a processing pipeline or called by an event loop unless you have multithreading or you buffer all the data.

Consumes Data Provides Data

Has Control Reader, calls fetch/read/recv, may implement setSource Writer, calls add/write/send, may implement setSink

Provides Services Sink, implements add/write/send Source, implements fetch/read/recv

Any transforming modules (such as a parser, or almost any other input-stream -> output-stream program) both provides and consumes data. It can obtain its data by being a Reader or a Sink; it can provide its data by being a Writer or a Source:

Reader-Writer (Active)
Standalone programs are like this. They read and and write whenever they want. A yacc parser is like this, reading by calling yylex, writing whenever & whatever it wants. This kind of unit needs a "run" method to activate it.
Sink-Writer (Write Filter)
An element in a output pipeline, these units receive some data then provide some data by calling someone; when that someone else returns control, they can return control. btyacc "-S" parsers are like this. Hangs if output takes time.
Reader-Sourcer (Read Filter)
An element in an input pipeline, these units are asked for data and must in turn ask someone else for some data. A lex parser is like this -- you ask for the next token, it reads bytes until it has one, then returns the tokens. Hangs if input takes time.
Sink-Source (Passive)
This kind of unit must have storage; the user adds some data, then fetches results. Every struct is like this, as are (non-virtual) collections.

Observation: setSource and setSink are often not explicit, and are only meaningful if the element is in some sort of pipeline.

Observation: Library routines should generally provide services, not have control. That means they should not be entirely in the top row.

Observation: Stream processing in a Source-Sink manner can be difficult. The models don't fit together very well.

Terminology: I use "source" and "sink" to mean about the same thing as C++ "istream" and "ostream" or java.io "InputStream" and "OutputStream". Java.io 1.1 introduced character streams and used the unfortunate terms "Reader" (for InputStream of Characters) and "Writer" (for OutputStream of Characters). I assume they let an understandable desire to promote character I/O surpass any desire for consistent naming. I would call the classes CharSource (or CharacterSource) and CharSink (or CharacterSink) if I needed them (which I don't, yet).


Home to blindfold. This page generated via doxygen 1.2.11.1 Wed Oct 10 16:40:37 2001.