15 Filter Effects


15.1 Caveat

(The following discussion consists of a proposal for how filter effects could work with SVG. Most of this material is very preliminary and is meant to present an approach which is being considered currently by the SVG working group.)

15.2 Introduction

A model for adding declarative raster-based rendering effects to a 2D graphics environment is presented. As a result, the expressiveness of the traditional 2D rendering model is greatly enhanced, while still preserving the device independence, scalability, and high level geometric description of the underlying graphics. This model can be easily added to modern 2D graphics systems by the implementation of a handful of straightforward image processing routines.

15.3 Background

On the Web, many graphics are presented as bitmap images in gif, jpg, or png format. Among the many disadvantages of this approach is the general difficulty of keeping the raster data in sync with the rest of the Web site. Many times, a web site designer must resort to a bitmap editor to simply change the title of a button. As the Web gets more dynamic, we desire a way to describe the "piece parts" of a site in a more flexible format. SVG is our chance to do this. What we propose here is a declarative raster effects model, which when combined with the 2D power of SVG can describe much of the common artwork on the web in such a way that client-side generation and alteration can be performed easily.

15.4 Basic Model

The raster effect model presented here is a special case of a "filtering" operation as currently proposed in SVG0. A filtering operation replaces one or more graphic primitives with another set of graphic primitives on the fly as they are rendered. For example, a simple filter could replace one graphic by two -- by adding a black copy of original offset to create a drop shadow. For raster effects, the model is to replace the input graphics with one or more raster "layers" which contain processed renderings of the original graphic. In the above example, the shadow could be blurred and become a raster layer. (Ideally the original graphic could still be rendered as vectors. This would allow a soft drop shadow while retaining the vector nature of the original graphic.)

But isn't image processing horrible and evil because it's device dependent?

Well, the answer is clearly no. Images are routinely rendered via "device independent" 2D graphics systems. Most artists using image editing applications such as PhotoShop rarely, if ever, work with device pixels or at device resolution. Photographs and other continuous tone images are scanned or painted and printed on a variety of output devices all the time. We are proposing to take this one step further -- rather than always specifying images as raw sample data to the 2D imaging model, why not add an additional mechanism to specify image data by generating it on the fly within the 2D graphics system itself?

Consider the traditional 2D graphics pipeline:

 

Traditional 2D graphics pipeline

 

Vector graphics primitives are specified abstractly and rendered onto the output device through a geometric transformation called the current transformation matrix, or CTM. The CTM allows the vector graphics code to be specified in device independent coordinate system. At rendering time, the CTM accounts for any differences in resolution or orientation of the input vector description space and the device coordinate system. According to the "painter's model", areas on the device which are outside of the vector graphic shapes remain unchanged from their previous contents (in this case the droplet pattern).

Consider now, altering this pipeline slightly to allow rendering the graphics to an intermediate continuous tone image which is then rendered onto the output device in a second pass:

 

Rendering via Continuous Tone
Intermediate Image

We introduce a new transformation matrix called the Effect Transformation Matrix, or ETM. Vector primitives are rendered via the ETM onto an intermediate continuous tone image. This image is then rendered onto the output device using the standard 2D imaging path via a modified transform, CTM', such that the net effect of ETM followed by CTM' is equivalent to the original CTM. It is important to note that the intermediate continuous tone image contains coverage information so that non rendered parts of the original graphic are transparent in the intermediate image and remain unaffected on the output device, as required by the painter's model. A physical analog to this process is to imagine rendering the vector primitives onto a sheet of clear acetate and then transforming and overlaying the acetate sheet onto the output device. The resulting imaging model remains as device-independent as the original one, except we are now using the 2D imaging model itself to generate images to render.

So far, we really haven't added any new expressiveness to the imaging model. What we have done is reformulated the traditional 2D rendering model to allow an intermediate continuous tone rasterization phase. However, now we can extend this further by allowing the application of image processing operations on the intermediate image, still without sacrificing device independence. In our model, the intermediate image can be operated on by a number of image processing operations which can effect both the color and coverage channels. The resulting image(s) are then rendered onto the device in the same way as above.

Rendering via Continuous Tone
Intermediate Image with Image Processing

In the picture above, the intermediate image was processed in two ways. First a simple bump map lighting calculation was applied to add a 3D look, then another copy of the original layer was offset, blurred and colored black to form a shadow. The resulting transparent layers were then rendered via the painter's model onto the output device.

But isn't image processing an unbounded problem?

Of course, in general this is true. However, if we limit ourselves to the set of image processing operations frequently used for artistic effect, the set is much more limited (although admittedly still open ended). There are numerous books on creating "Web Art" with various image editing applications. Applications such as Adobe ImageStyler (which created the illustrations in this paper) in fact use a very limited fixed set of image processing primitives. When used in combination, this relatively small set of operations can lead to an extremely wide variety of interesting and useful effects. The group of operations proposed here is based on a survey of "how-to" books, applications, examples seen on the Web, and our own experience implementing painting, image editing and compositing systems over the past decade. That being said, the set of operations should certainly be reviewed and refined by others with relevant experience. We are confident, however, that with a dozen or so basic imaging operations, we could easily describe most of the common effects used on the Web in SVG.

 

15.5 The Model in More Detail

15.5.1 Image Coordinate Space

An effect's image processing operations all occur in a raster coordinate space called image space. Within image space, each image has a position and extent (width and height) which are quantized to integers such that all images align precisely with each other and with the pixel grid in image space. The positions and extents are used in image processing calculations. For single image operations like GaussianBlur, the extent and position of the result will be grown by the width of the filter kernel. For operations which take more than one image, like Composite, the images are "registered" so that corresponding pixels in image space are operated on. The extent of the result of a Composite(Over) could be as large as the the extent of the bounding box containing both input images. Similarly the extent of a Composite(In) operation will be no larger that the extent of the intersection rectangle of the input images. Tile images and constant color images have infinite extent. Images can be offset within image space during the effect by the Offset operation.

15.5.2 Declarative Processing Representation

Data Flow Model

A raster effect is defined by specifying a data flow of processing nodes. Each node takes some number of input RGBA rasters and produces a single result. These nodes are wired together to create an overall effect. The overall effect is specified by the following attributes.

Effect Specification

Key

Type

Semantics

Name

id

Id for this effect

EffectMatrix

matrix

(optional) Transformation matrix specifying the transform from current user space, to Image Space (called ETM in the example above). If not present, this defaults to the current User to Viewport Space transformation. If the Viewport space to Device space transform is the identity (as it commonly is for rendering on the display), Image Space, Device Space and Viewport Space are all the same and some computation can be avoided.

Padding

left, right, top, bottom

(optional) determines the amount of extra padding used by the effect. These numbers are integers specified in Image Space and reflect the difference between the SourceGraphic's extent in ImageSpace and the extent of the output from the final Merge node.

NodeList

array

(required) The ordered list of processing nodes for this effect. The last processing node in the list must be a Merge node. The result of the Merge node is a single RGBA image which is then rendered onto the output device via a transformation matrix:

(EffectMatrix)-1 * CTM

such that the original graphic is subject to the same net transformation (i.e. the CTM) as specified by the SVG state.

15.5.3 Image Processing Filter Nodes

All filtering operations take 1 to N input RGBA images, additional attributes as parameters, and produce a single output RGBA image. The inputs are named references to a previous node, or a one of the special "builtin" inputs. The builtin inputs represent the "raw material" for the effect and are based on image space renderings of various pieces of the current SVG state.

Name of Built-in Image Inputs:

Builtin Input Name

Description

SourceGraphic

This built-in input represents the result of rendering the source SVG graphic to an initially clear RGBA raster in ImageSpace. Pixels left untouched by the original graphic will be left clear. The image is specified to be rendered in linear RGBA pixels. The alpha channel of this image captures any anti-aliasing specified by SVG. (Since the raster is linear, the alpha channel of this image will represent the exact percent coverage of each pixel.)

SourceAlpha

Same as SourceGraphic except only the alpha channel is specified. The color channels of this image are implicitly black and are unaffected by any image processing operations. Again, pixels unpainted by the SourceGraphic will be 0. The SourceAlpha image will also reflects any Opacity settings in the SourceGraphic.

FillPaint

This image represents the color data specified by the current SVG rendering state, transformed to image space. The FillPaint image has conceptually infinite extent in ImageSpace. (Since it is usually either just a constant color, or a tile). Frequently this image is opaque everywhere, but it might not be if the "paint" itself has alpha, as in the case of an alpha gradient or transparent pattern. For the simple case where the source graphic represents a simple filled object, it is guaranteed that:

SourceGraphic = In(FillPaint,SourceAlpha)

where In(A,B) represents the resulting image of Porter-Duff compositing operation A in B (see below).

StrokePaint

Similar to FillPaint, except for the stroke color as specified in SVG. Again for the simple case where the source graphic represents stroked path, it is guaranteed that:

SourceGraphic = In(StrokePaint,SourceAlpha)

where In(A,B) represents the resulting image of Porter-Duff compositing operation A in B (see below).

15.5.4 Image Processing Nodes

The filtering operations are wired together in a directed acyclic graph to produce a single merged result. The acyclic nature is enforced by disallowing any forward references. That is, the input to any node must be either be the name of a previous node, or one of the built-in image sources above.

All filters nodes share the following attributes:

Attribute Name

Type

Semantics

NodeType

string

Identifier for the type of operation that this filter node performs (e.g. "Composite" or "ColorMatrix")

NodeName

id

Name for this stage of the filter . The scope of the name is local to this effect, and needs to be unique across all nodes. Since all filter nodes have one output, the NodeName is used to specify the result of this node as input to a later filter.

Input1, Input2, Input3....

idref

Identifies the source image data to use for the filter. Reference to a previous filter node's NodeName, or to one of the builtin image inputs above.

The following is a catalog of the individual processing nodes proposed. All filters operate on linear premultiplied RGBA samples. Filters which work more naturally on non premultiplied data (ColorMatrix and ComponentTransfer) will temporarily undo and redo premultiplication as specified.

NodeType

ColorMatrix

Image Inputs

1

Attributes

matrix, 20 element color transform

Description

This filter performs

 

        | R' |     | a00 a01 a02 a03 a04 |   | R |

        | G' |     | a10 a11 a12 a13 a14 |   | G |

        | B' |  =  | a20 a21 a22 a23 a24 | * | B |

        | A' |     | a30 a31 a32 a33 a34 |   | A |

        | 1  |     |  0   0   0   0   1  |   | 1 |

 

for every pixel. The RGBA and R'G'B'A' values are automatically non-premultiplied temporarily for this operation.

Comments

This filter allows linear conversion from and to other useful color spaces, like YCrCb. Saturation/Desaturation can be implemented that way, as well as operations like converting grayscale data to transparency values. We might consider short hand notation for certain common operations such as hue-rotation, brightness, luminance-to-alpha, etc.

Implementation issues

These matrices often perform an identity mapping in the alpha channel. If that is the case, an implementation can avoid the costly undoing & redoing of the premultiplication for all pixels with A = 1.

NodeType

Color

Image Inputs

none

Attributes

color, RGBA color

Description

Creates an image with infinite extent filled with color

NodeType

ComponentTransfer

Image Inputs

1

Attributes

Fr, Fg, Fb, Fa 4 transfer functions (TBD -- how to specify?)

Description

This filter performs component-wise remapping of data:

R' = Fr( R )

G' = Fg( G )

B' = Fb( B )

A' = Fa( A )

 

for every pixel. The RGBA and R'G'B'A' values are automatically non-premultiplied temporarily for this operation.

Comments

This filter allows operations like brightness adjustment, contrast adjustment, color balance or thresholding. We might want to consider some predefined transfer functions such as identity, gamma, sRGB transfer, sine-wave, etc.

Implementation issues

Similar to the ColorMatrix filter, the undoing and redoing of the premultiplication can be avoided if Fa is the identity transform and A = 1.

NodeType

Composite

Image Inputs

2

Attributes

operator, one of (over, in, out, atop, xor, add, sub, mul, dissolve)
dissolve-value (only if operator is dissolve).

Description

This filter performs the combination of the two input images pixel-wise in image space.

over, in, atop, out, xor, and dissolve use the Porter-Duff compositing operations.

For these operations, the extent of the resulting image can be affected. In other words, even if two images do not overlap in image space, the extent for over will essentially include the union of the extents of the two input images.

add, sub, and mul perform component-wise arithmetic.

Comments

add, sub, and mul are necessary to make use of the output from the DiffuseLighting and SpecularLighting filters.

NodeType

DiffuseLighting

Image Inputs

1

Attributes

lots --TBD.

Description

Light an image using the alpha channel as a bump map.

Comments

This filter produces a light & shadow map, which can be combined with a texture image using the mul compositing method. Multiple light sources can be simulated by adding several of these light maps together before applying it to the texture image.

NodeType

DisplacementMap

Image Inputs

2

Attributes

scale
x-channel-selector one of R,G,B or A.
y-channel-selector one of R,G,B or A.

Description

Uses Input2 to spatially displace Input1, similar to the Photoshop displacement filter. This is the transformation to be performed:

P'(x,y) <- P( x + scale * XC(x,y), y + scale * YC(x,y) )

where P(x,y) is the source image, Input1, and P'(x,y) is the destination. XC(x,y) and YC(x,y) are the component values of the designated by the x-channel-selector and y-channel-selector. For example, to use the R component of Image2 to control displacement in x and the G component of Image2 to control displacement in y, set x-channel-selector to "R" and y-channel-selector to "G".

Comments

The displacement map defines the inverse of the mapping performed.

Implementation issues

This filter can have arbitrary non-localized effect on the input which might require substantial buffering in the processing pipeline. However with this formulation, any intermediate buffering needs can be determined by scale which represents the maximum displacement in either x or y.

NodeType

GaussianBlur

Image Inputs

1

Attributes

std-deviation.

Description

Perform gaussian blur on the input image.

Implementation Issues

The implementation could chose to use an approximation to the gaussian blur with a separable filter whose performance scales linearly with resolution. [Suggested implementation forthcoming.]

Frequently this operation will take place on alpha-only images, such as that produced by the built-in input, SourceAlpha. The implementation may notice this and optimize the single channel case.

If the input has infinite extent and is constant, this operation has no effect. If the input has infinite extent and is a tile, the filter is evaluated with periodic boundary conditions.

NodeType

Image

Image Inputs

none

Attributes

imageref, reference to external image data.
imaging-matrix, (optional) matrix used to render image.

Description

Refers to an external image which is loaded or rendered into an RGBA raster. If imaging-matrix is not specified, the image takes on its natural width and height and is positioned at 0,0 in image space.

The imageref could refer to an external image, or just be a reference to another piece of SVG. This node produces an image similar to the builtin image source SourceGraphic except from an external source.

NodeType

Merge

Image Inputs

any number

Attributes

none

Description

Composites input image layers on top of each other using the over operator with Input1 on the bottom and the last specified input, InputN, on top.

A Merge node is required as the last node of any effect.

Comments

Many effects produce a number of intermediate layers in order to create the final output image. This filter allows us to collapse those into a single image. Although this could be done by using n-1 Composite-filters, it is more convenient to have this common operation available in this form, and offers the implementation some additional flexibility (see below).

Implementation issues

The canonical implementation of Merge is to render the entire effect into one RGBA layer, and then render the resulting layer on the output device. In certain cases (in particular if the output device itself is a continuous tone device), and since merging is associative, it may be a sufficient approximation to evaluate the effect one layer at a time and render each layer individually onto the output device bottom to top.

If the Merge specified that final "on top" input to be SourceGraphic, the implementation is encouraged to render the layers up to that point, and then render the SourceGraphic directly from its vector description on top. This optimization is also possible if intermediate layers are specified as SourceGraphic, but requires evaluating and drawing intermediate effect layers.

NodeType

Morphology

Image Inputs

1

Attributes

operator,one of erode or dilate.
radius,
extent of operation

Description

This filter is intended to have a similar effect as the min/max filter in Photoshop and the width layer attribute in ImageStyler. It is useful for "fattening" or "thinning" an alpha channel,

The dilation (or erosion) kernel is a square of side 2*radius + 1.

Implementation issues

Frequently this operation will take place on alpha-only images, such as that produced by the built-in input, SourceAlpha. In that case, the implementation might want to optimize the single channel case.

If the input has infinite extent and is constant, this operation has no effect. If the input has infinite extent and is a tile, the filter is evaluated with periodic boundary conditions.

NodeType

Offset

Image Inputs

1

Attributes

dx,dy

Description

Offsets an image relative to its current position in the image space by the specified vector.

Comments

This is important for effects like drop shadow etc.

NodeType

SpecularLighting

Image Inputs

1

Attributes

lots --TBD.

Description

Create specular reflections using the alpha channel as a bump map

Comments

This filter produces an image which contains the specular reflection part of the lighting calculation. Such a map is intended to be combined with a texture using the add compositing method. Multiple light sources can be simulated by adding several of these light maps before applying it to the texture image.

Implementation issues

The DiffuseLighting and SpecularLighting filters will often be applied together. An implementation may detect this and calculate both maps in one pass, instead of two.

NodeType

Tile

Image Inputs

1

Attributes

none

Description

Creates an image with infinite extent by replicating source image in image space.

NodeType

Turbulence

Image Inputs

1

Attributes

base-frequency
num-octaves
type,
one of fractal-noise or turbulence.

Description

Adds noise to an image using the Perlin turbulence-function. It is possible to create bandwidth-limited noise by synthesizing only one octave. For a detailed description the of the Perlin turbulence-function, see "Texturing and Modeling", Ebert et al, AP Professional, 1994.

If the input image is infinite in extent, as is the case with a constant color or a tile, the resulting image will have maximal size in image space. (That is, its position and extent will match that of the output of the final Merge node, as specified by the effect's Padding attribute. )

Comments

This filter allows the synthesis of artificial textures like clouds or marble.

Implementation issues

It might be useful to provide an actual implementation for the turbulence function, so that consistent results are achievable.

If the image

Additional notes and comments: