Minutes 2006-08-04: Friday morning

Present:
  Norm (chair)
  Murray (host)
  Jeni (scribe)
  Henry
  Mohamed
  Alex
  Richard

Norm: What do we want to say about pipelines, pipeline libraries,
recursive pipelines etc. First: is it reasonable to have a pipeline
inside another pipeline?

Henry: I would like to, for modularity. It's a choice to package up
steps into a named pipeline.

Richard: You should be able to do that, but in other programming
languages you have multiple functions, but usually do not put
functions inside other functions. If you do have functions inside
functions, it's usually to give the inner function access to
information in the outer function. There's hiding the function from
the outside environment...

Henry: Hiding isn't a big deal.

Richard: If we don't want to access names in the outer pipeline, then
you don't need this.

Henry: I don't want the inner pipelines to access information from the
outside one. In Java, if I'm a novice, I write classes inside other
classes.

Jeni: But in Java, you have to create another file. In our language,
you don't, so why would you want to embed it?

Murray: What's the difference whether it's embedded or not?

Henry: When I pass the file to another user, it's no longer obvious
which pipeline should be run.

Murray: Not yet, but we could provide a mechanism to say which one is
going to be run. But I was really asking Richard why he cared...

Richard: Because in other languages, there are other semantics
associated with nesting a function inside another one, to do with
accessing information in the outer function.

Henry: That's why it doesn't work for me. Suppose my common code used
a pipeline parameter.

Murray: Why isn't it all in scope when the functions are at the same
level? It doesn't matter if our pipeline language works differently
from programming languages.

Norm (and others): Yes it does.

Richard: We also might, in the future, want to provide some semantics
to nested pipelines. Here's a Java example:

class foo {
  int a;
  class bar {
    int b, c;
    ...
    a = b + c;
  }
}

'a' is available in the inner class 'bar', but not from outside.

Murray: In the case the function is outside, you have to pass the
arguments. If the function is inside, you don't have to pass the
argument into the function: it's just there.

Henry: I want it to be a software engineering choice. If I have:

<pipe>
  S1
  S2
  S3
  ...
</pipe>

If I want to package S2 and S3 into a named pipeline. I'm happy to
have named inputs and outputs, and to have it encapsulated, but I want
to have parameters passed in automatically. I want parameters to be
lexically scoped, but not ports.

Richard: A choose can use ports from outside: normally ports are
lexically scoped as well.

Alex: What would be the problem of saying that you have to declare
parameters: if you want to pass the parameter, then you should declare
it.

Henry: I could live with that. But then it doesn't matter whether it's
inside or outside. I have my mind on the simple user with a single
pipeline element.

Richard: Nesting should correspond to scoping.

Alex: Nesting with encapsulation makes sense to me: the pipelines are
only accessible in the parent pipeline.

Norm: It seems odd that it's only one level deep.

Henry: I agree with Alex.

Alex: But because it's a black box, this doesn't solve the pipeline
library problem.

Henry: We have that as well.

Jeni: We need pipeline libraries, and they do what we need to do, so
why make the language more complex by adding this ability?

Henry: You can't use that argument, because removing constructs from the
language doesn't make it simpler to use the language.

Murray: Using nested pipelines makes absolute sense to me.

Example:

<pipeline>
  <pipeline name="a">
  </pipeline>

  <pipeline name="b">
    <pipeline name="c">
    </pipeline>
  </pipeline>

  <step>...</step>
</pipeline>

Can you call 'a' from 'b'?

Henry: No.

Murray: Then I don't understand.

Norm: If 'b' can't call 'a', then my user who wants to modularise
something that's common from 'a' and 'b' to 'd', and pulls it out, but
can't call it, is completely baffled.

Murray: The step asks me to run 'b'. Surely I should be aware of 'a'.
That's what makes sense to the naive user.

Henry: So named pipelines always get put into the pipeline library.
You can run one of those by name.

Norm: So now 'a', 'b' and 'c' are all peers and all callable from each
other.

Alex: The library changes as you go in: you add things to it. When you
go inside 'b', 'c' is added to the pipeline scope.

Murray: Naive user. That 'c' is inside of 'b', and the only way I can
run 'c' is by invoking 'b'. So I can run a pipeline that's inside of
me, or outside of me, but no one else can run pipelines that are
inside me.

Richard: I agree that we can do this, but if we do, we will have to
decide a lot of things that are quite complicated, and we should leave
it 'til version 2.0.

Murray: That's a good reason for not doing this.

Alex: Personally, pipeline libraries are useful, but let's leave
*them* to version 2.0. Because import is complicated.

Henry: I assumed that you'd just specify all your pipeline libraries
on the command line.

Norm: I think we need to have pipeline libraries. I think Richard's
right that pipeline libraries with pipelines all at the same level is
sufficient. We might later think it's too much work for naive users.
We can always do that later.

Richard: I think we should do it later. We shouldn't pre-empt the
semantics of nested pipelines, which we might add later.

Murray: We have to do the libraries, with some include mechanism. I
like the nesting, but I understand Richard's argument that this is too
much for us to take it on right now. I don't think we should make the
decision now: I think we should include it in the document, say we're
uncertain, and then later pull it, unless users come back saying that
they really need it.

Alex: Don't we have group? Can't we use that?

Henry, Norm: It's not the same thing.

Murray: Can we call this procedure rather than pipeline?

Norm: Let's talk about that later.

Murray: How is this nesting thing not like groups?

Richard, Norm: Groups get executed when you come across them: they
just provide some scope: you can't call them again.

Murray: Can we conflate them?

Norm: I don't like the idea of asking the public whether we should do
something. All we'll ever get from the public is "yes, we should do
it". We should give them the minimum, and get them to ask for more.

Alex: So can we talk about pipeline libraries?

Henry: <pipeline-library> contains zero or more <pipeline> elements.
We're done.

Jeni: We need defaulting.

Henry: <pipeline-library> contains zero or more <pipeline> elements,
and a default-pipeline attribute that points to one of them.

Norm: Let's get agreement on pipeline libraries.

Alex: We shouldn't have default-pipeline. We just supply the QName
when we call the pipeline.

Norm: If you have to point to the library, then it's no cost to
provide the name as well. To review: A pipeline library contains zero
or more pipelines, all of which have names. (Zero-or-more or
one-or-more...) I don't feel strongly about defaulting.

Richard: I want to just refer directly to the library, just like in C,
you have a 'main'. A library can have a default pipeline in it, that
gets executed if you get given the library.

Norm: Java has this functionality. It seems no effort, and has some use.

Mohamed: What about including other libraries?

Richard: We should use import rather than include. Include implies
textual inclusion. With import, the pipeline library might be already
compiled, and the only things that are available are some packaged
information.

Alex: Can you import inside the pipeline library?

Richard: Yes, you have to import from the pipeline library.

Norm: I suggest we leave off default-pipeline attribute for now. The
<import> has a source attribute that points to the imported library. It
can go in <pipeline-library> and in <pipeline>

Murray: I think pipeline libraries should have a name for debugging
purposes, so if I loaded it, debugging information would be raised.

Richard: I think it should have a name as well.

Norm: OK, optionally have a name.

Jeni: We shouldn't allow <import> within <pipeline>

Richard: You might have a single <pipeline> element in a file; you
should be able to import pipelines into it.

Jeni: No: if you need to reuse pipelines, you have to ramp up to
having a pipeline library.

Mohamed: You should import pipeline by QName rather than URI.

Alex: I would be happy with a <import> that excluded the URI, and tell
the implementation you need pipelines by name.

Richard: So do you expect a catalog mechanism so that I can get
libraries by URI when I'm not connected to the 'net? Is this our
problem?

Norm: This isn't our problem, just as it isn't XSLT's or Schema's
problem: it's implementation-defined how the documents are retrieved
given a URI.

Henry: If I have a pipeline and Richard says he has a library. I
thought that I had to say on the command line where the library is,
but everyone said that was crazy. So I need an import library
statement that I can put in my pipeline.

Jeni: You add <pipeline-library> around it and add <import>

...much discussion about the requirement for naive users to add
<import> in their standalone <pipeline> skipped...

Norm: I'm looking for a compromise. Suppose we go back to the GCC
model: you supply the pipeline libraries at the command line.

Richard: I think pipelines are going to be little things that they
want to run. They don't want to have to do this at the pipeline. I
think we should allow <import> within <pipeline> when <pipeline> is a
document element. But in a pipeline library, you have to put it at the
top leve.

Alex: So if I rip out a pipeline from the pipeline library and try to
run it, then it would be invalid. Plus if I put a pipeline into a
library, I need to move the <import> into the top level of the
pipeline library.

Murray: What was the logic behind not having the wrapper with a
standalone pipeline and putting the <import> inside that wrapper?

Norm: Most users are going to have simple pipelines, and they're not
going to want to write the wrapper.

Alex: If <pipeline> can have <import> inside it, then it should be
able to do that within a pipeline library.

Jeni: Are the imported pipelines visible within the <pipeline> itself
or in the entire library?

Alex: Only in the <pipeline> that contains the <import>.

Norm: Recap:

We will have a <pipeline-library> element that can contain pipelines.
It has an optional name. You can import pipelines from another
pipeline library. A pipeline can also stand by itself, which can
import other pipeline libraries. You can import a standalone pipeline.

Jeni: Circularity?

Norm: If you import a library that you've already imported, you don't
worry: all the pipelines you import are available.

Murray: I should be able to have an import in a pipeline in a pipeline
library, so I can cut and paste.

...

Norm: What about saying that a standalone pipeline can't be imported.
We have a syntactic warp in allowing import within a pipeline in one
place and not another; this is a way of getting around it.

Richard: To go back: if A imports B and C, then C shouldn't be able to
access pipelines in B.

Alex: In XSLT, you can.

Richard: In C you can't.

Norm: In XSLT you can.

Richard: It means that there are libraries that will work in some
contexts but not another.

Norm: We can say that if any pipeline library contains a step that
references a pipeline that isn't imported then it's an error.

Richard: So names are globally scoped.

   A
  / \
 B  C
    |
    D

A can see things in B and C and D. B can only see things in B. C can
see things in C and D. D can only see things in D.

Richard: So everything in the libraries that you import gets
automatically exported. What about circularity.

   A <-+
  / \  |
 B  C  |
    |  |
    D -+

Henry: Where you start is the top (A). You stop at D.

(agreement)

Norm: The name for the import statement is <import> with an attribute
called 'source' (this is consistent with what we do with <input>).

Alex: In pipeline libraries, we also have to deal with declaring components.

Norm: Yes, we need to deal with extension components.

Alex: We should put it in the pipeline libraries.

DECISION: We have pipeline libraries with <pipeline-library> document
elements, with an optional name attribute and containing multiple
pipelines. We have standalone pipelines with <pipeline> document
element. Both can have <import source="URI" />* as children of the
document element. This points to either a pipeline library or a
standalone pipeline. As well as the built-in components and
implementation-defined components, a pipeline library or a standalone
pipeline has in scope all the pipelines of all the pipeline libraries
or standalone pipelines that it imports, recursively.

No consensus on a default pipeline to run within a pipeline library.

BREAK

Inputs and outputs.

Henry: This isn't a proposal for naming, it's an analysis that may help.

A component is a named box with named things that data comes into and
named things that data comes out of. We have the ability to replicate
them, and use things to connect these boxes together. I propose
declaring components with:

<comp name="xslt">
  <inputs>
    <port name="doc" arity="1" />
    <port name="ss" arity="1" />
  </inputs>
  <outputs>
    <port name="result" arity="1" />
  </outputs>
</comp>

and parameters go in here as well, but this discussion doesn't
incorporate parameters.

We have something new now, which covers four language constructs: group,
for-each/viewport, choose and when. These are all containers for
steps, with their own paired in/out at the top and out/in at the
bottom. Choose actually looks almost like this, but the things inside
are containers as well.

<step kind="xslt">
  <input name="doc" (source="p!x" | href="http://...")
                    [select="..."] />
</step>

This is similar to what we've talked about before, except that
source->href and ref->source.

So how to do we do the in/out and the out/in for the containers. We
have a combination of <port> and <input>:

<iface name="x" arity="..."
       (source="p!x" | href="http://...")
       [select="..."] />

<oface name="y" arity="..."
       (@source | @href), @select? />

Richard: What about pipelines?

Henry: Pipelines are like components, in that they have some named
ports at the top and the bottom. But we can't call them inputs and
outputs.

The value of the source attribute must always be the name of a
Component ! the name of a port on a component or the name of a port on
oface.

Richard: I think pipelines have all of these things. You need to say
what inputs they have, just like for component definitions. And you
need to define the inputs for within the pipeline, and you need to
bind an input for use within the pipeline.

General agreement.

Richard: An input for a pipeline doesn't have a source.

Henry: It *could*.

Richard: But it doesn't *need* it. <iface> got its source from <input>.

Jeni makes the point that the out-facing ports may have different
names from the in-facing ports within the container.

Henry combines them by making them siblings and writes up:

<iface|oface>
  <input @name, (@source | @href), @select? />
  <port @name, @arity />
</iface|oface>

Henry: I'd like to digest this for a while before we discuss names.

Richard objects to the naming of one thing <port> and another thing
<input> since an input is a port.

We decide to think on the naming for a while.

---

Core components
---------------

Norm: We've talked about various components like XInclude, validate,
XSLT. What are the others?

List:

XInclude
XSLT[1|2]
validate*
xquery
load
save
identity
httprequest
aggregate
disaggregate
subsequence
escape (string to XML for RSS)
unescape (XML to string for RSS)
XPath[1|2]filter
wrap
wrap-sequence
insert (attributes|elements|change values)
ns-rename
delete (subtrees|attributes)
rename (attributes|elements)
strip whitespace
absolutize (absolutize selected URIs)
prettyprint
exec
os-access (get directory/environment variable etc)
sort (sorts elements)
regex (destructures a string)
bitbucket/sink
doc-replace (replaces an input with another one)
diff
c14n
encrypt
decrypt
sign
verify
label (adds IDs to all elements)
line number
push-tag (wrap selected elements with a wrapper)
soap-exchange
SPARQL
manifest/packaging
render XSL-FO/SVG/MathML
tagsoup
wikify
sgml-in
schema-check
apply (pipeline)
grddl (returns RDF from XML document)
STX (streaming transformation)
NVDL (namespace validation)
uptranslate
downtranslate
forward-chain-RDF
replicate
load-escaping-entity-references
save-disable-output-escaping

(During the course of generating the list)

Henry: I need two versions of load/save/identity, for different arities.

Richard: We've agreed that a sequence of one document is acceptable to
a port with an arity of 1.

Henry: I think we should either declare arities and enforce them
statically, but if we're not doing that, then we don't need two
versions of load/save/identity.

Jeni: We need, for example, load as well as a href/source attribute,
to allow the URI to be, for example, passed in as a parameter.

...

We have agreement that xml:base processing happens automatically, but
we have to talk about what happens in terms of the base URI of outputs.

We also need to talk about security at some point.

...

Alex: We should have modules of components that vendors may implement.

(general agreement)

...

Murray: What about entities?

Henry explains a case where the entities were escaped on load and
unescaped on save. We need to talk about character encodings in the
pipeline: we need to provide a way of preserving a character encoding
through the components.

Murray: I use entities for reuse: I don't want them expanded.

Norm: You have to use XInclude or other mechanism.

Richard: Nothing else in the XML stack does this.

Henry: I want a load-while-escaping-entities step.

Richard: We could have a component that turns the DTD into an XML
document that can be passed through to a later component, that can
then reconstruct the DTD for the entities.

We want to come back to preserving entities.

...

Henry: I'd like to talk about built-in parameters which have
information from the XML declaration.

Richard: Encoding and version are in the Infoset already.

...

Alex: I want to have some general declarations on serialization parameters.

Henry: We should put those on the output port declaration, to give
hints to the implementation.

...

What about core components? If no one objects, they're included...

XInclude
XSLT 1.0
validate
identity
aggregate

Alex objects to load because he wants httprequest.

Murray objects to all the rest.

We decide to take a different tack.

BREAK FOR LUNCH