Scheme48 for collaborative Engineering?

Daniel W. Connolly
connolly@hal.com
$Id: scheme48-review.html,v 1.1 1994/07/18 20:10:32 connolly Exp $

I've been poring over the scheme48-0.36 stuff for the last week or so, and I finally got to the point where I can read some of the code, and I've got a bunch of ideas and questions.

I'm coming at this from the comp.lang.* background, with an eye toward selecting technology for collaborative development of WWW applications.

I am concerned with the trend of developing "killer apps" using limited technology and/or hopelessly un-reusable code. The Mosaic code is kinda messy. There was an outcry to rewrite it as a Tcl app. But tcl doesn't lend itself to any sort of static analysis or optimization.

I see scheme48 as a basis for developing correct, well-abstracted code that can run efficiently. This is the only kind of code that has the potential to solve a given problem "once and for all." (or at least well enough that for the next 5 or 6 years, we will only have to retarget/recompile -- not rewrite -- the code).

The scheme48vm seems to be very fast, and since it supports full continuations, it should be able to support any of the popular interpreted programming languages -- python, perl, icon, tcl, smalltalkd, etc. (as well as ML, prologue, and other functional/logic languages), with suitable bytecode compilers and runtime libraries.

I like lisp and scheme, and the combination of one bright person and a good lisp/scheme development environment can produce some great things. I'm not so optimistic about what happens when you get a hundred folks, some bright and some not-so-bright, and task them with writing an application in scheme. I'm interested in integrating mechanisms into scheme48 to support such collaborative engineering.

lisp/scheme code is notoriously difficult to read and maintain. I have heard this claim at least a hundred times, and I used to dismiss it as rhetoric. But when I contrast my initial experience with Modula-3 (and it's wannabe-workalike, python) with my voyage through the scheme code in the scheme48-0.36 distribution, I think I have some real issues to back this claim.

The Namespace Problem

scheme48 has a module system, but the namespace is still essentially flat! Consider:

interface X defines names A, B, and C
interface Y defines C, D, and E
I want to write a module that uses both X's C and Y's C. How do I do it?

[Perhaps I've missed something... perhaps it's possible to rename identifiers in an interface definition...]

Also, it's not the case that once I've found a suitably unique interface/package/module name that I can use any name I want within the package. I still have to worry about collisions between names I make up and names from other packages that folks might want to use with this one.

Reading this style of code is no picnic either. If I print out some scheme source file foo.scm and come across:

	(define (foo-func a1 a2)
		(bar-func (+ a1 a2)))

there is nothing in foo.scm that tells me where to find bar-func. I have to discover the config file(s?) that include foo.scm, and then I have to search each of the structures that are open with foo.scm is included. Basically, to find the definition of bar-func, I have to search ALL names defined in the whole application! And if grep finds "(define (bar-func ..." more than once, I have to determine which structures are open when compiling foo.scm, and discover which of the bar-func's is the right one.

Are there mechanisms in the scheme48 development environment to help in this area?

I see that there is work being done on some "infix" language or dialect of scheme. I like this idea, and I strongly suggest that it support a Modula-3/python style namespace, using

	import X, Y, Z
	import X as xx, Y as yy
	from X import a,b,c

But for the purpose of static analysis, leave out python's:

	from X import *

This doesn't address the issue of how to make effective use of existing bodys of code written in scheme. Hmmm....

Lact of Syntactic Hints

The scheme macro system is nifty. scheme48 uses it to represent a zillion different (define-xxx ...) forms. Those forms get real hard to read real fast. The ubiquitous (define-record-type ...) form is a good example.

Is it possible/practical to use keyword arguments, if not in real functions, then at least in the syntactic macros? I wouldn't mind seeing:

	(define-record-type rec :brand :rec
			    :constructor (make-rec f1 f2)
			    :fields
			     (f1 rec-f1 rec-set-f1!)
			     (f2 rec-f2 rec-set-f2!)
			    )

Information hiding

Is there anything preventing somebody using record-set! on pretty much any record object? If not, it seems that the language does not support information hiding, and it seems that it would be very difficult to develop "safe" interfaces.

Static Analysis

When developing large applications, my experience is that static type checking reduces the net time to develop correct implementations, even though it increases the time to write the initial code.

I don't fully grok ML, but I understand that it has a powerful type system, and I gather it could be compiled for the scheme48 vm. I'm looking into that...

I saw some code in the scheme48 distribution that appeared to do static type checking of scheme programs. This looks interesting. At least with scheme, there is the possibility of static analysis and optimization, unlike tcl and python.

A Smalltalk Object system?

I understand that there are many ways to do object-oriented programming in scheme. But I doubt code can be reused across techniques. So in order to build a large code base of reusable smalltalk-like objects, a community must choose one technique.

The namespace for interfaces, classes, methods, variables, types, and such is also an issue for "programming in the large." obj.meth(arg1, arg2, arg3) ((python-getattr obj 'meth) arg2 arg3))

Foreign function interface

Several descriptions of Scheme48 imply that scheme48 has a powerful and easy-to-use FFI. I have experience with the Tcl FFI, which is simple but limited because all data is represented as strings, and the Python FFI, which is powerful, but difficult to use (an unsuitable for multi-threaded programs) because of reference counting.

Why were the socket functions coded using vm extensions in stead of foreign functions?

How does a C function create an manipulate objects in the VM heap? Since the heap "address" of an object can change during garbage collection, how does a C object refer to a scheme stob?

Some Questions...

It looks like the VM heap is allocated in main with a single call to malloc(), and it never grows after that. Is that right? Do you plan to change that? If not, why not?
I've looked at several lisp implementations, but not very sophisticated ones. I understand reference counting and mark-and-sweep garbage collection. I have heard the terms "incremental" and "generational" garbage collection, but I don't grok fully. The scheme48 documentation mentions a "copying" garbage collection scheme. What are the plusses and minuses of incremental, generational, and copying garbage collection, and how does the scheme48vm score? (if this answer can't be explained without my reading a bunch of stuff to get up to speed, could somebody point me toward some reading materials, preferably internet resources?)