Radical Imperativism is the view that all computation happens in some order, and by golly we should be able to take advantage of it. --OLW 17:20, 10 September 2007 (PDT)
 Internal Define is Wrong
I am cutting and pasting this from a pair of emails I sent to the ERR5RS list. Making the writing a single coherent piece conforming to wiki formatting will take some time. --OLW 01:23, 11 September 2007 (PDT)
Thanks for your thoughtful response, Andre. I will present an alternative viewpoint, which I hope will help delineate the subset of programs that should be defined by ERR5RS. Otherwise I would just be venting my spleen without making a positive contribution. I disagree with most of your assertions below to the extent they are unqualified as being for consistency with R6RS. Consistency with R6RS (short of bug-for-bug compatibility, to use Will's words) is a reasonable goal without having to make overly broad claims. First, I need to make a digression
After some contemplation, I think a significant problem is the nature of "define". To my knowledge, it is the only construct in R5RS which falsifies the "a very small number of rules for forming expressions, with no restrictions on how they are composed" statement. In particular, you can't place defines arbitrarily in a body. What's more, the chosen semantics for internal defines are inconsistent with those of top-level defines. I suppose the intent was to remove some layers of nesting, but the effect in R6RS has been taken to the extreme - top-level defines have been redefined to be consistent with internal defines! For example, let's take
(define bar 1) (define foo (lamdba (x) (if x (define bar 100) (set! bar (+ bar 1))) #t)))
There is one sensible, consistent way to interpret this code, which is that the (define bar 100) establishes a global binding of bar, not a local one. The requirement derives from lexical scoping. This is illustrating by the following sequence of expressions (hopefully it is obvious why this sequence represents the undesirable alternative):
(foo #f) (display bar) => 2 (foo #t) (display bar) => 100 (foo #f) (display bar) => 101
What does this have to do with "shared" versus "separate" bindings? Only that the internal-define requirements are all about separating phases, where the top-level definitions (in the presence of syntax-case, anyway) are effectively evaluated in phases above the expansion of all subsequent expressions, with bindings shared in downward phases. You can write programs that obey the internal-define requirements with top-level semantics, but not vice-versa. Therefore the top-level notion of define (and hence the implied phasing and down-phase sharing of bindings) is more desirable for a power-hungry programmer.
Alas, I do not harbor any hope that the meaning of the define keyword will be made consistent in any foreseeable version of Scheme.
> There were a lot of messages in that thread. In the first message, > Andre prefered a single set of bindings for all phases, but then > changed his mind a few messages later.
I think it is clear that if one insists on a single set of bindings for expand-time and runtime, one would have to recompile a program before running it every time one restarts a session, or otherwise include the compile-time image in the compiled object file. I think that would be an undesirable constraint to impose on compiler-based Scheme systems.
Only the reachable compile-time bindings. Whether it is desirable depends on the user of the language, not the compiler writer.
This also suffers from the internal-vs-top-level define problem - a programmer can write programs that do not need to share variables between phases in implementations that do not require it, but not vice-versa. Note that I am not suggesting run-time should be able to modify compile time values, just refer to them. It seems to me the environment of the higher phase could be regarded as a module implicitly exporting its bindings to the run-time, with the corresponding restriction on mutation.
(For simplicity and performance, my implementation does share a single memory location for the bindings of the same identifier in phases higher than "expand", but it syntactically prohibits access to bindings, even though they may be present in memory, that are used outside their declared levels. So to an extent enforcing levels syntactically is orthogonal to whether the actual objects are shared or not. Communication between levels is possible using mutation of shared objects, but this is not portable and should not be used. The only way in which information can reliably be communicated between levels is through syntax objects and datum->syntax on objects that have an external representation, as described in the thread you mentioned).
Yes, I saw Matthew Flatt denigrating 3D macros. For my taste, implementations should be required to accept arbitrary data in syntax object, in the same spirit that other values are self-quoting [yes, I am conflating constants with values]. It's not clear why limiting the programmer to expressions that have an external syntactic representations is a good idea. On the one hand, you see what the compiler writer wishes to provide, and on the other what the program author wishes to write. If an expression appears meaningful, it should be meaningful.
I agree with the premise of R6RS that portable programs be written to not depend on the unspecified details of the instantiation semantics. This could be seen as analogous to the constraint that portable programs must not depend on the unspecified evaluation order of function calls.
But this is an incorrect statement. A program whose intended meaning depends upon the order of evaluation is not portable, but I can certainly write programs whose output/state varies with the order of evaluation, yet does not depend on that order for correctness. In the same way, I might have a use for knowing the order of instantiation of libraries without requiring any particular order for correct functioning.
> I know I have found an analogous feature of PLT's > syntax-case incredibly annoying (not being able to directly use > variable definitions inside a syntax-case body).
I know it is annoying, but it is also correct. It avoids the even more annoying phenomenon of code that works in the REPL suddenly stopping working when you compile the same code (at compilation time the runtime variable does not have value, so the reference to it from syntax-case will fail). A good reference on this kind of thing can be found in Matthew Flatt's paper whose title ends with "you want it when", which I would recommend if you have not already read it.
If you mean "correct" in any sense other than "portable R6RS", you are not making an defensible claim. What is annoying is that the compiler doesn't make the code work like it did at the REPL, not that the REPL does the right thing.
> If this is correct, could you (Andre, Will, or both) provide a > single coherent explanation of why the separated binding are the > better semantics? In any case, as a user I would prefer to know that > it was going to be one or the other for certain.
As Will said, programs should not depend on unspecified aspects of the semantics. Maybe a good question for err5rs is whether this semantics-independence should be enforced by the implementation (e.g., providing only syntax-rules) or whether one should come up with a simple set of programmer's responsibilities that would, if followed, ensure such independence.
I don't think it should be enforced by the semantics. That is part of the problem with R6RS. The reason for ERR5RS to adopt a strict subset of the R6RS library syntax and semantics is that no consensus appears reachable on what the semantics of the unrestricted subset should be. It is a shame R6RS will not explicitly state what programs will have a definite meaning, and explicitly leave the rest undefined. The more programs you given definitions, the fewer implementations will support ERR5RS.
At least, this is how I understand this aspect of the project, and believe this email demonstrates why the reason is sound. I have also come to understand the lament of some rrrs mailing list poster that an implementor's workshop might encourage implementors to (heaven forfend) be involved in defining the language they are implementing.
Lynn PS. Separated bindings are really a kind of dynamic scoping. The following illustrates what I think lexical scoping would require (visibility of bindings in downward phases):
(define bar 10) (define-syntax foo (lambda (x) (define bar 'a) (syntax-case x () ((_) #'bar) ((_ y) #'(set! bar y))))))) bar => 10 (foo) => a (foo 'b) => bar => 10 (foo) => b
On 9/8/07, AndrevanTonder <firstname.lastname@example.org> wrote:
> On Sat, 8 Sep 2007, Lynn Winebarger wrote: > > > suppose the intent was to remove some layers of nesting, but the effect in > > R6RS has been taken to the extreme - top-level defines have been redefined > > to be consistent with internal defines! > > No. Actually R6RS internal definitions are now consistent with *R5RS* toplevel > definitions (except for forbidding internal redefintions, which R5RS also did). > So R6RS is more consistent with R5RS than R5RS was with itself ;-)
I don't think so, except that R6RS mandates left-to-right evaluation of letrec bindings, which is just another gratuitous change. I refer to http://www.schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-8.html#%_sec_5.1 :[from 5.1]
Definitions and syntax definitions occurring at the top level of a program can be interpreted declaratively. They cause bindings to be created in the top level environment or modify the value of existing top-level bindings. Expressions occurring at the top level of a program are interpreted imperatively; they are executed in order when the program is invoked or loaded, and typically perform some kind of initialization.
R5.97RS has the following mandate for top-level defines (from http://www.r6rs.org/document/html-5.97/r6rs-Z-H-13.html#node_chap_10 ) :
Because definitions and expressions can be interleaved in a <top-level body> (see chapter 8), the expander's processing of a <top-level body> is somewhat more complicated. It behaves as described above for a <body> or <library body> with the following exceptions. When the expander finds a nondefinition, it defers its expansion and continues scanning for definitions. Once it reaches the end of the set of forms, it processes the deferred right-hand-side and body expressions, then generates the equivalent of a letrec* form from the defined variables, expanded right-hand-side expressions, and expanded body expressions. For each body expression <expression> that appears before a variable definition in the body, a dummy binding is created at the corresponding place within the set of letrec* bindings, with a fresh temporary variable on the left-hand side and the equivalent of (begin <expression> <unspecified>), where <unspecified> is a side-effect-free expression returning an unspecified value, on the right-hand side, so that left-to-right evaluation order is preserved. The begin wrapper allows <expression> to evaluate to an arbitrary number of values.
This is considerably different in intent from the above. According to Will Clinger, ([]), the high-level (syntax-rules, to be specific) macros specified in R5RS are indifferent to when the expansion is actually performed. It should be clear the above specification of R6RS is all about making syntax-case macros work consistently with separated phases rather than consistently with prior notions of Scheme. Despite allowing defines to be interspersed at the top-level, those defines might as well be internal defines (modified for letrec* order of evaluation)..
> > (define bar 1) > > (define foo > > (lamdba (x) > > (if x > > (define bar 100) > > (set! bar (+ bar 1))) > > #t))) > > > > There is one sensible, consistent way to interpret this code, which is that > > the (define bar 100) establishes a global binding of bar, not a local one. > > No. Even if you could have a definition as a branch of IF (which you cannot), > the LAMBDA introduces a local scope, so assuming that DEFINE is a binding > form, the inner BAR would not be visible to the toplevel. The concept of local > scope has been part of Scheme forever and is in fact part of the original > raison d'etre of Scheme.
This was posed as a thought experiment in language design. For the purposes of the experiment, I took the sentence "Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.", that has appeared in all RnRS for n>=2, as being inviolate.
With that rule in place, the restrictions on internal defines would have to be discarded as they directly conflict with it. Therefore, in our gedanken Scheme, having define in a branch of if must be perfectly sensible even in local scope. While you are correct (of course) about lambda introducing local scope, that local scope does not contain a binding of BAR.
> (define bar[*] 1) > (define foo > (lamdba (x) > (if x >[D] (define bar[^] 100) >[M] (set! bar (+ bar 1))) > #t)))
I have annotated the program to identify the spots of interest. You left out the invocations of foo that illustrate why my suggested interpetation is necessary.
 (foo #f)  (display bar) => 2  (foo #t)  (display bar) => 100  (foo #f)  (display bar) => 101
- In , the second branch is taken, and the [M] statement is executed. Since the [D] statement has never executed, bar naturally scopes to the outer definition. Line  follows.
- In , the first branch is taken, and the define form is executed. Does this define extend the local scope with the variable bar, or does it assign a new value to bar[*]? We will leave this for the moment.
- In , the second branch is taken. Consistency requires that (a) bar refers to the same binding as in , and (b) that it refers to the same binding as bar[^] from .
If  causes a new binding to be added to the local scope, requirement (b) is in trouble if (a) is followed.
The most obvious counter to this line of argument is that the defines are meta-operations (extending environments at compile time), so bar would have a local binding in all instances. On the other hand, if the "(if x" became "(if #f", we could not immediately discard the branch for truth without checking for definitions. This seems unpleasant to me, but it might be palatable for others.[*]
While I have been writing this out, the following post on comp.lang.scheme occurred: [] , showing that this notion of define has some pragmatic appeal.
[*] R6RS question: What is the result of
(define foo #f) (define bar (lambda () (if #f (let-syntax ((fuzz (begin (set! foo 1) (syntax-rules x () ((_ y ...) 1))))) <whatever>) (let-syntax ((fuzz (lambda (x) (syntax-case x () ((keyword y ...) (datum->syntax #'keyword foo)))))) fuzz))))
(bar) => ???
I don't know the answer, but let's assume you take whatever steps are necessary in your implementation of choice so that foo is defined in the same phase as the expansion of the definition of bar.
> > the top-level definitions (in the presence of syntax-case, anyway) are > > effectively evaluated in phases above the expansion of all subsequent > > expressions, with bindings shared in downward phases. > > This is only true in a REPL setting. To extend this to a compiler-only > Scheme implementation is possible, but would require the compiler to evaluate > each definition before continuing to compile subsequent expressions. > Imagine the following program: > > (define pi-to-four-billion-digits > (calculation-taking-40-days)) > > (define-syntax format-result > (lambda (e) (syntax-case e -----))) > > (format-result pi-to-four-billion-digits) > > Adopting your suggestion, /compiling/ this program will take 40 days. And then > you have not even run it.
It would only take 40 days if pi-to-four-billion-digits was required at expand-time in one of the subsequent expressions. In which case it should take 40 days, or compilation should be deferred to run-time. I trust the author understands the ramifications of his/her code, and intended it to be so.
> > It is a shame > > R6RS will not explicitly state what programs will have a definite meaning, > > and explicitly leave the rest undefined. > > Exactly! > > > (define bar 10) > > (define-syntax foo > > (lambda (x) > > (define bar 'a) > > (syntax-case x () > > ((_) #'bar) > > ((_ y) #'(set! bar y))))))) > > Again, the lambda introduces a local scope, so the inner definition of BAR > cannot modify the outer BAR (since forever in Scheme).
First: Your response assumes "define" should have something to do with the scopes introduced by lambdas. "define" as it exists at top-level is anathema to lambda-introduced scopes. Just for specificity, "forever"=R2RS.
Second: Your response appears to miss the point of the example. I am probably to blame for that because I used internal-define instead of let-binding the inner BAR. The point is that the macro should expand to code referencing and modifying the expand-time binding of BAR, not the global binding at all. I claim that is the lexically apparent binding for the person reading the code. Indeed, if you review the original definition of LISP, the reason for dynamic scoping is the mechanism for abstraction: the lambda expression is simply quoted and then evaluated in the extant environment when invoked. The separate binding model is a much more complex implementation of the same type of phenomenon.