ERR5RS:Syntax and semantics of miscellaneous constructs

From SchemePunks

Jump to: navigation, search

Contents

[edit] Summary

This section attempts to sort the potential R5RS->ERR5RS changes by controversiality. This is just meant to be suggestive, and any given item may move to a different heading.

[edit] Uncontroversial changes (should adopt)

  • Make Let(rec)-syntax a splicing form.
  • Allow sequences of the form (begin <definition> ... <expression0> <expression> ...) in bodies.
  • Adopt syntax-rules patterns already in widespread use such as tail patterns of the format (<pattern> ... <pattern> <ellipses> <pattern> ...).
  • Adopt ellipsis quotation, e.g., (... ...).
  • Adopt the syntax-rules wild-card _ pattern.
  • Require macro-generated top-level definitions to be "hidden".
  • Allow the continuations of non-final expressions in sequences to accept an arbitrary number of values.
  • Adopt the R6RS extensions to quasiquote syntax. These fix a well-known shortcoming of the R5RS version. But should one adopt the somewhat complex R6RS rule for which parts of result is new?

[edit] Bug fixes (should adopt)

  • Adopt Will Clinger's specification for call/cc safe higher order procedures such as map.
  • Remove delay and force from base library. They are not safe for space. See SRFI-45.

[edit] Somewhat controversial changes (should adopt)

  • Provide letrec* in base language.
  • Keep restriction of definitions preceding expressions in library bodies.
  • Macro literals such as else, unquote, ... , etc., should refer to bindings so that they can be exported, imported, renamed, excluded, and so on.
  • Providing environment for constructing environments for eval from libraries.
  • Making clear that multiple returns to the continuation of a toplevel or library definition right hand side or expression have unspecified behaviour.
  • Decide on what, if anything, load means when called from within a library.
  • Disallowing definitions in eval unless in the interaction environment.

[edit] Somewhat controversial changes (should not adopt)

  • Letrec* semantics for internal definitions.
  • The call/cc abbreviation.
  • Providing let-values and let*-values in the base language.

[edit] Very controversial changes (should not adopt)

  • Disallowing multiple returns to the continuation of a letrec, letrec* or internal define right hand side.
  • Identifier macros.
  • Adopting R6RS expansion process for top-level bodies, which allows forward references to macros in earlier variable definition right hand sides.

[edit] Undecided

  • Internal define-syntax.
  • Adopting R6RS expansion process for in internal bodies. This is mainly of importance if internal define-syntax is adopted, allowing forward definitions to macros in earlier variable defintion right hand sides.
  • Adopting R6RS expansion process for library bodies, which allows forward references to macros in earlier variable definition right hand sides.
  • If force and delay are kept in the base library or an R5RS compatibility library, make them work with multiple values (if only to provide an example of how to do this).

[edit] Discussion

The following changes and additions were adopted in R6RS. ERR5RS should decide how to deal with them.

[edit] Toplevel programs

R6RS defines a <top-level program> construct that approximates the concept of an R5RS program in contexts where the R5RS program is standalone, for example when it is compiled as a unit. In fact, most such R5RS programs can be mapped into valid R6RS <top-level program>s with minor changes. The main differences are:

  • R6RS top-level programs must begin with an import clause, in the absence of which no bindings are available.
  • R6RS top-level programs do not allow redefinition or mutation of primitives (or any imports). Variable bindings local to the program can be mutated but not redefined.

ERR5RS should probably decide if and how ERR5RS programs will relate to R6RS <top-level program>s. For example, can ERR5RS standalone programs contain import clauses for static import of bindings, and should the restrictions on redefinition be adopted or not.

One possible proposal that subsumes R6RS top level programs under an extended syntax for R5RS programs is described at the page ERR5RS:Libraries‎. (AvT)
I think that ERR5RS should drop the requirement that programs must begin with an import clause. It breaks R5RS compatibility and makes one-liners a pain. One-liners are useful for scripting:
echo "(+ 5 7)" | scheme
Of course when dropping the requirement of an import clause, ERR5RS would have to provide some reasonable default bindings. R5RS maybe? --Nmh 23:53, 14 September 2007 (PDT)

[edit] Library bodies

R6RS library bodies have a similar semantics to the R5RS top-level as long as we only consider high-level macros and forget about redefinitions. However , in R6RS libraries, expressions cannot precede definitions. Should one relax this for ERR5RS libraries or keep it for R6RS compatibility?

[edit] Let-syntax as a splicing form

R5RS requires let-syntax to introduce a new scope (see 5.2.2), whereas R6RS requires let-syntax to be a splicing form. ERR5RS should probably make let-syntax and letrec-syntax into splicing forms, because this change was well-motivated and generated no controversy when Kent Dybvig proposed it at the Scheme workshop in Baltimore some years ago [citation needed]. Furthermore there are unlikely to be many (if any) existing R5RS programs that would be affected by this change.

[edit] Internal definition semantics

Does ERR5RS wish to adopt letrec* semantics for bodies? Currently, a correct R5RS body will be correct in R6RS unless the body uses call/cc to expose the implicit mutation, but the opposite is not true.

Here is one issue of keeping R5RS semantics for internal definitions: It is a little easier to mess up, for example define-record-type and similar macros, which expand into a sequence of definitions of which later ones depend on earlier ones. Instead of expanding (define-record-type name constructor ----) to

 (define name (make-record-type -------))
 (define constructor (record-constructor name))
 ...

one has to remember to eta-transform to

 (define (constructor field ...) ((record-constructor name) field ...)) 

and hope that the implementation suitably optimizes this.

[edit] Letrec*

Should ERR5RS include letrec*?

[edit] Restrictions on multiple returns

In R6RS a correct program may not return more than once to the continuation of a letrec, letrec* or define right hand side. Should ERR5RS adopt this restriction?

Kumoyuki 06:48, 14 September 2007 (PDT) - NO. Is returning multiple times to the RHS of SET! also going to be restricted then? This restriction in R6RS seems fairly insane. I must be missing something, please explain :)
OK, I liked the rhyme. I can understand how multiple returns to a given RHS inside a letrec might give rise to portability problems, but that is no worse than undefined argument evaluation order. But the ordering is well-defined for define and letrec*. Seriously now, what am I missing?
Some of the arguments for disallowing multiple returns: They allow one to expose set! in a program that does not use set!, making it more difficult to detect mutation and presumably inhibiting certain optimizations. It makes it impossible to provide a purely functional subset of Scheme as a library unless one excludes call/cc or redefines define and letrec. It makes letrec less abstract by allowing more of its internals to be exposed. It also goes at least against the spirit of the intent of internal define, which is supposed to be a binding construct, not a mutation operator (also, remember that R5RS internal definitions do not have an order). In addition, I think multiple returns to toplevel definitions bring up a host of further problems - I have no idea what their meaning should be. There is also a library immutability loophole discussed at http://lists.r6rs.org/pipermail/r6rs-discuss/2007-March/001853.html. By the way, implementations should definitely not be required to detect this, an option is to simply state that it is undefined.
An argument for allowing multiple returns: Call/cc is often used to build abstractions that are precisely designed to allow the programmer to decouple the multiple-return aspect of the semantics and reuse single-return code. In such cases, this restriction may make it very hard to write portable programs, since it is often requires a nonlocal analysis whether a given RHS returns multiply. Consider the following two expressions. Only the second returns multiply. (AvT)
(let ()
  (define x (amb 0 1))
  x)

(amb-collect
  (let ()
    (define x (amb 0 1))
    x))

[edit] Call/cc safety of higher-order procedures

R6RS higher-order procedures such as map are required to be call/cc safe. Should ERR5RS adopt this?

Yes! In my opinion, the R5RS already requires this, but the reasoning required to derive that conclusion from the R5RS itself is either incorrect or is too subtle for all implementors to have made the deduction. Requiring call/cc safety for the standard higher-order procedures, in my opinion, would be a clarification rather than a change. (Wclinger 05:06, 14 September 2007 (PDT))
Kumoyuki 06:56, 14 September 2007 (PDT) - Not to look too much like a thicko here, but what is 'call/cc safety'?
The classic gotcha is a definition of map that accumulates its result as a list in reverse order, destructively reverses that list, and then returns the destructively updated list. If the first argument to map captures one of its continuations and arranges for that continuation to be invoked after the original return from map, then the second result may not even have the same number of elements as in the argument lists (see example below). In my opinion, this clearly violates the R5RS specification of map, which says it "applies proc element-wise to the elements of the lists and returns a list of the results, in order"; nonetheless this has been a common bug in implementations of the R5RS, so ERR5RS should clarify the point. (Wclinger 07:21, 14 September 2007 (PDT))
To say this is implied, you must read "returns a list of the results, in order" as "every invocation of the continuation of the call to map must be passed a list of the results, in order." This is a pretty strong notion of "returns", which the evidence suggests is not the most common. Nevertheless, defining "returns" uniformly in this manner would give us uniform call/cc safety without specifying details about what will or will not be mutated. - OLW 18:01, 14 September 2007 (PDT)
There is also the problem that constructing the second result of map may mutate, and therefore invalidate, the result returned the first time. This is the first gotcha I ever ran into when I used call/cc and map together. It does not seem that this can be inferred to be wrong from the R5RS language. To fix this, R6RS requires the second return not to mutate the result of the first. (AvT)
You're right that it cannot be inferred to be wrong from the R5RS language. That's because it isn't wrong. I would say even Will's reading of "returns" is a little excessive, but it's still acceptable. I don't think we should encourage people to write code that relies on multiple returns from system functions implying unshared values. In particular, your mandate would be practically impossible to achieve without always restarting map. Even if you construct a list fresh every time, the user could mutate the last cons cell, and then return into the call. How is map supposed to deal with that? I suppose if you require map returns immutable cells, but I don't think I could go along with that. - OLW 11:09, 15 September 2007 (PDT)
The problem we are referring to is not about the user mutating the cons cells. The problem is when the system does it unasked. If the system code for the map procedure is allowed to mutate the prior results that have been returned to the user already, it is clear that it becomes impossible to reason about the ouput of the program. It is elementary to fix this - the standard simple functional implementation of map does the right thing already. I have added a couple more examples, from experience of actual implementations, below of how map should not behave. In neither case did the user mutate any cons cells, yet map nevertheless corrupted the results. (AvT)
I think the proper fix is for ERR5RS to require map (and the rest of that gang) to defer allocation of mutable result structure until all of the calls have returned. We could go further by requiring map etc to copy mutable argument structure before any of the calls have been performed, but I don't think we need to go that far. (Wclinger 11:35, 18 September 2007 (PDT))
Do you have some case in mind that is not caught by the R6RS language? I suspect this is stronger than R6RS, but I am not sure. I am also not sure what you mean to happen if, for example, I want to access a result of a previous return before backtracking to deliver the next result (is it allocated?), and who is responsible for resolving the undecidable question of when the last return has occurred (I assume the programmer). This kind of thing kind of boggles my mind a little. (AvT)
Yes, what I proposed is stronger than the R6RS restriction; under my proposal, eq? comparisons would reveal that the results of multiple returns never share structure, but the R5.97RS draft allows structure-sharing so long as map itself never mutates the returned result. Whether all of the calls have returned is easily decided, however, and is decided properly by all of the natural ways to write map etc. You may have thought I had said the allocation must be deferred until all returns from all of the calls have occurred, which is of course undecidable. (Wclinger 15:52, 18 September 2007 (PDT))
Ah, now it makes sense. Am I right in supposing that this would require copying the result before map returns, at least in programs that use set-car! or set-cdr!? Have you tried any quantification of the efficiency impact? (I know there were some experiments on the R6RS list, but I don't recall if this particular semantics was one of them.) (AvT)
What I proposed does not imply copying the result. As for performance, my experiments with vector-map showed that a bummed call/cc-safe vector-map is about 35% faster than the naive unsafe vector-map, which is about the same as the advantage of naive unsafe over naive safe vector-map. Running the benchmarks for map, I see similar results (substituting 25% for 35%). These differences are tiny compared to differences between the performance of different implementations of Scheme. As for myself, I would rather use a fast but safe implementation than a slow but buggy one. Although that philosophy represents a radical change from traditional attitudes, I recommend it for ERR5RS. (Wclinger 08:26, 21 September 2007 (PDT))
I agree. The built-ins should work correctly with respect to the full semantics, not just with respect to a functional sub-language. (AvT)
Just for clarity, the safe-map below would have the property you suggest, while the version without the temporary variable would not (due to unspecified order of evaluation). - OLW 01:36, 22 September 2007 (PDT)
Why is it necessary for the system map to obey these restrictions? Scheme is not a functional language. The system map should be fast for the implementation's storage management strategy in the typical uses of such primitives - otherwise there would be no need to require it from the implementation. A "call/cc-safe" map can be provided by a library for those who wish this level of assurance. - OLW 16:51, 18 September 2007 (PDT)
; In Will's opinion, the R5RS map is not allowed to behave like this one:

(define (bad-map f x)
  (do ((x x (cdr x))
       (y '() (cons (f (car x)) y)))
      ((null? x) (reverse! y))))

(define ugly-k #f)

(define (ugly n)
  (call-with-current-continuation
   (lambda (k)
     (if (= n 3)
         (set! ugly-k k))
     (+ n 1))))

(bad-map ugly '(1 2 3 4 5 6))  ; => (2 3 4 5 6 7)

(ugly-k 99)                    ; => (7 6 5 4 3 99 5 6 7)

; In Andre's opinion, the ERR5RS map should not be allowed to behave like this either:

(define (f x) 
  (amb 0 1))

(amb-collect (map f '(0 0)))  ; => ((0 1) (0 1) (1 1) (1 1)) - wrong!
                              ;    Correct result should be ((0 0) (0 1) (1 0) (1 1))


; Another example, not using AMB:

(let ((resume #f)
       (results '()))
   (set! results
         (cons (map (lambda (x)
                      (call/cc (lambda (k)
                                 (unless resume (set! resume k))
                                 0)))
                    '(#f #f))
               results ))
   (display results)(newline)
   (if resume
       (let ((resume* resume))
         (set! resume #f)
         (resume* 1))))

 => displays
    ((0 0))
    ((1 0) (0 0))
    ((1 1) (1 1) (0 0))  ; Wrong! - should be ((1 1) (1 0) (0 0)) !
(define safe-map
  (lambda (f ls)
    (let recurse ((ls ls))
      (if (null? ls)
	  '()
	  (let ((tmp (f (car ls))))
	    (cons tmp (recurse (cdr ls))))))))

[edit] Internal define-syntax

R6RS allows internal define-syntax. Does ERR5RS want to adopt this?

Again, not adopting it may preclude internal uses of macros like define-record-type, which might expand into a sequence of both definitions and syntax definitions.

Part of the solution is to design define-record-type so it can expand into a sequence of value definitions, with no syntax definitions. (Wclinger 11:37, 18 September 2007 (PDT))
This will work for define-record-type, but won't work for conceivable extensions that also emit macros, for example a macro to construct instances by field name instead of by positional constructor. Another example for which this won't work is define-enumeration in R6RS. That may be fine, but it may be worth considering whether internal define-syntax is really that expensive to add. (AvT)
I am not opposed to adding internal define-syntax, which I regard as an unfortunate idea whose time may have come. I was advocating a design rule whose purpose is to keep Scheme centered upon its semantic foundation, the lambda calculus, by discouraging the proliferation of ad hoc syntactic extensions whose semantics must be explained separately. The rule is: Never design a macro whose semantics can't be described by a moderately straightforward expansion into value definitions. (Wclinger 16:09, 18 September 2007 (PDT))
I think an alternative solution would be to define let-record-type that can be used in expression contexts (cf. let-include). I'm not saying this is pretty, though. Alan Watson

[edit] Internal begin

R5RS does not allow sequences of the form (begin <definition> ... <expression0> <expression> ...) in bodies, whereas R6RS allows it.

[edit] Macro extensions

R6RS introduces a form identifier-syntax for identifier macros having special behaviour inside set!.

R6RS allows identifier macros in syntax-rules. Note that this is orthogonal to the previous paragraph.

Identifier macros break one of the invariants on which we have long relied when reading Scheme code: constants and variable references do not have side effects. When that objection to identifier macros has been raised, the usual response is that we should continue to rely on that invariant when reading code, while relying on the authors of identifier macros to preserve that invariant for us. In my opinion, however, the advocates of identifier macros have not yet satisfied the obligation that should accompany any newly proposed feature: to show that the benefits of the feature substantially outweigh its disadvantages. Therefore ERR5RS should not require or recommend identifier macros. (Wclinger 05:14, 14 September 2007 (PDT))
Kumoyuki 07:00, 14 September 2007 (PDT) - seconded - The only thing I like about identifier macros is the ability to have true compile-time constants. The rest of the proposal gives me the willies.'

R6RS extends the syntax-rules pattern language with:

  • New ellipsis patterns already in widespread use such as tail patterns of the format (<pattern> ... <pattern> <ellipses> <pattern> ...).
  • Ellipsis quotation, e.g., (... ...).
  • The wild-card _.

I have at times found tail patterns and ellipsis quotation to be quite useful. The adoption of wild-cards in R6RS means that some valid R5RS macros will no longer work in R6RS, and vice versa.

[edit] Environments

R6RS contains this procedure for packaging a bunch of libraries for use with eval. Since this is pretty much necessary of one wants to use eval in a predictable way from inside libraries, and since this comes almost free with an implementation of libraries, should ERR5RS support it?

[edit] Eval

R6RS prohibits eval from defining or mutating bindings in its environment. In R5RS, there is one case where allowing eval to define new bindings is quite useful, namely when the environment is (interaction-environment), and many R5RS implementations allows this. So one probably will not want to adopt the R6RS restriction for eval.

[edit] Bindings for literals

In R6RS, macro literals such as else, unquote, ... , etc., refer to bindings that can be exported, imported, renamed, excluded, and so on. R5RS allows these to be unbound. If ERR5RS libraries are to be portable to R6RS implementations, ERR5RS must require these literals to refer to bindings.

Kumoyuki 07:02, 14 September 2007 (PDT) - Thicko hat again. Why? Is this a side effect of identifier macros?
No. It is just so that you can, for example, rename them to avoid name conflicts or to translate them to a different language, or exclude their use in a library. For example, some R5RS macros using the identifier _ will not work in R6RS, because in R6RS _ is a wild-card. Still, the R5RS macros will work in an R6RS library that excludes _ from its imports. Since only bindings are imported and exported, this means _ must refer to a binding. Normally, _ would be bound to a macro transformer that just raises and error message if _ is used outside a syntax-rules pattern.(AvT)

[edit] Hidden toplevel definitions

R5RS does not fully specify whether top-level definitions are binding forms. This affects whether macro-generated top-level definitions are "hidden" or not. This is a well-known deficiency of R5RS, which ERR5RS should fix.

Kumoyuki 07:12, 14 September 2007 (PDT) - Yes. But I have to admit that I thought it is pretty much required since without it the creation of top-level binding macros is pretty hard.

[edit] Non-final continuations

In contrast to R5RS, R6RS specifies that the continuations of non-final expressions in sequences accept an arbitrary number of values.

[edit] Call/cc abbreviation

Does ERR5RS provide the call/cc abbreviation as standard, like R6RS?