ERR5RS:Syntax and semantics of miscellaneous constructs
From SchemePunks
[edit] Summary
This section attempts to sort the potential R5RS->ERR5RS changes by controversiality. This is just meant to be suggestive, and any given item may move to a different heading.
[edit] Uncontroversial changes (should adopt)
- Make
Let(rec)-syntaxa splicing form. - Allow sequences of the form
(begin <definition> ... <expression0> <expression> ...)in bodies. - Adopt
syntax-rulespatterns already in widespread use such as tail patterns of the format(<pattern> ... <pattern> <ellipses> <pattern> ...). - Adopt ellipsis quotation, e.g.,
(... ...). - Adopt the
syntax-ruleswild-card_pattern. - Require macro-generated top-level definitions to be "hidden".
- Allow the continuations of non-final expressions in sequences to accept an arbitrary number of values.
- Adopt the R6RS extensions to
quasiquotesyntax. These fix a well-known shortcoming of the R5RS version. But should one adopt the somewhat complex R6RS rule for which parts of result is new?
[edit] Bug fixes (should adopt)
- Adopt Will Clinger's specification for
call/ccsafe higher order procedures such asmap. - Remove
delayandforcefrom base library. They are not safe for space. See SRFI-45.
[edit] Somewhat controversial changes (should adopt)
- Provide
letrec*in base language. - Keep restriction of definitions preceding expressions in library bodies.
- Macro literals such as else, unquote, ... , etc., should refer to bindings so that they can be exported, imported, renamed, excluded, and so on.
- Providing
environmentfor constructing environments forevalfrom libraries. - Making clear that multiple returns to the continuation of a toplevel or library definition right hand side or expression have unspecified behaviour.
- Decide on what, if anything,
loadmeans when called from within a library. - Disallowing definitions in
evalunless in the interaction environment.
[edit] Somewhat controversial changes (should not adopt)
-
Letrec*semantics for internal definitions. - The
call/ccabbreviation. - Providing
let-valuesandlet*-valuesin the base language.
[edit] Very controversial changes (should not adopt)
- Disallowing multiple returns to the continuation of a letrec, letrec* or internal define right hand side.
- Identifier macros.
- Adopting R6RS expansion process for top-level bodies, which allows forward references to macros in earlier variable definition right hand sides.
[edit] Undecided
- Internal
define-syntax. - Adopting R6RS expansion process for in internal bodies. This is mainly of importance if internal
define-syntaxis adopted, allowing forward definitions to macros in earlier variable defintion right hand sides. - Adopting R6RS expansion process for library bodies, which allows forward references to macros in earlier variable definition right hand sides.
- If
forceanddelayare kept in the base library or an R5RS compatibility library, make them work with multiple values (if only to provide an example of how to do this).
[edit] Discussion
The following changes and additions were adopted in R6RS. ERR5RS should decide how to deal with them.
[edit] Toplevel programs
R6RS defines a <top-level program> construct that approximates the concept of an R5RS program in contexts where the R5RS program is standalone, for example when it is compiled as a unit. In fact, most such R5RS programs can be mapped into valid R6RS <top-level program>s with minor changes. The main differences are:
- R6RS top-level programs must begin with an import clause, in the absence of which no bindings are available.
- R6RS top-level programs do not allow redefinition or mutation of primitives (or any imports). Variable bindings local to the program can be mutated but not redefined.
ERR5RS should probably decide if and how ERR5RS programs will relate to R6RS <top-level program>s. For example, can ERR5RS standalone programs contain import clauses for static import of bindings, and should the restrictions on redefinition be adopted or not.
- One possible proposal that subsumes R6RS top level programs under an extended syntax for R5RS programs is described at the page ERR5RS:Libraries. (AvT)
- I think that ERR5RS should drop the requirement that programs must begin with an import clause. It breaks R5RS compatibility and makes one-liners a pain. One-liners are useful for scripting:
echo "(+ 5 7)" | scheme
- Of course when dropping the requirement of an import clause, ERR5RS would have to provide some reasonable default bindings. R5RS maybe? --Nmh 23:53, 14 September 2007 (PDT)
[edit] Library bodies
R6RS library bodies have a similar semantics to the R5RS top-level as long as we only consider high-level macros and forget about redefinitions. However , in R6RS libraries, expressions cannot precede definitions. Should one relax this for ERR5RS libraries or keep it for R6RS compatibility?
[edit] Let-syntax as a splicing form
R5RS requires let-syntax to introduce a new scope (see 5.2.2), whereas R6RS requires let-syntax to be a splicing form. ERR5RS should probably make let-syntax and letrec-syntax into splicing forms, because this change was well-motivated and generated no controversy when Kent Dybvig proposed it at the Scheme workshop in Baltimore some years ago [citation needed]. Furthermore there are unlikely to be many (if any) existing R5RS programs that would be affected by this change.
[edit] Internal definition semantics
Does ERR5RS wish to adopt letrec* semantics for bodies? Currently, a correct R5RS body will be correct in R6RS unless the body uses call/cc to expose the implicit mutation, but the opposite is not true.
Here is one issue of keeping R5RS semantics for internal definitions: It is a little easier to mess up, for example define-record-type and similar macros, which expand into a sequence of definitions of which later ones depend on earlier ones. Instead of expanding (define-record-type name constructor ----) to
(define name (make-record-type -------)) (define constructor (record-constructor name)) ...
one has to remember to eta-transform to
(define (constructor field ...) ((record-constructor name) field ...))
and hope that the implementation suitably optimizes this.
[edit] Letrec*
Should ERR5RS include letrec*?
[edit] Restrictions on multiple returns
In R6RS a correct program may not return more than once to the continuation of a letrec, letrec* or define right hand side. Should ERR5RS adopt this restriction?
- Kumoyuki 06:48, 14 September 2007 (PDT) - NO. Is returning multiple times to the RHS of SET! also going to be restricted then? This restriction in R6RS seems fairly insane. I must be missing something, please explain :)
- OK, I liked the rhyme. I can understand how multiple returns to a given RHS inside a letrec might give rise to portability problems, but that is no worse than undefined argument evaluation order. But the ordering is well-defined for define and letrec*. Seriously now, what am I missing?
- Some of the arguments for disallowing multiple returns: They allow one to expose
set!in a program that does not useset!, making it more difficult to detect mutation and presumably inhibiting certain optimizations. It makes it impossible to provide a purely functional subset of Scheme as a library unless one excludescall/ccor redefinesdefineandletrec. It makesletrecless abstract by allowing more of its internals to be exposed. It also goes at least against the spirit of the intent of internaldefine, which is supposed to be a binding construct, not a mutation operator (also, remember that R5RS internal definitions do not have an order). In addition, I think multiple returns to toplevel definitions bring up a host of further problems - I have no idea what their meaning should be. There is also a library immutability loophole discussed at http://lists.r6rs.org/pipermail/r6rs-discuss/2007-March/001853.html. By the way, implementations should definitely not be required to detect this, an option is to simply state that it is undefined.
- Some of the arguments for disallowing multiple returns: They allow one to expose
- An argument for allowing multiple returns:
Call/ccis often used to build abstractions that are precisely designed to allow the programmer to decouple the multiple-return aspect of the semantics and reuse single-return code. In such cases, this restriction may make it very hard to write portable programs, since it is often requires a nonlocal analysis whether a given RHS returns multiply. Consider the following two expressions. Only the second returns multiply. (AvT)
- An argument for allowing multiple returns:
(let ()
(define x (amb 0 1))
x)
(amb-collect
(let ()
(define x (amb 0 1))
x))
[edit] Call/cc safety of higher-order procedures
R6RS higher-order procedures such as map are required to be call/cc safe. Should ERR5RS adopt this?
- Yes! In my opinion, the R5RS already requires this, but the reasoning required to derive that conclusion from the R5RS itself is either incorrect or is too subtle for all implementors to have made the deduction. Requiring call/cc safety for the standard higher-order procedures, in my opinion, would be a clarification rather than a change. (Wclinger 05:06, 14 September 2007 (PDT))
- Kumoyuki 06:56, 14 September 2007 (PDT) - Not to look too much like a thicko here, but what is 'call/cc safety'?
- The classic gotcha is a definition of map that accumulates its result as a list in reverse order, destructively reverses that list, and then returns the destructively updated list. If the first argument to map captures one of its continuations and arranges for that continuation to be invoked after the original return from map, then the second result may not even have the same number of elements as in the argument lists (see example below). In my opinion, this clearly violates the R5RS specification of map, which says it "applies proc element-wise to the elements of the lists and returns a list of the results, in order"; nonetheless this has been a common bug in implementations of the R5RS, so ERR5RS should clarify the point. (Wclinger 07:21, 14 September 2007 (PDT))
- To say this is implied, you must read "returns a list of the results, in order" as "every invocation of the continuation of the call to map must be passed a list of the results, in order." This is a pretty strong notion of "returns", which the evidence suggests is not the most common. Nevertheless, defining "returns" uniformly in this manner would give us uniform call/cc safety without specifying details about what will or will not be mutated. - OLW 18:01, 14 September 2007 (PDT)
- There is also the problem that constructing the second result of
mapmay mutate, and therefore invalidate, the result returned the first time. This is the first gotcha I ever ran into when I usedcall/ccandmaptogether. It does not seem that this can be inferred to be wrong from the R5RS language. To fix this, R6RS requires the second return not to mutate the result of the first. (AvT)
- There is also the problem that constructing the second result of
- You're right that it cannot be inferred to be wrong from the R5RS language. That's because it isn't wrong. I would say even Will's reading of "returns" is a little excessive, but it's still acceptable. I don't think we should encourage people to write code that relies on multiple returns from system functions implying unshared values. In particular, your mandate would be practically impossible to achieve without always restarting map. Even if you construct a list fresh every time, the user could mutate the last cons cell, and then return into the call. How is map supposed to deal with that? I suppose if you require map returns immutable cells, but I don't think I could go along with that. - OLW 11:09, 15 September 2007 (PDT)
- The problem we are referring to is not about the user mutating the cons cells. The problem is when the system does it unasked. If the system code for the
mapprocedure is allowed to mutate the prior results that have been returned to the user already, it is clear that it becomes impossible to reason about the ouput of the program. It is elementary to fix this - the standard simple functional implementation of map does the right thing already. I have added a couple more examples, from experience of actual implementations, below of howmapshould not behave. In neither case did the user mutate any cons cells, yetmapnevertheless corrupted the results. (AvT)
- The problem we are referring to is not about the user mutating the cons cells. The problem is when the system does it unasked. If the system code for the
- I think the proper fix is for ERR5RS to require
map(and the rest of that gang) to defer allocation of mutable result structure until all of the calls have returned. We could go further by requiringmapetc to copy mutable argument structure before any of the calls have been performed, but I don't think we need to go that far. (Wclinger 11:35, 18 September 2007 (PDT))
- I think the proper fix is for ERR5RS to require
- Do you have some case in mind that is not caught by the R6RS language? I suspect this is stronger than R6RS, but I am not sure. I am also not sure what you mean to happen if, for example, I want to access a result of a previous return before backtracking to deliver the next result (is it allocated?), and who is responsible for resolving the undecidable question of when the last return has occurred (I assume the programmer). This kind of thing kind of boggles my mind a little. (AvT)
- Yes, what I proposed is stronger than the R6RS restriction; under my proposal,
eq?comparisons would reveal that the results of multiple returns never share structure, but the R5.97RS draft allows structure-sharing so long asmapitself never mutates the returned result. Whether all of the calls have returned is easily decided, however, and is decided properly by all of the natural ways to writemapetc. You may have thought I had said the allocation must be deferred until all returns from all of the calls have occurred, which is of course undecidable. (Wclinger 15:52, 18 September 2007 (PDT))
- Yes, what I proposed is stronger than the R6RS restriction; under my proposal,
- Ah, now it makes sense. Am I right in supposing that this would require copying the result before
mapreturns, at least in programs that useset-car!orset-cdr!? Have you tried any quantification of the efficiency impact? (I know there were some experiments on the R6RS list, but I don't recall if this particular semantics was one of them.) (AvT)
- Ah, now it makes sense. Am I right in supposing that this would require copying the result before
- What I proposed does not imply copying the result. As for performance, my experiments with
vector-mapshowed that a bummed call/cc-safevector-mapis about 35% faster than the naive unsafevector-map, which is about the same as the advantage of naive unsafe over naive safevector-map. Running the benchmarks formap, I see similar results (substituting 25% for 35%). These differences are tiny compared to differences between the performance of different implementations of Scheme. As for myself, I would rather use a fast but safe implementation than a slow but buggy one. Although that philosophy represents a radical change from traditional attitudes, I recommend it for ERR5RS. (Wclinger 08:26, 21 September 2007 (PDT))
- What I proposed does not imply copying the result. As for performance, my experiments with
- I agree. The built-ins should work correctly with respect to the full semantics, not just with respect to a functional sub-language. (AvT)
- Just for clarity, the
safe-mapbelow would have the property you suggest, while the version without the temporary variable would not (due to unspecified order of evaluation). - OLW 01:36, 22 September 2007 (PDT)
- Just for clarity, the
- Why is it necessary for the system
mapto obey these restrictions? Scheme is not a functional language. The system map should be fast for the implementation's storage management strategy in the typical uses of such primitives - otherwise there would be no need to require it from the implementation. A "call/cc-safe" map can be provided by a library for those who wish this level of assurance. - OLW 16:51, 18 September 2007 (PDT)
- Why is it necessary for the system
; In Will's opinion, the R5RS map is not allowed to behave like this one:
(define (bad-map f x)
(do ((x x (cdr x))
(y '() (cons (f (car x)) y)))
((null? x) (reverse! y))))
(define ugly-k #f)
(define (ugly n)
(call-with-current-continuation
(lambda (k)
(if (= n 3)
(set! ugly-k k))
(+ n 1))))
(bad-map ugly '(1 2 3 4 5 6)) ; => (2 3 4 5 6 7)
(ugly-k 99) ; => (7 6 5 4 3 99 5 6 7)
; In Andre's opinion, the ERR5RS map should not be allowed to behave like this either:
(define (f x)
(amb 0 1))
(amb-collect (map f '(0 0))) ; => ((0 1) (0 1) (1 1) (1 1)) - wrong!
; Correct result should be ((0 0) (0 1) (1 0) (1 1))
; Another example, not using AMB:
(let ((resume #f)
(results '()))
(set! results
(cons (map (lambda (x)
(call/cc (lambda (k)
(unless resume (set! resume k))
0)))
'(#f #f))
results ))
(display results)(newline)
(if resume
(let ((resume* resume))
(set! resume #f)
(resume* 1))))
=> displays
((0 0))
((1 0) (0 0))
((1 1) (1 1) (0 0)) ; Wrong! - should be ((1 1) (1 0) (0 0)) !
(define safe-map
(lambda (f ls)
(let recurse ((ls ls))
(if (null? ls)
'()
(let ((tmp (f (car ls))))
(cons tmp (recurse (cdr ls))))))))
[edit] Internal define-syntax
R6RS allows internal define-syntax. Does ERR5RS want to adopt this?
Again, not adopting it may preclude internal uses of macros like define-record-type, which might expand into a sequence of both definitions and syntax definitions.
- Part of the solution is to design
define-record-typeso it can expand into a sequence of value definitions, with no syntax definitions. (Wclinger 11:37, 18 September 2007 (PDT))
- This will work for
define-record-type, but won't work for conceivable extensions that also emit macros, for example a macro to construct instances by field name instead of by positional constructor. Another example for which this won't work isdefine-enumerationin R6RS. That may be fine, but it may be worth considering whether internaldefine-syntaxis really that expensive to add. (AvT)
- This will work for
- I am not opposed to adding internal
define-syntax, which I regard as an unfortunate idea whose time may have come. I was advocating a design rule whose purpose is to keep Scheme centered upon its semantic foundation, the lambda calculus, by discouraging the proliferation of ad hoc syntactic extensions whose semantics must be explained separately. The rule is: Never design a macro whose semantics can't be described by a moderately straightforward expansion into value definitions. (Wclinger 16:09, 18 September 2007 (PDT))
- I am not opposed to adding internal
- I think an alternative solution would be to define
let-record-typethat can be used in expression contexts (cf.let-include). I'm not saying this is pretty, though. Alan Watson
[edit] Internal begin
R5RS does not allow sequences of the form (begin <definition> ... <expression0> <expression> ...) in bodies, whereas R6RS allows it.
[edit] Macro extensions
R6RS introduces a form identifier-syntax for identifier macros having special behaviour inside set!.
R6RS allows identifier macros in syntax-rules. Note that this is orthogonal to the previous paragraph.
- Identifier macros break one of the invariants on which we have long relied when reading Scheme code: constants and variable references do not have side effects. When that objection to identifier macros has been raised, the usual response is that we should continue to rely on that invariant when reading code, while relying on the authors of identifier macros to preserve that invariant for us. In my opinion, however, the advocates of identifier macros have not yet satisfied the obligation that should accompany any newly proposed feature: to show that the benefits of the feature substantially outweigh its disadvantages. Therefore ERR5RS should not require or recommend identifier macros. (Wclinger 05:14, 14 September 2007 (PDT))
- Kumoyuki 07:00, 14 September 2007 (PDT) - seconded - The only thing I like about identifier macros is the ability to have true compile-time constants. The rest of the proposal gives me the willies.'
R6RS extends the syntax-rules pattern language with:
- New ellipsis patterns already in widespread use such as tail patterns of the format
(<pattern> ... <pattern> <ellipses> <pattern> ...). - Ellipsis quotation, e.g.,
(... ...). - The wild-card
_.
I have at times found tail patterns and ellipsis quotation to be quite useful. The adoption of wild-cards in R6RS means that some valid R5RS macros will no longer work in R6RS, and vice versa.
[edit] Environments
R6RS contains this procedure for packaging a bunch of libraries for use with eval. Since this is pretty much necessary of one wants to use eval in a predictable way from inside libraries, and since this comes almost free with an implementation of libraries, should ERR5RS support it?
[edit] Eval
R6RS prohibits eval from defining or mutating bindings in its environment. In R5RS, there is one case where allowing eval to define new bindings is quite useful, namely when the environment is (interaction-environment), and many R5RS implementations allows this. So one probably will not want to adopt the R6RS restriction for eval.
[edit] Bindings for literals
In R6RS, macro literals such as else, unquote, ... , etc., refer to bindings that can be exported, imported, renamed, excluded, and so on. R5RS allows these to be unbound. If ERR5RS libraries are to be portable to R6RS implementations, ERR5RS must require these literals to refer to bindings.
- Kumoyuki 07:02, 14 September 2007 (PDT) - Thicko hat again. Why? Is this a side effect of identifier macros?
- No. It is just so that you can, for example, rename them to avoid name conflicts or to translate them to a different language, or exclude their use in a library. For example, some R5RS macros using the identifier
_will not work in R6RS, because in R6RS_is a wild-card. Still, the R5RS macros will work in an R6RS library that excludes_from its imports. Since only bindings are imported and exported, this means_must refer to a binding. Normally,_would be bound to a macro transformer that just raises and error message if_is used outside asyntax-rulespattern.(AvT)
- No. It is just so that you can, for example, rename them to avoid name conflicts or to translate them to a different language, or exclude their use in a library. For example, some R5RS macros using the identifier
[edit] Hidden toplevel definitions
R5RS does not fully specify whether top-level definitions are binding forms. This affects whether macro-generated top-level definitions are "hidden" or not. This is a well-known deficiency of R5RS, which ERR5RS should fix.
- Kumoyuki 07:12, 14 September 2007 (PDT) - Yes. But I have to admit that I thought it is pretty much required since without it the creation of top-level binding macros is pretty hard.
[edit] Non-final continuations
In contrast to R5RS, R6RS specifies that the continuations of non-final expressions in sequences accept an arbitrary number of values.
[edit] Call/cc abbreviation
Does ERR5RS provide the call/cc abbreviation as standard, like R6RS?

