To Scheme, or Not to Scheme: Scheming Schemers and Non-Scheming Schemers, or Keeping the Fun in Scheme

By October 23, 2009

Do you use the Scheme programming language? If so, do you program mainly in a serious mood to write applications, or in a crafty mood to have fun? In other words, do you consider yourself a non-Scheming Schemer, or a Scheming Schemer? I consider myself a Scheming Schemer: I program in Scheme mainly in a crafty mood just to have fun. To quote Alan Perlis from the dedication in SICP:

“I think that it’s extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don’t become missionaries. Don’t feel as if you’re Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don’t feel as if the key to successful computing is only in your hands. What’s in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.”

Alan J. Perlis (April 1, 1922-February 7, 1990)

Nevertheless, as many of you probably know, some recent developments in the evolution of the Scheme programming language have reduced the influence of the language. In particular, the R5RS vs. R6RS schism and the replacement of the 6.001 course at MIT, based on Scheme, with a 6.01 course, based on Python, are two events that have created much controversy.

Concerned over such events, recently, I posted a thread, entitled “Ideas for an SICP’?” on the comp.lang.scheme USENET newsgroup, asking for suggestions for an alternative to SICP with a similar content, as follows:

[W]ith Scheme replaced by Python at MIT, this special role of Scheme as the vehicle for teaching sophisticated students of computer science has been greatly diminished. Arguments against SICP as being too difficult for an introductory textbook notwithstanding, the presence and usage of that textbook contributed greatly to the significance of

Scheme as a tool in teaching introductory computer science.

It seems that SICP could use a replacement. What is needed is an alternative textbook to use Scheme in a role that cannot be fulfilled by such languages as Python, in order to foster creativity and originality in programming for future freethinking hackers. In addition, such an alternative textbook would need to be actively used by leading educational institutions of introductory computer science

in raising a new generation of future Scheme hackers.

Does anybody have any suggestions for a plan that could lead to the birth and growth of such an alternative leading textbook? Many programmers tend to be strongly influenced by the first textbook that they encounter in learning programming; whether that language is Scheme or Python could have great effect on the future influence of such languages. The SICP phenomenon has been done once; why not give

rise to a new SICP’ phenonemon?

There were several responses. In particular, one user, Ray, responded as follows:

[N]obody who’s grown up with the web, and who thinks of computers as being primarily communications devices, will believe that that makes a language anything other than a crippled toy if you can’t interface with the hardware capabilities of the machine, enabling you to do something as “simple” as writing a web browser in it, managing network connections, handling Graphical UI elements, and rendering

text and graphics on the screen.


Now, consider what Scheme’s got that Python doesn’t got. It comes down to syntactic abstraction and continuations. Courses based on SICP don’t use them, so MIT had nothing

to lose by going to Python.

Perhaps. But has Scheme really lost the essence of its appeal?

I disagree. Recently, I started reading a guide (albeit in Japanese, since I can read that language as well) by Shiro Kawai on the Gauche implementation of Scheme, and opened up a chapter on continuations. (For those of you who do not know, continuations are a control mechanism in Scheme which allows assignment of control flow to a variable, allowing a process to be “continued” (hence the name) from the point where the continuation was saved.) Since I had not yet fully understood continuations, I found the chapter extremely interesting, and could not stop reading. At one point, I encountered the following procedure which used a continuation, written in continuation-passing style (a.k.a. “CPS” style) (in which, rather than assigning the continuation directly to a variable, the continuation is explicitly passed as a parameter), to calculate the factorial function:

(define (fact/cps n cont) (if (= n 0) (cont 1) (fact/cps (- n 1) (lambda (a) (cont (* n a))))))

This procedure returns the same results as the following (much simpler) one (listed on the same page), which does not use a continuation:

(define (fact n) (if (= n 0) 1 (* n (fact (- n 1)))))

In particular, I started at the mysterious “(a)” variable in the following piece of code above, wondering what that variable represented:

(fact/cps (- n 1) (lambda (a) (cont (* n a))))

Suddenly, it dawned upon me: The “(a)” variable stored the parameter that was passed to the continuation (“(lambda (a) (cont (* n a)))”), which, in turn, captured the control state of the procedure at that point of execution, which was, in turn, passed back as a parameter to the enclosing procedure! In short, the continuation was a microcosm of the execution context of the procedure at that point in time, encapsulated in a lambda abstraction!

Here, “fact/cps” is the name of the procedure, which stands for “factorial in CPS (continuation-passing style) form.”

Suppose we call “fact/cps” in the most simple case:  a value of 0 for the first parameter, and a function that simply returns the parameter passed to it as a second parameter:

(fact/cps 0 (lambda (a) a))

Then, in “fact/cps”, “(= n 0)” is true in the if-statement, so “(cont 1)” is called, where “cont” is simply the second parameter of the enclosing “fact/cps” function, or “(lambda (a) a)” (the continuation), which returns the value of the parameter “a”, which is 1 in this case, so 1 is returned.

Let’s be a little bolder, and use a value of 1 for the first parameter:

(fact/cps 1 (lambda (a) a))

This time, “(= n 0)” is false in the if-statement, so the following recursive call is made (substituting 1 for n):

(fact/cps (- 1 1) (lambda (a) (cont (* 1 a))))

In the recursive call, (- 1 1) is substituted for the first parameter of “fact/cps”, and “(lambda (a) (cont (* 1 a))” is substituted for the second parameter of “fact/cps”.  This is the same as the following call (reducing the first parameter to a value):

(fact/cps 0 (lambda (a) (cont (* 1 a))))

However, since we had passed an identity function, “(lambda (a) a)” for “cont” in the enclosing function, this reduces to the following call (expanding “cont” to “(lambda (a) a)”):

(fact/cps 0 (lambda (a) ((lambda (a) a) (* 1 a))))

Here, the explicit continuation “(lambda (a) ((lambda (a) a) (* 1 a))))” first takes whatever is handed to it as the parameter “a”, and passes it to the inner function (lambda abstraction, actually, but we’ll dismiss that point here), so we reach “((lambda (a) a) (* 1 a))”.  But this is just the identity function on the inner “(* 1 a)”.  So this part just reduces to “‘(* 1 a)”, which is the same as just “a”, which is whatever value is passed to this continuation, so this continuation is just the identity function.

So when “fact/cps” is recursively called with 0 for the first parameter n and this identity value continuation “(lambda (a) ((lambda (a) a) (* 1 a))))” for the second parameter cont, we first reach “(= n 0)” as the condition in the if-statement, which is true.  This leads to evaluating the following:

(cont 1)

But “(cont 1)” is just this identity continuation called with 1, so it just returns the parameter 1.  So, “(fact/cps 1 (lambda (a) a))” is just the value passed to it:


Of course, we didn’t need to pass the identity function “(lambda (a) a)” as the second parameter to “fact/cps”.  We could have formatted the output, for example, by passing a formatting function, instead (to borrow the syntax of Gauche Scheme):

(lambda (a) (format #t "The factorial value is ~a." a))

Then we could have invoked “fact/cps” with that formatting function for the function to be passed as the continuation, as follows:

(fact/cps 3 (lambda (a) (format #t "The factorial value is ~a." a)))

This would have returned the following:

The factorial value is 6.#<undef>

Alternatively, we could have chosen to multiple whatever was returned by 2, just to screw up the function, as follows:

(fact/cps 3 (lambda (a) (* 2 a)))

This would have returned the following:


Hey, why not combine the two functions, and get Scheme to say something funny?

(fact/cps 3 (lambda (a) (format #t "I do solemnly swear that the factorial value is ~a." (* 2 a))))

Scheme then would have returned the following:

I do solemnly swear that the factorial value is 12.

Despite (or maybe because of?) all this exploratory monkey-business, in the “Aha!” moment described above, I felt an ecstasy of enlightenment that I do not often experience elsewhere (er, elsewhen, rather).

Such “Aha!” moments are crucial to appreciating the fun in computer science. They are commonly found whenever a deeper level of understanding is achieved by contemplating something which is not obvious at first. I have noticed that they are found more easily in Scheme than in some other programming languages, because of the flexibility that the language allows in even esoteric expressions.

In order to enjoy programming, one must appreciate the fun in programming, and it is difficult to appreciate this factor without experiencing an “Aha!” moment. The deeper the understanding, the more intense the exhilaration associated. Continuations and syntactic abstraction, in particular, are very abstruse (some may even say “arcane”) topics, especially when first encountered, and can require relatively deep understanding. Hence, by providing opportunities to learn such concepts, Scheme can provide an ideal opportunity to experience the fun of programming.  Thus, the continued need for Scheme.

Not all books that use Scheme adopt this approach.  An alternative approach is to structure the curriculum so that the learning becomes a linear process, rather than a series of leaps, so that all the parts fit together neatly like solving a jigsaw puzzle, rather than like climbing a mountain.

Not to be critical of this approach, but not everybody prefers it.  In one critique, José Antonio Ortega Ruiz, in his blog, “A Scheme bookshelf « programming musings,” contrasts one well-known textbook that uses this alternative approach, How to Design Programs (a.k.a. “HtDP”), with SICP, as follows:

The most cited alternative to SICP is How to Design Programs by Felleisen, Findler, Flatt and Krishnamurthi. Its authors have even published a rationale, The Structure and Interpretation of the Computer Science Curriculum, on why they think SICP is not well suited to teaching programming and how their book tries to fix the problems they’ve observed. I won’t try to argue against such eminent schemers, but, frankly, my feeling is that HtDP is a far, far cry from SICP. HtDP almost made me yawn, and there’s no magic to be seen.

Ortega-Ruiz is somewhat harsh in his critique of HtDP.  After all, according to the authors of the paper explaining the rationale behind the book, The Structure and Interpretation of the Computer Science Curriculum, HtDP was created in the first place to rectify two major perceived problems with SICP (page 9 of the paper):

… SICP doesn’t state how to program and how to manage the design of a program. It leaves these things implicit and implies that students can discover a discipline of design and programming on their own. The course presents the various uses and roles of programming ideas with a series of examples. Some exercises then ask students to modify this code basis, requiring students to read and study code; others ask them to solve similar problems, which means they have to study the construction and to change it to the best of their abilities. In short, SICP students learn by copying and modifying code, which is barely an improvement over typical

programming text books.

SICP’s second major problem concerns its selection of examples and exercises. All
of these use complex domain knowledge….

While these topics are interesting to students who use computing in electrical engineering and to those who already have significant experience of programming and computing, they assume too much understanding from students who haven’t understood programming yet and they assume too much domain knowledge from any beginning student who needs to acquire program design skills. On the average, beginners are not interested in mathematics and electrical engineering, and they do not have ready access to the domain knowledge necessary for solving the domain problems. As a result, SICP students must spend a considerable effort on the do- main knowledge and often end up confusing domain knowledge and program design knowledge. They may even come to the conclusion that programming is a shallow

activity and that what truly matters is an understanding of domain knowledge.

While these are all valid points, Ortega-Ruiz’s last clause, “there’s no magic to be seen,” actually describes the key conflict here.  What exactly is this “magic?”  To be experimentally borderline facetious, according to Arthur C. Clarke,

Any sufficiently advanced technology is indistinguishable from magic.

Arthur C. Clarke, “Profiles of The Future”, 1961 (Clarke’s third law)
English physicist & science fiction author (1917 – )

So maybe we’re actually referring to “any sufficiently advanced technology.”  What do we mean by “sufficiently advanced?”  I would suggest (to use a recursive definition), “sufficiently advanced to the point that deep understanding unachievable superficially is required to understand the material.”

Whether or not this “magic” is to be used in pedagogy actually relates to the fundamental design philosophy behind HtDP, as opposed to that behind SICP.  To quote the above-referenced paper explaining the rationale behind HtDP, “The Structure and Interpretation of the Computer Science Curriculum,” as follows (page 11):

The recipes also introduce a new distinction into program design: structural ver- sus generative recursion. The structural design recipes in the first half of the book match the structure of a function to the structure of a data definition. When the data definition happens to be self-referential, the function is recursive; when there is a group of definitions with mutual cross-references, there is a group of function definitions with mutual references among the functions. In contrast, generative re- cursion concerns the generation of new problem data in the middle of the problem

solving process and the re-use of the problem solving method.

Compare insort and kwik, two standard sort functions:

;; (listof X) -> (listof X) (define (insort l ) (cond [(empty? l ) empty] [else (place (first l ) (insort (rest l )))])) ;; (listof X) -> (listof X) (define (kwik l ) (cond [(empty? l ) empty] [else (append (kwik (larger (first l ) l )) (first l ) (kwik (smaller (first l ) l )))]))

The first function, insort , recurs on a structural portion of the given datum, namely, (rest l ). The second function, kwik, recurs on data that are generated by some other functions. To design a structurally recursive function is usually a straightforward process. To design a generative recursive function, however, almost always requires some ad hoc insight into the process. Often this insight is derived from some mathe- matical idea. In addition, while structurally recursive functions naturally terminate for all inputs, a generative recursive function may diverge. htdp therefore suggests that students add a discussion about termination to the definition of generative

recursive functions.

HtDP takes pains to remove the requirement for this “ad hoc insight” into the problem-solving process.  The authors of the book then make the following claim (same page):

Distinguishing the two forms of recursion and focusing on the structural case
makes our approach scalable to the object-oriented (OO) world.

That may be so, but that contrasts sharply with the spirit of the original quotation by Alan Perlis above:

Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house.

So we have two sharply contrasting approaches:  One to use Scheme for fun, and the other to use Scheme for scalability.  Again, this basically is a matter of taste.

In his above-mentioned post in the above-mentioned thread, Ray contrasted the advantages of Scheme vs. Python as follows:

[S]cheme still has no standard means of managing network connections or rendering anything on the screen.  Python has these things.

Now, consider what Scheme’s got that Python doesn’t got. It comes down to syntactic abstraction and continuations. Courses based on SICP don’t use them, so MIT had nothing to lose by going to Python.

SICP doesn’t use syntactic abstraction.  In the first edition, this was because Scheme didn’t have them yet. In the current edition …. well, here’s the footnote about macros from the current edition, page 373:

Practical Lisp systems provide a mechanism that allows users to add new derived expressions and specify their implementation as syntactic transformations without modifying the evaluator. Such a user-defined transformation is called a /macro/. Although it is easy to add an elementary mechanism for defining macros, the resulting language has subtle name-conflict problems. There has been much research on mechnaisms for macro definitions that do not cause these  difficulties. See, for example, Kohlbecker 1986, Clinger and Rees 1991, and Hanson 1991.

(Aside: “practical” lisp systems have them; the dialect covered in the book does not.  Students can and do draw

the obvious conclusion….)

Granted, specific implementations do have these functions.  But there is no single main implementation of Scheme that everybody uses, and the libraries that implement these functions are not necessarily portable across implementations.  Therefore, Scheme as a language (as opposed to a specific implementation) does not have these functions.

But is that necessarily bad?  I don’t think so.  I think that the whole point of Scheme, as a language, is, to quote Alan Perlis above, that we are “[not] responsible for the successful, error-free perfect use of these machines, [but] for stretching them, setting them off in new directions, and keeping fun in the house.”

To sum, when I first approached SICP, I found it too challenging to digest.  I had to quit reading it repeatedly, and then return to it later, and I still have read only a portion of the book.  But I became entranced with the magic of computer science as demonstrated by such creatures as tail recursion in Scheme in SICP.  And it was precisely this magic that kept me returning to computer science in general, and to Scheme in particular.

On the other hand, books such as HtDP are very comforting and reassuring.  While SICP sometimes makes me wonder why I am such an idiot, HtDP makes me feel as if I am no longer an idiot.  I no longer need to think for hours and hours during my sleep about how to overcome a particular problem.  Books such as HtDP make the material very straightforward.  However, by doing so, they also remove all the magic, and break the spell.

I feel that an intermediate approach is better.  The magic is necessary, but the sorcery in SICP can be too much at first.  However, the jigsaw-puzzle approach of HtDP seems too straightforward.  There is not enough exploration to maintain interest after a certain level of reader sophistication.  Paradoxically, although I can read HtDP much, much faster than SICP, I also fall asleep reading it just as much faster, and actually haven’t read so far in it.  A gentler approach than that of SICP, which still offers more exploration than HtDP, would be a better compromise.

Also, I feel that the greatest strength of Scheme lies in its flexibility for exploratory programming.  Scheme shares one quality that is also shared by such addictive games as Tetris:  It is relatively simple to learn, yet extremely difficult to master.  Writing simple procedures to calculate such functions as the factorial function or the Fibonacci sequence is deceptively simple at first.  But when the student ventures into such deeper areas as tail recursion, continuations, and syntactic abstractions, the procedures can become tantalizingly complex.

To conclude, shouldn’t Scheme really be a language for scheming programmers to figure out mainly how to have fun?  I like to be a Scheming Schemer, always scheming plots for stretching the lambda abstractions to set them off in new directions, mainly just to have fun.  Are you a Scheming Schemer?