As I listen to “Soup” by Can I’m reminded of the time when Alan Kay took some time to post on Hacker News addressing his thoughts and ideas about object-oriented programming. For raw context read the original thread, but in this post I’ll attempt to annotate some of what was posted. For the purpose of this post anything in italics and all footnotes are assumed to be my words, blockquotes are questions from other HN denizens, and everything else are quotes from Dr. Kay.
The 10,000ft view
I had several stages about “objects”. The first was the collision 50 years ago in my first weeks of (ARPA) grad school of my background in math, molecular biology, systems and programming, etc., with Sketchpad, Simula, and the proposed ARPAnet. This led to an observation that almost certainly wasn’t original — it was almost a tautology — that since you could divide up a computer into virtual computers intercommunicating ad infinitum you would (a) retain full power of expression, and (b) always be able to model anything that could be modeled, and (c) be able to scale cosmically beyond existing ways to divide up computers. I loved this. Time sharing “processes” were already manifestations of such virtual machines but they lacked pragmatic universality because of their overheads (so find ways to get rid of the overheads …)
Though you could model anything — including data structures — that was (to me) not even close to the point (it got you back into the soup). The big deal was encapsulation and messaging to provide loose couplings that would work under extreme scaling (in manners reminiscent of Biology and Ecology).
As we know, modern OO programming is all about using classes to define and model data structures. While the use of interface-first design has helped manage the complexities added, I wonder if the point is still missed with this practice.
A second stage was to mix in “the Lisp world” of Lisp itself, McCarthy’s ideas about robots and temporal logics, the AI work going on within ARPA (especially at MIT), and especially Carl Hewitt’s PLANNER language. One idea was that objects could be like servers and could be goal-oriented with PLANNER-type goals as the interface language.
A third stage were a series of Smalltalks at Parc that attempted to find a pragmatic balance between what was inevitably needed in the future and what could be done on the Alto at Parc (with 128K bytes of memory, half of which was used for the display!). This was done in partnership with Dan Ingalls and other talented folks in our group. The idealist in me gritted my teeth, but the practical results were good. A fourth stage (at Parc) was to deeply revisit the temporal logic and “world-line” ideas (more on this below).
A fascinating example of an early Smalltalk is the [ST-71 version](http://worrydream.com/EarlyHistoryOfSmalltalk/) that has a somewhat familiar implementation pattern matched with a logical matching system.
A fifth stage was to seriously think about scaling again, and to look at e.g. Gelernter’s Linda “coordination language” as an approach to do loose coupling via description matching in a general publish and describe manner. I still like this idea, and would like to see it advanced to the point where objects can actually “negotiate meaning” with each other.
McCarthy’s Temporal Logic: “Real Functions in Time”
There’s lots of context from the past that will help understanding the points of view presented here. I will refer to this and that in passing, and then try to provide a list of some of the references (I think of this as “basic CS knowledge” but much of it will likely be encountered for the first time here).
Most of my ways of thinking about all this ultimately trace their paths back to John McCarthy in the late 50s. John was an excellent mathematician1 and logician. He wanted to be able to do consistent reasoning himself — and he wanted his programs and robots to be able to do the same. Robots were a key, because he wanted a robot to be in Philadelphia at one time and in New York at another. In an ordinary logic this is a problem. But John fixed it by adding an extra parameter to all “facts” that represented the “time frame” when a fact was true. This created a simple temporal logic, with a visualization of “collections of facts” as stacked “layers” of world-lines.
This can easily be generalized to world-lines of “variables”, “data”, “objects” etc. From the individual point of view “values” are replaced by “histories” of values, and from the system point of view the whole system is represented by its stable state at each time the system is between computations. Simula later used a weaker, but useful version of this.
I should also mention Christopher Strachey — a great fan of Lisp and McCarthy — who realized that many kinds of programming could be unified and also be made safer by always using “old” values (from the previous frame) to make new values, which are installed in a the new frame. He realized this by looking at how clean “tail recursion” was in Lisp, and then saw that it could be written much more understandably as a kind of loop involving what looked like assignment statements, but in which the right hand side took values from time t and the variables assigned into existed in time t+1 (and only one such assignment could be made). This unified functional programming and “imperative like” programming via simulating time as well as state.
And let me just mention the programming language Lucid, by Ashcroft and Wadge, which extended many of Strachey’s ideas …
It’s also worth looking at “atomic transactions” on data bases as a very similar idea with “coarse grain”. Nothing ever gets smashed, instead things are organized so that new versions are created in a non-destructive way without race conditions. There is a history of versions.
The key notion here is that “time is a good idea” — we want it, and we want to deal with it in safe and reasonable ways — and most if not all of those ways can be purely functional transitions between sequences of stable world-line states.
This is the core idea behind the Datomic database.
The just computed stable state is very useful. It will never be changed again — so it represents a “version” of the system simulation — and it can be safely used as value sources for the functional transitions to the next stable state. It can also be used as sources for creating visualizations of the world at that instant. The history can be used for debugging, undos, roll-backs, etc.
In this model — again partly from McCarthy, Strachey, Simula, etc., — “time doesn’t exist between stable states”: the “clock” only advances when each new state is completed. The CPU itself doesn’t act as a clock as far as programs are concerned.
This gives rise to a very simple way to do deterministic relationships that has an intrinsic and clean model of time.
For a variety of reasons — none of them very good — this way of being safe lost out in the 60s in favor of allowing race conditions in imperative programming and then trying to protect against them using terrible semaphores, etc which can lead to lock ups.
I’ve mentioned a little about my sequence of thoughts about objects. At some point, anyone interested in messaging between objects who knew about Lisp, would have to be drawn to “apply” and to notice that a kind of object (a lambda “thing”, which could be a closure) was bound to parameters (which kind of looked like a message). This got deeper if one was aware of how Lisp 1.5 had been implemented with the possibility of late bound parameter evaluation — FEXPRs rather than EXPRs — the unevaluated expressions could be passed as parameters and evaled later. This allowed the ungainly “special forms” (like the conditional) to be dispensed with, they could be written as a kind of vanilla lazy function.
From a purely aesthetic stand-point of the LISP programming model, FEXPRs are the pinnacle of beauty to me.
By using the temporal modeling mentioned above, one could loosen the “gears” of “eval-apply” and get functional relationships between temporal layers via safe messaging.
So, because I’ve always liked the “simulation perspective” on computing, I think of “objects” and “functions” as being complementary ideas and not at odds at all. (I have many other motivations on the side, including always wondering what a good language for children should be epistemologically … but that’s another story.)
I suspect that the other story is anchored in the work of Seymour Papert
Do you still think it is possible to build a system with the organically-growable scale of the internet but with some of the nice, predictable pseudotime-like semantics? — [Adam Solove](https://twitter.com/asolove)
Good questions. To keep my first comment as short as I could make it, I didn’t mention the next wave of researchers who got interested in this perspective on computing, for example: Dave Reed (“the slash in TCP/IP”) in his 1978 MIT thesis, Dave Jefferson and Henry Sowizral at RAND (“Time Warp”), Leslie Lamport, etc.
And the next round about 15 years ago — for Croquet — which included Dave Reed, David A Smith, Andreas Raab, myself, etc. There were three or four different versions of this mechanism, all of which worked, and the designs progressively got better.
There are several sets of issues and tradeoffs. Reed’s initial ideas in the 70s were about what an operating system for the whole Internet should look like. One of the many interesting ideas he proposed was to deal with the slow latencies and large numbers of potential users by distributing cloned computations organized by pseudo-time, and using the slow Internet for just input and synch deltas. This is what we first implemented in Croquet in the early 2000s when a typical good Internet round trip for a ping was about 80-100 milliseconds. This is good enough to create a massively parallel game like World of Warcraft (or even a flight simulator game) without any servers at all by just using the distributed machines that the game players are using to play on.
Further versions did more, and more comprehensively.
(A side note is that Croquet without the real-time graphics and interaction, etc. automatically provides a distributed object-oriented data base, etc. Pseudo-time is the larger idea underneath protected distributed atomic transaction data-bases, etc.)
Your last question is a good one, and it has been discussed over the years. Can it be done? How much would need to be done, and where? Etc.
The object work we did at Parc was a part of the network thinking going on in the ARPA/Parc community. And there was an eventual intent to think of higher-level network entities. A quite good intermediate real implementation of “net level entities” was the LOCUS system by Gerry Popek and his group at UCLA after he spent a year at Parc thinking about what network systems should be like (this was really good, and the first several chapters of his book are well worth reading). It was a migratory load balancing idea over heterogeneous machine types, which was quite complementary to the pseudo-time ideas.
I would like to see a highly talented group take another pass at your last question. Resilience to scaling is quite difficult to predict ahead of time, and it’s good to do real implementations. Still, the 9 orders of magnitude that the Internet needed was possible with the scheme that the ARPA/Parc community had, and Croquet worked perhaps better than it should have (especially with just a few people working on it). In any case, I think it is a bit crazy to not use pseudo-time if you know about it, and it’s a bit amateurish not to be aware of it. In the middle are religions whose identities are built on various “one true way”s (and this is the opposite of the way any science should be conducted).
An example: Let’s say I have an existing object and a clone of it where I have changed some of the methods’ behavior, but I want to understand all the consequences of these differences across all the distributed objects they cooperate with. — [Adam Solove](https://twitter.com/asolove)
Yes, I think you are characterizing what is called “possible worlds reasoning” in AI. A notable system was “ART” (Automated Reasoning Tool) by Inference Corp some years ago. More recently you might be interested in the “Worlds” work by Alex Warth et al.. This is a highly efficient design and implementation of fine grain pseudo-time contexts for general programming.
But what you are doing is good and important. We like a quotation by Goethe: “We should all share in the excitement of discovery without vain attempts to claim priority”. In other words, it doesn’t matter when good ideas happen — it is enough that they do happen.
OO and mutable state
The way I have looked at this is that a big problem is the confusion of “objects” with “abstract data types” and the clinging to “data” and “assignment statements”. As soon as encapsulation is really enforced the object can do what it needs to deal with the design parameters (including keeping histories, etc.).
Again: it’s really worth looking at Dave Reed’s MIT thesis …
Kay at times uses the phrase “not-quite-a-field” to refer to the tech industry. The statements below might give a sense of the nature of the so-called-field as Alan Kay sees it.
I don’t identify the general notion of “functional programming” with any particular language, including Haskell. I stay with the “idea of a function” as a reliable mapping. Similarly, I don’t identify the general notion of “object-oriented” (as I characterized it long ago) with any particular language. Both of these terms have now been “colonized” and their meanings changed, etc.