Facebook


Emacs has flickered for 30 years. Now, it should be flicker-free. I’ve just landed support for double-buffered rendering for the X11 port. Now you should be able to edit, resize, and introduce bugs in your awful codebase without seeing a partially-rendered buffer or being incited to murder by barely-perceptible white flashes while editing that disappear when you look at them.

You might say, “That’s great, but double-buffered rendering is the textbook solution to the problem of displaying incomplete rendering to users and driving them to kill their dogs in maniacal frustration.”. That’s true, but Emacs predates those textbooks. GNU Emacs is an old-school C program emulating a 1980s Symbolics Lisp Machine emulating an old-fashioned Motif-style Xt toolkit emulating a 1970s text terminal emulating a 1960s teletype. Compiling Emacs is a challenge. Adding modern rendering features to the redisplay engine is a miracle.

“Emacs is a great operating system”, the old joke goes, “but it just needs a decent text editor.” That’s not so far from the truth --- Emacs is basically a Lisp interpreter married to a big bunch of C code called redisplay. (Iä! Iä! Fhtagn!) Under normal operation, Emacs sits idle waiting for input, reads that input, maps the input to a command function, executes the command function, and displays the result of executing that command. It’s a fairly simple model: at a high level, it’s not so different from the read-eval-print loop that you see when you run /usr/bin/python3.

But Python has a simple command line interface. Emacs is a visual system, so “display[ing] the result” has some subtlety to it. Emacs organizes its view of the outside world into frames (what the rest of the world calls “windows”), windows (which the rest of the world calls “panes”), and buffers (which the rest of the world calls “documents”). At any given time, a user might be looking at any number of frames, displaying any number of buffers distributed into any number of panes. When a user hits a key, Emacs does whatever the key-command says to do, then updates these frames, windows, and buffers to reflect the results whatever changes that command made. The act of updating display to reflect Emacs’ internal model of the world is called redisplay. One simple approach to implementing redisplay is to just redraw all the frames, windows, and buffers from scratch. This approach might be good enough for a shitty 2016 video game like Nuclide or Eclipse, but not Emacs.

Emacs was designed for much more constrained systems. Men were men, women were women, and bandwidth was expensive. Consequently, Emacs tries very hard to optimize redisplay. Internally, Emacs has a model of what each frame used to look like, before the last invocation of redisplay. This model is one of redisplay’s inputs. Another input is the current contents of each Emacs buffer. Redisplay essentially diffs the last-known display configuration and what it’s supposed to be displaying right now, then emits a minimal set of terminal control codes needed to change the last-known state to the current-good state.

(Incidentally, this approach, applied to the web and mobile, is the core of React. Jordan Walke, eat your heart out.)
All of this redisplay code was written a long time ago. At one time, it had K&R function prototypes. The authors (mainly RMS, whose present sanity reflects this effort) intended redisplay to be used on text terminals over slow links. (Emacs, to this day, has code that activates if it thinks you’re running on a connection slower than 2600 baud.) In this environment, redisplay works very well and affords compelling advantages.

One day, a fool wanted to run Emacs in a GUI as a native GUI program. The rest is ChangeLog.

To understand why Emacs is so unusual, it’s important to understand how a normal GUI program differs from a normal terminal (henceforth, TUI) program. A TUI program is driven by its read-eval-print loop, just like python3 above. Until the program does something, the world stands still. The program reads input, does something, and squirts out some bytes in response. Life is simple. The worst problem that a TUI program has to consider is that the terminal changes size.

By contrast, a GUI program is event driven: innumerable things can happen to the program, outside of its control. A user can move or resize a window, click on a button, use VR googles to lovingly career the titlebar, and do other things that are generally unpredictable and that happen at completely unexpected times. When you write a GUI program from scratch, you usually register some kind of callbacks that run in response to various events happening. In each callback, the program does some work and displays the result. These callbacks can happen in arbitrary order at arbitrary times. The GUI event model is not a hard programming model: it’s just different from the TUI one, because the set of events is much richer.
Whoever made Emacs into a native X11 program didn’t port Emacs to the event driven model, of which TUI is a neat subset. Instead, he pretended the GUI was a text terminal. Everything that is wrong with Emacs stems from this decision. Emacs does not, like most GUI programs, just receive GUI messages and respond to them. Emacs’s main mode of operation is a still honey-badger-esque read-eval-print loop. Everything Emacs does to respond to window events happens inside the read and (horrifyingly) eval parts of this process.

Rendering is worth mentioning. One of the callbacks a normal GUI program can receive is called Expose. (That’s the X11 name: the Windows equivalent is WM_PAINT.) An Expose event says “I need you [the program] to render this part of your window. Do it.”. Most programs are perfectly happy just responding to Expose callbacks and drawing what they need to draw, but Emacs is not most programs. Emacs is a 1980s Lisp Machine pretending that it’s running on a text terminal. It’s going to draw when it wants to draw, not when some stupid “GUI system” tells it to draw.

Consequently, Emacs window rendering is “push”, not “pull”. When Emacs gets an Expose event, it draws a god damn white square to tide the window system over until it gets around to letting redisplay (which still thinks it’s talking to a 1960s teletype) redraw the display. This redisplay happens in terms of character cells and cursor positions, not pixels. Emacs demands that the window system let Emacs draw onto the screen whenever it wants, not just in response to an Expose event.

The pretending doesn’t stop at terminals though. The first GUI ports of Emacs were based on a GUI framework called Xt. Xt worked well for many years. (Does “Motif” ring a bell? Yes? If so, you are old enough to have seen some shit.) But modern, non-Xt toolkits came along eventually. Xt works very differently from GTK. GTK+ is much better, but has a different model.
Did Emacs just adapt to whatever these non-Xt toolkits did? Did Emacs adopt modern best practices? GTK+ is a modern GUI library. Emacs supports GTK+. Is Emacs a well-behaved GTK+ program now?

LOLOLOLOLOLOLOLOLOLOLOLOL

Of course not. Emacs pretends GTK+ is an old-fashioned Xt toolkit. The entire Emacs philosophy is to force $MODERN_THING to behave just like Xt just like a 1960s TTY. Emacs does awful things to GTK+ to maintain this illusion.

Keep in mind that Emacs xdisp.c tries to support five different toolkits (including two different major versions of GTK) with #ifdefs. There is no runtime abstraction. We define three or four different versions of each damn function. It’s a nightmare.

(When I was at Facebook, I was famous for “convincing” Android to do things it was never intended to do. Do you think that I gained an appreciation for this perversion when I joined Facebook? Emacs was my first and best school of awful hacks.)

Remember how Emacs just does crap, then displays the result, oblivious to the outside world? This model doesn’t work very well when combined with a window system that can ask Emacs to do arbitrary things at arbitrary times. While Emacs is in the middle of syntax-highlighting a 20,000 line C++ file, the window system can say “You! Paint your window! Now!”. Emacs could just wait for a convenient time to get around to doing this painting, but this strategy would produce a poor user experience.

To provide a better user experience, Emacs installs a SIGIO signal handler for the X11 socket. Whatever Emacs is doing, wherever it is in its code, if the GUI wants to tell Emacs something, Emacs stops what it’s doing and runs redisplay. So now redisplay is not only a fiendishly complicated algorithm designed to minimize 1980s modem bills, but it also needs to be thread-safe with respect to every other part of Emacs.

In the SIGIO callback (which runs whenever the GUI asks Emacs to do something), Emacs runs a very limited version of redisplay. If this gimped version of redisplay says that it can’t cope with the current state, Emacs arranges for the full version of redisplay to be done later. In this case, Emacs usually just paints a white background over whatever area redisplay can’t consider at the moment.

SIGIO handlers can interrupt Emacs at any moment. In a sense, it’s like thread safety. Have you ever tried to make a single-threaded program safe for threads? Hard, yes? Well, Emacs is like that. Except that we don’t acknowledge that we have threads. (Our global lock is called block_input().)

What’s particularly hilarious is that SIGIO can happen in the middle of redisplay. The REPL loop (in the Emacs case, not Read Eval Print, but Read Eval WTF) can be recursive.

Emacs flickers like crazy. This flickering is a predictable consequence of the “do what the fuck I want when I want it” redisplay strategy Emacs uses. There’s no coordination between your video card, your GUI system, Emacs, and your sinful soul. Say we’re about to draw a line of text. Step one is to erase that line with the background color. Step two is to draw each character of text, one by one, of that line. If your video card happens to refresh in the middle of this process, you’ll see, momentarily, incomplete state. The next frame will probably be perfect. The GUI system sampled Emacs in the middle of its drawing operation. You perceive your eyes seeing this incomplete rendering as flicker.

A program like Emacs can minimize flicker by minimizing the amount of drawing that you actually do. If Emacs were to redraw every window every frame, you’d see massive flickering. It’s because redisplay (which is optimized for modems) is pretty good at optimizing the updating of the screen that you usually don’t see flicker. But you still see it sometimes. It’s a fundamental problem. In a single-buffer, immediate-mode, direct-drawing system, you can always get unlucky and your GUI can always show you Emacs in the middle of changing its underwear.

The amount of flicker you actually see depends on things like whether redisplay optimized the last update, your video driver, and the purity of your soul. In a single-buffer environment, the compositing manager and your video driver sample Emacs at essentially arbitrary intervals. It’s only through double buffering that we can guarantee that you see either valid old state or valid new state, not some random bullshit in-between.

I am a sinner. Unrepentant. Damned. Thusly, for me, Emacs flickers constantly. I hate flicker. I love Emacs. Something has to change. I decided to hack Emacs to eliminate this antediluvian flickering.

Eliminating flicker is not a conceptually hard problem. The basic idea is that you do your rendering and drawing into some off-screen area, then, when it’s done, atomically (i.e., all at once, all-or-nothing) show your human user the result of that drawing. A user sees either the complete new state or the complete old state. Modern GUI toolkits like Qt and GTK deal with this problem automatically: when a modern GTK program gets an Expose event, it asks everything affected by this Expose event to draw onto a bitmap, and when everything has drawn, it blits this bitmap to the main screen. You never see embarrassing intermediate state.

It’s elegant. It’s also easy to implement --- if you’re not Emacs. Recall that Emacs still thinks it’s running in a terminal and has complete control over all output. Flicker was driving me crazy. I had to retrofit double buffering onto this horrible system.

Modern incarnations of the X Window System have a nice extension called DOUBLE-BUFFER. This extension lets a program pretend it’s rendering directly to the user while in fact rendering some intermediate buffer. Under program control, X11 (the GUI, recall) will copy this intermediate buffer to the primary display. This functionality is perfect for Emacs.

In order to eliminate flickering from Emacs rendering, I needed to retrofit double buffering into a Byzantine system. The X double buffer extension helped. Most of Emacs still believes it’s drawing to a normal X window. The reality is that it’s rendering to a back buffer. Keep in mind that Emacs can draw at any time. It’s not enough to just copy this intermediate buffer to the primary display at the end of processing each command.

We either render too often, imposing unacceptable load on the X server, or too seldom, and generate user-visible bugs. Remember that Emacs can draw at arbitrary points, so there’s no clear point at which we should expose the back buffer, the one that contains the results of our accumulated and thus-by invisible drawing operations. The code is reentrant, “thead”-safe, and full of special cases, but it achieves the result I desired.

The problem is imposing a well-defined render->publish cycle on a free-for-all program.

Eventually, I settled on a solution: a global “block redraw” count. Emacs has no way of indicating dirtiness in drawing, so I just decided that any code that asked to draw would mark its parent frame as “dirty”. At the end of processing each X11 event (unless blocked) or when the blocked count reached zero, we walk over all frames and buffer-flip any that might have been dirtied since the last buffer-flip operation. We want to minimize the number of buffer-flip operations, so we try to coalesce as many as possible, which is challenging when SIGIO can interrupt anything.

I started out by trying to figure out what parts of the program might redraw the display and doing a “display flip” (i.e., atomic redraw) after each, but I quickly started licking the walls and cuddling myself as I stared off into the distance and imagined better days. Emacs redraw the display from anywhere.

Instead, I just created a system where Emacs keeps track of dirty (i.e., drawn) regions and we redraw at the end of any X command. Inefficient? Maybe. Satisfying? Eh. Can I sleep tonight? Probably! When the global lock count transitions from one to zero, we flip dirty buffers. Redisplay locks buffer flipping, but other components do too, depending on context.,

Now, we redraw when the “block redraw” count transitions from one to zero. Redisplay always blocks redraw. Asynchronous input blocks redraw. Timers block redraw. Eventually, it all works out, we decrement a counter from one to zero, and we show you a new view onto your shitty awful code.

Emacs should now render itself as smoothly as any other modern GUI program. It just provides this functionality through a mechanism that’s completely alien and antithetical to modern GUI frameworks. Internally, Emacs still belives it’s a text program, and we pretend Xt is a text terminal, and we pretend GTK is an Xt toolkit. It’s a fractal of delusion.

Emacs uses an X11 extension, DOUBLE-BUFFER, largely seen as an historical artifact. There are other hacks I didn’t describe, like putting scrollbars in their own X11 window, contrary to the intent and design of the GTK people. This extension allows us to reuse our existing drawing code and redirect it to an off-screen buffer. GTK+ or Lucid or Motif or whatever we’re using is oblivious. My diff turns scrollbars and other widgets that share screen space with the double-buffered region into independent X windows. Overall, it’s a giant hack.

But it works. Somehow, it all works. And as a result, Emacs is as smooth and flicker-free as any other modern GUI program, and regular users have no idea what horrors lie beneath.

Damn, I love working on this program.