Recently, there’s been a lot of turmoil in the systems language community. We have the Rust Evangelism Strikeforce nudging us towards rewriting everything in Rust. We have the C++17 folks who promise the safety and ease of use of modern programming languages with the performance and power of C. And then there’s a long tail of other “systems” programming languages, like Nim, Reason / OCaml, Crystal, Go, and Pony.
Personally, I’m super excited we’re seeing some interesting work in the programming language theory space. This got me excited to learn more about what’s out there. A lot of the problems I solve are usually solved in C. Recently, Go has begun to encroach on C’s territory. I enjoy C and Go as much as the next person — They’re good languages for getting shit done. Often times, they leave a lot to be desired, and leave me envious of other programmers with tools like Flow, Typescript, and Dialyzer. Coming from developing in Erlang, even with its rudimentary type system, functional programming just came far more easily to me.
Let’s back up a bit. What is a systems language? Well, I think that depends on where you are in the stack, and who you ask. In general, I would suggest the definition of a systems language is a language that can be used to implement the components your systems runs atop.
Similarly, a Java programmer must understand the runtime behaviour of the JVM, but until OpenJDK, the implementation language was irrelevant. Even though these ecosystems are fairly mature, there will always be a need to someone to understand what’s going on underneath them in the system when the computers break.
At the end of the day, things written in systems programming languages rarely deliver direct business value, but build the underlying infrastructure to deliver business value. They are a necessary evil, and for most organizations, investing in them is a cost center, as their usage is not central to the success of the product.
From the perspective of the application developer, databases, RPC libraries, load balancers, and caches are just another piece of infrastructure. Fundamentally, this infrastructure is composed of various systems components. We’ve seen these infrastructure components come in a variety of languages. For example, databases have been implemented in a variety of languages, such as Java (Cassandra), C++ (MongoDB), Erlang (CouchDB, Riak), C (Postgresql), and Go (CockroachDB). Data pipelines have been implemented in Java (Kafka, Spark), and C++ (Heron). Caching has largely stayed in the domain of C (Redis, Memcache).
Many of these projects end up leveraging shared infrastructure that developed over the years. Projects like LevelDB / RocksDB (C++) have found themselves everywhere. SQLite seems to run in nearly every system.
At the end of the day, nearly all of these systems rely on our operating systems userland and the Linux Kernel. Most of the distributions of Linux still use a userland largely written in C. Although, GNU C has a ridiculous amount of extensions, enabling everything from a formalized memory model to thread-local variables, it is still C. There have been attempts to introduce userlands in Go, Rust, and Nim, but none of these projects have gained wide adoption.
At the end of the day, nearly all of us run software on the Linux Kernel. The Linux Kernel is written in C, and has no intention of accepting C++ for good reasons. There are alternative kernels in development that use languages other than C such as Fuschia and MirageOS, but none have yet reached the maturity to run production workloads.
I wanted to see if I could write a very simple userland utility that I would normally write in C in another language. Go has replaced much of my usage of C these days, but it’s not a language I’m happy with, but rather it is the pragmatic choice. I get memory management, a ton of build tooling, and a much simpler language. I lose generics, macros, and a whole lot of insight into what I’m doing.
I tried to write a program for a very simple task for a thing at work. It was a POSIX signal helper. Effectively, it was the entrypoint into a container, that would be responsible for calling a shutdown script during a container termination. It would run services that were meant to run to termination, so these services are meant to only terminate if, either due to a failure caused by programmer error, or machine fault, or due to an external signal.
Often, we want to do something pre-shutdown. Sometimes, this is to deregister us from service discovery, and other times it is used to save some application state after shutdown. Nearly all traditional PID 1’s have this feature, but putting something like systemd into a container is a non-starter.
Most of this work is just making syscalls, like fork, exec, sigmask, wait4, and sigtimedwait. The fact that it runs in a container, means that we can’t rely on a big runtime, or a set of shared libraries being available. At most, we can rely on libc.so.6.
I tried to write this a bunch of languages. In most cases, it was a false start, and the language fell short of my needs in capabilities. To spoil things early, I ended up compromising, and writing this in Go. Even as close to C as Go is, it is still awkward. Since fork/exec is managed via os/exec, you can’t simply start to listen to all signals received in the main goroutine, without breaking exec.
Nim was a false start. I needed to run the code in a separate process group than the signal-wrapper so when the process group gets the signal. You can do this “by hand” if you go through the fork/exec process by hand, and call setpgid.
The bigger problem was signal handler safety. Signal handler safety is not well defined in Nim, and there is an open Github issue for it. It seems like the runtime wants to own signal processing.
In general, it looks like a neat language, and I really hope it goes somewhere. There was also awkwardness around the way that sum types are specified. My favourite part of the language is that it could compile to multiple backends, and you could explore the intermediate state.
Pony is still in its infancy. I am not really using it for its original use case. I was bound to hit missing features.
With that said, the language itself was super fun to write in. It’s a simple language, with a simple toolchain. The “build tool” (ponyc) “just works”. It also produces a small binary that has minimal dependencies.
With that said, it was also a false start. First, there was no way to just listen for signals, without the program exiting. I submitted a PR to the Pony team, and they merged in this capability.
The other problem is that the mechanism to fork and exec processes didn’t really fit the bill for me. It didn’t have the ability to shuffle around file descriptors, nor did it have the ability to run things like setpgid after forking.
The upside is that the core Pony code is so simple that I started taking a hatchet to the ASIO subsystem to start enabling these kinds of capabilities. Given how simple of a runtime it is, it shouldn’t require much work to bring Pony to being capable of this task, but I didn’t have the time nor the mental energy to write an RFC.
A lot of the complexity around this is that although I could start banging away at the FFI, and glue together some bits, I lose a lot of the benefits of the language. The language has a concept that was unfamiliar to me in the context of PLT at first of being “capabilities-safe”, but upon further understanding, it’s an incredibly powerful tool. I think it solves a big problem that the C++ community has, which is even if you write good code, the libraries you bring in could be written like YOLO.
Pony’s team also has a great philosophy. They explicitly care about programmer productivity.
The Pony Philosophy
In the spirit of Richard Gabriel, the Pony philosophy is neither “the-right-thing” nor “worse-is-better”. It is “get-stuff-done”.
Incorrectness is simply not allowed. It’s pointless to try to get stuff done if you can’t guarantee the result is correct.
Runtime speed is more important than everything except correctness. If performance must be sacrificed for correctness, try to come up with a new way to do things. The faster the program can get stuff done, the better. This is more important than anything except a correct result.
Simplicity can be sacrificed for performance. It is more important for the interface to be simple than the implementation. The faster the programmer can get stuff done, the better. It’s ok to make things a bit harder on the programmer to improve performance, but it’s more important to make things easier on the programmer than it is to make things easier on the language/runtime.
I tried to write this in Reason, Facebook’s flavour of OCaml. It’s awesome to see a functional programming language on this list. As of writing this, Reason is primarily aimed at frontend developers, and not “native” developers, as they call us.
What’s underneath Reason is OCaml — Reason is effectively what they call a “Transpiler”, and relies on OCaml to do the heavy lifting. This means we get 20 years of OCaml’s legacy, and tooling. This includes nearly everything in the opam package library.
The thing that nobody told me is that OCaml’s runtime cannot do two things at once. It has a Python-style GIL. It only introduced multicore support in 2015.
The rough implementation is below, with parts omitted because they plug into internal systems.
So, this actually works.
Initially, one of the things I used was CCBlockingQueue in order to provide synchronization. This was a great way to pass things between the thread that was waiting for signals to the thread that was coordinating. I used a sum type across the queue so I could go ahead and use match. This was a poor man’s implementation of a state machine — except, I had to be able to exhaustively handle all messages at all points. Pattern matching made this a breeze, but it was still awkward
This was kind of nasty at points, because I used a sum type of messages across the queue, since there was no obvious way to wait on multiple queues at once. In my Rust implementation, I used channels with the channel_select! macro (https://github.com/BurntSushi/chan). It would be nice to be able to do the same in OCaml or if there was an obvious library to handle this.
Another one of the issues I had was dealing with timers. Again, because the mechanism I used relied on this single queue, I’d either need to handle a single-thread acting as a timer wheel pushing expiration messages across or to launch a thread per timer.
This was more about documentation. When I registered a signal handler, as opposed to using the wait_signal (sigtimedwait), it was difficult to troubleshoot why I was deadlocking. I learned that signal handlers were “unsafe” in the OCaml runtime (See: GIL), and could prevent other threads from executing.
One of the best parts about OCaml was the access to systems APIs. The straightforward mechanism to call fork, setpgrp, etc.. was awesome. The cross-platform signal transformation logic was a little confusing, but nothing that could not be solved without documentation.
I think the OCaml folks should go spoon with the Rust folks. I ended up writing a Makefile by hand, and using rebuild, but I imagine if the project got much more complex, or had multiple files involved, it would become unwieldy to do this by hand. There exists Oasis, ocamlbuild, and Jenga — all of which seem to have a steeper learning curve than the language itself.
I was also able to complete the task in Rust. I really wanted to like Rust. Rust feels like all of the complexity and difficulty of C++, without much added benefit for simple programs.
Rust suffers from one of the Seven Deadly Sins. Pride. This manifests in one of two ways. The first is the borrow checker. The second is the approach to performance over simplicity.
In the Rust documentation, they say:
However, this system does have a certain cost: learning curve. Many new users to Rust experience something we like to call ‘fighting with the borrow checker’, where the Rust compiler refuses to compile a program that the author thinks is valid. This often happens because the programmer’s mental model of how ownership should work doesn’t match the actual rules that Rust implements. You probably will experience similar things at first. There is good news, however: more experienced Rust developers report that once they work with the rules of the ownership system for a period of time, they fight the borrow checker less and less.
If the language has the problem that people are fighting with the language in order to become productive with it, perhaps something is wrong with the language, and not the programmers? Instead, the Rust community continues to flaunt the correctness of their language — a valuable property, but without taking a step back and thinking that perhaps different defaults might make more sense.
The biggest issue I have with the defaults, and the borrow checker is that places in FP where you would normally pass by copy — pass by value, in Rust instead it assumes you want to pass by reference. Therefore, you need to clone things by hand and pass the cloned versions instead. Although it has a mechanism to do this automatically, it’s far from ergonomic.
The argument of pass by reference, or borrowing is that it’s more performant than cloning by default. In general, computers are getting faster, but systems are getting more complex.
Performance is not a primary concern — easy [sic] of programming and correctness
are. — Joe Armstrong (http://erlang.org/pipermail/erlang-questions/2014-June/079613.html)
I think Rust missed this.
If you’ve gotten this far, you’ll realize that everything is still terrible. If I want to implement anything at this layer of the system, my choices are largely still C, and Go. I’m excited because a number of new participants have entered the ring. I’m unsure that I’m ever going to want to use Rust, unless they have a massive attitude adjustment. I’m excited to see Nim, and Pony mature.