Fulfilling a Pikedream: the ups of downs of porting 50k lines of C++ to Go.

By logicchains

Rob Pike, one of the creators of the Go language, stated that he expected the language to be adopted by C++ programmers, a prediction that hasn’t been realised. Recently however at the HFT firm where I work, the success of a team’s move from Python to Go for some pieces of non-speed-critical infrastructure led to the decision to attempt a slimmed-down Go rewrite of a somewhat-throughput-critical 50k LOC C++ server. The old C++ server used the same techniques and libraries used in our latency-critical C++ trading software, where every microsecond matters, and this degree of performance simply wasn’t needed. It was hence thought that a rewrite in Go, using the language’s native scheduler rather than the hyper-optimised C++ framework used by the autotraders, would be easier to maintain. I was tasked with the rewrite.

The tl;dr
In business terms, the project was a success: completed ahead of schedule, performing acceptably, and less than 10k LOC long (this massive LOC reduction was of course partially due to the removal of features that were deprecated or not needed by the team behind the rewrite). In personal terms however I feel the outcome was suboptimal, in the sense that I wrote two to three times as much code as would have been needed in a language with parametric polymorphism. Some of this was due to type-safety: Go forces a tradeoff to be made between verbosity and type-safety, and I settled somewhere in the middle; it could have used less code and been less type-safe, or used more code and been more type-safe.

Now, the pros and cons, starting with the pros.

The pros

Emacs! With plugins for autocomplete, jump-to-definition, error checking upon save, intelligent refactoring, and GoTest integration, programming Go in Emacs offers practically everything one’d expect from a good IDE, with the bonus of super-easy customisation and extensibility via Elisp. Since one of my reasons for getting into programming was the opportunity to get paid to use Emacs, this is definitely a huge plus.

Goroutines! Go makes message-passing based concurrency, which I personally find the easiest form of concurrency to reason about, super simple to use. It also allows parallel/async code to be written in the exact same way as concurrent code, simply by setting GOMAXPROCS to 1. The only other languages I know with built-in lightweight thread schedulers are Erlang/Elixir and Haskell, with the former lacking static typing and the latter lacking management-willing-to-use-ability.

No inheritance. I’ve personally come to view inheritance-based OO as somewhat of an antipattern in many cases, bloating and obscuring code for little benefit, and Go makes this kind of code impossible to write. I suspect this was Rob Pike et al’s motivation for designing Go the way they did: there was a bunch of Java/C++ at Google that was written as if Enterprise Fizzbuzz was a positive role-model, and they wanted to spare themselves from having to deal with such code in future. That being said, in spite of being legacy code the use of inheritance in the old C++ server was pretty sane, and it could easily have been rewritten to use a more modern style.

Readability. I always found the Go code I encountered quite easy to read and understand, both our code and external code. Some of the C++ I encountered, in contrast, took hours to fully comprehend. Go also forced me to write readable code: the language makes it impossible to think something like “hey, the >8=3 operator in this obscure paper on ouroboromorphic sapphotriplets could save me 10 lines of code, I’d better include it. My coworkers won’t have trouble understanding it as the meaning is clearly expressed in the type signature: (PrimMonad W, PoshFunctor Y, ReichsLens S) => W Y S ((I -> W) -> Y) -> G -> Bool”.

Simple, regular syntax. When I found myself desiring to add the name of the enclosing function to the start of every log string, an Emacs regexp find-replace was sufficient, whereas more complex languages would require use of a parser to achieve this. The simple syntax also makes code-generation a breeze, be it generation by Emacs macros or Go templates. Emacs + Go == parametric polymorphism: not only can macros be used to speed up the process of generating the “copy-paste” code that Go’s lack of parametric polymorphism requires, if functions are written right then regex can also be used to update all “copy-pasted” functions simultaneously, making updating the code for fooInt, fooFloat and fooDouble almost as easy as updating foo<t> in a language that supports <t>. The downside is that, while Emacs macros and regex can write and modify Go code in such a manner as to emulate parametric polymorphism, it’s still not as readable or concise as actually-polymorphic code, and of course is not easily maintainable by someone lacking familiarity with regex or an extensible editor like Emacs.

Built-in, effective templating. Go’s text/template package can easily be used to Generate new Go code. This allows IO to be used during code generation: we had for instance a library for interacting with a particular service that was generated from an XML schema, making the code perfectly type-safe, with different functions for each datatype. In C++, IO cannot be performed at compile time, so such schema-driven code generation would not be possible. Languages allowing compile time IO include F#, which has compile time IO via Type Providers, Idris, which also has Type Providers, Lisps, which can do IO in macros, Haskell, which has an IO -> Q compile time IO function in Template Haskell, D, which can use `import` to read files at compile time, Nimrod, which has functions for compile time file IO, Elixir (and possibly Erlang?) which can do arbitrary IO via macros, and Rust, which can use libsyntax to perform arbitrary computations and IO at compile time.

The cons

Stockholm syndrome. I just argued above that generating Go code with templates is superior to compile time metaprogramming in C++ due to allowing IO, which of course is a stupid argument, since one could just as easily generate C++ code using a separate C++ program that does IO.

Lack of parametric polymorphism! I’ve read many people saying this isn’t a problem in practice, well in this particular case it was a huge problem. I’m confident that a C++ translation of the new Go code would be less than half the LOC of the Go version and more type-safe, due to C++’s polymorphic functions and types. A Haskell rewrite would need even fewer LOC, and if I’d been allowed to write it in Clojure I suspect the whole thing could have been expressed in fewer than 1000 lines of macros (although I’m not sure how debuggable or maintainable that would have been…).

Sacrificing type safety. We use extension attributes for the various protobuffer messages that the server handles, and I originally intended to distinctly type these, so that for instance a FooExtensionAttribute could not be used on a Bar. Go’s lack of parametric polymorphism and generic types however meant that this would have involved a significant amount of code duplication, so I ended up settling with just a single ExtensionAttribute type, with the type system not checking that it was used to extend the appropriate message.

Binary sizes. If one uses code generation to Generate a type-safe API, with distinctly typed accessors and whatnot for each datatype, then one can easily wind up with over 100,000 lines of Go and 30mb+ binaries. Compile times are also slower; over 10 seconds in this case, although this wasn’t a significant issue as the library could just be compiled once to a static library and then statically linked.

Kernel compatibility. I may well be the first to make this complaint, but when for Kafkaesque reasons you have to deploy to an old kernel, it’s somewhat disappointing when the newest Go version requires features not supported by the kernel and you’re forced to stick to an older, slower version of Go.

Conclusion

Go is a double-edged sword: it forbids complex abstraction, both bad and good. The worse the abstraction that you and your colleagues are likely to use, the better a language is Go, and vice-verse (ultimately depending on what is considered ‘good’ and ‘bad’ abstraction).