How a micro-optimisation in the .NET compiler broke our web application at runtime

By Matt Thompson

Recently we updated our builds to use Visual Studio Build Tools 2017, including MSBuild 15, instead of MSBuild 14. We expected nothing to go wrong with this. And getting to use the C#7 features, which we’ve had to ignore for awhile, was something nice to look forward to.

The commit went in, the build ran on the build server, unit and integration tests passed, and we’re ready for a release! During a last-minute bit of manual poking, we discovered a runtime error attempting to view any customer record when the customer had purchased a particular type of product.

Uncaught Error:
Sys.Webforms.PageRequestManagerServerErrorException: Type ‘Spektrix.InterfaceModel.FixedSeries.MyFixedSubscriptions+<>c__DisplayClass6_0’ in Assembly ‘Spektrix.InterfaceModel’ is not marked as serializable.

One thing stood out: the DisplayClass. Understanding why we were suddenly getting errors here relied on knowing what this thing is and why it’s here.

When using a lambda that would reference some state and require a closure, the compiler generates a new class which holds the variables on the heap. This class contains the variables used in the lambda, and the lambda method itself. Generally, it’ll look something like this when decompiled:

In any code using this, an instance of the DisplayClass is created and the state is assigned. Then an Action with a pointer to the method group on the instance is created, in place of the lambda.

However, when no state is required in a closure, the compiler would do something different. No DisplayClass would be generated, the lambda method would just be a method on the class in which it’s used:

But why DisplayClass? This answer from Eric Lippert explains that really, it’s just jargon:

“The reason that a closure class is called “DisplayClass” is a bit unfortunate: this is jargon used by the debugger team to describe a class that has special behaviours when displayed in the debugger. Obviously we do not want to display “x” as a field of an impossibly-named class when you are debugging your code; rather, you want it to look like any other local variable. There is special gear in the debugger to handle doing so for this kind of display class. It probably should have been called “ClosureClass” instead, to make it easier to read disassembly.”

For our back-office web application, we serialize all session state into Redis, including the current view state. This allows someone to leave the site and come back to exactly where they were, even if in the middle of a complex flow. Part of this serializable view state was an Action with no closure which was used to trigger data binding when something else somewhere on the UI happened. On upgrading to MSBuild 15 these Actions were no longer serializable.

Something had changed, and it was breaking a few unrelated areas of our product. All the breakages shared the same symptom — a lambda being serialized as part of the view model. Without finding out exactly why it was breaking, changing to a method group in the class fixed the issue. This meant that on MSBuild 15, the compiler was placing the lambda that didn’t require a closure inside a DisplayClass, while before it wasn’t.

Through some research we found that a micro-optimisation in the Roslyn compiler changed how delegates are invoked causing all lambdas to be part of DisplayClasses regardless of whether they relied on the state in a closure, as this was more performant. This answer from Jon Skeet explains exactly what we were seeing:

“IIRC, it was found that invoking a delegate which was “backed” by an instance method was faster than invoking a delegate backed by a static method — which is the motivation behind the change.”

We knew we couldn’t do what we were before. We wanted to be able to use the latest features available in C#, so we had to fix the places that were wrong. Doing some regex searches on the code to find any class marked Serializable that contained Action, Action<>, or Func<> properties found hundreds of classes. Narrowing down all places that would assign lambdas to them and then converting them to method group assignments got rid of the runtime exceptions.

Legacy codebases can have a lot of skeletons in the closet. No one on the team would intrinsically know that a problem like this would occur from a version bump on MSBuild (and the related tooling). And if we had not caught it prior to release we would have had severe regressions in multiple places.

All levels of testing are important, and in this case a UI element callback being serialized into a 3rd party dependency was the failure. Naturally, in a codebase like ours, attaining 100% test coverage after the fact is prohibitive. So we’re focusing on fast, cheap rollbacks when unknown problems occur, as that is more valuable at this point in time.

Spektrix is recruiting! Join our growing teams in London and Manchester and help build the SaaS platform pushing the Arts forward!

Check out our open positions!