This is a conversational talk about a deeply problematic trend I’ve seen with Entity Framework utilization across organizations small and large with teams of juniors to teams of architects. This isn’t a how to. This also isn’t for juniors. If something sparks a thought or you’re curious about something I mention, Google is your friend. This is also my first blog entry. Critique is welcome.
First, let’s simply review the feature rollout of EF over time. This isn’t exhaustive by any means and certainly doesn’t list things out by updates to a major version. It simply serves as a reminder of the story so far with EF.
EF / EF 3.5
· DB First
· Lazy Loading
· NuGet Installation
7 / Core
· Code First Only
· In-Memory Support
· Limited Batching
· Nonrelational Support
Seeing that kind of fractured rollout and Microsoft’s general reputation in the development space, there’s little surprise Entity Framework has gotten the bad wrap it has not that it excuses the issues I’m going to cover. Features pop into existence seemingly at random depending which minor version you’re working on. So, you get used to something, go to another environment, make claims, try it out on their existing framework even though the same major version is installed, it doesn’t work, and you get that “told you so” look which only serves to further deepen already entrenched positions.
Basically, the typical story with EF goes something like this:
Senior Person: “Let’s use EF and the repository pattern!”
Other Devs: “Idk, haven’t heard good things about it.”
Senior: “No, it’s great! See this example?”
Devs: “Hmm, OK.”
At first, it works somewhere between acceptable and great. However, as it grows, the slowness sets in and people grumble. Because of the utterly poor state of paying technical debt in our industry or due to wholly refusing the see the technical debt in the repository pattern to begin with, whole departments of supposedly smart people fold their arms and simply come to conclude that EF is garbage, not that their usage of it is garbage. I’m here to make the case it’s more the latter and to show you how to avoid that trap.
Early on in my career, I started using classic ASP and SQL Server directly via ADO. I worked in an extremely small web department so I often had to dive into the database myself to create tables and do tasks. I got quickly familiar with all the nobs of SQL Server in a flurry of copy/paste deployments, testing in production, and so on. “What about this? Nope, that product page still doesn’t load. What about that?! Nope, still doesn’t load. C’mon… this?? Success!” And through almost literally stumbling around in the dark I got quite intimate with indexes, views, replication, security permissions and so on even barely out of high school.
Enter my first few places with more structured environments and my first blush with Entity Framework. It was completely devoid of any of the options I was used to. So, I jumped aboard but it didn’t take long for the grumbles to set in. In case you have the memory of a goldfish, let me reiterate my being used to being able to tinker at will with all the levers. When there were problems, I would investigate. Often, I found critical components wholly ignored such as index utilization. When I would bring these problems up, I got told EF was at fault for not knowing how to utilize them and that we were here to work on business problems and not do Microsoft’s job for them. I was still largely a junior dev, if I’m honest. Who was I to disagree?
Enter the repository pattern. The problem with the repository pattern is two-fold. First, it requires you to declare up front how your application will be bound to interact with the database. Even if you build these super complex methods that let you pass in expressions, dictionaries, or shudder dynamics, and you get inventive, all you’ve accomplished is creating a maintenance nightmare.
“But the callers can define what they need!” No, they can’t. Sure, they get to point at an entity and generally define the shape of the data to be selected, but they can’t determine things like field selection. They have zero way to say they need to load the data up front in a friendly way or delay it. They can’t say in one instantiation that they also need data from here or there but in the next only go after the targeted entity unless the blessed repository lets them do so. Instead, you get these all or nothing decisions that you chain your applications with and we wonder why it quickly degrades. I sure hope your crystal ball is better than mine.
Second, even Microsoft’s own examples don’t use the proper interfaces that some intern probably coded anyway. Therefore, I say -everyone- is doing EF wrong. The common wisdom with EF is to use the repository pattern and since the repository pattern’s own documentation and examples aren’t correct, then nobody is letting EF do what it was designed to do as the source of knowledge is poisoned. In the face of this, I’ve heard lots of complaints from lots of people about the examples of MVC tutorials that butt right up to and utilize the DbContext directly complaining that it’s not SOLID, not that hardly anyone does SOLID either but that’s another blog post. (Most software jumps straight to ID and ignore the rest.)
SQL Server, because it’s the most common backed data store used with EF, is not a clean piece of software. It’s messy. It has a TON of features for a TON of scenarios. If you want to let your applications actually use even a fraction of what you’re paying those huge licensing fees for, stop constraining SQL to an EF driven hellish wasteland of something little better than SELECT *. Then, we like to complain when things get slow.
If you don’t let EF utilize features for the correct scenario, you can’t possibly realize the potential of your platform. There must be billions of dollars wasted in licensing and development costs that only ever see single digit utilization out of SQL Server in terms of distinct features used even as applications grow in wildly different paths. This is a gut feeling, but seeing the stupidly naïve repository implementations I’ve seen out of companies small and large, I find it hard to see me being grossly incorrect here. This is a disservice to ourselves, to our employers, and to each other.
Entity Framework is still locked, step by step, to the way the underlying data store works. In SQL Server, this means join performance, view and index utilization, stored procedure calls, and so on. This like calling a latex glove on a hand an abstraction for a hand. It’s not and neither is EF an abstraction for the storage mechanism it relies upon. It is instead a set of common APIs that let us access data in a uniform way. This is not an abstraction for the very reasons I just stated in that we can not deny or mitigate the behavior of the underlying implementation in any way. Therefore we must account for those behaviors in our code breaking the abstraction either explicitly or implicitly. The only thing we can do if we want to pretend it is an abstraction is to bury our heads in the sand and simply continue to groan when things get clumsy.
Most recently, I had -architects- almost in awe at the suggestion to let the database define views and to point EF at the views instead of tables, you know, letting DBAs actually do their job and to give the database the ability to change without breaking application code. This isn’t hard stuff, but the problem is endemic so most can’t see past their noses in environments they’re too familiar with. So, what do we do about it?
The first step to using Entity Framework correctly is to break the love affair with IEnumerable. It’s simply bad when talking about disconnected stores. The only thing IEnumerable gives us is delayed execution. If that’s the only feature you want out of your ORM, then you don’t need an ORM. The thing that makes IEnumerable insidious to working with data stores is that they are pinned once and for all time in their representation. Even as an application grows, even as repositories get new methods added to them, the old implementations returning IEnumerable are blind, deaf, and dumb to the new world they live in. You are literally forcing your code to work with your data layout and expectations as it was when it was first implemented years ago. This is a developer’s fault yet the blame gets levied upon EF.
IQueryable, however, can morph and change to its given context. Even when passed around and clauses added to it, it can evaluate instance for instance the needs of the individual call. The DbContext can still retrieve entities from the cache if it has already fetched the data before giving very fast speeds to repeated calls making hot paths a bit cooler. Even more, it exposes features such as letting us stream data if the underlying provider supports it, loading data without needing instantiate List objects to be more heap friendly, inspecting the underlying type so we can make smart decisions in complex workflows, accessing the underlying context, and so on.
These are all features that let your code actually understand what’s going on without breaking the abstraction barrier since EF is not an abstraction. The abstraction, by the way, should be the component that EF is being used in, not EF itself. I scratch my head at the many discussions we programmers have where some need is expressed but we balk at many solutions out of hand in the name of “abstraction” and the ensuing hoops we gleefully contort ourselves in just so we can continue the delusion of being SOLID.
Perhaps the single biggest complaint I’ve heard about EF is how much damn data it retrieves. Who defined the entities? EF? No! You did. You can’t much be blamed, per se, as an entity per table approach seems to be all anyone can see. Still, we don’t need to be hindered by entities, no matter how large. Passing an anonymous type to an EF query will cause EF to only select the fields you defined. That monster table that is dozens of columns and “can’t be refactored” can be chopped down to the 3 or 4 fields you actually need. The fascination with selecting whole entities at a time and pretending there’s nothing to be done about it can only be described as a form of mass hysteria where we plug our ears and shout “I can’t see you!”
You know all those Microsoft Press books with the various tools on the cover? There’s a reason to that, you know, beyond some person just picking a random image. Most of the tools aren’t just a screwdriver or planer. There are some truly odd ones that don’t have obvious applications, but, assuredly, they have their purpose and they excel at it. The mantra of “right tool” is often repeated but we don’t really pause to really think about the job let alone the tool for it. Here are a few for EF.
Another large complaint that’s a close runner up to the amount of data EF retrieves how it supposedly can’t handle large amounts of data. I love developers’ dualities. I would have you know, combining AsStreaming as talked about below, reactive extensions, and SqlBulkCopy, I can retrieve, transform, and push millions of records a minute without breaking a sweat creating a perfectly good ETL solution that is completely code based for any workload from small up to the crest of moderately large, say 5–10 billion records, and still have good performance. If you need more, there are more specialized tools. However, don’t say Entity Framework can’t handle large amounts of data. Your code can’t handle large amounts of data. EF is fine. The sad part is we’ve had SqlBulkCopy since 2005 yet we pretend there is this large hole in our toolbox. The problem is already solved. There is zero reason to reinvent the wheel. Guess what? It supports streaming too!
I feel like I’m a broken record. Yet another large complaint about EF is its caching of data. You could almost always tell the DbContext to get rid of cached entities. Recently, though, we gained the ability to set that as the default policy in Entity Framework Core. Instead, we can be selective about what we want to track rather than what we don’t. There is one annoyance that I will gladly acknowledge is that you still need to detach entities.
Queries in Entity Framework normally buffer all the results before returning. Streaming gets around that and immediately lets you start processing data as it enters your application. You can both start work more quickly and be more memory friendly to your servers.
There is a disturbing trend I’ve seen among developers. There is a lack of desire to explore and invent. We want out of the box solutions that “just work” while being ignorant of the details. We still believe in the magic of the unseen even though code is not magic.
The general approach I would take is instead of writing these super pinned down repositories is to build extensions that let our applications behave in the unique ways we need them to. Want the benefits of cached data in a moderately long running process but not have it persist outside of that given operation? Sounds like a perfect extension method to DbContext to me that takes some entities, processes them gaining the benefits of caching, and then clears the cache before returning. Another extension method would be one that detaches all those entities after an operation is complete.
I’m talking about the DbContext here because that’s how many people treat it. It’s seen as the big, bulky, unwieldy thing that will steal your kids if you’re not careful. We go through extraordinary lengths to keep the DbContext’s existence known to only a select few components. This strangles our implementations to repositories even further. Since we must go through the repository to get any kind of data, we need to violate the Open/Closed principle on a regular basis as change occurs or be forced to accept the tradeoff of the bloat of decisions the repository dictates and be extra careful when we call to it.
Let the DbContext out. If a module needs data, don’t delude yourself that the DbContext isn’t a dependency already. I can promise you if you get comfortable with it being accessible and take away the mysticism of “what if people make mistakes?!” gasp! it will actually make us better as a whole. If someone can commit naughty code and it makes it to production untested at least once, you truly have no release control or quality checks. Policies like hiding the DbContext are stopgaps on an already bleeding wound in your organization that does nothing to actually mitigate the real problem.
We programmers need to stop acting like solutions to the problems we face are ever evasive or somehow must be divined using the right mix of node.js and dapper, not that they don’t have their legitimate uses, but they are often scapegoated at the expense of Entity Framework when it is a fine tool for what it does. The tools we’ve had for a decade now have been sufficient for most of our needs. It is the culmination of bad decision after bad decision that has led us to the straight-jacket we’re in. Get comfortable with your tools. Try new things. One thing is for sure, we only have ourselves to blame.