To ORM or not to ORM


I've been enjoying using Go's database/sql package for working with databases. Recently, some mentions of gorm piqued my curiosity about using ORMs in Go vs. using database/sql directly. Having had some mixed experiences with ORMs in the past, I decided to start with a practical experiment by writing the same simple application with and without gorm, and comparing the results in terms of effort spent.

This led me to write down some general thoughts on the benefits and drawbacks of ORMs. If that kind of thing interests you, read on!

My experiment involves defining a simple database that could be a subset of a blogging engine, as well as write some Go code that populates and queries this database and compare how it looks using plain SQL vs. using an ORM.

This is the database schema:

DB schema for the experiment - showing Post, Tag, Comment tables

While simple, this schema demonstrates an idiomatic normalized database that most likely contains all the elements one needs to build simple wiki or blog apps - it has both one-to-many relationships (between posts and comments) and many-to-many relationships (between posts and tags). If you prefer to read DB schemas as SQL, here's the definition taken from the code sample:

create table Post ( postID integer primary key, published date, title text, content text
); create table Comment ( commentID integer primary key, postID integer, author text, published date, content text, -- One-to-many relationship between Post and Comment; each Comment -- references a Post it's logically attached to. foreign key(postID) references Post(postID)
); create table Tag ( tagID integer primary key, name text unique
); -- Linking table for the many-to-many relationship between Tag and Post
create table PostTag ( postID integer, tagID integer, foreign key(postID) references Post(postID), foreign key(tagID) references Tag(tagID)
);

This SQL was tested with SQLIte; other RDBMSs may need minor adjustments. When using gorm, there is no need to write this SQL. Instead, we define "objects" (really structs) with some magic field tags for gorm:

type Post struct { gorm.Model Published time.Time Title string Content string Comments []Comment `gorm:"foreignkey:PostID"` Tags []*Tag `gorm:"many2many:post_tags;"`
} type Tag struct { gorm.Model Name string Posts []*Post `gorm:"many2many:post_tags;"`
} type Comment struct { gorm.Model Author string Published time.Time Content string PostID int64
}

The code working with this database comes in two variants:

  1. No-ORM; using plain SQL queries through the database/sql package.
  2. ORM; using the gorm library for database access.

The sample is doing several things:

  1. Add some data (posts, comments, tags) to the DB.
  2. Query all posts in a given tag.
  3. Query all post details (all comments attached to it, all tags it's marked with).

Just as an example, here are the two variants for task (2) - finding all posts in a given tag (this could be to populate some sort of archives listing page on the blog). First, no-ORM:

func dbAllPostsInTag(db *sql.DB, tagID int64) ([]post, error) { rows, err := db.Query(`
 select Post.postID, Post.published, Post.title, Post.content
 from Post
 inner join PostTag on Post.postID = PostTag.postID
 where PostTag.tagID = ?`, tagID) if err != nil { return nil, err } var posts []post for rows.Next() { var p post err = rows.Scan(&p.Id, &p.Published, &p.Title, &p.Content) if err != nil { return nil, err } posts = append(posts, p) } return posts, nil
}

This is fairly straightforward if you know SQL. We have to perform an inner join between Post and PostTag and filter it by the tag ID. The rest of the code is just iterating over the results.

Next, the ORM:

func allPostsInTag(db *gorm.DB, t *Tag) ([]Post, error) { var posts []Post r := db.Model(t).Related(&posts, "Posts") if r.Error != nil { return nil, r.Error } return posts, nil
}

In the ORM code, we tend to use objects directly (Tag here) rather than their IDs, for the same effect. The SQL query generated by gorm here will be pretty much the same as the one I wrote manually in the no-ORM variant.

Apart from generating the SQL for us, gorm also provides an easier way to populate a slice of results. In the code using database/sql we explicitly loop over the results, scanning each row separately into individual struct fields. gorm's Related method (and other similar querying methods) will populate structs automatically and will also scan the whole result set in one go.

Feel free to play with the code! I was pleasantly surprised at the amount of code gorm saves here (about 50% savings for the DB-intensive part of the code), and for these simple queries using gorm wasn't hard - the invocations are taken from API docs in a straightforward manner. The only complaint I have about my specific example is that setting up the many-to-many relationship between Post and Tag was a bit finicky, and the gorm struct field tags look ugly and magical.

The problem with simple experiments like that above is that it's often difficult to tickle the system's boundaries. It obviously works well for simple cases, but I was interested to find out what happens when it's pushed to the limit - how does it handle complicated queries and DB schemas? So I turned to browsing Stack Overflow. There are many gorm-related questions, and sure enough, the usual layered complexity problem is immediately apparent (example 1, example 2). Let me explain what I mean by that.

Any situation where complex functionality is wrapped in another layer runs the risk of increasing the overall complexity when the wrapping layer is itself complicated. This often comes along with leaky abstractions - wherin the wrapping layer can't do a perfect job wrapping the underlying functionality, and forces programmers to fight with both layers simultaneously.

Unfortunately, gorm is very susceptible to this problem. Stack Overflow has an endless supply of problems where users end up fighting complexities imposed by gorm itself, working around its limitations, and so on. Few things are as aggravating as knowing exactly what you want (i.e. which SQL query you want it to issue) but not being able to concoct the right sequence of gorm calls to end up with that query.

One key advantage of using an ORM is apparent from my experiment: it saves quite a bit of tedious coding. About 50% savings in DB-centered code is nontrivial and can make a real difference for some applications.

Another advantage that wasn't obvious here is abstraction from different database backends. This may be less of an issue in Go, however, since database/sql already provides a great portable layer. In languages that lack a standardized SQL access layer, this advantage is much stronger.

As for the disadvantages:

  1. Another layer to learn, with all the idiosyncracies, special syntax, magical tags, and so on. This is mainly a disadvantage if you're already experienced with SQL itself.
  2. Even if you're not experienced with SQL, there is a vast bank of knowledge out there and many folks who can help with answers. Any single ORM is much more obscure knowledge not shared by many, and you will spend considerable amounts of time figuring out how to force-feed it things.
  3. Debugging query performance is challenging, because we're abstracted one level further from "the metal". Sometimes quite a bit of tweaking is required to get the ORM to generate the right queries for you, and this is frustrating when you already know which queries you need.

Finally, a disadvantage that only becomes apparent in the long term: while SQL stays pretty much constant over the years, ORMs are language-specific and also tend to appear and disappear all the time. Every popular language has a large variety of ORMs to choose from; as you move from one team/company/project to another, you may be expected to switch, and that's additional mental burden. Or you may switch languages altogether. SQL is a much more stable layer that stays with you across teams/languages/projects.

Having implemented a simple application skeleton using raw SQL and compared it to an implementation using gorm, I can see the appeal of ORMs in reducing boilerplate. I can also remember myself from many years ago being a DB newbie and using Django with its ORM to implement an application - it was nice! I didn't have to think about SQL or the underlying DB much, it just worked. But that use case was really simple.

With my "experienced and salty" hat on, I can also see many disadvantages in using an ORM. Specifically, I don't think an ORM is useful for me in a language like Go which already has a good SQL interface that's mostly portable across DB backends. I'd much rather spend an extra bit of time typing, but this will save me time reading ORM's documentation, optimizing my queries, and most importantly debugging.

I could see an ORM still being useful in Go if your job is to write large numbers of simple CRUD-like applications, where the savings in typing overcome the disadvantages. In the end, it all boils down to the central thesis of the benefits of extra dependencies as a function of effort: where there is significant effort to spend on a project outside the DB-interfacing code - which should be the case for programs that aren't simple CRUDs - the ORM dependency is not worth it, in my opinion.