Rethinking Postgres in a Post-Server World


Serverless architectures have brought engineering teams a great number of benefits. We get simpler deployments, automatic and infinite scale, better concurrency, and a stateless API surface. It’s hard to imagine going back to the world of managed services, broken local environments, and SSHing into servers. When I started doing web development, moving from servers in a closet to rackspace was a revolution.

It’s not just hosting and how we deploy applications that have changed under this new paradigm. These advantages of serverless have presented challenges to the traditional MVC architecture that has been so ubiquitous. Thinking back to those early days, frameworks like Zend, Laravel, Django, and Rails were incredible productivity boosters. They not only influenced the way we built applications, but also the way we think about solving problems with the web. These were your “majestic monoliths” and they solidified the MVC pattern as the defacto standard for most of the web applications we use today.

In many ways, the rise of microservices and with it the idea of hexagonal architectures (aka ports and adaptors) led naturally to this new serverless world. It started as creating and hosting standalone APIs organized by a shared context that were still backed by the classic frameworks we already knew and loved.

The popularity of NodeJS led to the express framework where we now had a less rigid microframework enabling us to more flexibly organize our code. The final metamorphosis of this pattern is individual pieces of business logic that can be executed on demand in the cloud. We no longer manage machines or even multiple API servers. Each piece of business logic only exists when it is needed, for as long as it’s needed and no longer. They are lightweight, and you only pay for the individual parts of your application that get used.

Today, we hardly realize there is a server at all–even the terminology, serverless, is a misnomer designed to highlight just how far we’ve come from the days of setting up XAMP or VirtualBox and Vagrant. The benefits are clear, the hours saved, the headaches avoided, and the freedom to just solve business problems with code bring building software closer than ever to the simple act of writing prose.

The Problem

The classic MVC frameworks codified not only the pattern of working in three distinct tiers (data, application, and presentation) but also the technology for each. You were able to choose some options at each layer, for instance Postgres or MySQL as the data layer, but the general idea is these decisions are made for you. You implicitly adopt the idea of convention over configuration.

Postgres as a data layer solution makes a lot of sense. It’s robust, fast, supports ACID transactions, and has over thirty years of development behind it. It is also open-source, can be hosted almost anywhere, and is likely to be around for another thirty years. You could do much worse than stake your company’s future on Postgres. Add to that all the work put into integrating it into these equally battle-tested frameworks and the story for choosing Postgres becomes very strong.

However, when we enter a serverless context, this type of architecture presents a number of challenges particularly when it comes to handling our data.

Common issues include:

  1. Maintaining stateful connections: when each user is a new connection to Postgres this can max out the number of connections Postgres can handle quickly.
  2. Provisioned scale: with Postgres we must be sure to provision the right size database for our application ahead of time, which is not ideal when our application layer can automatically scale to any workload.
  3. Traditional security model: this model does not allow for any client-side use and is vulnerable to SQL injection attacks.
  4. Data centralization: while our application may be deployed globally, this is of little use when our database is stuck in a single location potentially thousands of miles from where the data needs to be.
  5. High operational overhead: serverless promises to free us from complexity and remove barriers to solving business problems. With Postgres we return to needing to manage a service ourselves, dealing with sharding, scaling, distribution, and backups.

Traditional systems like Postgres were never designed for this purpose. To start, Postgres operates on the assumption of a stateful connection. What this means is that Postgres will hold open a connection with a server in order to optimize the response time. In a traditional monolithic application, if your server had to open a new connection every single time it requested data this would be quite inefficient. The actual network request would in many times be the primary bottleneck. By keeping this connection cached Postgres removes this bottleneck. As you scale your application you will likely have multiple machines running, and a single Postgres database can handle many such connections, but this number isn’t infinite. In fact, in many cases, you have to set this number at the time of provisioning the database.

In a serverless context, each request is effectively a new machine and a new connection to the database. As Postgres attempts to hold open these connections we can quickly run up against our connection limit and the memory limits of the machine. This also introduces another issue with the traditional Postgres use case, which is provisioned resources. 

With Postgres we have to decide the size of the database, the capacity of the machine it runs on, where that machine is located, and the connection limit at the time of creation. This puts us in a situation where our application can scale automatically but we must watch our database closely and scale it ourselves. This can be even trickier when we are dealing with spikes in traffic that are not consistent in both time and location. Ultimately by moving to serverless we have reduced the operational overhead of our application layer, but created some increased operational overhead in our database. Would it not be better if both our application and our data layer could scale together without us having to manage it?

The complexity required to make traditional systems like Postgres work in a serverless environment can often be enough to abandon the architecture all together. Serverless requires on-demand, stateless execution of business logic. This allows us to create lighter, more scalable programs but does not allow us to preserve things like network connections and can be slowed down by additional dependencies like ORMs and middleware. 

The Ideal Solution

It’s time we begin thinking about a new type of database, one more in line with the spirit of serverless and one that embraces iterative development and more unified tooling. We want this database to have the same automatic, on-demand scale as the rest of our application as well as handle global distribution that are hallmarks of the serverless promise. This ideal solution should be:

  1. Support for stateless connections with no limits.
  2. Auto-scaling both for the size of the machine and in the size of the database itself.
  3. Be securely accessible from both the client and the server to support both serverless APIs as well as Jamstack use cases.
  4. Globally distributed so data is closest to where it is needed always.
  5. Free of operational overhead so we don’t add complexity managing things like sharding, distribution, and backups.

If we are truly to embrace the serverless architecture, we need to ensure that our database scales along with the rest of the application. In this case, we have a variety of solutions some of which involve sticking with Postgres. Amazon Aurora is one example of a Postgres cloud solution that gives us automatic scalability and backups, and gives us some global distribution of data. However, Amazon Aurora is hardly easy to set up and doesn’t free us from all operational overhead. We also are not able to securely access our data from the client without building an accompanying API as it still follows the traditional Postgres security model.

Another option here are services like Hasura, that allow us to leverage Postgres but access our data by way of a GraphQL API. This solves our security issues when accessing data from the client and gets us much closer to the ease of use we have with many of our other serverless services. However, we are left to manage our database ourselves and this merely adds another layer on top of our database to manage the security. While the Hasura application layer is distributed, our database is not so we don’t get true global distribution with this system.

I think at this point we should turn toward some additional solutions that really hit all the points above. When looking at solutions outside of Postgres we have to add two additional requirements that put the solutions on par with the power of Postgres:

  1. Support for robust, distributed, ACID transactions.
  2. Support for relational modeling such that we can easily perform join operations on normalized data.

When we typically step outside of relational database systems into the world of schemaless solutions, ACID transactions and relational, normalized data are often things we sacrifice. So we want to make sure that when we optimize for serverless we are not losing the features that have made Postgres such a strong contender for so long.

Azure’s CosmosDB supports a variety of databases (both SQL and NoSQL) under the hood. CosmosDB also provides us with libraries that can work on both the client and server freeing us from an additional dependency like Hasura. We get some global distribution as well as automatic scale. However, we are still left with a lot of choices to make and are not free entirely from database management. We still have to manage our database size effectively and choose from many database options that all have their pros and cons.

What we really want is a fully managed solution where the operational overhead of choosing database size and the type of database can be abstracted away. In a general sense having to research many types of databases and estimate scale would be things that matter a lot less if we have all of the features we need. Fauna is a solution where we don’t have to worry about the size of the database nor do we have to select the type of database under the hood. We get the support of ACID transactions, global distribution, and no data loss without having to figure out the best underlying technology to achieve that. We also can freely access our database on the client or the server with full support for serverless functions. This allows us to flexibly create different types of applications in a variety of architectures such as JAMstack clients, serverless APIs, traditional long-running backends, or combinations of these styles.

When it comes to schemaless databases, we gain flexibility but are forced to think differently about our data model to most efficiently query our data. When moving away from Postgres this is often a subtle but large point of friction. With Fauna, we have to move into a schemaless design as you cannot opt into another database type. However, Fauna makes use of a unique document-relational database model. This allows us to utilize relational database knowledge and principles when modeling our data into collections and indexes. This is why I think it’s worth considering for people used to Postgres as the mental overhead is not the same as with other NoSql options.

Conclusion

Systems like Postgres have been powerful allies for building applications for over thirty years. The rise of agile and iterative development cycles led us to the serverless revolution before us today. We are able to build increasingly more complex applications with less operational overhead than ever before. That power requires us to think about databases differently and demand a similar level of flexibility and ease of management. We want to preserve the best qualities of a Postgres, like ACID transactions, but ditch the more unsavory aspects of working with the database like connection pooling, provisioning resources, security, distribution and managing scale, availability and reliability.

Solutions such as Amazon’s Aurora Serverless v2 create a serverless solution that works in this new world. There are also solutions like Hasura that sit on top of this to further fulfill the promise of serverless. We also have solutions like Cosmos DB and Fauna that are not based in Postgres but are built for serverless while supporting important Postgres functionality.

While Cosmos DB gives us a lot of flexibility in terms of what database we use, it still leaves us with many decisions and is not completely free of operational overhead. Fauna has made it so you don’t have to compromise on ACID transactions, relational modeling or normalized data — while still alleviating all the operational overhead of database management. Fauna is a complete rethinking of a database that is truly free of operational overhead. By combining the best of the past with the needs of the future Fauna has built a solution that behaves more like a data API and feels natural in a serverless future.

Follow Michael Rispoli on Twitter