GraphQL concepts I wish someone explained to me a year ago

By Naresh Bhatia

Image by Rostyslav

In a recent blog article, npm co-founder Laurie Voss commented:

GraphQL, tracked by its most popular client library Apollo, continues to explode in popularity. We think it’s going to be a technical force to reckon with in 2019 … You’ll need to learn GraphQL.

But learning is hard because most GraphQL tutorials use simplistic examples to teach its capabilities. When it comes to building real applications, we run into complexities that these tutorials do not address. In this series, we’ll look at GraphQL concepts using a realistic book domain that has 3 entities and 2 relationships. It’s usually the relationships that cause us heartburn, so we will tackle them head on.

Let’s look at our book domain that has 3 entities and 2 relationships:

Book domain model
  • Every book must have a publisher and a publisher may publish multiple books (one-to-many relationship).
  • A book may have one or more authors and an author may write many books (many-to-many relationship).

Below is the front-end of our final application. It’s intentionally designed to show all the entities and relationships on a single page. This will allow us to demonstrate several GraphQL features — for example, how Apollo Cache normalizes entities to keep views consistent across the application.

GraphQL Bookstore user interface

GraphQL is a modern approach to access data from one or more data sources. Open-sourced by Facebook in 2015, it’s quickly gaining popularity over REST because of the ease with which you can access complex object graphs — hence the name Graph Query Language. However, don’t let the name fool you. In addition to querying, you can also change the data (a.k.a. mutation) and subscribe to data changes (a.k.a. subscriptions).

Real world data is complex — it contains a large number of objects connected via complex relationships. Just take a look at our simple bookstore example:

Bookstore object graph

Given this data, what kind of questions can we ask? Here’s a sample:

  • What books have been published by Addison Wesley?
  • What books has Martin Fowler written and who has published them?
  • Who wrote Clean Code and what other books have they written?

Note how we need to traverse the object graph in order to answer these questions. For the first question we need to start from a publisher and hop over to books. For the second question we need to start from an author, then hop over to books and finally to publishers. For the third question, we need to start with a book, then hop over to authors and finally to books again.

In the traditional REST world, each entity is associated with a separate endpoint (a.k.a. resource). We would have to hit multiple endpoints to satisfy our questions. Of course, we can cheat and include some related objects in our responses, but this approach will only take us so far. There will be future use cases that will need more and we will have to change our interfaces to accommodate for them. But what about a use case that needs less data? It either gets more data from our existing API or we create a new (nearly identical) API that returns less data. What a mess!

GraphQL changes all that — it hands over the control to the client. The server is only responsible for publishing the shape of the object graph (a.k.a. the schema) and then letting the client query it any way it wants. The resulting graph is returned to the client in one hit! For example, the third question above is satisfied by the following GraphQL query:

{
book(id: "clean-code") {
name
authors {
name
books {
name
}
}
}
}

This query literally says:

Give me the book with id="clean-code"
Give me the book's name
Give me the book's authors
Give me the author's name
Give me all the books the author has written
Give me the book's name

And the result of the query is self-explanatory:

{
"data": {
"book": {
"name": "Clean Code",
"authors": [
{
"name": "Robert C. Martin",
"books": [
{
"name": "Clean Code"
},
{
"name": "Agile Software Development"
}
]
}
]
}
}
}

Note what is returned is a subset of the full object graph, but it is not returned as a graph — JSON can’t represent cyclic graphs. The server denormalizes the graph into a tree (with possibly duplicate nodes) and sends it over to the client as JSON. (Later we will see how Apollo client normalizes this tree back into a graph, removing duplicates and maintaining a single source of truth.)

The query response is a tree representing a subset of the object graph

In the book query above, you may have noticed that we are asking for each entity field explicitly. This syntax is required and allows the client to ask for exactly what it needs and get only that — nothing more, nothing less. Imagine that the Book had 100 fields but our list view needed only 2: name and publication date. Then we could ask for just these two fields:

{
books {
name
publishDate
}
}

This approach improves the performance significantly, especially on mobile devices, where bandwidth and processing power are at a premium.

Side note
There is a side effect of this design choice — you cannot treat received slices of your data as domain objects. If you have fetched a book from the server and saved it in a store, you cannot assume that all of its fields are available. All you have are simple groupings of primitive data types with no built-in behavior (see the Anemic Domain Model anti-pattern).
Apollo’s in-memory cache does an amazing job of keeping the fields together in one place, but you still need to be aware that these are not true objects. Behavior must be introduced from the outside. I found this article by Marc-André Giroux to be very helpful in designing operations that feel smart rather than anemic.
Moreover, don’t let all of this be an excuse to not understand your business domain. Your application should always be driven by a domain model and your domain objects (whether on the client or on the server) should always encapsulate behavior. You can read more about domain modeling in my Domain-Driven Design article.

A major problem during application development is the conceptual disconnect between the client and the server. These two ends are generally developed by different teams at different speeds. In spite of careful documentation, it is common to get client and server data structures out of sync. This results in painful integration and bugs that go undetected even into production.

GraphQL helps this situation by offering an API-driven approach. Defining the API first establishes a common understanding between the client and the server. While this can be done with REST also, what is unique about GraphQL is its strong type system — expressed using a simple syntax called the GraphQL Schema Definition Language (SDL). Both the client and the server can validate their messages to conform to the schema, preventing a major category of bugs. Moreover we can generate parts of the client and the server from the schema, making it even more convenient to conform to the API.

As mentioned earlier, GraphQL provides access to data from one or more data sources. The client doesn’t care where the data physically resides. All it knows about is the logical model of the data (the schema). It is up to the server to fetch the data from one or more sources and stitch it together to conform to the schema. The diagram below shows some potential data sources such as files, databases, REST APIs, and content management systems.

GraphQL provides a unified interface to access data from different sources

There’s a whole range of applications that can benefit from displaying the latest data in real-time. For example, a trading application needs to show stock prices in real-time. Games and other interactive applications need to show events as soon as they happen.

GraphQL subscriptions are a way to push data from the server to the clients in real-time. As defined by the GraphQL specification: “subscription is a long‐lived request that fetches data in response to source events”. In fact, subscriptions are very similar to queries in that they specify a set of fields to be returned. However, instead of returning them immediately, the server sends them to the requesting client every time a specified event happens. For example, the subscription below specifies bookCreated as the source event. Whenever that event happens, the server pushes the book’s name to the client.

subscription {
bookCreated {
book {
name
}
}
}

Now that we understand the what and the why of GraphQL, let’s focus on the how. We’ll build the Bookstore application step-by-step. In this part, we’ll start by writing the Bookstore schema — remember our API-driven approach? In subsequent parts we will add features such as querying, mutations and subscriptions to create a full-fledged application.

Let’s start with a simple schema — enough to query the bookstore data. Just follow along right now to understand the concepts.

As a reminder, we’ll use the Book domain model (repeated below) to guide our schema.

Book domain model

Let’s start by defining the Author type:

type Author {
id: ID!
name: String!
books: [Book!]!
}

As mentioned earlier, GraphQL schemas are strongly typed. Here we see that the Author object has three fields:

  • id: represents a unique identifier. Its type is ID, which is serialized in the same way as a String; however, it is not intended to be human‐readable. The exclamation mark at the end denotes that this field must be Non‐Null. (By default, all types in GraphQL are nullable.)
  • name: type String!, again Non-Null.
  • books: type [Book!]! — now that needs some explanation. Basically it says that books is an array of object type Book. The outer exclamation mark says that the array itself cannot be Null — at a minimum, it should be an empty array. The inner exclamation mark says that each entry in the array must be a Book — it cannot be Null. Note that this is a very “conceptual” definition of the books field — it says nothing about the underlying implementation, whether it is a foreign key in a relational table or an object reference in an object database — none of that! All it says is that given an author, we can get their authored books.

So there you have it — our first GraphQL type declaration. You can read more about the GraphQL type system in the official spec. Don’t worry, it’s very readable!

Now that we have defined what an author is, let’s define a couple of queries — one to get all authors and another to get a specific author. Here you go:

type Query {
authors: [Author!]!
author(id: ID!): Author!
}

To understand what this means, we must first understand two key concepts:

  • GraphQL supports three types of operations — queries, mutations and subscriptions. What we have above is two query operations named authors and author. The first returns an array of authors and the second returns a single author.
  • A GraphQL schema defines a root operation type for each kind of operation. The Query wrapper around the two query operations above is the root operation type for queries. The other root operation types are type Mutation and type Subscription. Thus a typical GraphQL schema looks like this:
type Query {
... query operations ...
}
type Mutation {
... mutation operations ...
}
type Subscription {
... subscription operations ...
}
... types referenced by the above root operations ...

Now that we understand all this, let’s write out the complete schema for the bookstore (just for queries of course):

type Query {
authors: [Author!]!
author(id: ID!): Author!
publishers: [Publisher!]!
publisher(id: ID!): Publisher!
books: [Book!]!
book(id: ID!): Book!
}
type Author {
id: ID!
name: String!
books: [Book!]!
}

type Publisher {
id: ID!
name: String!
books: [Book!]!
}

type Book {
id: ID!
name: String!
publisher: Publisher!
authors: [Author!]!
}

That’s it! This is a complete and valid schema that defines 6 query operations on our bookstore.

However, there is a catch — As our application grows, we will need to add more types and operations to the schema. At some point, this schema will become unwieldy to manage in a single file. We need to modularize it, just like we modularize code into modules and our UI into components. The easiest way to do this is to split the schema into smaller manageable chunks. We will break up our schema on entity boundaries. Let’s create three files, one for each entity. By convention, we use .graphql as the extension for these files.

# ----- author.graphql -----
type Author {
id: ID!
name: String!
books: [Book!]!
}

type Query {
authors: [Author!]!
author(id: ID!): Author!
}
# ----- publisher.graphql -----
type Publisher {
id: ID!
name: String!
books: [Book!]!
}

type Query {
publishers: [Publisher!]!
publisher(id: ID!): Publisher!
}
# ----- book.graphql -----
type Book {
id: ID!
name: String!
publisher: Publisher!
authors: [Author!]!
}

type Query {
books: [Book!]!
book(id: ID!): Book!
}

Now that’s more scalable! In the next part, we’ll merge these schemas into one using a library called merge-graphql-schemas.

The source code for this series is available in the graphql-bookstore repo on GitHub. I have marked the code for each part with a git tag. This should make it easier to understand the code for the part that you are reading — without distractions from the later stuff. To look at the code for part 1, execute the following commands on your command line:

git clone https://github.com/nareshbhatia/graphql-bookstore.git
cd graphql-bookstore
git checkout 1-basics

The GraphQL schema discussed in this part is saved under apollo-bookstore-server/src/graphql/typedefs.

We have now covered the basics of GraphQL — what is it and its benefits. We also looked at how to compose a GraphQL schema.

I realize that this was a long read, but hopefully it gave you a strong foundation before jumping into implementation. I would love to get your questions and comments.

In part 2, we will use our bookstore schema to build a GraphQL server. Until then, happy computing!