Pain Points of GraphQL

By Bruno Soares

This post is a branch of "Sharing data in a Microservices Architecture using GraphQL" where I explore some problems integrating services in a Microservices Architecture and how GraphQL comes in handy to solve some of them.

For some time working and studying GraphQL, I have found some important concerns for anyone considering using this tool. In this post I’ll summarize these concerns and list some actions that can be taken as workarounds.

We will cover three main types of concerns/problems:

1 — Almost impossible to solve:

  • Query In Indefinite Depth
    TL;DR: GraphQL cannot query in indefinite depth, so if you have a tree and want to return a branch without knowing the depth, you’ll have to do some pagination.
  • Specific Response Structure
    TL;DR: In GraphQL the response matches the shape of the query, so if you need to respond in a very specific structure, you'll have to add a transformation layer to reshape the response.

2 — Hard to solve:

  • Cache at Network Level
    TL;DR: Because of the commonly way GraphQL is used over HTTP (A POST in a single endpoint), cache at network level becomes hard. A way to solve it is to use Persisted Queries.
  • Handling File Upload
    TL;DR: There is nothing about file upload in the GraphQL specification and mutations doesn’t accept files in the arguments. To solve it you can upload files using other kind of APIs (like REST) and pass the URL of the uploaded file to the GraphQL mutation, or inject the file in the execution context, so you’ll have the file inside the resolver functions.
  • Unpredictable Execution
    TL;DR: The nature of GraphQL is that you can query combining whatever fields you want but, this flexibility is not for free. There are some concerns that are good to know like Performance and N+1 Queries.

3 — When not to use GraphQL:

  • Super Simple APIs
    TL;DR: In case you have a service that exposes a really simple API, GraphQL will only add an extra complexity, so a simple REST API can be better.

If you have a tree structure and want to query in an indefinite depth, it will be tricky. There is no way to navigate recursively without knowing the maximum depth.

Take the query of a category tree as example:

{
category(slug: "assembler") {
name
slug
category {
name
slug
category {
name
slug
category {
name
slug
}
}
}
}
}

Note that you need to specify each level you want to retrieve.

This query results in:

{
"data": {
"category": {
"name": "Assembler",
"slug": "assembler",
"category": {
"name": "Lisp",
"slug": "assembler/lisp",
"category": {
"name": "Samlltalk 80",
"slug": "assembler/lisp/samlltalk-80",
"category": {
"name": "C++",
"slug": "assembler/lisp/samlltalk-80/c"
}
}
}
}
}
}

You can improve the query a bit using fragments:

{
category(slug: "assembler") {
...CategoryFields
category {
...CategoryFields
category {
...CategoryFields
category {
...CategoryFields
}
}
}
}
}
fragment CategoryFields on CategoryType {
name
slug
}

In this issue Lee Byron explains the reasons why GraphQL does not have recursive queries:

The issue is that GraphQL has no way of actually confirming that the data it encounters is not infinitely recursive. Without an ability to make this guarantee, allowing for cyclic queries could in fact expose infinite execution queries which could be used as an attack vector.

A way to solve this is paginate using the last resource reached as a parameter to query the next page. It's more predictable, but more complex.

If you have to respond in a very specific format, will be hard or in some cases impossible to reshape the response to your needs. Because of GraphQL responds according to the schema definition + the query.

You can use aliases to rename fields:

{
empireHero: hero(episode: EMPIRE) {
name
}
jediHero: hero(episode: JEDI) {
name
}
}

This query rename the first hero to empireHero, and the second to jediHero:

{
"data": {
"empireHero": {
"name": "Luke Skywalker"
},
"jediHero": {
"name": "R2-D2"
}
}
}

It comes in handy when you need to respond with more than one object of the same type, in this case hero. But it does not solve the problem of reshaping the whole response.

The way I found to solve this is to add a layer between the client and GraphQL which takes the GraphQL response and make the necessary transformations. You can use tools like Jolt (JSON to JSON transformation) that helps a bit.

The common way I see GraphQL folks building APIs is over HTTP from POST methods through a single endpoint. It's not a much friendly way to cache at network level.

What's the problem?

A Single Endpoint
In REST you can configure your cache service (HTTP Server, Varnish, CDN, etc..) to match the URL patterns and deliver cache directives accordingly to your needs.

Takes a blog REST API as example. If you know that the list of best posts refreshes only once a day, you can match the pattern /api/posts/best and delivers it with one day of cache, likewise, you can configure the pattern /api/posts/:id: (the Post resource) to keep only one hour of cache.

The Query
You can choose to use the GET method and pass the query and variables like this:

/graphql?query=query($id:ID!){post(id:$id){title body}}&vars[id]=42

For me, it’s a horrible URL to cache, after pass the query string to a URL encoding it becomes unreadable:

/graphql?query=query%28%24id%3AID%21%29%7Bpost%28id%3A%24id%29%7Btitle%20body%7D%7D&vars[id]=42

And if you change the order of the fields like title body to body title. The previous query cache will not be used, resulting in an inefficient cache system.

Is there any way to solve this?

A somewhat complicated way to solve, but yes. Persisted Query is the term you need to look for:

* All links from apollodata.com! These guys are not kidding around! :-)

Keep in mind that GraphQL doesn’t make assumptions about the transport layer so, it’s up to you to build a system that is cacheable accordingly to your needs furthermore you can cache each object separately, which leads to a very efficient caching strategy.

The problem is that GraphQL makes changes in the data using Mutations and the data passed to a mutation have to be as simple as a JSON, where complex data types such as files can’t be passed.

A simple way to solve this is creating an endpoint in something like REST to upload the file and return the URL, so you can use this URL as a mutation parameter like this:

mutation ($url: String!) {
updateUserAvatar(input: {url: $url}) {
url
}
}

Another workaround is send the file together with GraphQL mutation using multipart request and append the file to the context of the GraphQL query execution so, you can get the file inside the resolver function. The mutation in this case will not need the url parameter.

mutation {
updateUserAvatar {
url
}
}

It’s a bit strange to omit the parameter, but, it works.

Marc-Andre Giroux wrote a post with this solution in Ruby: Uploading files using Relay and a Rails GraphQL server. Here is his mutation getting the file from the context hash:

AddFileMutation = GraphQL::Relay::Mutation.define do
name "AddFile"
  # Here's the mutation operation:
resolve -> (_args, ctx) {
file = ctx[:file]
raise StandardError.new("Expected a file") unless file
# ... Do what you want with the file!
}
end

The nature of GraphQL is that you can query combining whatever fields you want but, this flexibility is not for free. There are some concerns good to know.

Performance
Let’s say that one client of your API do a giant query. Your backend will suffer to respond and it could impact the server performance, so you better have a plan to prevent it!

What're our options?

Some GraphQL libraries provides ways to handle it, like:

  • Maximum Depth. Blocks queries that exceed the query’s maximum depth.
  • Cost Analysis. Blocks queries that exceed the maximum execution cost. The maximum cost is a sum of the cost of each field.
  • Persisted Queries. Only execute pre-approved queries that have been persisted in the backend (Facebook seems to be using this strategy).
Here is a very good article explaining ways to handle these issues.

N+1 Queries
Basically, n+1 queries is when you query the database in an inefficient way.

In APIs like REST, with endpoints for each resource, you can build very specific queries to retrieve all data needed for the response. In other words, you have the opportunity to fine-tune the query, preventing N+1 queries.

Because of the dynamic way that GraphQL works, you don’t have the same opportunity as in REST.

Fortunately, we have some tools to deal with it in GraphQL:

  • Facebook DataLoader: DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a simplified and consistent API, over various remote data sources such as databases or web services via batching and caching.
  • Shopify GraphQL::Batch: Provides an executor for the graphql gem which allows queries to be batched.
Tools like that really helps solving most of the problems but, you still can’t improve your queries as you can in REST, so you will have to measure the pros and cons to choose the right tool for your case.

In case you have a service that exposes a really simple API, GraphQL will only add an extra complexity, so a simple REST API can be better.

But, what’s a simple API?

APIs that returns only a few fields and the clients always consume these fields in the same way. In my point of view, it’s a simple API.

Take a zip codes API as example. This API takes a zip code as input and respond with the address related to this zip code:

GET https://api.example.com/zip-code/94110
{
"zip_code": "94110",
"neighborhood": "Inner Mission",
"city": "San Francisco",
"state": "CA"
}

If you plan to keep this API like that, there is no reason to use GraphQL. It's easy to build, easy to cache, no additional dependencies is required and is simple to maintain.

But if things start to look like this:

{
"zip_code": "94110",
"neighborhood": "Inner Mission",
"city": "San Francisco",
"state": "CA",
"coordinates": {
"lat": 0.658862,
"lng": -2.13655
},
"timezone": {
"timezone_identifier": "America/Los_Angeles",
"timezone_abbr": "PDT",
"utc_offset_sec": -25200,
"is_dst": "T"
},
"demographics": {
"population": 69333,
"population_density": 29816,
"housing_units": 28913,
"median_home_value": "$768,200",
"land_area": 2.33,
"water_area": 0,
"occupied_housing_units": 27128,
"median_household_income": "$82,111"
},
"gender": {
"male": 36429,
"female": 32904
},
"race": {
"White": 40545,
"Black Or African American": 2547,
"American Indian Or Alaskan Native": 718,
"Asian": 8842,
"Native Hawaiian & Other Pacific Islander": 217,
"Other Race": 12034,
"Two Or More Races": 4430
},
...
}

🤔 I think it’s time to consider GraphQL.

The first two problems (Query In Indefinite Depth and Specific Response Structure) are the harder to deal with, because they break the nature of GraphQL.

The Unpredictable Execution, is the worst of the section hard to solve, because Cache at Network Level has a (kind of) solution and Handling File Upload is possible but, Unpredictable Execution remains unpredictable, just with some sort of control.

And finally, I think the use of GraphQL for super simple APIs is a waste of time, just adding more complexity and dependencies to your project.

There is no silver bullet… GraphQL is a really good tool, but have some drawbacks just like any other. Being aware of these disadvantages will help you preventing problems.

Obviously, there are other problems and if you know a good one, leave a comment, maybe it’s worth an update in the post :)

Special thanks to Cristhiane Faria de Almeida and Daniel Tamai who helped review an earlier draft of this post.

I hope you’ve enjoyed the read. If you liked it, please consider tapping or clicking the 👏 icon to recommend it to others so they can enjoy it too. And feel free to share it widely in your favorite social network :-)