To GraphQL or not to GraphQL? Pros and Cons

So you want to know if GraphQL is a great fit for your project? Using GraphQL can be a consequential decision, for better or worse. If you have never used GraphQL in a complex project, how are you supposed to know if you are going to regret it later or celebrate yourself for making the right call?

It can be hard to see through all the hype articles like "GraphQL is the successor of REST" or "GraphQL is overkill for most projects" and understanding what GraphQL would mean for your particular project. I want to save you some trouble with this post and put everything I have learned about GraphQL in the past years into one blog post that can help you make a more informed decision.

Can you trust my opinion? Maybe, always make your own judgment! For reference, here are a few things I have worked on in the GraphQL ecosystem:

I have built thousands of GraphQL APIs in the process of creating Slicknode, a framework and headless CMS to rapidly create GraphQL APIs
I have written graphql-query-complexity the most popular open-source library to secure GraphQL servers written in NodeJS against DoS attacks. It is also used by some of the big open-source frameworks like TypeGraphQL and NestJS
I have migrated a 12-year-old codebase to GraphQL
I have ported a library to create Relay compatible GraphQL APIs from JavaScript to PHP
I have removed GraphQL from a service where it turned out to be not a great fit

So keep in mind that I am a huge GraphQL fan, but I'll try my best to talk about the tradeoffs as well.

Is GraphQL Just A Hype?

At this point, it is probably safe to say that GraphQL is here to stay. In the years following its open-source release in 2015, it has gained an incredible amount of traction. It has been adopted by a large number of fortune 500 companies, and those applications don't tend to disappear overnight.

GraphQL has also sparked the interest of investors: Businesses that build products around GraphQL have raised hundreds of millions of dollars in venture capital.

There are now GraphQL clients and servers in all major programming languages available and the GraphQL library for Javascript alone has currently 5.5 million weekly downloads. The tooling around GraphQL has been a joy to work with and gives you a lot of options to get creative and solve real-world problems.

What is GraphQL?

The official website describes GraphQL as "A query language for your API". The process is explained as "Describe your data ➜ Ask for what you want ➜ Get predictable results".

I like to think of it as the SQL for APIs. When you work with SQL, you can write your SQL query in a declarative way, describe what data you want to load or change, and then let the SQL server figure out the best way to perform the actual operation. You don't care what is happening under the hood, like which blocks are read from disk, caches, or networks, this is all nicely hidden from you. When there is a new release of your SQL server available, you can safely update the database and profit from all the performance improvements, etc. without having to update a single SQL query in your codebase. If there were new features added, you can use those features in new parts of your application, but you don't have to update existing parts of your application, as the old queries will still work the same.

This is very similar to how GraphQL works, only for APIs: You write a GraphQL query in a declarative way, send it to your server, and the server figures out the best way to load that data. It is then returned in the exact format that you specified in your query. Now, if you need different data, you just change the query, NOT the server! This gives API consumers unprecedented powers. You can send any valid query to your GraphQL server, and it will on-demand return the correct response. It's like having unlimited REST API endpoints without changing a single line of code in your backend.

A good analogy: When REST is like ordering a la carte in a restaurant, GraphQL is the all-you-can-eat buffet. You can mix and match any dish that is served at the buffet on a single plate and take as much or as little as you want. With REST, the dish always has the same portion size, you first have to ask the waiter if they could combine multiple dishes, they have to check back with the kitchen and might come back telling you that they don't do that, thinking you are an annoying customer for not respecting their menu.

Also, when you want to add new capabilities to your GraphQL server, you just add more fields and types to your schema. All the existing GraphQL queries in your client applications keep working without requiring any changes. This enables you to evolve your APIs and client applications independently, but we will look at that in more detail later.

If you are not that familiar with GraphQL yet, I recommend checking out the introduction on the official GraphQL website, so you can see some GraphQL queries and schema definitions in action.

Why GraphQL?

So why was GraphQL created in the first place and what problems was it supposed to solve?

GraphQL was initially created at Facebook to solve some major challenges with the API communication between the servers and their plethora of client applications for all kinds of device sizes, operating systems, web apps, etc. As you can imagine, evolving APIs at that scale is extremely hard, especially if you are using a REST-style architecture.

There is a documentary about GraphQL where the GraphQL creators and other early adoptors talk about why they created and adopted GraphQL.

Let's look at the most important reasons in some more detail.

Overfetching

A problem with REST APIs is that you have a predefined response format that is pretty inflexible. One URL always returns the requested resource in its entirety. Sure, there are ways to mitigate that problem like passing the fields to include in the response via parameters, etc., but this is not standardized, therefore needs documentation and it has to be implemented in the backend, which adds unnecessary complexity.

This can become a problem over time, especially as you evolve your API and add new features or deprecate obsolete data.

For example: Let's say you want to add a new field to your user object with the current online status to the REST endpoint of the user:

GET https://example.com/users/2
{
  "username": "ivo"
}

You could simply add the field to the response, which then becomes:

{
  "username": "ivo",
  "isOnline": true
}

This is great and works. However, the problem with this approach is that now the online status is sent to every single client application even if they don't need it. Do this a few times and you end up with bloated, mostly useless responses that you send to every client, consuming their bandwidth and making your application slower over time.

Underfetching

Another challenge you might be familiar with when building rich user interfaces is something called underfetching: You don't get all the data you need to display in a single API request and have to call the API again to load related data. This adds additional server roundtrips, increases the latency, and can lead to poor user experience.

Let's look at a very simple example: Say you want to create the backend for a blog post detail page where you want to display the following data:

{
  "post": {
    "title": "GraphQL is awesome!",
    "text": "GraphQL solves so many problems...",
    "author": {
      "name": "Ivo"
    },
    "comments": [
      {
        "text": "100% !!!",
        "author": {
          "name": "John"
        }
      },
      {
        "text": "Couldn't agree more",
        "author": {
          "name": "Jane"
        }
      }
    ]
  }
}

When you want to implement that with REST, you have to ask a few questions: Do you include the author in the response data of the post?

Why not put it in a dedicated /user/23 API resource to avoid redundancy and only return a reference in the post response?
What about comments? Do you return them as well?
What about the author of the comments?
Where does it end?

To keep it DRY you might implement the response like this:

{
  "post": {
    "title": "GraphQL is awesome!",
    "text": "GraphQL solves so many problems...",
    "author": "/user/1",
    "comments": "/comments?post=345"
  }
}

When the client gets this response, it does not contain all the data that we need. We have to make additional requests to fetch the author and the comments, which adds additional latency. We could create a custom API endpoint where we return the data exactly in the shape that we need. But that might add redundancy in our backend and make the API less flexible (a mobile application might not want to load the comments initially).

GraphQL eliminates this problem by giving frontend developers the ability to request exactly the data that they need on-demand and let the GraphQL server do the heavy lifting of loading (or not loading) references automatically.

Feature Deprecation

As your project evolves and requirements change, you might want to deprecate a feature and remove it from your API to not have to maintain obsolete or redundant services. This can be a major challenge with a REST architecture, especially in more complex projects. Do you release a new version for every feature that is removed? How do you make sure a removed feature is not still in use by some client application? When can you shut down the old API versions?

In a lot of cases, it is easier and cheaper to just keep the old feature in place than to go through the significant engineering effort of implementing a solid migration strategy. The problem is, you are forcing all this useless data through the bandwidth-limited connections of your users, with no easy way out, or you have to maintain multiple versions of your API.

GraphQL has a built-in way to solve this problem. You can just mark a field as deprecated and add information on how to migrate client applications. This information is available to all client applications in a standardized way, you can run a script in your CI pipeline that automatically checks if deprecated fields are used and migrate your client applications accordingly. As soon as all deprecation notices are fixed in your client applications, you can safely remove the field from your GraphQL API.

The benefits are obvious:

No need to create, run and maintain multiple versions of your API. Just one GraphQL API that evolves with your project over time.
An automated and self-documenting way to manage feature deprecations.

Decoupling Frontend & Backend Development

By introducing GraphQL in a project, you are eliminating an enormous amount of friction by completely decoupling frontend and backend development. Any number of frontend applications can be developed and changed completely independent of the backend. There is no need to create or update a specific REST endpoint for a particular view, the power shifts to the frontend developers as they can just request data on demand.

To get back to the restaurant analogy: The chefs can just place a new dish on the buffet and users can come mix and match it with anything else they pick up when they fill their plate. There is no coordination needed. Compared to the REST style a-la-carte ordering, changing the menu to combine multiple dishes requires coordination with the chef, and possibly other restaurant staff.

As an example of what this can mean in practice: In one project I was working on, a team created an entire mobile application without changing anything in the GraphQL backend.

The GraphQL Killer-Feature

There is one feature of GraphQL that gets way too little attention and is oftentimes not even mentioned in articles examing the pros and cons of GraphQL. In my opinion, this is the killer feature that makes GraphQL invaluable, especially in larger projects. All the GraphQL advantages that we have looked at so far can somehow be worked around, with some ugly compromises to be fair, but nothing was a show stopper. However, I am not aware of any widely adopted technology that even comes close to how GraphQL solves this particular problem:

Co-Location Of Data Dependencies

Let's look at an example to illustrate the problem. We have a React component somewhere in a codebase where we display the user name:

export function UserName({user}) {
  return (
    <span>{user.username}</span>
  );
}

Now we want to display the online status next to the username. What sounds like a simple task can quickly escalate into a nightmare. It raises all kinds of questions:

Is that online status even available in the user object?
How do you know or find out? Is there documentation available?
If the data is coming from an API, how do you make sure it is included in every single API endpoint that returns the user object for the component?

You need a vast amount of knowledge that is not readily available where you want to implement the feature, and also not necessarily related to the task at hand. For a developer that is new to the codebase, this can be particularly problematic. You might have to identify and change lots of API endpoints and include the online status in all the API responses that include the user object.

This becomes a complete non-issue with GraphQL APIs because you can co-locate your data dependencies with your frontend components using GraphQL fragments:

export const UserNameFragments = {
  user: gql`
    fragment UserName on User {
      username
      #Just add the online status to the fragment here:
      isOnline
    }
  `
}

export function UserName({user}) {
  return (
    <span>{user.username} ({user.isOnline ? "online" : "offline"})</span>
  );
}

You define the data dependencies right in your UI components with GraphQL fragments. The parent components can then include those fragments in their GraphQL queries that load the data, and you have a guarantee that the UserName component receives the online status, no matter where in your application it is located.

This makes it incredibly easy to extend your application no matter how complex it is. You don't have to have any knowledge about the rest of the codebase and can confidently implement a feature without leaving the component. With the right tooling, you even get autocomplete functionality, type validation, and documentation in your IDE.

Performance & Security

With great power comes great attack surface.

By exposing a GraphQL API to the internet, you are giving clients an enormous amount of power that can have big implications with regards to security and performance. Clients have access to all your data and functionality at once, on-demand. This significantly increases your attack surface and can easily be exploited if not considered from the start.

Let's look at a few problematic queries...

Load a ridiculous amount of data:

query LotsOfPosts {
  posts(first: 100000000) {
    title
  }
}

Load deeply nested data that requires millions of DB queries:

query DeeplyNestedData {
  user(id: 2) {
    name
    friends {
      name
      friends {
        name
        friends {
          name
        }
      }
    }
  }
}

Launching a brute force attack in a single request:

mutation BruteForcePassword {
  attempt1: login(email: "victim@example.com", password: "a")
  attempt2: login(email: "victim@example.com", password: "b")
  # ...
  attempt100000: login(email: "victim@example.com", password: "xxxxx")
}

The problem is that those queries are not prevented by commonly available rate limiters. You can send a single request to a GraphQL server that completely overwhelms the servers. To prevent such queries to GraphQL APIs, I wrote graphql-query-complexity, an extensible open-source library that detects such queries and rejects pathological queries before consuming too many resources on the server. You can assign each field a complexity value, and queries that exceed a threshold will be rejected. In Slicknode this protection is added automatically based on the number of nodes that are being returned.

Another common approach is to register an allow-list of queries that are permitted and reject all other queries to the API. This might be more granular and secure than dynamic rules, but it limits the flexibility of your GraphQL API and you have to keep your query registry up to date with all your client applications, which requires additional setup and maintenance.

Optimizing the requests to your internal data stores can be another challenge. The responsibility to optimize the data loading process lies 100% with the GraphQL API and can't easily be offloaded to a CDN or reverse proxy. With a REST API, you have very good control over how many and what database queries are executed, thanks to the limited scope of the REST endpoint. With GraphQL a client can request any number of objects that might have to be loaded from different DB tables. Slicknode automatically combines multiple requested objects into a single database query by analyzing the GraphQL query and generating the SQL query dynamically, but your average ORM is probably not equipped to do that out of the box.

This is also related to the N+1 problem, where nested queries make the number of database requests explode. If you want to learn more about this problem, I recommend this video and checking out dataloader, a library released by Facebook to help with batching queries and solving this problem.

Caching vs GraphQL

Caching is always a hard challenge and that can be especially true for GraphQL servers. A lot of the tools we usually rely on don't work well with GraphQL out of the box.

Take CDNs for example. The most common way to deploy GraphQL APIs is via an HTTP server. You then send your GraphQL requests via a POST request to the API and retrieve your response. The problem is that POST requests are not cached by default in the most common CDNs. You could potentially also send your requests via a GET request, but you will quickly hit the request size limit as GraphQL queries can get huge. If you are using a query allow list, you can send a query ID or hash to the server instead of the full query to get around this limit.

Cache invalidation can also be more challenging with GraphQL APIs. All the queries are usually served via the same URL, so invalidating resources by URL is off the table. Furthermore, one data object can be included in any number of cached responses. One strategy to solve this is to attach cache tags to responses and then later invalidate responses based on those tags instead of URLs. This is the approach that is used in the Slicknode Cloud to cache GraphQL responses around the globe.

A great way to add caching to your GraphQL APIs is to add a layer behind the GraphQL API itself and implement it in front of your data sources. Combine this with the dataloader mentioned in "Performance & Security" and you can fully customize the cache behavior.

Single Point of Failure

One thing to keep in mind is that your GraphQL API becomes your API Gateway replacement as the way to access all your functionality, data, and services. If the GraphQL API goes down, your entire application is offline. This is not that different from a REST architecture, but good to know that the GraphQL API will be a critical part of your infrastructure and treat it accordingly.

Best (and worst) Use-Cases for GraphQL

If you have a hammer, every problem looks like a nail. It is really easy to fall in love with all the benefits that GraphQL provides. It makes your life as a frontend developer so much easier compared to previous technologies. But some types of applications are more suitable for GraphQL than others. I have burnt my fingers too and have removed GraphQL from some parts of an application, while for other applications I would highly recommend it.

In my experience, the best use-case for GraphQL is the purpose it was originally built for: Providing the data and functionality for rich user interfaces. A central place that contains all your data and functionality in one unified GraphQL API, easily accessible for any number of teams, always up to date, and self-documenting. It dramatically reduces the complexity of your frontend code. Where you previously had to implement lots of API calls with all the complexity that asynchronous functionality entails (loading states, error handling, etc.), you can now simply define your data dependencies and let GraphQL take care of the rest. You can validate all your API calls at build time and implement solutions with end-to-end type safety. Even though GraphQL is awesome to use as a single developer, the bigger your application and team become, the more you'll enjoy working with GraphQL.

So when should you consider alternatives to GraphQL? I would consider alternatives for applications where you want to physically separate different services. GraphQL is great for combining a lot of functionality in one place. This might be a problem if you want to isolate certain services for example at the network or hardware level and only make them accessible to a subset of services. You might be better off looking at other architectures.

Conclusion

GraphQL is an awesome addition to a developer's toolbelt, especially for powering user interfaces. It is a joy to work with and I am excited to see the GraphQL ecosystem gaining more and more traction. To make it easier for developers to build GraphQL APIs I created Slicknode. It automates all the hard parts and gets you up and running in minutes. Come join the awesome GraphQL community! I also have some pretty big news to share about Slicknode soon, so make sure to follow me on Twitter and subscribe to the newsletter so you'll be the first to know.

Ivo Meißner's Blog