Evolving API Pagination at Slack

By Michael Hahn

Photo by tookapic (CC0)

At Slack, the size and scope of the data we expose via our APIs has changed dramatically since the product first launched. Endpoints that were designed around the expectation that they would, in the most extreme cases, return several hundred records, are now returning hundreds of thousands of records. To handle this rapid growth, we’ve had to rethink how we paginate data — from no pagination, to offset pagination, to a new cursor-based pagination scheme. In this post, we’ll go over the pros and cons of various pagination strategies and discuss how these influenced our latest pagination scheme.

There isn’t a single right answer to how you should paginate your data. The answer depends on the type of application you’re building, how fast your dataset is growing, and the experience you’re trying to deliver to the end user. At Slack, our primary concern is giving developers the ability to retrieve the entire contents of a specific dataset. For example, a task management application may periodically call users.list to retrieve all of the users on a team and enable assigning a task to a co-worker. Another application may use channels.history to retrieve all of the messages in a channel to run through sentiment analysis.

These use cases define our requirements for a pagination scheme:

  • It must support efficiently paginating through all of the items in a dataset
  • It must handle datasets whose lengths can vary by orders of magnitude (users vs messages)
  • It must support endpoints with different sorting and ordering requirements
  • It must be implemented in a backwards-compatible way
  • It should be a consistent experience to paginate through different endpoints
  • It should be flexible enough to handle cases we haven’t thought of yet 🤞

With these requirements in mind, let’s look at various types of pagination that are popular in the community.

Using offsets to paginate data is one of the most widely used techniques for pagination. Clients provide a parameter indicating the number of results they want per page, count, and a parameter indicating the page number they want to return results for, page. The server then uses these parameters to calculate the results for the specific page being requested. Implementing this with an SQL-like database is very straight forward:

  • We count all of the results to determine the total number of pages based on the provided count parameter:
SELECT COUNT(*) AS total FROM users WHERE team_id = %team_id;
  • We then use the page and count parameters to query for the items in the requested page. For example, with page = 3, count = 20, we can tell the database to return the next 20 items, skipping the first 40, to give us the third page of results:
SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT 20 OFFSET 40;

The response from the server would be:

{
"users": [...],
"paging": {
"total": 295,
"page": 3,
"pages": 15
}
}

Where total is the total number of items on all pages, page is the current page and pages is the total number of pages available.

This type of pagination has several advantages:

  • It gives the user the ability to see the total number of pages and their progress through that total.
  • It gives the user the ability to jump to a specific page within the set.
  • It’s easy to implement as long as there is an explicit ordering of the results from a query.

However, there are several drawbacks that don’t make this strategy a fit for our pagination needs:

  • Using LIMIT <count> OFFSET <offset> doesn’t scale well for large datasets. As the offset increases the farther you go within the dataset, the database still has to read up to offset + count rows from disk, before discarding the offset and only returning count rows.
  • If items are being written to the dataset at a high frequency, the page window becomes unreliable, potentially skipping or returning duplicate results. For example, let’s think about paginating through the latest messages in a channel. After requesting the first page of 10 messages, 10 messages are added to the channel. Now when we fetch the second page, we see the same results from the first page.

Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer. This method addresses the drawbacks of using offset pagination, but does so by making certain trade offs:

  • The cursor must be based on a unique, sequential column (or columns) in the source table.
  • There is no concept of the total number of pages or results in the set.
  • The client can’t jump to a specific page.

To understand these tradeoffs, let’s look at how we would implement this for the users.list example:

  • We’d pick a unique, sequential column to paginate on. In this case, we’ll use the id field and assume this is an auto-incremented, primary key value.
  • Similar to the offset implementation, the client would make a request with a parameter indicating the number of results they want per page, count. Instead of the page parameter, we would accept a cursor parameter, which the client would get in the response from the previous request.
  • The server would then use cursor and count to paginate through the list.

Let’s assume we want to paginate from the most recent user, to the oldest user. For the first request, we’d select the first page:

SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit

Where limit is equal to count plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.

The response from the server would be:

{
"users": [...],
"next_cursor": "123456", # the user id of the extra result
}

The client would then provide next_cursor as cursor in the second request:

SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit

Again, limit is equal to count plus one. By setting the limit to one more than the count requested by the client, we’ll know we’re at the last page when the number of rows returned is less than count. At that point, we’ll return an empty next_cursor which tells the client there are no more pages to be fetched.

The response from the server would be:

{
"users": [...],
"next_cursor": ""
}

With this, we’ve addressed the drawbacks of offset based pagination:

  • This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen. We’re also not returning the total number of pages or items in the set, so we avoid having to calculate the full result set on each request.
  • We’ve also stabilized the pagination window. Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.

Even with its trade-offs, cursor-based pagination meets all of the requirements we outlined for our pagination strategy.

Relay is a framework for retrieving and caching data from a GraphQL server for a React application. It handles pagination in a really interesting way by allowing the client to paginate forward and backwards from any item returned in the collection.

Relay enables this pagination by defining a specification for how a GraphQL server should expose lists of data called Relay Cursor Connections.

The spec outlines the following components:

  • connection: a wrapper for details on a list of data you’re paginating through. A connection has two fields: edges and pageInfo.
  • edges: a list of edge types.
  • edge: a wrapper around the object being returned in the list. An edge type has two fields: node and cursor
  • node: this is the actual object, for example, user details.
  • cursor: this is a string field that is used to identify a specific edge.
  • pageInfo: contains two fields hasPreviousPage and hasNextPage which can be used to determine whether or not you need to request another set of results.

Using all of these together in a GraphQL query for listing users on a team might look like this:

{
team {
users(first: 10, after: "opaqueCursor") {
edges {
cursor
node {
id
name
}
}
pageInfo {
hasNextPage
hasPreviousPage
}
}
}
}

While the above might seem verbose, it allows the server to implement a robust pagination scheme:

  • Returning a list of edge wrappers, each with a cursor, enables the client to paginate forwards and backwards from any point within the result set.
  • The pageInfo details let the client clearly know if there is a previous or next page to fetch.
  • Requiring the cursor to be a string value promotes the use of an opaque cursor value. The javascript implementation of the Relay spec for instance uses Base64 encoded IDs as cursor values. This discourages the client from implying what value goes in this field and gives the server the ability to encode additional information within the cursor.

Unlike the previous pagination types we went over, Relay Cursor Connections don’t enforce an underlying database pagination scheme. In fact, because the cursor is an encoded value, we could implement both the offset and cursor-based strategies we defined above with the Relay semantics. Unfortunately, we wouldn’t be able to adhere to the full Relay Cursor Connections spec in a backwards compatible way, but the idea of granting the server flexibility by using an encoded cursor is something we wanted to replicate in our system.

After evaluating the pros and cons of the various types of pagination, we settled on a new cursor-based strategy with some similar principles to Relay.

Requests that support this new pagination will take two new fields:

  • cursor: an opaque string value
  • limit: a maximum number of items you want returned for the page

Responses will leverage our existing response metadata and include a new field:

  • next_cursor: an opaque string value, or an empty string if there are no more results

While this is less robust than the Relay model, it focuses on doing one thing well: forward pagination through an entire dataset in a backwards compatible way.

The piece that we’re most excited about in this new scheme is the flexibility we now have by using an opaque string value as the cursor. Similar to the Relay implementation, we’re using Base64 to encode information on how to retrieve the next set of results within the cursor. This allows us to technically implement different underlying pagination schemes per endpoint, while giving a consistent interface to consumers of the API.

For example, if we look at a truncated response from the users.list endpoint which now uses this new scheme:

{
"ok": true,
"members": [
{
"id": "USLACKBOT",
"team_id": "T0G9PQBBK",
"name": "slackbot",
...
},
{
"id": "W07QCRPA4",
"team_id": "T0G9PQBBK",
"name": "glinda",
...
}
],
"cache_ts": 1498777272,
"response_metadata": {
"next_cursor": "dXNlcjpXMDdRQ1JQQTQ="
}
}

The client would take the next_cursor value of dXNlcjpXMDdRQ1JQQTQ= and pass that as the cursor argument in a subsequent request to retrieve the next set of results. On the server side, the cursor will be decoded to: user:W07QCRPA4. The server then knows to use the user id of W07QCRPA4 as the starting point to fetch the next set of results.

At the same time, we have existing endpoints like files.list which are using offset pagination. If we wanted to add support for our new cursor strategy, while still supporting the existing offset pagination, we could do so without changing the underlying data fetching logic. In this case, we would just return cursors which encode the offset value. For example, the next_cursor after fetching the first 10 files could simply be the encoded version of offset:10, or b2Zmc2VoOjEw. This makes it trivial for us to add support for the new scheme to existing endpoints that are already implemented in a specific way.

Both the offset and cursor-based pagination strategies we discussed assumed that the data we were paginating through lived within the same table. As the size of our datasets continue to grow, this won’t always be something we can rely on. With this new pagination scheme, we’re able to handle these cases by encoding multiple cursors within a single cursor returned to the client.

For example, teams on Enterprise Grid contain information at the organization level on one shard and the team level on another shard. If we wanted to paginate through a dataset that spanned both of those shards, we could return an encoded version of: team_last_id:<last-id-on-team-shard>,org_last_id:<last-id-on-organization-shard>. We can then use those two ids as cursors when querying the respective shards.

We’re looking forward to being able to continue to improve the performance and reliability of our APIs with this new cursor-based pagination scheme. While everyone’s pagination needs are different, we hope this can serve as a reference point for other developers creating or evolving their APIs. As a final note, hindsight is always 20/20, but if you’re wondering whether or not you should paginate an endpoint that returns a list of data, no matter how small the dataset seems, we recommend you do 😅.

If you want more information on how to develop against the new pagination, check out our developer documentation.

Want to help build APIs to power the future of work? Come join us!