Rails 6 has added support to provide optimizer hints.

What is Optimizer Hints?

Many relational database management systems (RDBMS) have a query optimizer. The job of the query optimizer is to determine the most efficient and fast plan to execute a given SQL query. Query optimizer has to consider all possible query execution plans before it can determine which plan is the optimal plan for executing the given SQL query and then compile and execute that query.

An optimal plan is chosen by the query optimizer by calculating the cost of each possible plans. Typically, when the number of tables referenced in a join query increases, then the time spent in query optimization grows exponentially which often affects the system’s performance. The fewer the execution plans the query optimizer needs to evaluate, the lesser time is spent in compiling and executing the query.

As an application designer, we might have more context about the data stored in our database. With the contextual knowledge about our database, we might be able to choose a more efficient execution plan than the query optimizer.

This is where the optimizer hints or optimizer guidelines come into picture.

Optimizer hints allow us to control the query optimizer to choose a certain query execution plan based on the specific criteria. In other words, we can hint the optimizer to use or ignore certain optimization plans using optimizer hints.

Usually, optimizer hints should be provided only when executing a complex query involving multiple table joins.

Note that the optimizer hints only affect an individual SQL statement. To alter the optimization strategies at the global level, there are different mechanisms supported by different databases. Optimizer hints provide finer control over other mechanisms which allow altering optimization plans by other means.

Optimizer hints are supported by many databases such as MySQL, PostgreSQL with the help of pg_hint_plan extension, Oracle, MS SQL, IBM DB2, etc. with varying syntax and options.

Optimizer Hints in Rails 6

Before Rails 6, we have to execute a raw SQL query to use the optimizer hints.

query = "SELECT /*+ JOIN_ORDER(articles, users) MAX_EXECUTION_TIME(60000) */ articles.* FROM articles INNER JOIN users ON users.id = articles.user_id WHERE (published_at > '2019-02-17 13:15:44') ".squish ActiveRecord::Base.connection.execute(query)

In the above query, we provided two optimizer hints to MySQL .

/*+ HINT_HERE ANOTHER_HINT_HERE ... */

Another approach to use optimizer hints prior to Rails 6 is to use a monkey patch like this.

In Rails 6, using optimizer hints is easier.

The same example looks like this in Rails 6.

Article .joins(:user) .where("published_at > ?", 2.months.ago) .optimizer_hints( "JOIN_ORDER(articles, users)", "MAX_EXECUTION_TIME(60000)" )

This produces the same SQL query as above but the result is of type ActiveRecord::Relation.

In PostgreSQL (using the pg_hint_plan extension), the optimizer hints have a different syntax.

Article .joins(:user) .where("published_at > ?", 2.months.ago) .optimizer_hints("Leading(articles users)", "SeqScan(articles)")

Please checkout the documentation of each database separately to learn the support and syntax of optimizer hints.

To learn more, please checkout this PR which introduced the #optimization_hints method to Rails 6.

Bonus example: Using optimizer hints to speedup a slow SQL statement in MySQL

Consider that we have articles table with some indexes.

class CreateArticles < ActiveRecord::Migration[6.0] def change create_table :articles do |t| t.string :title, null: false t.string :slug, null: false t.references :user t.datetime :published_at t.text :description t.timestamps t.index :slug, unique: true t.index [:published_at] t.index [:slug, :user_id] t.index [:published_at, :user_id] t.index [:title, :slug] end end
end

Let’s try to fetch all the articles which have been published in the last 2 months.

>> Article.joins(:user).where("published_at > ?", 2.months.ago)
# Article Load (10.5ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:38:18.647296')
=> #<ActiveRecord::Relation [#<Article id: 20, title: "Article 20", slug: "article-20", user_id: 1, ...]>

Let’s use EXPLAIN to investigate why it is taking 10.5ms to execute this query.

>> Article.joins(:user).where("published_at > ?", 2.months.ago).explain
# Article Load (13.9ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:39:05.380577')
=> # EXPLAIN for: SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:39:05.380577')
# +--------+----------+----------------+-----------+------+----------+-------+
# | select | table | possible_keys | key | rows | filtered | Extra |
# | _type | | | | | | |
# +--------+----------+----------------+-----------+------+----------+-------+
# | SIMPLE | users | PRIMARY | PRIMARY | 2 | 100.0 | Using |
# | | | | | | | index |
# +--------+----------+----------------+-----------+------+----------+-------+
# | SIMPLE | articles | index | index | 9866 | 10.0 | Using |
# | | | _articles | _articles | | | where |
# | | | _on_user_id, | _on | | | |
# | | | index | _user_id | | | |
# | | | _articles | | | | |
# | | | _on | | | | |
# | | | _published_at, | | | | |
# | | | index | | | | |
# | | | _articles | | | | |
# | | | _on | | | | |
# | | | _published_at | | | | |
# | | | _and_user_id | | | | |
# +--------+----------+----------------+-----------+------+----------+-------+

According to the above table, it appears that the query optimizer is considering users table first and then the articles table.

The rows column indicates the estimated number of rows the query optimizer must examine to execute the query.

The filtered column indicates an estimated percentage of table rows that will be filtered by the table condition.

The formula rows x filtered gives the number of rows that will be joined with the following table.

Also,

  • For users table, the number of rows to be joined with the following table is 2 x 100% = 2,
  • For articles table, the number of rows to be joined with the following table is 500 * 7.79 = 38.95.

Since the articles tables contain more records which references very few records from the users table, it would be better to consider the articles table first and then the users table.

We can hint MySQL to consider the articles table first as follows.

>> Article.joins(:user).where("published_at > ?", 2.months.ago).optimizer_hints("JOIN_ORDER(articles, users)")
# Article Load (2.2ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:54:06.230651')
=> #<ActiveRecord::Relation [#<Article id: 20, title: "Article 20", slug: "article-20", user_id: 1, ...]>

Note that it took 2.2ms now to fetch the same records by providing JOIN_ORDER(articles, users) optimization hint.

Let’s try to EXPLAIN what changed by using this JOIN_ORDER(articles, users) optimization hint.

>> Article.joins(:user).where("published_at > ?", 2.months.ago).optimizer_hints("JOIN_ORDER(articles, users)").explain
# Article Load (4.1ms) SELECT /*+ JOIN_ORDER(articles, users) */ `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:55:24.335152')
=> # EXPLAIN for: SELECT /*+ JOIN_ORDER(articles, users) */ `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:55:24.335152')
# +--------+----------+----------------+-----------+------+----------+--------+
# | select | table | possible_keys | key | rows | filtered | Extra |
# | _type | | | | | | |
# +--------+----------+----------------+-----------+------+----------+--------+
# | SIMPLE | articles | index | index | 769 | 100.0 | Using |
# | | | _articles | _articles | | | index |
# | | | _on_user_id, | _on | | | condi |
# | | | index | _publish | | | tion; |
# | | | _articles | ed_at, | | | Using |
# | | | _on | | | | where |
# | | | _published_at, | | | | |
# | | | index | | | | |
# | | | _articles | | | | |
# | | | _on | | | | |
# | | | _published_at | | | | |
# | | | _and_user_id | | | | |
# +--------+----------+----------------+-----------+------+----------+--------+
# | SIMPLE | users | PRIMARY | PRIMARY | 2 | 100.0 | Using |
# | | | | | | | index |
# +--------+----------+----------------+-----------+------+----------+--------+

The result of the EXPLAIN query shows that the articles table was considered first and then the users table as expected. We can also see that the index_articles_on_published_at index key was considered from the possible keys to execute the given query. The filtered column for both tables shows that the number of filtered rows was 100% which means no filtering of rows occurred.

We hope this example helps in understanding how to use #explain and #optimization_hints methods in order to investigate and debug the performance issues and then fixing it.