BlazingSQL — The GPU SQL Engine now runs over 20X Faster than Apache Spark!

By Rodrigo Aramburu

A few weeks ago we were very excited to show our first benchmark on an end to end workload.

TL,DR you can check out the full article here.

The key takeaways from the previous article were the following:

  • We ran an end to end analytics workload: 
    Data Lake → ETL/Feature Engineering → XGBoost Training
  • We built two price equivalent clusters on GCP, one for Apache Spark and another for BlazingSQL
  • BlazingSQL ran the ETL phase of this workload 5x faster than Apache Spark

Today we are even more excited to announce our latest release with dramatic performance improvements!

We are now running over 20x faster than Apache Spark on the same exact workload we ran in the previous demo.

These improvements are fairly dramatic but are attributable to two main changes.

We are now using GCP’s T4 GPUs.

Source: Google Cloud Blog

These are a relatively new entry class of Server GPUs that produce phenomenal performance at a fraction of the price.

In our original demo, we ran a V100 GPU against 8 CPU nodes. The new T4 GPUs cut our cost in half, which meant we reduced the Apache Spark cluster to 4 CPU nodes in order to maintain price parity. Even with the reduced GPU memory, the whole workload ran significantly faster.

The majority of the performance improvement came from an internal engine project. In addition to our roadmap features, our engineers wanted to work on a new GPU execution kernel built for GPU DataFrames (GDFs). We call it the “SIMD Expression Interpreter”.

SIMD Expression Interpreter will require a long post describing its architecture, how it works, and why it yields such a performance boost, but we want to share some early details now.

SIMD Expression Interpreter produces these performance improvements through a few key steps:

  1. The machine can receive multiple inputs. These inputs can be GDF columns, literals, and in the near future, functions.
  2. When loading those inputs, SIMD Expression Interpreter optimizes the allocation of registers on the GPU. This optimizes thread occupancy on the GPU and increases performance.
  3. The virtual machine then processes these inputs and produces multiple outputs simultaneously. For example, let’s say you have the following SQL query:
     SELECT colA + colB * 10, sin(colA) — cos(colD) FROM tableA
    Previously, BlazingSQL would convert this into 5 operations (+, *, sin, cos, — ) each done individually for two unique outputs. With SIMD Expression Interpreter, it will take the three inputs (colA, colB, colD) and do all 5 operations in a single kernel execution and produce two outputs. Note this also means we only have to load colA once, as opposed to twice.

Presently, SIMD Expression Interpreter supports BlazingSQL filtering and projections, so it impacts many of the most popular SQL queries.

What’s Coming

The BlazingDB team is hard at work on new features. We’re working on some pretty major feature releases that don’t just include performance improvements. Expect the following shortly:

  • String Support — we’re really hammering this out, should have something soon!
  • Distribution — first PoCs are out the door, still have a little ways to go.

As always, feel free to comment below with any questions, or learn more at our website. For daily updates and info, follow us on twitter.