Scaling Relational Databases with Apache Spark SQL and DataFrames — Spark is now well established as a high performance cluster-computing framework based around the MapReduce approach of processing data. But it’s both possible to use SQL with it and to represent data stored within Spark in a relational style using DataFrames. This article explains how, then follows up with a hands-on tutorial.
Appears in lists (1)
More like this (4)
First Steps With PySpark and Big Data Processing Take your first steps with Spark, PySpark, and...
'Big Data' with Postgres and Apache Spark — Spark is a popular open source analytics engine...
Ibis: Python Data Analysis Productivity Framework IBIS-PROJECT.ORG