Initial impressions of Scala from a Java and Python data engineer

By Matt Hagy

I’ve been learning Scala over the last two months for a future role. Previously, I only had a passing understanding of Scala code from reading the Spark source code. I have historically developed data engineering workflows using Java Spark and MapReduce as well as Python PySpark.

Going into Scala, I had high expectations based on the great strengths highlighted on scala-lang.org. At present, I’m a big initial fan of Scala and hope to see it replacing Java and Python in my future data engineering work.

Here are highlights of my initial impressions…

I find Scala to be the most concise language I’ve used professionally. I find small amounts of Scala code can contain a lot of information. I’m particularly a fan of the concise case class definitions.

Similar to Python, I find Scala to be highly literate language that can be read in a structure similar to English. Python is a little stronger here — with a more basic vocabulary — but I still Scala to be a highly readable language in contrast to say Java and C/C++.

I like the combination of requiring compile-time types to catch bugs and having the compiler automatically figure out the types for many variables and functions. It also adds to code brevity.

In general, I’m a big fan of making all data immutable to cut down on the complexities of state modification. I first encountered persistent data structures in Clojure and I always thought they were interesting. I like the idea of minimizing memory usage and accelerating the creation of modified versions through persistent data structures.

I’m used to the concepts of covariance and contravariance from Java’s type system, but Scala’s types seem to be even more powerful than Java’s. I’m still learning the type system and look forward to mastering it.

I’ve used pattern matching in my limited work with Common Lisp and Clojure and always wanted to use it in other languages. I’m so happy that Scala includes these feature and I’ve enjoyed using it to making my code simpler. I like how this behavior can be extended in companion objects to allow for custom pattern matching.

Seems to work well from my perspective.

I think simply don’t understand these powerful concepts yet. Will focus on better learning it soon.

In learning about Scala from the community, I regularly encounter developers that criticize Scala for not being functional enough. Personally, I don’t yet have an opinion here. I am hoping Scala can further my proficiency with functional programming so I’m ready to move to Haskell or Eta if those languages ever become relevant for my work.

I love the JVM and think it’s solid engineering infrastructure. Further, glad that I can access the massive Java ecosystem from Scala.

Likes this Lispy idea of having everything in the language being an expression that returns a value. In practice seems to simplify my code.

As someone who’s had to use Pair numerous times in Java, I’m glad Scala has built-in support for working with typed tuples.

Over the years I’ve come to appreciate using a powerful IDE for development and I’m glad to see IntelliJ has robust Scala support.

I’ve enjoyed learning Scala and look forward to using it in my future work. Will update as I learn more and develop more opinions.

I hypothesize that more data engineers and data scientists will increasingly use Scala in place of Java and Python for complex ETL, analysis, and applying machine learning at scale.