Spark 101 for Scala Users

A quick hands-on intro into Spark for Scala users.

I’ll format this into a more detailed presentation later (so feel free to check back and bug me if I’m not getting around to it) but here are some immediate things you may be interested in if you saw my Austin Scala Enthusiasts Meetup presentation…

Here’s a link to the PDF of the slides I talked to.

Running Zeppelin via a Docker container

docker run --name zeppelin -p 8080:8080 -p 4040:4040 -v $HOME/spark/data:/data -v $HOME/spark/logs:/logs -v $HOME/spark/notebook:/notebook -e ZEPPELIN_NOTEBOOK_DIR='/notebook' -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_INT_JAVA_OPTS="-Dspark.driver.memory=4G" -e ZEPPELIN_INTP_MEM="-Xmx4g" -d apache/zeppelin:0.9.0 /zeppelin/bin/zeppelin.sh

Running Spark via a Docker container

docker run --name spark -v $HOME/spark/data:/data -p 4040:4040 -it mesosphere/spark bin/spark-shell

For a basic Spark SBT project

build.sbt:

import Dependencies._

ThisBuild / scalaVersion     := "2.12.11"
ThisBuild / version          := "0.1.0-SNAPSHOT"
ThisBuild / organization     := "com.example"
ThisBuild / organizationName := "Meetup Spark Example"
ThisBuild / scalacOptions ++= Seq("-language:higherKinds")

lazy val root = (project in file("."))
  .settings(
    name := "SparkCatScratch",
    libraryDependencies ++= Seq( scalaTest % Test, sparkCore, sparkSQL, catsCore, catsFree, catsMTL)
  )

initialCommands in console :=
  s"""
    |import cats._, cats.data._, cats.implicits._, org.apache.spark.sql.SparkSession
    |val spark = SparkSession.builder().master("local").getOrCreate
    |""".stripMargin

cleanupCommands in console := "spark.close"

project/Dependencies.scala:

import sbt._

object Dependencies {

  val sparkVersion = "2.4.5"
  val catsVersion = "2.0.0"

  lazy val scalaTest = "org.scalatest" %% "scalatest" % "3.0.8"
  lazy val sparkCore = "org.apache.spark" %% "spark-core" % sparkVersion
  lazy val sparkSQL = "org.apache.spark" %% "spark-sql" % sparkVersion
  lazy val catsCore = "org.typelevel" %% "cats-core" % catsVersion
  lazy val catsFree = "org.typelevel" %% "cats-free" % catsVersion
  lazy val catsMTL = "org.typelevel" %% "cats-mtl-core" % "0.7.0"
}

Starting Spark in the SBT console:

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder().master(?local").getOrCreate
val sc = spark.SparkContext

My “Instant Pot” Experience

Months ago, my husband and I were watching TV—I think it might have been CBS Sunday Morning—and there was a short segment about this new “must have” kitchen item that was a 6-in-1 gadget replacing your rice cooker, slow cooker, pressure cooker and all those sorts of things. I told Bill that this gadget would be a great candidate for my Christmas list. Long story short: this device was one of the presents under the Christmas tree, and I figured I would share my experiences and learnings so far. Continue reading “My “Instant Pot” Experience”

Funny Bug of the Day (Java)

This took a little while to figure out!

Date startDate = new Date();
Date endDate = new Date(startDate.getTime() + (24 * 3600000 * 42));

This was expected to result with startDate being right now (Feb 20, 2013 5:17:10 PM) and the end date being six weeks later (Apr 3, 2013 6:17:10 PM), but instead end date was being computed to be earlier than the start date… Feb 13, 2013 in fact! Continue reading “Funny Bug of the Day (Java)”