I’ll format this into a more detailed presentation later (so feel free to check back and bug me if I’m not getting around to it) but here are some immediate things you may be interested in if you saw my Austin Scala Enthusiasts Meetup presentation…
Here’s a link to the PDF of the slides I talked to.
Running Zeppelin via a Docker container
docker run --name zeppelin -p 8080:8080 -p 4040:4040 -v $HOME/spark/data:/data -v $HOME/spark/logs:/logs -v $HOME/spark/notebook:/notebook -e ZEPPELIN_NOTEBOOK_DIR='/notebook' -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_INT_JAVA_OPTS="-Dspark.driver.memory=4G" -e ZEPPELIN_INTP_MEM="-Xmx4g" -d apache/zeppelin:0.9.0 /zeppelin/bin/zeppelin.sh
Running Spark via a Docker container
docker run --name spark -v $HOME/spark/data:/data -p 4040:4040 -it mesosphere/spark bin/spark-shell
For a basic Spark SBT project
build.sbt:
import Dependencies._
ThisBuild / scalaVersion := "2.12.11"
ThisBuild / version := "0.1.0-SNAPSHOT"
ThisBuild / organization := "com.example"
ThisBuild / organizationName := "Meetup Spark Example"
ThisBuild / scalacOptions ++= Seq("-language:higherKinds")
lazy val root = (project in file("."))
.settings(
name := "SparkCatScratch",
libraryDependencies ++= Seq( scalaTest % Test, sparkCore, sparkSQL, catsCore, catsFree, catsMTL)
)
initialCommands in console :=
s"""
|import cats._, cats.data._, cats.implicits._, org.apache.spark.sql.SparkSession
|val spark = SparkSession.builder().master("local").getOrCreate
|""".stripMargin
cleanupCommands in console := "spark.close"
project/Dependencies.scala:
import sbt._
object Dependencies {
val sparkVersion = "2.4.5"
val catsVersion = "2.0.0"
lazy val scalaTest = "org.scalatest" %% "scalatest" % "3.0.8"
lazy val sparkCore = "org.apache.spark" %% "spark-core" % sparkVersion
lazy val sparkSQL = "org.apache.spark" %% "spark-sql" % sparkVersion
lazy val catsCore = "org.typelevel" %% "cats-core" % catsVersion
lazy val catsFree = "org.typelevel" %% "cats-free" % catsVersion
lazy val catsMTL = "org.typelevel" %% "cats-mtl-core" % "0.7.0"
}
Starting Spark in the SBT console:
import org.apache.spark.sql.SparkSession val spark = SparkSession.builder().master(?local").getOrCreate val sc = spark.SparkContext