Thus far I haven't found a good project template for Apache Spark and it's been a repeated process to get it right. In this tutorial, I walk through a simple project template that I've created as an effort to help others get started with Apache Spark in Scala.
This article will walk you through how to build Apache Spark to support the HIVE SQL execution engine as well as YARN. After that it should be ready to get up and running on your hadoop cluster.
This introductory tutorial will walk you through the basic RDD abstraction in Spark. It has code samples in both Scala as well as Python Spark (PySpark). We'll answer the question, what is an RDD?