This article will walk you through how to build Apache Spark to support the HIVE SQL execution engine as well as YARN. After that it should be ready to get up and running on your hadoop cluster.
In this easy to follow tutorial, learn the basics of Spark DataFrames, how they're composed of RDDs and what they allow you to do in Scala. They're a similar abstraction to pandas DataFrames or R's DataFrames.
This post will dive into some of the details of the Spark Shuffle and what it means for you while using Apache Spark to perform your data analysis in a cluster setting.