[How-To] Run SparkR with RStudio
Your private vip singapore escort
has elite call girls ready to provide social services for any of your demands.
With the latest release of Apache Spark 1.4.0, SparkR which was a third-party package by AMP Labs, got integrated officially with the main distribution. This update is a delight for Data Scientists and Analysts who are comfortable with their R ecosystem and still want to utilize the speed and performance of Spark.
In this article, I’ll walk you through creating an Ubuntu instance from scratch, installing R, RStudio, Spark, configuring SparkR with RStudio, concluding with a quick example of SparkR code running from RStudio. I’ll be using Google Cloud (GC) for this walkthrough; however check here similar process is applicable for Amazon Web Service or even a fresh Ubuntu installation within your local system.
Don’t you know that using college helper
increases chances to get better grades? Just take a look at the website of one of such services and you will understand that it is your helping hand for now.
Create a new Instance:
- Log on to your Google Cloud Console
- Click on Compute from the left-side Navigation Bar
- Click on the blue New Instance button
Describe the Virtual Machine (VM) specifications:
- Create a new instance with hardware specs of your choice
- Make sure you choose Ubuntu 15.04 as your Operating System
- Tick Allow HTTP traffic and Allow HTTPS traffic
Configure Network Ports:
- Once you’ve created your instance, you’ll see it in the Computer Engine page
- Click on default as highlighted below
- Click on the blue Add firewall rule button
- In the new window, type this exactly -> tcp:4039-60000 and save
Open SSH terminal:
- From the Compute Engine page, click on SSH on the right most side of your window
In case your terminal isn’t configured to BASH, then run this command:
sudo chsh -s /bin/bash
The GC instance is now setup. We will now proceed with our SparkR installation.
Upgrade sources to Install R
sudo nano /etc/apt/sources.list,/span>
Add the following to the end
deb http://cran.cnr.berkeley.edu/bin/linux/ubuntu/ vivid/
sudo apt-get update
Download and unpack Spark 1.4.0:
tar -xvzf spark-1.4.0-bin-hadoop2.6.tgz
mv spark-1.4.0-bin-hadoop2.6.tgz spark
sudo apt-get install r-base
Install RStudio Server:
sudo apt-get install gdebi-core
sudo gdebi rstudio-server-0.99.447-amd64.deb
Install any package to create a “user” library:
install.packages(“magrittr”) ## You can install any package but we will be using this one in our SparkR example.
Create a soft link to the Spark folder in R’s user library:
ln -s /home//spark/R/lib/SparkR /home//R/x86_64-pc-linux-gnu-library/3.2
sudo apt-get install openjdk-7-jdk
Launch RStudio from web browser:
Within RStudio, run the following to test the setup:
Sys.setenv(‘SPARKR_SUBMIT_ARGS‘=‘“–packages” “com.databricks:spark-csv_2.10:1.0.3” “sparkr-shell”‘)
# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName=“SparkR-Flights-example“)
sqlContext <- sparkRSQL.init(sc)
# The SparkSQL context should already be created for you as sqlContext
# You should get this output — Java ref type org.apache.spark.sql.SQLContext id 1
Make sure you replace with your Ubuntu username. Please leave a comment if you had any trouble with the process.