With its many useful inbuilt packages and its ease of use, R has become increasingly popular as a tool for data analysis. Users are able to perform complex statistical analyses or machine learning algorithms on their data easily. However, not all is sunshine and rainbows — R stores all objects in memory. This means that you will need lots of RAM and computational power in order to analyze larger datasets.
I quickly faced this memory limitation problem in R while playing around with some larger datasets that had about 4 million rows. In a bid to overcome the long waiting times for something to process, I began looking for extra computational resources in the cloud. That’s when i stumbled across Louis Aslett’s website. You can easily create an EC2 instance that runs RStudio using these Rstudio server Amazon Machine Images (AMIs). Of course, there are also ways to set up your own Rstudio environment on your own, but using an AMI makes the process easier for those who are not so Linux-savvy.
Steps to setting up your Rstudio environment on EC2:
1) Get an Amazon account. It can be the same account you use to shop on Amazon.com
2) Log in to the Amazon AWS Console
3) Choose one of the AMIs available on Louis Aslett’s website based on your region
4) Choose an EC2 Instance Type. Note that only the t2.micro is free tier eligible. It’s best to get acquainted with the fees and resources available for each instance type before beginning. You only get 1 year of free use for the t2.micro and it only offers 1 GiB of memory. After you’re done choosing, proceed to Configure Security Group.
5) Make sure you have 2 types. SSH port range 22, and HTTP port range 80. These two are the necessary to access Rstudio. When you’re done, click Launch.
5) You might get a pop-up window at this stage requesting you to select a key pair. It is important to create a key pair if you don’t already have one. Just insert a name for it and download it. This will create the pem file you’ll need to ssh into the server to access your files on the EC2 instance. After launching, click View Instances to see the instance you just initiated.
6) Select the instance that is running, and copy the Public DNS. If you paste this into your browser address bar, you should see the RStudio login page. The default username and password is both “rstudio”. Follow the instructions in the Welcome.R script to change your password, and then close the file without saving. Your Rstudio environment is now up!
Now that you have your RStudio set up on Amazon EC2, you will need to know how to move your files from your local environment into the cloud, and vice versa. This is where the key pair file your downloaded in step 5 comes into play.
How to copy files from your computer into your EC2 instance:
1) Fire up your terminal. I use iTerm on a Mac. Go to the directory where you saved the key pair file. you can do this by typing something like
cd ~/Downloads
2) Use the chmod command to change the permissions on the key pair file
chmod 400 nameofyourkeypair.pem
3) Copy your local files into your EC2 instance. You will need to include the entire file path of your key pair file, and the public DNS of your instance would be what you pasted in your browser address bar to access RStudio. Here, we are assuming you are in the same directory as where FileName.txt is saved.
scp -i ~/Downloads/nameofyourkeypair.pem FileName.txt ubuntu@your-public-DNS:~
If you receive a prompt asking if you are sure you want to continue, just type ‘yes’ and continue. Your file should be copied over
How to copy files from your EC2 instance to your computer:
1) Step 1 and 2 of the above apply as well. If you have already done this before, you can skip.
2) Enter the below command to copy files from your EC2 instance to your computer. The first file path is the location of your file in the EC2 instance, and the second file path would be where you want your file to be saved in your local environment.
scp -i ~/Downloads/nameofyourkeypair.pem ubuntu@your-public-DNS:~/FileName.txt ~/Documents/FileName2.txt
Have Fun!