RStudio is an integrated development environment (IDE) for the R programming language. It provides a suite of tools that make working with R much easier, including code completion, debugging, visualizations, and notebook publishing. This guide will walk through installing RStudio on Ubuntu 20.04 and getting started with some basic usage.
Installing R
R can be installed from Ubuntu‘s default repositories. This will provide the latest supported version:
sudo apt update
sudo apt install r-base
However for specific needs, you may want alternate versions they can compiled from source:
wget https://cran.r-project.org/src/base/R-4/R-4.2.2.tar.gz
tar -xf R-4.2.2.tar.gz
cd R-4.2.2
./configure
make
sudo make install
Compiling from source allows targeting particular CPU architectures optimizations like AVX2.
Once installed, check the version:
R --version
RStudio Alternatives
While RStudio is likely the most popular IDE for R, some alternatives worth considering:
- Jupyter Notebook – Code and document R in notebooks along Python and other languages
- Emacs/Emacs Speaks Statistics – Extensively customize workflows with this text editor
- Visual Studio Code – Microsoft‘s free IDE with R plugin for remote workspaces and collaboration features
- Atom – Hackable open-source IDE with packages for autocomplete, debugging, plotting
However RStudio still sets itself apart with the most seamless integration across reporting, plotting, packaging, collaboration and publishing outputs.
Installing RStudio
Download the latest .deb package (currently rstudio-2022.12.0-353-amd64.deb) and install using dpkg:
wget https://download1.rstudio.org/desktop/bionic/amd64/rstudio-2022.12.0-353-amd64.deb
sudo dpkg -i rstudio-*.deb
Alternatively, automated updates can be enabled by adding RStudio‘s repository:
wget -qO- https://rstudio.org/download/latest/ubuntu/rstudio-key.asc | sudo tee /usr/share/keyrings/rstudio-keyring.asc &>/dev/null
echo "deb [signed-by=/usr/share/keyrings/rstudio-keyring.asc] https://download2.rstudio.org/server/bionic main" | sudo tee /etc/apt/sources.list.d/rstudio.list
sudo apt-get update
sudo apt-get install rstudio-desktop
Once installed, launch RStudio Desktop from the applications menu or command line with rstudio.
RStudio IDE Tour
When you first launch RStudio, you will be presented with a multi-pane interface:

The default layout comprises:
- Source Pane – R script editor with syntax highlighting, smart code completion,multiple-file editing
- Console Pane – Read–eval–print loop (REPL) for running code line-by-line
- Environment/History Pane – Explore variable contents, access history, manage objects
- Files/Plots/Packages/Help Pane – GUI interfaces for key components like visualization, package management, documentation lookup
This layout can be customized extensively based on personal preference – panes can be added, removed, resized, and reordered via the View > Panes menu.
For example, you may opt to hide the console pane to maximize space
for the script editor. Or split documents horizontally to view multiple source files. Nearly any interface configuration is possible.
RStudio Projects
To keep work organized, RStudio introduces the concept of projects – self-contained workspaces storing related data, code, results and reports as a portable unit.
Creating a new project generates an associated directory for files, with sub-folders like /data or /figures created by default. Switching into a project resets paths and workspace/history accordingly.
This enables easily bundling up everything required to resume work later, share with others, or archive results submitted for publication.
Projects can be created from the File > New Project menu or by using the projectTemplate() function.
R Basics
Now that RStudio is setup, let‘s go through some R basics – from simple arithmetic to data structures and analysis.
Math Operators
Common mathematical operators like +, -, *, / behave as expected:
2 + 2
## [1] 4
Use parentheses to dictate order of operations:
(2 + 3) * 4
## [1] 20
Exponents, logs, trig and other math functions are included:
sin(pi/2)
## [1] 1
See ?Math for the full list.
Variable Assignment
Use the <- arrow for assignment:
x <- 2 + 2
print(x)
## [1] 4
Alt + - keys produce the assignment arrow in most keyboards.
It assigns the output of the expression on the right to a variable name on the left.
Data Types
R includes common data types like:
-
numeric – decimal numbers
- Doubles by default, can specify integer with
Lsuffix
- Doubles by default, can specify integer with
-
integer – round numbers
-
complex – complex numbers with real & imaginary parts
-
logical – boolean TRUE / FALSE values
-
character – string text
Check types with class() function:
x <- 5 # numeric
y <- 5L # integer
z <- 5+3i # complex
a <- TRUE # logical / boolean
b <- "text" # character
print(class(x))
print(class(y))
print(class(z))
print(class(a))
print(class(b))
## [1] "numeric"
## [1] "integer"
## [1] "complex"
## [1] "logical"
## [1] "character"
Data Structures
Beyond atomic types, R includes data structures for storing data collections:
-
Vectors – Ordered collections, 1d arrays
-
Lists – Ordered, heterogeneous collections
-
Matrices – 2d rectangular dataset
-
Arrays – Multidimensional generalizations of matrices
-
Data Frames – Tabular datasets comprised of equal-length vectors
-
Factors – Nominal/ordinal categorical variables
Some usage examples:
vec <- c(1, 3, 5) # vector
lst <- list(a = 1, b = "text") # list
matrix(1:6, nrow = 2, ncol = 3) # matrix
array(1:24, dim = c(3,4,2)) # 3D array
data.frame(x = 1:3, y = 4:6) # data frame
Data frames (used most commonly) will be covered more below.
See ?Compound for more data structure details.
Importing & Tidying Data
Loading datasets is essential to analysis in RStudio.
Data can originate locally from files, databases or spreadsheets, as well externally via web APIs or scraping.
Importing Local Data
Common options for getting data locally into R include:
- CSV files –
read.csv() - Text/log files –
read.delim()/read.fwf()/read.table() - Excel spreadsheets –
readxl::read_excel() - JSON –
jsonlite::fromJSON() - RDBMS – RMySQL, RPostgres, RSQLite packages
- SPSS/SAS/Stata –
haven::read_sas(),haven::read_spss()etc
For example reading a CSV:
df <- read.csv("data.csv")
Or Excel sheet:
library(readxl)
df <- read_excel("data.xlsx", sheet = "Sheet1")
See the import options at Help -> Data Import -> Import Data.
Tidy Data
Best practice is for data to be tidy before analyzing – meaning:
- Each variable has its own column
- Each observation forms a row
- Each value sits in its own cell
This facilitates data manipulation using the "tidyverse" set of packages like dplyr and tidyr designed specifically for tidy data.
Untidy formats like column headers with multiple variables should be broken out:
Year, Location, Sales Reps (John, Jane, Jake), Revenue
2010, East, 10, 20, 30, 100000
Would be broken into:
Year, Location, JohnSales, JaneSales, JakeSales, Revenue
2010, East, 10, 20, 30, 100000
With a separate column per variable.
The tidyr pivot functions are handy for reshaping data as needed.
Data Analysis Examples
Now let‘s go through some examples working with datasets for visualization and modeling tasks.
The iris Dataset
A classic dataset for data analysis is Ronald Fisher‘s Iris flower dataset. This contains measurements of 150 flowers across 3 species:
# Load dataset
iris <- datasets::iris
# View column names
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
It captures numeric measurements of sepal & petal dimensions, along with the categorical species classification.
First let‘s check out some summaries:
summary(iris)
str(iris)
This reveals variable ranges and the data types – a combination of numeric measurements and factors (categoricals) for the species.
Visualizing
Now let‘s visualize the iris data by plotting the petal dimensions grouped by species:
plot(iris$Petal.Width,
iris$Petal.Length,
col = iris$Species)

And as a boxplot showing sepal width distributions per species:
boxplot(iris$Sepal.Width ~ iris$Species,
xlab = "Species",
ylab = "Sepal Width (cm)")

This gives a sense for how clustering measurements and sepal width ranges differ across the 3 iris species types.
Many more advanced visualizations are possible with ggplot2 and other graphic packages.
Modeling
Given measurements, we can train machine learning models to automatically predict the iris species represented from input data as classification models.
First, we‘ll split the data into 80% train, 20% test:
library(caret)
set.seed(22519)
indexes = sample(1:nrow(iris), size = 0.8*nrow(iris))
train = iris[indexes,]
test = iris[-indexes,]
Then we can train a model – here using random forest, but many options available:
library(randomForest)
model = randomForest(Species ~ ., data = train)
Now predict on the held-out test data:
predictions = predict(model, newdata = test)
And evaluate accuracy:
mean(predictions == test$Species)
## [1] 0.9722222
Reaching 97% accuracy – not bad for this quick modeling exercise!
We have barely scratched the surface of R‘s machine learning capabilities. Check out the caret package and its workflows for comparing dozens of algorithms with parameter tuning, cross-validation and other best practices all built-in.
Reproducible Reporting
Once you have done your analysis, effectively communicating results and findings is critical.
R Markdown documents allow inlining R code, results, text and visualizations to publish interactive reports, presentations, papers and more with a single R Notebook:

Output options range from HTML/PDF documents to notebooks, dashboards, books and journal articles.
R Notebooks are reproducible – the code + narrative allows regenerating any data, analysis or reports fully automatically.
This facilitates sharing more transparent, self-contained analyses for diagnosis, collaboration or publication of research in academia and industry.
RStudio Server/Compute Options
While the RStudio IDE has been covered extensively here as a desktop application, RStudio Server editions are also available to allow access to remote development environments through a web browser.
RStudio server can be deployed on a centralized server, cloud instance, or service platform like AWS, GCP and reconnect to computing resources and storage hosted elsewhere.
Some motivations for remote development environments:
- Streamline administration without needing to install/update software on individual machines
- Leverage more powerful server hardware like GPUs or large memory capacities if computing constraints present
- Bring analyses to the data for scenarios where transferring data to local desktops may be restricted due regulatory reasons
- Promote collaboration across geographic distances more easily
If opting to self-host RStudio server, some recommended Ubuntu server optimizations:
- Fast processors – Prioritize high CPU clock speeds and core counts
- Max RAM – Memory capacity almost always the primary bottleneck
- Fast storage – SSD storage helps with launching environments/reading data
- Resource allocation – Control per-user RAM allocations depending on workload target
- Scaling clusters – Horizontally scale RStudio jobs across auto-scaling server clusters to enable parallel computing of very large workloads
Best Practices
To summarize some best practices covered in this guide for efficient workflows:
- Organize projects in self-contained folders storing related data/code/results
- Strive for making data tidy before analyzing
- Control randomness/sampling via a consistent seed for reproducibility
- Consider notebooks to unify code/comments/results in one view
- Take advantage of built-in RStudio tools for version control, plotting, packages etc
- Profile + optimize performance for intensive computing procedures
- Use remote development servers to scale complex workloads on faster infrastructure
Following these and other good habits will ensure analyses run smoothly.
Conclusion
This guide just scratched the surface of using RStudio for advanced analysis tasks on Ubuntu. Visit RStudio‘s learning resources for more or refer to documentation for specifics on any aspect like visualization or modeling.
With such a breadth tools and capabilities backed by an large open-source community, RStudio provides an excellent environment taking projects from data to insights using the R language.
So in summary:
- 🐧 RStudio desktop makes R much friendlier to work with
- ⚙️ Tweak and customize the IDE layout to your preference
- 📁 Use projects to organize all files for a given analysis
- 📊 Import, process and explore your datasets
- 🔬 Train models and create beautiful visualizations
- 🚀 Scale up development environments remotely with RStudio Server editions
RStudio + R let you turn data into knowledge, seamlessly taking ideas from conception all the way through publication and sharing – give it a try with your next analysis!


