R Language Essentials

Learn R Language Essentials concepts: creating variables, user input methods, handling impossible values (NA/NaN), memory limits, and binary operators. Perfect for R beginners and data science learners.

R Language Essentials: Variables, Input, Memory & Operators

How to create a new variable in R?

In the R language, there are many ways to create a new variable, depending on your data structure and needs. Here are some important ways to create a new variable in R.

Assignment Operator

For creating a new variable assignment operator, ‘<-‘ is used. For example,

mydata$sum <- mydata$x1 + mydata$x2

Using the $ operator (for data frames)

# Create a data frame
df <- data.frame(x = 1:5, y = 6:10)

# Add a new variable
df$z <- df$x + df$y

## Output
	df$z
[1]  7  9 11 13 15

Using bracket notation [ ]

df["new_var"] <- df$x * 2

Using the transform() function

df <- transform(df, 
                sum_xy = x + y,
                product_xy = x * y)

Using within() function

df <- within(df, {
  ratio <- x / y
  squared_diff <- (x - y)^2
})

Creating variables in vectors

# Create a vector
my_vector <- c(1, 2, 3, 4, 5)

# Add names to vector elements
names(my_vector) <- c("a", "b", "c", "d", "e")

# Create new vector from existing
new_vector <- my_vector * 2
R Language Essentials: Variables, Input, Memory & Operators

How to request input from the user through keyboard and monitor?

In the R language, there is a series of functions that can be used to request input from the user, including readline(), cat(), and scan(). But I find the readline() function to be the optimal function for this task.

readline() – Basic Text Input

# Simple text input
name <- readline(prompt = "Enter your name: ")
age <- as.numeric(readline(prompt = "Enter your age: "))

cat("Hello", name, "! You are", age, "years old.\n")

Basic Keyboard Input

# Read numeric input from keyboard
cat("Enter numbers (press Enter twice to finish):\n")
numbers <- scan()

# Read character input
cat("Enter text (press Enter twice to finish):\n")
text <- scan(what = character())

# Read with prompt for each input
values <- scan(n = 5)  # Reads exactly 5 values
Rfaqs.com R Language Essentials: Variables, Input, Memory & Operators

How are impossible values represented in R?

In the R language, impossible or undefined values are represented using special values and NA types.

NaN (Not a Number)

Represents mathematically undefined numeric operations.

# Operations that produce NaN
0 / 0           # NaN - 0 divided by 0
Inf - Inf       # NaN - Infinity minus infinity
Inf / Inf       # NaN - Infinity divided by infinity
sqrt(-1)        # NaN - Square root of negative number
log(-1)         # NaN - Log of negative number
asin(2)         # NaN - Arcsin of number > 1

# Check for NaN
is.nan(0/0)     # TRUE
is.nan(5)       # FALSE

Inf and -Inf (Infinity)

Represent positive and negative infinity.

# Positive infinity
1 / 0           # Inf
exp(1000)       # Inf (if result exceeds limits)
10^1000         # Inf

# Negative infinity
-1 / 0          # -Inf
log(0)          # -Inf

# Check for infinity
is.infinite(1/0)    # TRUE
is.finite(1/0)      # FALSE

NA (Not Available)

Represents missing or undefined values.

# Different NA types
numeric_na <- NA_real_      # Numeric NA
integer_na <- NA_integer_   # Integer NA
character_na <- NA_character_  # Character NA
logical_na <- NA            # Logical NA (default)

# Check for NA
is.na(NA)          # TRUE
is.na(5)           # FALSE

NULL

Represents an empty or undefined object (different from NA).

# NULL examples
empty_list <- NULL
uninitialized_var <- NULL

# Functions returning NULL
result <- print("hello")  # print() returns NULL

# Check for NULL
is.null(NULL)      # TRUE
is.null(NA)        # FALSE (NA is not NULL!)

What is the memory limit of R?

The memory limit in the R language depends on several factors, including your operating system, R version, architecture (32-bit vs 64-bit), and system configuration.

Operating System Differences

For Windows Operating Systems

# Check memory limit on Windows
memory.limit()    # Returns current limit in MB
memory.size()     # Current memory usage in MB
memory.size(max = TRUE)  # Maximum memory used

# Set memory limit (Windows only)
memory.limit(size = 16000)  # Set to 16GB

For MacOS and Linux Systems

# No explicit memory limit functions
# Limited by system RAM and swap space

# Check system memory
system("free -h", intern = TRUE)      # Linux
system("vm_stat", intern = TRUE)      # macOS

32-bit vs 64-bit Architecture

32-bit R

  • Maximum addressable memory: ~4GB
  • Practical limit: ~3-3.5GB
  • Vector size limit: 2^31-1 elements (~2.1 billion)
  • Common issue: “Cannot allocate vector of size…”

64-bit R

  • Theoretical limit: 8TB on 64-bit Windows, much larger on Linux/macOS
  • Vector size limit: 2^48-1 elements on Windows, 2^64-1 on Linux/macOS
  • Practical limit: Your available RAM + swap space
# Check if you're running 64-bit R
.Platform$r_arch        # "x64" for 64-bit, "" for 32-bit
.Machine$sizeof.pointer  # 8 for 64-bit, 4 for 32-bit

# Maximum vector length
.Machine$integer.max     # 2147483647 (2^31-1)

On which type of data do binary operators in R work?

Binary operators in the R language work on various data types, but their behavior depends on the types of operands involved. Binary operators are applied to matrices, vectors, and scalars.

Statistics and Data Analysis

Files in R Language

Learn everything about files in R, including .RData, CSV, Excel, and text files. Discover how to read, write, and restore R objects using load(), save(), read.csv(), and more. Explore best practices for file handling in R and compare different file formats for efficient data management. Perfect for R programmers, data analysts, and researchers working with datasets in R.

What is a File in the R Language?

In R, a file refers to data stored on a computer storage device. The script written in R has an extension *.R that can read into R or write from R. R Files are essential for importing external data, saving results, and sharing work. The R script files contain code that can be executed within the R software environment.

Describe commonly used Files in R

For illustration purposes, I have categorized the commonly used files in R as code files, data files, and specialized data files.

Code Files:

  • .R (R script files)
  • .Rmd (R Markdown files)

Data Files:

  • .csv (Comma Separated Values) – Most common for tabular data
  • .txt (Plain text files)
  • .xlsx or .xls (Excel files)
  • .RData or .rda (R’s native binary format)

Specialized Data Formats:

  • .json (for structured data)
  • .xml (for hierarchical data)
  • .sav (SPSS files)
  • .dta (Stata files)
Files in R Language

What are the best Practices for using Files in R?

  • Use relative paths when possible for portability
  • Check file existence before reading
  • Close connections (when the database connection is open) after reading/writing certain file types
  • Consider using the package here for more reliable file paths

What is .RData Files in R

An .RData (or .rda) file is a binary file format used by R. It is used to save multiple objects (variables, data frames, functions, etc.) in a compressed, space-efficient way. It is R’s native format for storing workspace data.

What are the Key Features of .RData Files?

The key features of .RData files in R are:

  1. Stores Multiple Objects
    • The ..RData can save several R objects (e.g., data frames, lists, models) in a single file.
    • Example: save(df, model, list1, file = "mydata.RData")
  2. Binary Format (Not Human-Readable)
    • Unlike .csv or .txt, .RData files are not plain text and cannot be opened in a text editor.
  3. Compressed by Default
    • Uses compression to reduce file size (especially useful for large datasets).
  4. Platform-Independent
    • Can be shared across different operating systems (Windows, macOS, Linux).
  5. Preserves Attributes
    • Keeps metadata (e.g., variable labels, factors, custom classes).

Which command is used for restoring an R object from a file?

In R, one can restore the saved objects from a file using the load() function. The load() command loads all objects stored in the file into the current R environment. This command works with .RData or .rda files (these are binary files used by R). This command does not work with .csv, .txt, or xlsx, etc. files.

Explain the use of load() command with example

The following example first creates objects $x$, $y$, and $z$. These objects will be saved in “my_work.RData” file. These objects will appear in the R workspace after loading.

x <- rnorm(10)
y <- 1:20
z <- "Level of Significance"

save(x, y, z, file = "my_work.RData")
load("my_work.RData")

How many ways are there to read and write files in R?

There are dozens of ways to read and write files in R. The best approach depends on the file type and size. Depending on the file format and the packages used, the following is a categorized breakdown of the most common methods:

Base R Functions

  • Reading Files
    • read.table(): Generic function to read tabular data (e.g., .txt).
    • read.csv(): For comma-separated values (CSV) files.
    • read.delim(): For tab-delimited files (.tsv or .txt).
    • scan(): Low-level function to read raw data.
    • load(): Restores R objects from .RData or .rda files.
    • readRDS(): Reads a single R object from .rds files.
  • Writing Files
    • write.table(): Writes data frames to text files.
    • write.csv(): Writes to CSV files.
    • write.delim(): Writes tab-delimited files.
    • save(): Saves multiple R objects to .RData or .rda.
    • saveRDS(): Saves a single R object to .rds.

Using Packages

  • Reading Files
PackageFunctionFile Type Supported
readrread_csv()Faster CSV reading
readxlread_excel()Excel (.xlsx, .xls)
data.tablefread()Fast CSV/TSV import
havenread_spss()SPSS (.sav)
havenread_stata()Stata (.dta)
jsonlitefromJSON()JSON files
xml2read_xml()XML files
  • Writing Files
PackageFunctionFile Type Supported
readrwrite_csv()Faster CSV export
writexlwrite_xlsx()Excel (.xlsx)
data.tablefwrite()Fast CSV/TSV export
havenwrite_sav()SPSS (.sav)
havenwrite_dta()Stata (.dta)
jsonlitetoJSON()JSON files
xml2write_xml()XML files

Specialized Methods

For Large Datasets

  • vroom (from the vroom package) – High-speed reading of large CSV/TSV files.
  • arrow (Apache Arrow) – Efficient for big data (supports Parquet, Feather formats).

For Databases

  • DBI + RSQLite/RMySQL/odbc: Read/write from SQL databases.

For Binary & Custom Formats

  • feather: Fast binary storage (works well with Python).
  • qs: A faster alternative to saveRDS() for large objects.

Statistics and Data Analysis

Functions in R

Functions in R programming are reusable blocks of code that perform specific tasks, improving efficiency and readability. This guide covers how to write functions in R, their key features (lexical scoping, closures, generics), and practical examples for data science & automation. It is perfect for beginners and advanced users!

What are Functions in R Language?

A function is a chunk of code written to carry out a specified task. It can or cannot accept arguments (also called parameters), and it can or cannot return one or more values. In R, functions are objects in their own right. Hence, we can work with them the same way we work with any other type of object.

Objects in the function are local to the function. One can return the object as any data type.

What is Function Definition?

An R function is created using the keyword function. The basic syntax of an R function definition is as follows –

Function_name <- function(arg_1, arg_2, …) {
    Function body 
}

What are the Components of R functions?

The different components of a function are:

  • Function Name: Function Name is the actual name of the function because it is stored in the R environment as an object with this name.
  • Arguments: An argument is a placeholder. When a function is invoked, we pass a value to the Argument. Arguments are optional; that is, a function may contain no arguments. Arguments can also have default values.
  • Functions Body: In a function body, statements can be collected. It defines what the function does.
  • Return Value: The return value of a function is the last expression in the function body to check.

What are the Key Features of R Functions?

The following are key features of R functions:

  • Generic Functions: Work differently based on input class (e.g., print(), plot()).
  • First-class Objects: First-class Objects can be assigned, passed as arguments, and returned.
  • Lexical Scoping: Variables are looked up where the function is defined.
  • Flexible Arguments: Default values, optional args, and ... (variable-length args).
  • Closures: Can remember their environment (useful in functional programming).

What are Generic Functions in R?

Generic Functions in R behave differently based on the class of their input arguments. They use method dispatch to call the appropriate version (method) of the function for a specific object type. The generic function allows one function name to work for different object types (e.g., print(), plot(), and summary()).

What is the Attribute Function in R?

To get or set a single attribute, you can use the attr() function. This function takes two important arguments. The first argument is the object we want to examine, and the second argument is the name of the attribute we want to see or change. If the attribute we ask for does not exist, R simply returns NULL.

What is an arbitrary function in R?

Arbitrary function means any function. Generally, an arbitrary function refers to a function that belongs to the same class of functions we are discussing (its freedom is limited). For example, when talking about continuous real-valued functions defined on the bounded closed interval of the real line, an arbitrary function may refer to a function of the same type.

What are the Types of Functions in R?

In R, the following are types of functions:

  • Built-in Functions: R has many built-in functions such as sum(), mean(), and plot().
numbers <- c(2, 4, 6, 8)
mean(numbers)  

## Output: 5
  • User-defined Functions: Custom functions created by users, for example,
# Define a function to add two numbers
add_numbers <- function(a, b) {
  return(a + b)
}

# Call the function
add_numbers(5, 3)  

## Output: 8
  • Generic Functions (Polymorphic Behavior): Generic functions behave differently based on input class. For example, print() behaves differently for numbers and lm models.
  • Recursive Functions: Recursive functions call themselves (useful for iterative algorithms).
# Recursive factorial function
factorial <- function(n) {
  if (n == 0) return(1)
  else return(n * factorial(n - 1))
}

factorial(5)  

## Output: 120
Functions in R Language

What are the Best Practices for Writing Functions in R?

The following are considered best practices when writing functions in R Programming Language.

Use Descriptive Names (e.g., calculate_mean() instead of f1()).
Keep Functions Short & Focused (Single Responsibility Principle).
Add Comments for clarity.
Use Default Arguments for flexibility.
Test Functions with different inputs.

Functions in R make your code modular, reusable, and efficient. Whether you’re performing data analysis, building models, or creating visualizations, mastering functions will significantly improve your R programming skills.

Machine Learning Quiz