215

In R, how do you add a new row to a data frame once the data frame has already been initialized?

So far I have this:

df <- data.frame("hi", "bye")
names(df) <- c("hello", "goodbye")

#I am trying to add "hola" and "ciao" as a new row
de <- data.frame("hola", "ciao")

merge(df, de) # Adds to the same row as new columns

# Unfortunately, I couldn't find an rbind() solution that wouldn't give me an error

Any help would be appreciated

7
  • 3
    assign names to de too. names(de) <- c("hello","goodbye") and rbind Commented Feb 12, 2015 at 0:14
  • 6
    Or in one line rbind(df, setNames(de, names(df))) Commented Feb 12, 2015 at 0:15
  • 3
    This really is an area which base R fails miserably at, and has for a long time: stackoverflow.com/questions/13599197/… Commented Feb 12, 2015 at 0:49
  • 2
    @thelatemail disagree. data frames are a special structure in r. a list of lists with common dimnames and attributes and methods. I think it is very expected that one cannot rbind(data.frame(a = 1), data.frame(b = 2)).. why would you want to? I would hope that would throw an error regardless. It's like merge'ing with a random by variable. And this is 2015, doesn't everyone set options(stringsAsFactors = FALSE)? Commented Feb 12, 2015 at 15:34
  • 4
    @rawr - sure, different names shouldn't be bound, but R can't handle binding no names to no names, binding names to no names with the same dimensions, or binding new data to incorporate new factor levels. I think that's a weakness. Particularly when it can handle binding repeated names and all NA names. And setting stringsAsFactors=FALSE can be a quick fix, but changing the defaults that other people are going to have set differently can really ruin a day. Commented Feb 12, 2015 at 22:21

16 Answers 16

215

Let's make it simple:

df[nrow(df) + 1,] = c("v1","v2")
Sign up to request clarification or add additional context in comments.

8 Comments

This causes problems when trying to add a new row with mixed data types (some string, some numeric). In such a case, even the numeric values are converted to string. One workaround is to add the values separately, something like the following (assuming there are 3 columns): df[nrow(df) + 1, 1:2] = c("v1", "v2") and df[nrow(df), 3] = 100 But still it's a good point about adding new row. So, +1
Or use "list" instead of "c".
@Matheus Araujo: Is this the most efficient way to add row to a df? I have 100k+ rows to be added in a loop. Feel like nrow would get slower as number of rows increase.
Tried this with data.table but tells with nrow+1 is out of range.
data.table is not a data.frame
@Arani there's already an answer with list(). I reverted your edit.
|
178

Like @Khashaa and @Richard Scriven point out in comments, you have to set consistent column names for all the data frames you want to append.

Hence, you need to explicitly declare the columns names for the second data frame, de, then use rbind(). You only set column names for the first data frame, df:

df<-data.frame("hi","bye")
names(df)<-c("hello","goodbye")

de<-data.frame("hola","ciao")
names(de)<-c("hello","goodbye")

newdf <- rbind(df, de)

2 Comments

Thanks! Any idea how to fix this if I dont have a second dataframe declared, but instead have each value I want to add to a new row stored as a variable?
Try: newdf<-rbind(df, data.frame(hello="hola", goodbye="ciao")) OR with variable: newdf<-rbind(df, data.frame(hello=var1, goodbye=var2))
96

There's now add_row() from the tibble or tidyverse packages.

library(tidyverse)
df %>% add_row(hello = "hola", goodbye = "ciao")

Unspecified columns get an NA.

4 Comments

I liked this approach if you stick to the tidyverse philosophy. Otherwise basic R syntax is a survival skill that comes in handy when you are in an environment where you don't have privileges to import packages. I particularly like the answer using plain R syntax with rbind and as.matrix below
Just I'd like to mention that the library is dplyr.
@Ariel To be specific, yes. But it's usually going to just be "tidyverse" to load anything else that you might want.
If it is already in a data frame, you can do: df1 %>% add_row(df2). If there are multiple rows in df2 they will also be appended. @Ariel add_row is imported by dplyr.
63

Or, as inspired by @MatheusAraujo:

df[nrow(df) + 1,] = list("v1","v2")

This would allow for mixed data types.

Comments

25

I like list instead of c because it handles mixed data types better. Adding an additional column to the original poster's question:

#Create an empty data frame
df <- data.frame(hello=character(), goodbye=character(), volume=double())
de <- list(hello="hi", goodbye="bye", volume=3.0)
df = rbind(df,de, stringsAsFactors=FALSE)
de <- list(hello="hola", goodbye="ciao", volume=13.1)
df = rbind(df,de, stringsAsFactors=FALSE)

Note that some additional control is required if the string/factor conversion is important.

Or using the original variables with the solution from MatheusAraujo/Ytsen de Boer:

df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen", volume=20.2)

Note that this solution doesn't work well with the strings unless there is existing data in the dataframe.

1 Comment

If hello and goodbye are in character in df, you can do the following. You do not necessarily use names in a list. df <- data.frame(hello = "hi", goodbye = "bye", volume = 1,stringsAsFactors = FALSE); rbind(df, list("hola", "ciao", 100)).
14

Not terribly elegant, but:

data.frame(rbind(as.matrix(df), as.matrix(de)))

From documentation of the rbind function:

For rbind column names are taken from the first argument with appropriate names: colnames for a matrix...

1 Comment

This solution works without needing to specify the columns to add, which is much better for applications on large datasets
4

If you want to make an empty data frame and add contents in a loop, the following may help:

# Number of students in class
student.count <- 36

# Gather data about the students
student.age <- sample(14:17, size = student.count, replace = TRUE)
student.gender <- sample(c('male', 'female'), size = student.count, replace = TRUE)
student.marks <- sample(46:97, size = student.count, replace = TRUE)

# Create empty data frame
student.data <- data.frame()

# Populate the data frame using a for loop
for (i in 1 : student.count) {
    # Get the row data
    age <- student.age[i]
    gender <- student.gender[i]
    marks <- student.marks[i]

    # Populate the row
    new.row <- data.frame(age = age, gender = gender, marks = marks)

    # Add the row
    student.data <- rbind(student.data, new.row)
}

# Print the data frame
student.data

Hope it helps :)

Comments

4

To build a data.frame in a loop:

df <- data.frame()
for(i in 1:10){
  df <- rbind(df, data.frame(str="hello", x=i, y=i*10))
}

1 Comment

I think usually for loop is something we try our best to avoid in R...
3

I think,

rbind.data.frame(df, de)

should do the trick

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
2

In the tidyverse, this is commonly done with bind_rows:

df1 <- data.frame(hello = "hi", goodbye = "bye")
df2 <- data.frame(hello = "hola", goodbye = "ciao")

library(dplyr)

bind_rows(df1, df2)

In dplyr >= 1.0.0 you could use row_insert (though probably overkill for this situation):

df1 %>% 
  rows_insert(df2)
Matching, by = "hello"
  hello goodbye
1    hi     bye
2  hola    ciao

Note: all columns in df2 must exist in df1, but not all columns in df1 have to be in df2.

For additional behavior, there are other row_* options. For example, you could use row_upsert which will overwrite the values if they exist already, otherwise it will insert them:

df2 <- data.frame(hello = c("hi", "hola"), goodbye = c("goodbye", "ciao"))

library(dplyr)

df1 %>% 
  rows_upsert(df2)
Matching, by = "hello"
  hello goodbye
1    hi goodbye # bye updated to goodbye since "hi" was already in data frame
2  hola    ciao # inserted because "hola" was not in the data frame

These functions work by matching key columns. If the by argument is not specified then the default behavior is to match the first column in the second data frame (df2 in this example) to the first data frame (df1 in this example).

Comments

1

There is a simpler way to append a record from one dataframe to another IF you know that the two dataframes share the same columns and types. To append one row from xx to yy just do the following where i is the i'th row in xx.

yy[nrow(yy)+1,] <- xx[i,]

Simple as that. No messy binds. If you need to append all of xx to yy, then either call a loop or take advantage of R's sequence abilities and do this:

zz[(nrow(zz)+1):(nrow(zz)+nrow(yy)),] <- yy[1:nrow(yy),]

Comments

1

I need to add stringsAsFactors=FALSE when creating the dataframe.

> df <- data.frame("hello"= character(0), "goodbye"=character(0))
> df
[1] hello   goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
Warning messages:
1: In `[<-.factor`(`*tmp*`, iseq, value = "hi") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, iseq, value = "bye") :
  invalid factor level, NA generated
> df
  hello goodbye
1  <NA>    <NA>
> 

.

> df <- data.frame("hello"= character(0), "goodbye"=character(0), stringsAsFactors=FALSE)
> df
[1] hello   goodbye
<0 rows> (or 0-length row.names)
> df[nrow(df) + 1,] = list("hi","bye")
> df[nrow(df) + 1,] = list("hola","ciao")
> df[nrow(df) + 1,] = list(hello="hallo",goodbye="auf wiedersehen")
> df
  hello         goodbye
1    hi             bye
2  hola            ciao
3 hallo auf wiedersehen
> 

Comments

1

Make certain to specify stringsAsFactors=FALSE when creating the dataframe:

> rm(list=ls())
> trigonometry <- data.frame(character(0), numeric(0), stringsAsFactors=FALSE)
> colnames(trigonometry) <- c("theta", "sin.theta")
> trigonometry
[1] theta     sin.theta
<0 rows> (or 0-length row.names)
> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
> trigonometry[nrow(trigonometry) + 1, ] <- c("pi/2", sin(pi/2))
> trigonometry
  theta sin.theta
1     0         0
2  pi/2         1
> typeof(trigonometry)
[1] "list"
> class(trigonometry)
[1] "data.frame"

Failing to use stringsAsFactors=FALSE when creating the dataframe will result in the following error when attempting to add the new row:

> trigonometry[nrow(trigonometry) + 1, ] <- c("0", sin(0))
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "0") :
  invalid factor level, NA generated

Comments

1

To formalize what someone else used setNames for:

add_row <- function(original_data, new_vals_list){ 
  # appends row to dataset while assuming new vals are ordered and classed appropriately. 
  # new_vals must be a list not a single vector. 
  rbind(
    original_data,
    setNames(data.frame(new_vals_list), colnames(original_data))
    )
  }

It preserves class when legal and passes errors elsewhere.

m <- mtcars[ ,1:3]
m$cyl <- as.factor(m$cyl)
str(m)

#'data.frame':  32 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...

Factor preserved when adding 4, even though it was passed as a numeric.

str(add_row(m, list(20,4,160)))
#'data.frame':  33 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ... 
# $ disp: num  160 160 108 258 360 ...

Attempting to pass a non- 4,6,8 would return an error that factor level is invalid.

str(add_row(m, list(20,3,160)))
# 'data.frame': 33 obs. of  3 variables:
# $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
# $ disp: num  160 160 108 258 360 ...
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 3) :
  invalid factor level, NA generated

Comments

1

Building on prior answers, new rows can be added to a data frame using replacement functions. Replacement functions can encapsulate code complexity, which is advantageous when row additions are occurring multiple times in the same code.

Multiple versions of such a function are presented in order of increasing complexity.

Version 1: This version is like the answers by @MatheusAraujo or @YtsendeBoer. It is compact and useful if all column data for the new row is present in a fixed order.

'new_row<-'<- function(x, value){x[nrow(x) + 1,] <- value; x}

 df <- data.frame(A = 1,  B = 2,  C = 3)
 new_row(df) <- c(4,  5,  6)
 new_row(df) <- list(7,  8,  9)

Version 2: Though slightly longer, this version improves traceability by keying new data to the column name. All named columns must be present, though not necessarily in order, when adding a new row.

'new_row<-'<- function(x, value){
    x[nrow(x) + 1,] <- sapply(names(x), function(y){value[y]}); x
 }

 df <- data.frame(A = 1,  B = 2,  C = 3)
 new_row(df) <- c(B = 1, C = 2,  A = 3)     
 new_row(df) <- list(C = 1,  A = 2,  B = 3)
 new_row(df) <- data.frame(A = 3,  B = 4,  C = 5)
 

Version 3: This bulkier version will work when columns are missing and when new named columns are included. This is advantageous when new rows need adding while column data is still incomplete or when new rows only partially fit the data frame.

'new_row<-'<- function(x, value){
  x[names(value)[!is.element(names(value), names(x))]] <- numeric()
  x[nrow(x) + 1,] <- sapply(names(x), function(y){
    if(is.element(y,names(value))){return(value[y])}else{return(NA)}
  }); x}  

df <- data.frame(A = 1,  B = 2,  C = 3)

new_row(df) <- NA
new_row(df) <- c(A = 5)
new_row(df) <- list(C = 1,  A = 2, B = 1)
new_row(df) <- data.frame(Z = 1000)

Comments

0

I will add to the other suggestions. I use the base r code to create a dataframe:

data_set_name <- data.frame(data_set)

Now I always suggest making a duplicate of the original data frame just in case you need to go back or test something out. I listed that below:

data_set_name_copy <- data_set_name

Now if you wanted to add a new column the code would look like the following:

data_set_name_copy$Name_of_New_Column <- Data_for_New_Column

The $ signifies that you are adding a new column and right after as outlined you insert the nomenclature/name for your new entry.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.