Change multiple data frames inside a list

tidyverse programming dplyr tidy evaluation

In R, how to change columns in multiple data frames inside a list using map() not lmap().

Jeremy Allen https://jeremydata.com
05-19-2021

The Problem: change a column in each data frame in a list of data frames.

Or, you may say it like this. Given a list of data frames, apply a function to the same column in each data frame. Or like this. Iterate over list elements, which are data frames, and mutate the same column in each.

Let’s start by making a list of data frames.

library(tidyverse)

# for example data, make a list of data frames
df_list <- list(
 df1 = head(diamonds), # using head() so we only get first 10 rows
 df2 = head(diamonds)
)

# inspect the structure of the list
str(df_list)
List of 2
 $ df1: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
  ..$ carat  : num [1:6] 0.23 0.21 0.23 0.29 0.31 0.24
  ..$ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
  ..$ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
  ..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
  ..$ depth  : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
  ..$ table  : num [1:6] 55 61 65 58 58 57
  ..$ price  : int [1:6] 326 326 327 334 335 336
  ..$ x      : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
  ..$ y      : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
  ..$ z      : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48
 $ df2: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
  ..$ carat  : num [1:6] 0.23 0.21 0.23 0.29 0.31 0.24
  ..$ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
  ..$ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
  ..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
  ..$ depth  : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
  ..$ table  : num [1:6] 55 61 65 58 58 57
  ..$ price  : int [1:6] 326 326 327 334 335 336
  ..$ x      : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
  ..$ y      : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
  ..$ z      : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48

The Solution

My strategy is to make a function that does the column mutation, and then use a tidyverse mapping function to apply my function to each list element. My function first.

# first make a function that will take a df and a column name then multiply that column by 1000
multiply_column <- function(df, my_col) {
 df %>% 
  mutate("{{my_col}}" := {{my_col}} * 1000)
}

Let’s talk about this function. We know it must take a data frame and a column name as its arguments. When we use it later, by applying it to list elements, which are data frames, at each position in that list, my function must take the data frame at that position, mutate one of its columns, then move on to the data frame in the list.

Imagine we are at the first data frame in the list. My function will

We are using tidyverse functions and syntax here. Tidyverse uses unquoted column names, and the column names do not have to be preceded by $ as in base R, like diamonds$carart. We can refer to carat by itself with no quotes when giving that column name as one of the arguments to my function, multiply_column(df = diamonds, my_col = carat). Then inside the function, we have to refer to the my_col argument with some special syntax to access the column name we put in it. We wrap my_col in double curly braces. When we do that inside mutate(), on the left-hand side of the = we also wrap it in quotes, and we use a particular version of the = sign, :=. This special syntax, a result of tidy evaluation, is explained more in the article Programming with dplyr.

Now that we have my column-mutating function set up, we need another function that will use it on each element of a list. The map() function does just that, and it will return a list. We give it a list of data frames and we get back a list of data frames.

The main arguments to map() are .x and .f. Any named arguments listed after those two will be arguments intended for the function you provide in .f. So here, my_col = carat, is not an argument for map() in the usual way, it is an argument that map() will give to my function multiply_column().

# apply my function to each data frame in the list of data frames
new_list <- map(
 .x = df_list,
 .f = multiply_column,
 my_col = carat
)

# inspect the structure of the new list
str(new_list)
List of 2
 $ df1: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
  ..$ carat  : num [1:6] 230 210 230 290 310 240
  ..$ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
  ..$ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
  ..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
  ..$ depth  : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
  ..$ table  : num [1:6] 55 61 65 58 58 57
  ..$ price  : int [1:6] 326 326 327 334 335 336
  ..$ x      : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
  ..$ y      : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
  ..$ z      : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48
 $ df2: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
  ..$ carat  : num [1:6] 230 210 230 290 310 240
  ..$ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
  ..$ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
  ..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
  ..$ depth  : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
  ..$ table  : num [1:6] 55 61 65 58 58 57
  ..$ price  : int [1:6] 326 326 327 334 335 336
  ..$ x      : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
  ..$ y      : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
  ..$ z      : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48

We did it!

We can see that my column-mutation function worked on each data frame in the list. The values in both carat columns have been multiplied by 1000.

Why didn’t we use lmap()?

Consider a passenger train with three cars carrying passengers plus an engine car at the front, so 4 cars total. We want to access a fancy dining table in car 2 so that we can repaint it.

# car and table, i.e., we end up with position 2 in the list but not
# directly with the asset that is inside position 2
# NOTICE the $df2 that prints before the data frame here
df_list[2]
$df2
# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
# table itself, i.e., we've gone to position 2 in the list plus got
# our hands directly on the data frame that is in position 2
# NOTICE here we get just the data frame itself without the list position
df_list[[2]]
# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

The lmap() function accesses list positions like option one. We get the train car we want but not immediate direct access to the table inside.

The map() function accesses assets inside list positions like option two. We teleport inside car 2 and have direct access to the table so we can repaint it.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Allen (2021, May 19). jeremydata: Change multiple data frames inside a list. Retrieved from https://jeremydata.com/posts/2021-05-19-change-multiple-data-frames-inside-lists-oh-my/

BibTeX citation

@misc{allen2021change,
  author = {Allen, Jeremy},
  title = {jeremydata: Change multiple data frames inside a list},
  url = {https://jeremydata.com/posts/2021-05-19-change-multiple-data-frames-inside-lists-oh-my/},
  year = {2021}
}