In R, how to change columns in multiple data frames inside a list using map() not lmap().
Or, you may say it like this. Given a list of data frames, apply a function to the same column in each data frame. Or like this. Iterate over list elements, which are data frames, and mutate the same column in each.
Let’s start by making a list of data frames.
library(tidyverse)
# for example data, make a list of data frames
df_list <- list(
df1 = head(diamonds), # using head() so we only get first 10 rows
df2 = head(diamonds)
)
# inspect the structure of the list
str(df_list)
List of 2
$ df1: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
..$ carat : num [1:6] 0.23 0.21 0.23 0.29 0.31 0.24
..$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
..$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
..$ depth : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
..$ table : num [1:6] 55 61 65 58 58 57
..$ price : int [1:6] 326 326 327 334 335 336
..$ x : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
..$ y : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
..$ z : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48
$ df2: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
..$ carat : num [1:6] 0.23 0.21 0.23 0.29 0.31 0.24
..$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
..$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
..$ depth : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
..$ table : num [1:6] 55 61 65 58 58 57
..$ price : int [1:6] 326 326 327 334 335 336
..$ x : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
..$ y : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
..$ z : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48
My strategy is to make a function that does the column mutation, and then use a tidyverse mapping function to apply my function to each list element. My function first.
# first make a function that will take a df and a column name then multiply that column by 1000
multiply_column <- function(df, my_col) {
df %>%
mutate("{{my_col}}" := {{my_col}} * 1000)
}
Let’s talk about this function. We know it must take a data frame and a column name as its arguments. When we use it later, by applying it to list elements, which are data frames, at each position in that list, my function must take the data frame at that position, mutate one of its columns, then move on to the data frame in the list.
Imagine we are at the first data frame in the list. My function will
take that data frame, df
, and a column name supplied by the user, my_col
access the column by its name, which is stored in the my_col
argument
change, or mutate, that column by multiplying it by 1000
return the changed data frame
We are using tidyverse functions and syntax here. Tidyverse uses unquoted column names, and the column names do not have to be preceded by $
as in base R, like diamonds$carart
. We can refer to carat
by itself with no quotes when giving that column name as one of the arguments to my function, multiply_column(df = diamonds, my_col = carat)
. Then inside the function, we have to refer to the my_col
argument with some special syntax to access the column name we put in it. We wrap my_col
in double curly braces. When we do that inside mutate()
, on the left-hand side of the =
we also wrap it in quotes, and we use a particular version of the =
sign, :=
. This special syntax, a result of tidy evaluation, is explained more in the article Programming with dplyr.
Now that we have my column-mutating function set up, we need another function that will use it on each element of a list. The map()
function does just that, and it will return a list. We give it a list of data frames and we get back a list of data frames.
The main arguments to map()
are .x
and .f
. Any named arguments listed after those two will be arguments intended for the function you provide in .f
. So here, my_col = carat
, is not an argument for map()
in the usual way, it is an argument that map()
will give to my function multiply_column()
.
# apply my function to each data frame in the list of data frames
new_list <- map(
.x = df_list,
.f = multiply_column,
my_col = carat
)
# inspect the structure of the new list
str(new_list)
List of 2
$ df1: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
..$ carat : num [1:6] 230 210 230 290 310 240
..$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
..$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
..$ depth : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
..$ table : num [1:6] 55 61 65 58 58 57
..$ price : int [1:6] 326 326 327 334 335 336
..$ x : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
..$ y : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
..$ z : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48
$ df2: tibble[,10] [6 × 10] (S3: tbl_df/tbl/data.frame)
..$ carat : num [1:6] 230 210 230 290 310 240
..$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3
..$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7
..$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6
..$ depth : num [1:6] 61.5 59.8 56.9 62.4 63.3 62.8
..$ table : num [1:6] 55 61 65 58 58 57
..$ price : int [1:6] 326 326 327 334 335 336
..$ x : num [1:6] 3.95 3.89 4.05 4.2 4.34 3.94
..$ y : num [1:6] 3.98 3.84 4.07 4.23 4.35 3.96
..$ z : num [1:6] 2.43 2.31 2.31 2.63 2.75 2.48
We can see that my column-mutation function worked on each data frame in the list. The values in both carat columns have been multiplied by 1000.
first, we made a function that would work on a single data frame
then, we applied that function to each data frame in a list
Consider a passenger train with three cars carrying passengers plus an engine car at the front, so 4 cars total. We want to access a fancy dining table in car 2 so that we can repaint it.
Car and table: we can isolate car 2, ending up with the car with our table inside it, but we cannot repaint the table until we get inside
Table itself: we can teleport ourselves inside car 2 directly to the table so we can repaint it
# car and table, i.e., we end up with position 2 in the list but not
# directly with the asset that is inside position 2
# NOTICE the $df2 that prints before the data frame here
df_list[2]
$df2
# A tibble: 6 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
# table itself, i.e., we've gone to position 2 in the list plus got
# our hands directly on the data frame that is in position 2
# NOTICE here we get just the data frame itself without the list position
df_list[[2]]
# A tibble: 6 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
The lmap()
function accesses list positions like option one. We get the train car we want but not immediate direct access to the table inside.
The map()
function accesses assets inside list positions like option two. We teleport inside car 2 and have direct access to the table so we can repaint it.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Allen (2021, May 19). jeremydata: Change multiple data frames inside a list. Retrieved from https://jeremydata.com/posts/2021-05-19-change-multiple-data-frames-inside-lists-oh-my/
BibTeX citation
@misc{allen2021change, author = {Allen, Jeremy}, title = {jeremydata: Change multiple data frames inside a list}, url = {https://jeremydata.com/posts/2021-05-19-change-multiple-data-frames-inside-lists-oh-my/}, year = {2021} }