By Row in Base R, Tidyverse, and data.table

rowwise tidyverse data.table

Which rows have more 1s than 0s?.

Jeremy Allen https://jeremydata.com
03-12-2021

PROBLEM: Add a column that indicates if a row has more 1s than 0s, use 1 if true, 0 if false, but only consider specific columns and allow for NAs.

This question was asked on Twitter, and I want to elaborate on the solutions here.

We know we need to include

First, let’s make sample data that includes a row with NA.

# PROBLEM: add a column that indicates if a row has more 1s than 0s,
# use 1 if true, 0 if false, but only consider certain columns and
# allow for NAs.

# sample data
df <- data.frame(
  a = as.integer(c(1,0,1,0,0)),
  b = as.integer(c(1,0,1,0,0)),
  c = as.integer(c(1,0,1,1,0)),
  d = as.integer(c(1,1,0,9,0)),
  e = as.integer(c(1,1,0,1,0)),
  f = as.integer(c(1,1,NA,0,0)),
  g = as.integer(c(1,1,1,0,0))
)

Let’s try a base R solution first. Base R has a function rowMeans() that could be helpful.

#---- base R solution ----

# base R rowMeans() says yes to row 4, but row 4 does not have 
# more 1s than 0s, and it will fail on any non-numeric columns
df$more_1s <- ifelse(rowMeans(df, na.rm = T) > .5,1,0)

df
  a b c d e  f g more_1s
1 1 1 1 1 1  1 1       1
2 0 0 0 1 1  1 1       1
3 1 1 1 0 0 NA 1       1
4 0 0 1 9 1  0 0       1
5 0 0 0 0 0  0 0       0

However, if use the mean of each row, instead of literally just counting 1s and 0s, then we can be fooled by rows with larger numbers, like row 4 above. So, let’s make a function that only counts 1s and 0s, which is the problem we were asked to solve.

# function to count only ones and zeros and report TRUE or
# FALSE if more 1s, and convert the logical to integer
is_more_1s <- function(x) {
  as.integer(
    sum(x == 1, na.rm = T) > sum(x == 0, na.rm = T)
  )
}

# this gets row 4 correct
df$more_1s <- apply(df, 1, is_more_1s)

df
  a b c d e  f g more_1s
1 1 1 1 1 1  1 1       1
2 0 0 0 1 1  1 1       1
3 1 1 1 0 0 NA 1       1
4 0 0 1 9 1  0 0       0
5 0 0 0 0 0  0 0       0

This function only counts 1s and 0s, ignoring the 9 in row 4 and therefore giving us the correct answer in our new column. The function also returns a 1 if TRUE and a 0 if FALSE.

library(tidyverse)

Let’s try a tidyverse solution that uses rowwise() instead of apply()

#---- tidyverse solution ----

# sample data
df <- data.frame(
  a = as.integer(c(1,0,1,0,0)),
  b = as.integer(c(1,0,1,0,0)),
  c = as.integer(c(1,0,1,1,0)),
  d = as.integer(c(1,1,0,9,0)),
  e = as.integer(c(1,1,0,1,0)),
  f = as.integer(c(1,1,NA,0,0)),
  g = as.integer(c(1,1,1,0,0))
)

library(tidyverse)

tb <- as_tibble(df)
tb %>% 
  rowwise() %>% 
  mutate(more_1s = is_more_1s(c_across(a:g))) # only a through g
# A tibble: 5 x 8
# Rowwise: 
      a     b     c     d     e     f     g more_1s
  <int> <int> <int> <int> <int> <int> <int>   <int>
1     1     1     1     1     1     1     1       1
2     0     0     0     1     1     1     1       1
3     1     1     1     0     0    NA     1       1
4     0     0     1     9     1     0     0       0
5     0     0     0     0     0     0     0       0

Here, rowwise() makes sure we are counting across rows, and c_across() lets us constrain which columns we consider, which is part of the problem we were asked to solve.

library(data.table)

Using only data.table we don’t have rowwise() from dplyr, so we use base R’s apply() again. We use .SD and .SDcols to specify which columns we want to be constrained to.

#---- data.table solution ----

# sample data
df <- data.frame(
  a = as.integer(c(1,0,1,0,0)),
  b = as.integer(c(1,0,1,0,0)),
  c = as.integer(c(1,0,1,1,0)),
  d = as.integer(c(1,1,0,9,0)),
  e = as.integer(c(1,1,0,1,0)),
  f = as.integer(c(1,1,NA,0,0)),
  g = as.integer(c(1,1,1,0,0))
)

library(data.table)

dt <- as.data.table(df)
my_cols <- letters[1:7] # only a through g
dt[, more_1s := apply(.SD, 1, is_more_1s), .SDcols = my_cols]

dt
   a b c d e  f g more_1s
1: 1 1 1 1 1  1 1       1
2: 0 0 0 1 1  1 1       1
3: 1 1 1 0 0 NA 1       1
4: 0 0 1 9 1  0 0       0
5: 0 0 0 0 0  0 0       0

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Allen (2021, March 12). jeremydata: By Row in Base R, Tidyverse, and data.table. Retrieved from https://jeremydata.com/posts/2021-03-12-by-row-in-base-r-tidyverse-and-datatable/

BibTeX citation

@misc{allen2021by,
  author = {Allen, Jeremy},
  title = {jeremydata: By Row in Base R, Tidyverse, and data.table},
  url = {https://jeremydata.com/posts/2021-03-12-by-row-in-base-r-tidyverse-and-datatable/},
  year = {2021}
}