Which rows have more 1s than 0s?.

This question was asked on Twitter, and I want to elaborate on the solutions here.

We know we need to include

a column selection constraint

properly handle NAs

answers in a new column

1 for TRUE and 0 for FALSE

First, let’s make sample data that includes a row with NA.

```
# PROBLEM: add a column that indicates if a row has more 1s than 0s,
# use 1 if true, 0 if false, but only consider certain columns and
# allow for NAs.
# sample data
df <- data.frame(
a = as.integer(c(1,0,1,0,0)),
b = as.integer(c(1,0,1,0,0)),
c = as.integer(c(1,0,1,1,0)),
d = as.integer(c(1,1,0,9,0)),
e = as.integer(c(1,1,0,1,0)),
f = as.integer(c(1,1,NA,0,0)),
g = as.integer(c(1,1,1,0,0))
)
```

Let’s try a base R solution first. Base R has a function `rowMeans()`

that could be helpful.

```
#---- base R solution ----
# base R rowMeans() says yes to row 4, but row 4 does not have
# more 1s than 0s, and it will fail on any non-numeric columns
df$more_1s <- ifelse(rowMeans(df, na.rm = T) > .5,1,0)
df
```

```
a b c d e f g more_1s
1 1 1 1 1 1 1 1 1
2 0 0 0 1 1 1 1 1
3 1 1 1 0 0 NA 1 1
4 0 0 1 9 1 0 0 1
5 0 0 0 0 0 0 0 0
```

However, if use the mean of each row, instead of literally just counting 1s and 0s, then we can be fooled by rows with larger numbers, like row 4 above. So, let’s make a function that only counts 1s and 0s, which is the problem we were asked to solve.

```
# function to count only ones and zeros and report TRUE or
# FALSE if more 1s, and convert the logical to integer
is_more_1s <- function(x) {
as.integer(
sum(x == 1, na.rm = T) > sum(x == 0, na.rm = T)
)
}
# this gets row 4 correct
df$more_1s <- apply(df, 1, is_more_1s)
df
```

```
a b c d e f g more_1s
1 1 1 1 1 1 1 1 1
2 0 0 0 1 1 1 1 1
3 1 1 1 0 0 NA 1 1
4 0 0 1 9 1 0 0 0
5 0 0 0 0 0 0 0 0
```

This function only counts 1s and 0s, ignoring the 9 in row 4 and therefore giving us the correct answer in our new column. The function also returns a 1 if `TRUE`

and a 0 if `FALSE`

.

Let’s try a tidyverse solution that uses `rowwise()`

instead of `apply()`

```
#---- tidyverse solution ----
# sample data
df <- data.frame(
a = as.integer(c(1,0,1,0,0)),
b = as.integer(c(1,0,1,0,0)),
c = as.integer(c(1,0,1,1,0)),
d = as.integer(c(1,1,0,9,0)),
e = as.integer(c(1,1,0,1,0)),
f = as.integer(c(1,1,NA,0,0)),
g = as.integer(c(1,1,1,0,0))
)
library(tidyverse)
tb <- as_tibble(df)
tb %>%
rowwise() %>%
mutate(more_1s = is_more_1s(c_across(a:g))) # only a through g
```

```
# A tibble: 5 x 8
# Rowwise:
a b c d e f g more_1s
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 1 1 1 1
2 0 0 0 1 1 1 1 1
3 1 1 1 0 0 NA 1 1
4 0 0 1 9 1 0 0 0
5 0 0 0 0 0 0 0 0
```

Here, `rowwise()`

makes sure we are counting across rows, and `c_across()`

lets us constrain which columns we consider, which is part of the problem we were asked to solve.

Using only data.table we don’t have `rowwise()`

from dplyr, so we use base R’s `apply()`

again. We use `.SD`

and `.SDcols`

to specify which columns we want to be constrained to.

```
#---- data.table solution ----
# sample data
df <- data.frame(
a = as.integer(c(1,0,1,0,0)),
b = as.integer(c(1,0,1,0,0)),
c = as.integer(c(1,0,1,1,0)),
d = as.integer(c(1,1,0,9,0)),
e = as.integer(c(1,1,0,1,0)),
f = as.integer(c(1,1,NA,0,0)),
g = as.integer(c(1,1,1,0,0))
)
library(data.table)
dt <- as.data.table(df)
my_cols <- letters[1:7] # only a through g
dt[, more_1s := apply(.SD, 1, is_more_1s), .SDcols = my_cols]
dt
```

```
a b c d e f g more_1s
1: 1 1 1 1 1 1 1 1
2: 0 0 0 1 1 1 1 1
3: 1 1 1 0 0 NA 1 1
4: 0 0 1 9 1 0 0 0
5: 0 0 0 0 0 0 0 0
```

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

Allen (2021, March 12). jeremydata: By Row in Base R, Tidyverse, and data.table. Retrieved from https://jeremydata.com/posts/2021-03-12-by-row-in-base-r-tidyverse-and-datatable/

BibTeX citation

@misc{allen2021by, author = {Allen, Jeremy}, title = {jeremydata: By Row in Base R, Tidyverse, and data.table}, url = {https://jeremydata.com/posts/2021-03-12-by-row-in-base-r-tidyverse-and-datatable/}, year = {2021} }