Create yes/no column based on values in two other columns

Question

I have a dataset that looks like this:

df <- structure(list(ID = 1:10, Region1 = c("Europe", "NA", 
"Asia", "NA", "Europe", "NA", "Africa", "NA", "Europe", "North America"), Region2 = c("NA", "Europe", 
"NA", "NA", "NA", "Europe", 
"NA", "NA", "NA", "NA"
)), 
class = "data.frame", row.names = c(NA, -10L))

I want to create a new column called EuropeYN which is either yes or no depending on whether EITHER of the region columns (region1 or region2) include "Europe". The final data should look like this:

df <- structure(list(ID = 1:10, Region1 = c("Europe", "NA", 
"Asia", "NA", "Europe", "NA", "Africa", "NA", "Europe", "North America"), Region2 = c("NA", "Europe", 
"NA", "NA", "NA", "Europe", 
"NA", "NA", "NA", "NA"
), EuropeYN = c("yes", "yes", "no", "no", "yes", "yes", "no", "no", "yes", "no")), 
class = "data.frame", row.names = c(NA, -10L))

I know how to do this if it was just checking to see if "Europe" appears in one column, but have no idea how to do this when checking across multiple columns. This is what I would do if it was just one column:

df$EuropeYN <- ifelse(grepl("Europe",df$region1), "yes", "no")

Any ideas on the best way to approach this?...

ifelse(df$Region1 == "Europe" | df$Region2 == "Europe", "yes", "no") — r2evans, Commented Jul 29, 2021 at 16:03

Ubiminor · Accepted Answer · 2021-07-29 16:05:47Z

2

My approach would be very similar to yours:

dplyr::mutate(df, EuropeYN = ifelse((Region1 == "Europe" | Region2 == "Europe"), "yes", "no"))

answered Jul 29, 2021 at 16:05

Ubiminor

1241 bronze badge

Thanks! This is similar to what someone else suggested just before you posted :)
– Japes
Commented Jul 29, 2021 at 16:20
1

People are quick at replying :(
– Ubiminor
Commented Jul 29, 2021 at 16:48

Add a comment |

Chris Ruehlemann · Accepted Answer · 2021-07-29 17:41:56Z

A little late but maybe still worth a look:

library(dplyr)
library(stringr)
df %>%
  rowwise() %>%
  mutate(YN = +any(str_detect(c_across(Region1:Region2), 'Europe')))
# A tibble: 10 x 4
# Rowwise: 
      ID Region1       Region2    YN
   <int> <chr>         <chr>   <int>
 1     1 Europe        NA          1
 2     2 NA            Europe      1
 3     3 Asia          NA          0
 4     4 NA            NA          0
 5     5 Europe        NA          1
 6     6 NA            Europe      1
 7     7 Africa        NA          0
 8     8 NA            NA          0
 9     9 Europe        NA          1
10    10 North America NA          0

or, without +:

df %>%
   rowwise() %>%
   mutate(YN = any(str_detect(c_across(Region1:Region2), 'Europe')))
# A tibble: 10 x 4
# Rowwise: 
      ID Region1       Region2 YN   
   <int> <chr>         <chr>   <lgl>
 1     1 Europe        NA      TRUE 
 2     2 NA            Europe  TRUE 
 3     3 Asia          NA      FALSE
 4     4 NA            NA      FALSE
 5     5 Europe        NA      TRUE 
 6     6 NA            Europe  TRUE 
 7     7 Africa        NA      FALSE
 8     8 NA            NA      FALSE
 9     9 Europe        NA      TRUE 
10    10 North America NA      FALSE

If you have several columns across which you want to mutate you can use starts_with(or also contains or ends_with) to address these columns:

df %>%
  rowwise() %>%
  mutate(YN = any(str_detect(c_across(starts_with('R')), 'Europe')))

r2evans · Accepted Answer · 2021-07-29 16:06:00Z

Two ways:

Literally check each of two columns:

ifelse(df$Region1 == "Europe" | df$Region2 == "Europe", "yes", "no")
#  [1] "yes" "yes" "no"  "no"  "yes" "yes" "no"  "no"  "yes" "no"

This has the advantage of being easier to read (subjective) and very clear.

Select a range of columns and look for equality:

subset(df, select = Region1:Region2) == "Europe"
#    Region1 Region2
# 1     TRUE   FALSE
# 2    FALSE    TRUE
# 3    FALSE   FALSE
# 4    FALSE   FALSE
# 5     TRUE   FALSE
# 6    FALSE    TRUE
# 7    FALSE   FALSE
# 8    FALSE   FALSE
# 9     TRUE   FALSE
# 10   FALSE   FALSE

apply(subset(df, select = Region1:Region2) == "Europe", 1, any)
#     1     2     3     4     5     6     7     8     9    10 
#  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE

This allows us to use 1 or more columns.

Either of those can be assigned back into the frame with df$EuropeYN <- ....

Rui Barradas · Accepted Answer · 2021-07-29 16:10:26Z

1

Here is a vectorized base R way.

i <- rowSums(df[grep("Region", names(df))] == "Europe") > 0
df$EuropeYN <- c("no", "yes")[i + 1L]

answered Jul 29, 2021 at 16:10

Rui Barradas

76.9k8 gold badges39 silver badges74 bronze badges

Add a comment |

akrun · Accepted Answer · 2021-07-29 19:01:28Z

We may use if_any here as a vectorized option in tidyverse

library(dplyr)
library(stringr)
df %>%
     mutate(YN = if_any(starts_with("Region"), str_detect, 'Europe'))
   ID       Region1 Region2    YN
1   1        Europe      NA  TRUE
2   2            NA  Europe  TRUE
3   3          Asia      NA FALSE
4   4            NA      NA FALSE
5   5        Europe      NA  TRUE
6   6            NA  Europe  TRUE
7   7        Africa      NA FALSE
8   8            NA      NA FALSE
9   9        Europe      NA  TRUE
10 10 North America      NA FALSE

Or in base R

df$YN <-  Reduce(`|`, lapply(df[startsWith(names(df), 'Region')], 
        `%in%`, 'Europe'))

NOTE: It is easier to subset with a logical flag instead of "Yes"/"No"

Collectives™ on Stack Overflow

Create yes/no column based on values in two other columns

5 Answers 5

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Related