1

I have a dataset that looks like this:

df <- structure(list(ID = 1:10, Region1 = c("Europe", "NA", 
"Asia", "NA", "Europe", "NA", "Africa", "NA", "Europe", "North America"), Region2 = c("NA", "Europe", 
"NA", "NA", "NA", "Europe", 
"NA", "NA", "NA", "NA"
)), 
class = "data.frame", row.names = c(NA, -10L))

I want to create a new column called EuropeYN which is either yes or no depending on whether EITHER of the region columns (region1 or region2) include "Europe". The final data should look like this:

df <- structure(list(ID = 1:10, Region1 = c("Europe", "NA", 
"Asia", "NA", "Europe", "NA", "Africa", "NA", "Europe", "North America"), Region2 = c("NA", "Europe", 
"NA", "NA", "NA", "Europe", 
"NA", "NA", "NA", "NA"
), EuropeYN = c("yes", "yes", "no", "no", "yes", "yes", "no", "no", "yes", "no")), 
class = "data.frame", row.names = c(NA, -10L))

I know how to do this if it was just checking to see if "Europe" appears in one column, but have no idea how to do this when checking across multiple columns. This is what I would do if it was just one column:

df$EuropeYN <- ifelse(grepl("Europe",df$region1), "yes", "no")

Any ideas on the best way to approach this?...

1
  • ifelse(df$Region1 == "Europe" | df$Region2 == "Europe", "yes", "no")
    – r2evans
    Commented Jul 29, 2021 at 16:03

5 Answers 5

2

My approach would be very similar to yours:

dplyr::mutate(df, EuropeYN = ifelse((Region1 == "Europe" | Region2 == "Europe"), "yes", "no"))
2
  • Thanks! This is similar to what someone else suggested just before you posted :)
    – Japes
    Commented Jul 29, 2021 at 16:20
  • 1
    People are quick at replying :(
    – Ubiminor
    Commented Jul 29, 2021 at 16:48
2

A little late but maybe still worth a look:

library(dplyr)
library(stringr)
df %>%
  rowwise() %>%
  mutate(YN = +any(str_detect(c_across(Region1:Region2), 'Europe')))
# A tibble: 10 x 4
# Rowwise: 
      ID Region1       Region2    YN
   <int> <chr>         <chr>   <int>
 1     1 Europe        NA          1
 2     2 NA            Europe      1
 3     3 Asia          NA          0
 4     4 NA            NA          0
 5     5 Europe        NA          1
 6     6 NA            Europe      1
 7     7 Africa        NA          0
 8     8 NA            NA          0
 9     9 Europe        NA          1
10    10 North America NA          0

or, without +:

df %>%
   rowwise() %>%
   mutate(YN = any(str_detect(c_across(Region1:Region2), 'Europe')))
# A tibble: 10 x 4
# Rowwise: 
      ID Region1       Region2 YN   
   <int> <chr>         <chr>   <lgl>
 1     1 Europe        NA      TRUE 
 2     2 NA            Europe  TRUE 
 3     3 Asia          NA      FALSE
 4     4 NA            NA      FALSE
 5     5 Europe        NA      TRUE 
 6     6 NA            Europe  TRUE 
 7     7 Africa        NA      FALSE
 8     8 NA            NA      FALSE
 9     9 Europe        NA      TRUE 
10    10 North America NA      FALSE

If you have several columns across which you want to mutate you can use starts_with(or also contains or ends_with) to address these columns:

df %>%
  rowwise() %>%
  mutate(YN = any(str_detect(c_across(starts_with('R')), 'Europe'))) 
1

Two ways:

  1. Literally check each of two columns:

    ifelse(df$Region1 == "Europe" | df$Region2 == "Europe", "yes", "no")
    #  [1] "yes" "yes" "no"  "no"  "yes" "yes" "no"  "no"  "yes" "no" 
    

    This has the advantage of being easier to read (subjective) and very clear.

  2. Select a range of columns and look for equality:

    subset(df, select = Region1:Region2) == "Europe"
    #    Region1 Region2
    # 1     TRUE   FALSE
    # 2    FALSE    TRUE
    # 3    FALSE   FALSE
    # 4    FALSE   FALSE
    # 5     TRUE   FALSE
    # 6    FALSE    TRUE
    # 7    FALSE   FALSE
    # 8    FALSE   FALSE
    # 9     TRUE   FALSE
    # 10   FALSE   FALSE
    
    apply(subset(df, select = Region1:Region2) == "Europe", 1, any)
    #     1     2     3     4     5     6     7     8     9    10 
    #  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE 
    

    This allows us to use 1 or more columns.

Either of those can be assigned back into the frame with df$EuropeYN <- ....

1

Here is a vectorized base R way.

i <- rowSums(df[grep("Region", names(df))] == "Europe") > 0
df$EuropeYN <- c("no", "yes")[i + 1L]
1

We may use if_any here as a vectorized option in tidyverse

library(dplyr)
library(stringr)
df %>%
     mutate(YN = if_any(starts_with("Region"), str_detect, 'Europe'))
   ID       Region1 Region2    YN
1   1        Europe      NA  TRUE
2   2            NA  Europe  TRUE
3   3          Asia      NA FALSE
4   4            NA      NA FALSE
5   5        Europe      NA  TRUE
6   6            NA  Europe  TRUE
7   7        Africa      NA FALSE
8   8            NA      NA FALSE
9   9        Europe      NA  TRUE
10 10 North America      NA FALSE

Or in base R

df$YN <-  Reduce(`|`, lapply(df[startsWith(names(df), 'Region')], 
        `%in%`, 'Europe'))

NOTE: It is easier to subset with a logical flag instead of "Yes"/"No"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.