1

I have a dataframe that shows ICD-10 codes for people who have died (decedents). Each row in the data frame corresponds to a decedent, each of whom can have up to twenty conditions listed as contributing factors to his or her death. I want to create a new column that shows if a decedent had any ICD-10 code for diabetes (1 for yes, 0 for no). The codes for diabetes fall within E10-E14 i.e., codes for diabetes must start with any of the strings in the following vector, but the fourth position can take on different values:

diabetes <- c("E10","E11","E12","E13","E14")

This is a small, made-up example of what the data looks like:

original <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255", 
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))
acond1 acond2 acond3 acond4
E112 I255 I258 I500
I250 B341 B348 E669
A419 F179 I10 I694
E149 F101 I10 R092

This is my desired result:

acond1 acond2 acond3 acond4 diabetes
E112 I255 I258 I500 1
I250 B341 B348 E669 0
A419 F179 I10 I694 0
E149 F101 I10 R092 1

There have been a couple other posts (e.g., Using if else on a dataframe across multiple columns, Str_detect multiple columns using across) on this type of question, but I can't seem to put it all together. Here is what I have unsuccessfully tried so far:

library(tidyverse)
library(stringr)

#attempt 1
original %>%
  mutate_at(vars(contains("acond")), ifelse(str_detect(.,paste0("^(", 
  paste(diabetes, collapse = "|"), ")")), 1, 0))

#attempt 2
original %>%
  unite(col = "all_conditions", starts_with("acond"), sep = ", ", remove = FALSE) %>%
  mutate(diabetes = if_else(str_detect(.,paste0("^(", paste(diabetes, collapse = "|"), ")")), 1, 0))

Any help would be appreciated.

4 Answers 4

2
library(tidyverse)

diabetes_pattern <- c("E10","E11","E12","E13","E14") %>% 
  str_c(collapse = "|")

original <-
  structure(
    list(
      acond1 = c("E112", "I250", "A419", "E149"),
      acond2 = c("I255", "B341", "F179", "F101"),
      acond3 = c("I258", "B348", "I10", "I10"),
      acond4 = c("I500", "E669", "I694", "R092")
    ),
    row.names = c(NA,-4L),
    class = c("tbl_df", "tbl", "data.frame")
  )

original %>% 
  rowwise() %>% 
  mutate(diabetes = +any(str_detect(string = c_across(everything()), pattern = diabetes_pattern)))
#> # A tibble: 4 x 5
#> # Rowwise: 
#>   acond1 acond2 acond3 acond4 diabetes
#>   <chr>  <chr>  <chr>  <chr>     <int>
#> 1 E112   I255   I258   I500          1
#> 2 I250   B341   B348   E669          0
#> 3 A419   F179   I10    I694          0
#> 4 E149   F101   I10    R092          1

original %>% 
  mutate(diabetes = rowSums(across(.cols = everything(), ~str_detect(.x, diabetes_pattern))))
#> # A tibble: 4 x 5
#>   acond1 acond2 acond3 acond4 diabetes
#>   <chr>  <chr>  <chr>  <chr>     <dbl>
#> 1 E112   I255   I258   I500          1
#> 2 I250   B341   B348   E669          0
#> 3 A419   F179   I10    I694          0
#> 4 E149   F101   I10    R092          1

Created on 2022-01-23 by the reprex package (v2.0.1)

2

I would like to add an update to this question because I found the approved answer via dplyr takes a very long time to execute.

instead you could vectorize the original codes and columns you are looking for.

library(tidyverse)
original <-
  structure(
    list(
      acond1 = c("E112", "I250", "A419", "E149"),
      acond2 = c("I255", "B341", "F179", "F101"),
      acond3 = c("I258", "B348", "I10", "I10"),
      acond4 = c("I500", "E669", "I694", "R092")
    ),
    row.names = c(NA,-4L),
    class = c("tbl_df", "tbl", "data.frame")
  )

# vector for your columns & pattern you are looking for,
# this allows you to add or subtract 
# to a vector for the next portion of code.
dia <- c("acond1", "acond2", "acond3", "acond4")
diabetes_pattern <- c("E10","E11","E12","E13","E14")

identified_diabetes <- original |> 
  mutate(diabetes = +(if_any(any_of(dia), \(x) substr(x, 1,3) %in% c(diabetes_pattern))))


This should return the desired output all the same, but the benchmarking of this is drastically faster.

original %>% 
rowwise() %>% 
mutate(diabetes = any(grepl(dia, c_across(starts_with("ac")))) * 1) %>% ungroup          

replications elapsed
100    0.45

versus

original |> 
  mutate(diabetes = +(if_any(any_of(dia), \(x) substr(x, 1,3) %in% c(diabetes_pattern))))

replications elapsed
100    0.14

While this smaller set might be fast, it might be worth noting that as the dataset gets larger (like I attempted to do on a df of >250k rows and ~100 columns) the latter is much faster way to check this.

1

Here's a base R approach using apply

dia <- paste(c("E10","E11","E12","E13","E14"), collapse="|")

df$diabetes <- apply(df, 1, function(x) any(grepl(dia,x)))*1

df
  acond1 acond2 acond3 acond4 diabetes
1   E112   I255   I258   I500        1
2   I250   B341   B348   E669        0
3   A419   F179    I10   I694        0
4   E149   F101    I10   R092        1

With dplyr

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(diabetes=any(grepl(dia,c_across(starts_with("ac"))))*1) %>% 
  ungroup
# A tibble: 4 × 5
  acond1 acond2 acond3 acond4 diabetes
  <chr>  <chr>  <chr>  <chr>     <dbl>
1 E112   I255   I258   I500          1
2 I250   B341   B348   E669          0
3 A419   F179   I10    I694          0
4 E149   F101   I10    R092          1

Data

df <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255", 
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), class = "data.frame", row.names = c(NA, 
-4L))
1

If we want to use across wit ifelse and str_detect then we could:

  1. create a pattern with paste and collapse for str_detect
  2. mutate across all columns and use anonymous ~ifelse with the condition and .names to control for the new columns
  3. unite the new columns
  4. trick with parse_number from readr package
diabetes <- c("E10","E11","E12","E13","E14")

pattern <- paste(diabetes, collapse = "|")

library(tidyverse)

original %>% 
  mutate(across(everything(), ~ifelse(str_detect(., pattern), 1, 0), .names = "new_{col}")) %>% 
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(diabetes = parse_number(New_Col), .keep="unused")                                                                                                                                                                                                                                                                                              
  acond1 acond2 acond3 acond4 diabetes
  <chr>  <chr>  <chr>  <chr>     <dbl>
1 E112   I255   I258   I500          1
2 I250   B341   B348   E669          0
3 A419   F179   I10    I694          0
4 E149   F101   I10    R092          1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.