0

i have a data set with different diagnosis codes for patients that i want to filter through and view the patients that contain any of the codes i am interested in. i am currently using the str_detect function to filter all of the columns but want to know how to specify only the diagnosis code columns. as well how to create a statement that can include multiple strings to detect; ex: O99019 and 0909

filter(if_any(everything(), ~ str_detect(., "O99019")))

  patient_id dx1     dx2   dx3   dx4    dx5   dx6    dx7   dx8    dx9    dx10   dx11  dx12   dx13  dx14  dx15 
       <dbl> <chr>   <chr> <chr> <chr>  <chr> <chr>  <chr> <chr>  <chr>  <chr>  <chr> <chr>  <chr> <chr> <chr>
1   12812360 28262   2768  311   4019   53081 4011   28261 NA     NA     NA     NA    NA     NA    NA    NA   
2   12812360 28262   2859  4019  28260  4011  NA     NA    NA     NA     NA     NA    NA     NA    NA    NA   
3   12812360 28262   2859  4019  28261  2827  NA     NA    NA     NA     NA     NA    NA     NA    NA    NA   
4   12812360 28262   2859  4019  7295   NA    NA     NA    NA     NA     NA     NA    NA     NA    NA    NA   
5   12812360 T8029XA D5700 F329  I10    Z6833 Z79899 D571  L03113 Z79891 D57219 K219  L03119 R509  NA    NA   
6   12812360 D5700   D509  I10   Z79899 D219  D57419 M7989 M79662 NA     NA     NA    NA     NA    NA    NA 
0

2 Answers 2

0

As per previous questions here - (e.g.: Concatenate first character substring from many columns into a new column without explicitly listing every column name) - i'd recommend structuring this data as a long file, instead of wide with lots of NAs. It does away with the need for specifying a range of diagnosis code columns each time (which can be error-prone), and allows simpler filtering code at patient and event level:

diagnosis <- dat %>%
    mutate(event = row_number()) %>%
    pivot_longer(-c(patient_id, event),
                 names_pattern=".+(\\d+)",
                 names_to="seq",
                 values_to="code",
                 values_drop_na=TRUE)

## all patient_ids associated with certain substrings at any time:
diagnosis %>%
    filter(substr(code,1,3) %in% c("Z68","285")) %>%
    distinct(patient_id)

## all diagnoses + patient_ids matching certain regex at any time
diagnosis %>%
    filter(grepl("^D", code))

## all events + patient_ids for patients matching 
## multiple criteria together in the same event
diagnosis %>%
    group_by(patient_id, event) %>% 
    filter(any(grepl("^D", code)) & any(substr(code,1,3) %in% c("Z68","285"))) %>%
    distinct(patient_id, event)
0

Using grepl, choosing search strings that are present in the data (| or). The search string can be embedded anywhere within a larger string (.* zero or more).

library(dplyr)

df %>% 
  filter(if_any(-patient_id, ~ grepl(".*295.*|.*570.*", .x)))
  patient_id     dx1   dx2  dx3    dx4   dx5    dx6   dx7    dx8    dx9   dx10
1   12812360   28262  2859 4019   7295  <NA>   <NA>  <NA>   <NA>   <NA>   <NA>
2   12812360 T8029XA D5700 F329    I10 Z6833 Z79899  D571 L03113 Z79891 D57219
3   12812360   D5700  D509  I10 Z79899  D219 D57419 M7989 M79662   <NA>   <NA>
  dx11   dx12 dx13 dx14 dx15
1 <NA>   <NA> <NA>   NA   NA
2 K219 L03119 R509   NA   NA
3 <NA>   <NA> <NA>   NA   NA

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.