I have a column of values with unique identifiers that look like this:
df$1 <– c("identifier:ab134:4sfh", "identifier:gh164:9sgh", "identifier:3h1v4:kk9gh"
Some of them are in another column in a separate data frame with 71 columns but in that data frame, they are often clustered like this:
df2$1 <– c(""identifier:ab134:4sfh|identifier:gh164:9sgh", "identifier:sfghskg8:kk9gh|identifier:fj893n:9sgh|identifier:gh164:9sgh",...)
I need to find all rows which have any of the identifiers in them in the second dataframe. I would strsplit the column but I want to keep the rest of the second dataset as it is.
I have tried using this code both ways (i.e. df1 %in% df2 and df2 %in% df1) but obviously it's not giving me all the matches because it's trying to match whole strings rather than substrings:
new_subset <- subset(df$1, trimws(1) %in% trimws(df2$1))
Any suggestions? Thanks in advance for your help!
lapply(v1, function(x) unlist(lapply(strsplit(v2, "|", fixed = TRUE), function(y) match(x, y))))
Also trygrep(df2$1, df$1)