1

I can't seem to find the answer to what I need anywhere, so apologies if this is a duplicate.

Suppose I have the following df:

a    b    c    d  
1    2    3    4  
2    1    2    3  
1    2    4    4

I want to subset my df so that matching rows based on the "a", "b", and "d" columns are returned in a new dataframe.

1 Answer 1

1

We can use duplicated on the subset of the columns to return a logical vector to filter the rows

df[duplicated(df[c('a', 'b', 'd')])|duplicated(df[c('a', 'b', 'd')], 
                fromLast = TRUE),]
#  a b c d
#1 1 2 3 4
#3 1 2 4 4

It can also be done with a group by operation and filter those groups that have more than 1 row

library(dplyr)
df %>%
   group_by(a, b, d) %>%
   filter(n() > 1)

with data.table

library(data.table)
setDT(df)[, .SD[.N > 1], by = .(a, b, d)]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.