Select rows where column contains same data in more than one record

Question

There are plenty of questions with similar titles, but I haven't been able to find an answer that doesn't involve group by (GROUP BY x HAVING COUNT(*) > 1), but what I'm looking for is a query that returns all rows ungrouped (in MySQL).

Say I have the following:

id  data
1    x
2    y
3    y
4    z

What I want the query to return is:

2    y
3    y

based on the fact that rows 2 and 3 have identical values in the data column.

SELECT * FROM table WHERE [data contains a value that exists in some other row as well]

why SELECT * FROM table WHERE data = 'y' does not apply to you? — JPG
– JPG, Commented Jun 2, 2014 at 10:14
In this example, yes. But in general where the contents of one column is equal irregardless of the contents of the other columns. — linurb
– linurb, Commented Jun 2, 2014 at 10:15
@JPG: Because I don't know that it is 'y'. It's just an example. What I know is that I want all rows except those where a particular column has a uniqe value. — linurb
– linurb, Commented Jun 2, 2014 at 10:17
Yes, id is unique, but that's irrelevant to the problem. In the real table there are plenty of other columns that are not unique. What's interesting is that data (in the example) is not unique. — linurb
– linurb, Commented Jun 2, 2014 at 10:30

fancyPants · Accepted Answer · 2014-06-02 10:17:18Z

7

You have to put it in a subquery

select * from table where data in (
    select data from table group by data having count(*) > 1
)

see it working live in an sqlfiddle

answered Jun 2, 2014 at 10:17

fancyPants

52.1k34 gold badges94 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

linurb Over a year ago

Thanks, but I've already tried that. It's extremely slow and and the database freezes and must be restarted. Any idea why?

fancyPants Over a year ago

Because you have no indexes on the table maybe? Especially on column data.

linurb Over a year ago

I have. And the subquery is fast enough in itself. Weird.

fancyPants Over a year ago

Have you checked with EXPLAIN SELECT ... if the index is really used?

linurb Over a year ago

On the subquery it is, but not on PK. I'm not sure how to interpret that. PRIMARY - Using where (31 394 rows), DEPENDENT SUBQUERY - Using index

|

Strawberry · Accepted Answer · 2014-06-02 10:39:10Z

6

SELECT DISTINCT x.* 
  FROM table1 x 
  JOIN table1 y 
    ON y.id <> x.id      --   ids are NOT equal 
   AND y.data = x.data;  --   but data IS

http://sqlfiddle.com/#!2/f8910

This query and fP's above are probably roughly equivalent in terms of performace - but rewrite fP's this way and watch it go...

SELECT DISTINCT x.id 
  FROM table1 x
  JOIN 
     ( SELECT data FROM table1 GROUP BY data HAVING COUNT(0) > 1 ) y
    ON y.data = x.data;

edited Jun 2, 2014 at 10:39

answered Jun 2, 2014 at 10:28

Strawberry

34k14 gold badges43 silver badges58 bronze badges

1 Comment

linurb Over a year ago

With your first query I had the same result, i.e. had to restart the db server. But your rewrite of fP's query did the trick. Thanks! I just wish I understood what makes the other two problematic.

Collectives™ on Stack Overflow

Select rows where column contains same data in more than one record

2 Answers 2

6 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Related