4

There are plenty of questions with similar titles, but I haven't been able to find an answer that doesn't involve group by (GROUP BY x HAVING COUNT(*) > 1), but what I'm looking for is a query that returns all rows ungrouped (in MySQL).

Say I have the following:

id  data
1    x
2    y
3    y
4    z

What I want the query to return is:

2    y
3    y

based on the fact that rows 2 and 3 have identical values in the data column.

SELECT * FROM table WHERE [data contains a value that exists in some other row as well]

8
  • So where the data is equal, but the id's aren't? Commented Jun 2, 2014 at 10:12
  • 1
    why SELECT * FROM table WHERE data = 'y' does not apply to you? Commented Jun 2, 2014 at 10:14
  • In this example, yes. But in general where the contents of one column is equal irregardless of the contents of the other columns. Commented Jun 2, 2014 at 10:15
  • 1
    @JPG: Because I don't know that it is 'y'. It's just an example. What I know is that I want all rows except those where a particular column has a uniqe value. Commented Jun 2, 2014 at 10:17
  • 1
    Yes, id is unique, but that's irrelevant to the problem. In the real table there are plenty of other columns that are not unique. What's interesting is that data (in the example) is not unique. Commented Jun 2, 2014 at 10:30

2 Answers 2

7

You have to put it in a subquery

select * from table where data in (
    select data from table group by data having count(*) > 1
)
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks, but I've already tried that. It's extremely slow and and the database freezes and must be restarted. Any idea why?
Because you have no indexes on the table maybe? Especially on column data.
I have. And the subquery is fast enough in itself. Weird.
Have you checked with EXPLAIN SELECT ... if the index is really used?
On the subquery it is, but not on PK. I'm not sure how to interpret that. PRIMARY - Using where (31 394 rows), DEPENDENT SUBQUERY - Using index
|
6
SELECT DISTINCT x.* 
  FROM table1 x 
  JOIN table1 y 
    ON y.id <> x.id      --   ids are NOT equal 
   AND y.data = x.data;  --   but data IS

http://sqlfiddle.com/#!2/f8910

This query and fP's above are probably roughly equivalent in terms of performace - but rewrite fP's this way and watch it go...

SELECT DISTINCT x.id 
  FROM table1 x
  JOIN 
     ( SELECT data FROM table1 GROUP BY data HAVING COUNT(0) > 1 ) y
    ON y.data = x.data;

1 Comment

With your first query I had the same result, i.e. had to restart the db server. But your rewrite of fP's query did the trick. Thanks! I just wish I understood what makes the other two problematic.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.