SQL Remove Duplicates: Comprehensive Methods and Best Practices

Explore the different methods for filtering out and permanently removing duplicate rows using SQL. Learn the practical applications of how to remove duplicates in SQL Server, MySQL, and PostgreSQL.

Updated Mar 26, 2026 · 8 min read

Duplicate records are a common issue that can compromise data integrity and database performance. Removing these duplicates is essential for maintaining data accuracy, optimizing storage, and improving query performance. In this article, we will explore various techniques for removing duplicate rows in SQL, tailored to various use cases and database management systems.

As we get started, I highly recommend taking DataCamp’s Introduction to SQL and Learn SQL courses to learn foundational knowledge of extracting and analyzing data using SQL. Also, I find the SQL Basics Cheat Sheet, which you can download, is a helpful reference because it has all the most common SQL functions.

TL;DR

Use SELECT DISTINCT or GROUP BY to retrieve unique rows without modifying the table
Use ROW_NUMBER() with a CTE and DELETE for precise control over which duplicates to remove permanently
Use DELETE with a subquery (NOT IN / MIN()) for a straightforward deduplication approach in any DBMS
For large datasets, use temporary tables to batch-process duplicate removal safely
Prevent duplicates proactively with primary keys, unique constraints, and proper database normalization

Understanding Duplicate Rows in SQL

Duplicate rows in SQL refer to records within a table that contain identical values across all or selected columns. The common causes of duplicate rows in SQL include the following:

Missing Primary Keys: When tables lack a defined primary key or unique constraint, there is no mechanism to prevent the insertion of duplicate data. This can happen when a table is not normalized and/or there are transitive dependency issues.
Data Integration Issues: When merging datasets from different sources, improper joins or inconsistencies in data formats can accidentally introduce duplicates.
Manual Data Entry Errors: Human error, such as entering the same record multiple times, is another common cause of duplicate rows.

In the rest of the article, we will look at how to remove duplicates in SQL, and we will divide the article into two blocks. In the first section, we will cover how to remove duplicates in the data that you are retrieving for a report or dashboard; in the second section, we will look at how to remove duplicates in the database.

How to Identify Duplicate Rows

Before removing duplicates, identify which rows are duplicated. Use GROUP BY with HAVING COUNT(*) > 1 to find rows that appear more than once:

SELECT Name, COUNT(*) AS duplicate_count
FROM customers
GROUP BY Name
HAVING COUNT(*) > 1;

This query returns each Name that appears more than once, along with the number of occurrences. You can extend this to multiple columns by adding them to both the SELECT and GROUP BY clauses.

To see all rows with a rank indicating their position within each duplicate group, use ROW_NUMBER():

SELECT ID, Name,
       ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) AS row_num
FROM customers;

Rows where row_num > 1 are duplicates. Once identified, choose the appropriate removal method from the sections below.

Methods for Removing Duplicates in the Data You Retrieve

There are different methods of removing duplicates while retrieving records in SQL. Each method depends on the DBMS, such as SQL Server, MySQL, and PostgreSQL. In this section, we will look at the methods of removing duplicates while highlighting any special consideration for each database. Keep in mind, these methods filter the data and return unique records and they do not modify the underlying table.

Using DISTINCT keyword

The DISTINCT keyword is used in a SELECT statement to retrieve unique rows. The DISTINCT keyword syntax for removing duplicates is similar for MySQL, PostgreSQL, and SQL Server databases. The query below will retrieve unique customer names from the customers table.

SELECT DISTINCT Name 
FROM customers;

Using GROUP BY with aggregate functions

The GROUP BY clause, combined with other aggregate functions like MAX(), MIN(), or COUNT(), can help remove duplicate records from tables. The GROUP BY clause helps select specific records to retain while deleting other duplicates.

Suppose you want to delete duplicate customer records but keep the one with the highest ID. You will use the GROUP BY clause with the MAX() function, as shown below.

-- Delete duplicate rows from the 'customers' table (aliased as c1)
DELETE c1
FROM customers c1
-- Find the maximum ID for each unique Name
JOIN (
    SELECT Name, MAX(ID) AS MaxID
    FROM customers
    GROUP BY Name
) c2
-- Match rows based on 'Name' and keep the row with the maximum ID
ON c1.Name = c2.Name 
AND c1.ID < c2.MaxID;

MySQL and SQL Server support the above syntax of GROUP BY with aggregate functions and the JOIN clause.

Using ROW_NUMBER() with Common Table Expressions (CTE)

With the ROW_NUMBER() function combined with a Common Table Expression (CTE), you can filter out duplicates based on your criteria. The ROW_NUMBER function, when used with PARTITION BY and ORDER BY clauses, assigns a unique sequential number to each row. This method allows for filtering out the rows that do not meet the required criteria.

The following query identifies duplicates and removes all but the first occurrence.

-- Common Table Expression (CTE) to rank rows based on 'Name'
WITH CTE AS (
    SELECT ID, Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID ASC) AS RowNum
    FROM customers
)
-- Select only the unique records where RowNum = 1
SELECT ID, Name
FROM CTE
WHERE RowNum = 1;

This method works well for modern versions of SQL Server, MySQL, and PostgreSQL. It is useful for larger datasets or more complex conditions, as it allows you to specify exactly which duplicate to keep.

Removing duplicates using self-JOIN

A self-join allows you to compare a table to itself, making it helpful in identifying and removing duplicate rows by comparing records based on specific criteria. The following example uses the self-join to delete the row with the higher ID, keeping only the first occurrence of each name.

-- Delete duplicate rows using self-join
DELETE c1
FROM customers c1
JOIN customers c2
ON c1.Name = c2.Name AND c1.ID > c2.ID;

The above method works in major databases, including SQL Server, MySQL, and PostgreSQL. Check out our Intermediate SQL course to learn more about using aggregate functions and joins to filter data.

Methods for Removing Duplicates in the Database

While you can remove duplicate records using queries, you can also permanently delete them from the database. This approach is important for maintaining data quality. The following methods are used to remove duplicates from the database.

Using ROW_NUMBER() and DELETE

The ROW_NUMBER() function assigns a sequential number to rows within a defined partition. When used with the DELETE statement, it helps identify duplicates by ranking rows based on specific columns and removing unwanted records. This method applies to modern versions of MySQL (from 8.0), PostgreSQL, and SQL Server.

Suppose you want to remove duplicate customer records based on the Name column, keeping only the first occurrence (smallest ID):

-- Common Table Expression (CTE) to rank rows based on 'Name'
WITH CTE AS (
    SELECT ID, Name, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID ASC) AS RowNum
    FROM customers
)
-- Delete rows from the 'customers' table where the row number is greater than 1
DELETE FROM customers
WHERE ID IN (SELECT ID FROM CTE WHERE RowNum > 1);

Using DELETE with subquery

Sometimes, a simple DELETE operation using a subquery can remove duplicates from the database. This method is suitable for older versions of MySQL or PostgreSQL where ROW_NUMBER() might not be available.

The query below deletes rows from the customers table where the ID is not the minimum for each Name, keeping only the row with the smallest ID for each unique Name.

-- Delete rows from the 'customers' table
DELETE FROM customers
WHERE ID NOT IN (
    -- Subquery to find the minimum ID for each unique Name
    SELECT MIN(ID)
    FROM customers
    GROUP BY Name
);

Using GROUP BY with HAVING clause

When you need to check for duplicate values in specific columns, the GROUP BY clause combined with the HAVING clause can be used to identify duplicates. This method allows you to delete specific rows based on the given criteria. This method is compatible with SQL Server, MySQL, and PostgreSQL.

The following queries first identify which names appear more than once, then delete the duplicates while keeping the row with the smallest ID for each Name.

-- Step 1: Identify which Names have duplicates
SELECT Name, COUNT(*) AS duplicate_count
FROM customers
GROUP BY Name
HAVING COUNT(*) > 1;

-- Step 2: Delete duplicate rows, keeping the smallest ID for each Name
DELETE FROM customers
WHERE ID NOT IN (
    SELECT MIN(ID)
    FROM customers
    GROUP BY Name
);

Using temporary tables for batch processing

Temporary tables are efficient for batch processing and removing duplicates in large datasets. This method is useful where single queries can cause performance issues. The following query creates a temporary table to store the minimum ID for each Name and deletes rows from the customers table where the ID is not in the temp_customers table.

-- Create a temporary table with unique records
CREATE TEMPORARY TABLE temp_customers AS
SELECT MIN(ID) AS KeepID, Name
FROM customers
GROUP BY Name;

-- Delete duplicates not in the temporary table
DELETE FROM customers
WHERE ID NOT IN (SELECT KeepID FROM temp_customers);

-- Clean up
DROP TABLE temp_customers;

The above syntax using CREATE TEMPORARY TABLE is only supported in MySQL and PostgreSQL databases.

Remove Duplicates in SQL Server

SQL Server offers different methods of removing duplicate records from the database. These methods include using DISTINCT with INTO, ROW_NUMBER(), and temporary tables.

Using DISTINCT with INTO

You can use the DISTINCT keyword in a SELECT statement to create a new table with unique records. You can drop the old table once you verify the new table has the specified records. The following example creates the unique_customers table with unique records from the customers table.

-- Select distinct rows from 'customers' and create a new table 'unique_customers'
SELECT DISTINCT *
INTO unique_customers
FROM customers;
-- Drop the original 'customers' table to remove it from the database
DROP TABLE customers;
-- Rename the 'unique_customers' table to 'customers' to replace the original table
EXEC sp_rename 'unique_customers', 'customers';

Using ROW_NUMBER()

You can also use the ROW_NUMBER() function to remove duplicate records from the SQL Server. Assume you have a Customers table with duplicate rows based on the CustomerName column, and you want to delete all but the first occurrence for each duplicate group.

-- Common Table Expression (CTE) to assign a row number to each customer 
WITH CTE AS (
    SELECT CustomerID, CustomerName, ROW_NUMBER() OVER (PARTITION BY CustomerName ORDER BY CustomerID ASC) AS RowNum
    FROM Customers
)
-- Delete rows from the CTE
DELETE FROM CTE
WHERE RowNum > 1;

Using temporary table

Since SQL Server does not support the CREATE TEMPORARY TABLE function, you use the SELECT INTO function. Temporary tables in SQL Server use # as a prefix for the table name.

-- Create a temporary table
SELECT MIN(CustomerID) AS ID, CustomerName
INTO #temp_customers
FROM customers
GROUP BY CustomerName;
-- Delete rows from the 'customers' table where the ID is not in the temporary table
DELETE FROM customers
WHERE CustomerID NOT IN (SELECT ID FROM #temp_customers);
-- Optionally drop the temporary table after use
DROP TABLE #temp_customers;

I suggest trying our SQL Server Fundamentals skill track to improve your joining tables and data analysis skills. The SQL Server Developer career track will equip you with the skills to write, troubleshoot, and optimize your queries using SQL Server.

Quick Reference: SQL Deduplication Methods

The table below summarizes all the deduplication methods covered in this article, so you can quickly pick the right approach for your situation.

Method	Use Case	Modifies Data?	Database Support
`SELECT DISTINCT`	Retrieve unique rows from query results	No	All DBMS
`GROUP BY` + aggregates	Retrieve unique rows with aggregate values	No	All DBMS
`ROW_NUMBER()` + CTE (SELECT)	Flexible duplicate filtering in queries	No	SQL Server, MySQL 8.0+, PostgreSQL
`ROW_NUMBER()` + CTE (DELETE)	Permanently remove duplicates with fine control	Yes	SQL Server, MySQL 8.0+, PostgreSQL
`DELETE` with subquery	Remove duplicates using `NOT IN` / `MIN()`	Yes	All DBMS
Self-`JOIN` + `DELETE`	Remove duplicates by comparing rows pairwise	Yes	All DBMS
Temporary table approach	Batch processing for large datasets	Yes	MySQL, PostgreSQL (`#temp` for SQL Server)
`SELECT DISTINCT INTO`	Create a clean copy of the table	Yes (replaces table)	SQL Server

Best Practices

Duplicate rows are a common problem affecting data quality and database performance. Consider the following best practices to prevent duplicate records from being inserted in your database.

Use Primary Keys: The primary key column ensures that each record contains unique information, preventing duplicate values from entering the table.
Implement Unique Constraints: Applying unique constraints to any column ensures no duplicates exist across non-primary key columns, such as email addresses or phone numbers.
Proper Database Design and Normalization: Effective schema design and database normalization help reduce redundancy and duplicate data. This approach ensures each record is stored in specific tables.
Use Unique Indexes: Use unique indexes to ensure that certain column combinations are unique without requiring full table-level constraints across the entire dataset.
Regular Data Audits: Perform regular data audits by running queries to identify potential duplicates based on your business rules.

Conclusion

Identifying and removing duplicate rows is important to maintaining database efficiency and data accuracy. It is always a best practice to back up your data before making modifications to ensure no accidental data loss occurs.

If you are interested in becoming a proficient data analyst, check out our Associate Data Analyst in SQL career track to learn the necessary skills. The Reporting in SQL course is also appropriate if you want to learn how to build professional dashboards using SQL. Finally, I recommend obtaining the SQL Associate Certification to demonstrate your mastery of using SQL for data analysis and stand out among other data professionals.

Earn a Top SQL Certification

Prove your core SQL skills and advance your data career.

Get SQL Certified

Author

Allan Ouko

What causes duplicate rows in SQL databases?

Can I prevent duplicates based on multiple columns?

How does the DISTINCT keyword remove duplicate rows?

Which method can you use to permanently delete duplicate records from the database?

Can duplicates affect the performance of my database?

How do I find duplicate rows in SQL?

What is the fastest way to remove duplicates from a large SQL table?

How do I find duplicate rows in SQL?

Topics

SQL

Data Analysis

Learn SQL with DataCamp

Course

Data Manipulation in SQL

4 hr

314.2K

Master the complex SQL queries necessary to answer a wide variety of data science questions and prepare robust data sets for analysis in PostgreSQL.

See Details

Start Course

Course

Introduction to Relational Databases in SQL

4 hr

187.3K

Learn how to create one of the most efficient ways of storing data - relational databases!

See Details

Start Course

Course

Introduction to SQL

2 hr

1.5M

Learn how to create and query relational databases using SQL in just two hours.

See Details

Start Course

Tutorial

How to Remove Duplicates in Excel: 5 Best Methods

Learn how to use Excel’s built-in features, formulas, and even Power Query to tackle duplicates, along with best practices and troubleshooting tips for handling complex datasets.

Laiba Siddiqui

Tutorial

SQL GROUP BY Multiple Column: Tips and Best Practices

Learn how to group data by multiple columns in SQL to perform advanced aggregations. Explore use cases, performance tips, and practical examples.

Allan Ouko

Tutorial

Excel UNIQUE(): The Fastest Way to Filter Duplicates

Discover how to use the Excel UNIQUE() function to instantly remove duplicates and extract distinct values from any list. Learn how to apply it to both rows and columns.

Josef Waples

Tutorial

Cleaning Data in SQL

In this tutorial, you'll learn techniques on how to clean messy data in SQL, a must-have skill for any data scientist.

Sayak Paul

Tutorial

SQL GROUP BY Multiple Column: Tips and Best Practices

Learn how to group data by multiple columns in SQL to perform advanced aggregations. Explore use cases and performance tips.

Allan Ouko

Tutorial

How to Join 3 Tables in SQL: Methods and Examples

Learn how to effectively join three tables in SQL. Discover practical methods and examples to enhance your data manipulation skills. Master SQL joins with ease.

Allan Ouko

See More See More

TL;DR

Understanding Duplicate Rows in SQL

How to Identify Duplicate Rows

Methods for Removing Duplicates in the Data You Retrieve

Using DISTINCT keyword

Using GROUP BY with aggregate functions

Using ROW_NUMBER() with Common Table Expressions (CTE)

Removing duplicates using self-JOIN

Methods for Removing Duplicates in the Database

Using ROW_NUMBER() and DELETE

Using DELETE with subquery

Using GROUP BY with HAVING clause

Using temporary tables for batch processing

Remove Duplicates in SQL Server

Using DISTINCT with INTO

Using ROW_NUMBER()

Using temporary table

Quick Reference: SQL Deduplication Methods

Best Practices

Conclusion

Earn a Top SQL Certification

Frequently Asked SQL Questions

How does the DISTINCT keyword remove duplicate rows?

Which method can you use to permanently delete duplicate records from the database?

Can duplicates affect the performance of my database?

How do I find duplicate rows in SQL?

What is the fastest way to remove duplicates from a large SQL table?

How do I find duplicate rows in SQL?

How to Remove Duplicates in Excel: 5 Best Methods

SQL GROUP BY Multiple Column: Tips and Best Practices

Excel UNIQUE(): The Fastest Way to Filter Duplicates

Cleaning Data in SQL

SQL GROUP BY Multiple Column: Tips and Best Practices

How to Join 3 Tables in SQL: Methods and Examples

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Data Manipulation in SQL

Introduction to Relational Databases in SQL

Introduction to SQL

How to Remove Duplicates in Excel: 5 Best Methods

SQL GROUP BY Multiple Column: Tips and Best Practices

Excel UNIQUE(): The Fastest Way to Filter Duplicates

Cleaning Data in SQL

SQL GROUP BY Multiple Column: Tips and Best Practices

How to Join 3 Tables in SQL: Methods and Examples

Data Manipulation in SQL