Different strategies for removing duplicate records in SQL Server
Delete duplicate rows from a table example using temporary table:
BEGIN TRAN
--Pull distinct records in the temporary table
SELECT DISTINCT * INTO #T_TEMP FROM TableName
--Truncate the target table
TRUNCATE TABLE TableName
--Insert the distinct records from temporary table back to target table
INSERT INTO TableName SELECT * FROM #T_TEMP
--Drop the temporary table
IF OBJECT_ID('tempdb..#T_TEMP') IS NOT NULL
DROP TABLE #T_TEMP
COMMIT TRAN
GO
SELECT * FROM TableName
Delete duplicate rows from a table example using CTE:
Delete duplicate rows from a table Using correlated sub query:
Creating new table with distinct records and renaming it
Using MERGE Statement
Beginning with SQL Server 2008, now you can use MERGE SQL COMMAND to perform INSERT/UPDATE/DELETE operations in a single statement. This new command is similar to the UPSERT (fusion of the words UPDATE and INSERT.) command of Oracle. It inserts rows that don't exist and updates the rows that do exist. With the introduction of the MERGE SQL command, developers can more effectively handle common data warehousing scenarios, like checking whether a row exists and then executing an insert or update or delete.
The MERGE statement basically merges data from a source result set to a target table based on a condition that you specify and if the data from the source already exists in the target or not. The new SQL command combines the sequence of conditional INSERT, UPDATE and DELETE commands in a single atomic statement, depending on the existence of a record. With this you can make sure no duplicate records are being inserted into the target table, but rather updated if there is any change and only new records are inserted which do not already exist in the target.
BEGIN TRAN
--Pull distinct records in the temporary table
SELECT DISTINCT * INTO #T_TEMP FROM TableName
--Truncate the target table
TRUNCATE TABLE TableName
--Insert the distinct records from temporary table back to target table
INSERT INTO TableName SELECT * FROM #T_TEMP
--Drop the temporary table
IF OBJECT_ID('tempdb..#T_TEMP') IS NOT NULL
DROP TABLE #T_TEMP
COMMIT TRAN
GO
SELECT * FROM TableName
Delete duplicate rows from a table example using CTE:
WITH cte AS (
SELECT
contact_id,
first_name,
last_name,
email,
ROW_NUMBER() OVER (
PARTITION BY
first_name,
last_name,
email
ORDER BY
first_name,
last_name,
email
) row_num
FROM
sales.contacts
)
DELETE FROM cte
WHERE row_num > 1;
--With correlated subquery
CREATE TABLE Employee
(
[ID] INT IDENTITY,
[FirstName] Varchar(100),
[LastName] Varchar(100),
[Address] Varchar(100),
)
GO
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('Linda', 'Mitchel', 'America')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('Linda', 'Mitchel', 'America')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('John', 'Albert', 'Australia')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('John', 'Albert', 'Australia')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('John', 'Albert', 'Australia')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('Arshad', 'Ali', 'India')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('Arshad', 'Ali', 'India')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('Arshad', 'Ali', 'India')
INSERT INTO Employee([FirstName], [LastName], [Address])
VALUES ('Arshad', 'Ali', 'India')
GO
SELECT * FROM Employee
GO
--Selecting distinct records
SELECT * FROM Employee E1
WHERE E1.ID = ( SELECT MAX(ID) FROM Employee E2
WHERE E2.FirstName = E1.FirstName AND E1.LastName = E2.LastName
AND E1.Address = E2.Address)
GO
--Deleting duplicates
DELETE Employee
WHERE ID < ( SELECT MAX(ID) FROM Employee E2
WHERE E2.FirstName = Employee.FirstName AND E2.LastName = Employee.LastName
AND E2.Address = Employee.Address)
GO
SELECT * FROM Employee
GO
New table with distinct only
BEGIN TRAN
-- Pull distinct records in a new table
SELECT DISTINCT * INTO EmployeeNew
FROM Employee
--Drop the old target table
DROP TABLE Employee
--rename the new table
EXEC sp_rename 'EmployeeNew', 'Employee'
COMMIT TRAN
GO
SELECT * FROM Employee
GO
Using MERGE Statement
Beginning with SQL Server 2008, now you can use MERGE SQL COMMAND to perform INSERT/UPDATE/DELETE operations in a single statement. This new command is similar to the UPSERT (fusion of the words UPDATE and INSERT.) command of Oracle. It inserts rows that don't exist and updates the rows that do exist. With the introduction of the MERGE SQL command, developers can more effectively handle common data warehousing scenarios, like checking whether a row exists and then executing an insert or update or delete.
The MERGE statement basically merges data from a source result set to a target table based on a condition that you specify and if the data from the source already exists in the target or not. The new SQL command combines the sequence of conditional INSERT, UPDATE and DELETE commands in a single atomic statement, depending on the existence of a record. With this you can make sure no duplicate records are being inserted into the target table, but rather updated if there is any change and only new records are inserted which do not already exist in the target.
No comments:
Post a Comment