1

I have a delete based on a simple select. I could tune it to delete in batches and it's way faster then a normal delete.

DECLARE 
@StartDate DATETIME = '05/01/2024', 
@EndDate DATETIME =   '05/31/2024';

 select  TABLE_ID
into #temp
                from [BIGGUS_TABLE]
                    WHERE TransactionDate between  @StartDate AND @EndDate
                    AND ANOTHER_CHECK = 18;
go
DECLARE @BatchSize INT = 50000;
DECLARE @RowsDeleted INT;
DECLARE @DeletedIDs TABLE (TABLE_ID INT);
-- Initialize @RowsDeleted to a non-zero value
SET @RowsDeleted = @BatchSize;
WHILE @RowsDeleted > 0
BEGIN
    -- INSERT INTO @DeletedIDs (TABLE_ID)
    DELETE TOP (@BatchSize) 
    FROM [BIGGUS_TABLE]
    OUTPUT DELETED.TABLE_ID INTO @DeletedIds
    where TABLE_ID in
        ( 
            select  TABLE_ID
                from #temp
        )
 DELETE FROM #temp 
    WHERE TABLE_ID IN (SELECT TABLE_ID FROM @DeletedIDs)
-- DROP TABLE #DeletedIDs
    SET @RowsDeleted = @@ROWCOUNT;
END

*TABLE_ID* is the PK

it would take 5 days to delete 100million rows.

enter image description here

The green table is BIGGUS_TABLE having the main deletes. the RED table is a secondary EMPTY table that has a beautiful FK pointing to BIGGUS_TABLE.

The index being used by BIGGUS_TABLE has TABLE_ID (The pk) But it doesn't has the ANOTHER_CHECK column, but funnily it's still using a seek?

So, I'm trying to think what we can do to improve this:

1 - Remove the FK from the empty table and then later re-add ( we don't even know if that table is being used for real.

2 - add ANOTHER_CHECK column into the index being used by the green Table/BIGGUS_TABLE?

3 - Anything we can add/remove from the query?

We went from some 400k each hours to 700k each hour but now I'm stuck.

1
  • 1
    What is considered improvement here?...faster overall runtime (if even at the expense of resource contention and longer sustained table locking)? What indexes are on BIGGUS_TABLE? Are you trying to empty the entire table (which is what your example code would do)? How many rows are in BIGGUS_TABLE total? Commented Apr 7 at 4:46

2 Answers 2

1

Without seeing the actual execution plan, it's hard to tell where the hold up is, but there are a couple improvements we can throw at it to try to move things along:

  1. Create an index with a uniqueness guarantee on the #temp table
  2. Not OUTPUT into a @table variable because it disallows a parallel plan

The changes would look about like this:

/*Be nice people and declare our variables together*/
DECLARE
    @StartDate datetime = '20240501',
    @EndDate datetime = '20240531',
    @BatchSize integer = 50000,
    @RowsDeleted integer = 1;

/*Continue nice-fest and create our temp tables together*/
CREATE TABLE 
    #DeletedIDs
(
    TABLE_ID integer NOT NULL
);


CREATE TABLE 
    #temp
(
    TABLE_ID integer NOT NULL
);

/*
Insert with TABLOCK to keep a parallel insert if we
were getting one with the SELECT INTO
*/
INSERT INTO
    #temp
WITH
    (TABLOCK)
(
    TABLE_ID
)
SELECT DISTINCT
    TABLE_ID
FROM [BIGGUS_TABLE] AS bt
WHERE bt.TransactionDate
BETWEEN @StartDate
    AND @EndDate
AND bt.ANOTHER_CHECK = 18
/*Add a recompile hint here to get over any local variable silliness*/
OPTION(RECOMPILE);

/*Create index after loading data to get full scan stats*/
CREATE UNIQUE CLUSTERED INDEX 
    c
ON #temp(TABLE_ID);

/*Same loop as before*/
WHILE @RowsDeleted > 0
BEGIN
    DELETE TOP (@BatchSize)
    FROM [BIGGUS_TABLE] AS bt
    /*Output into a temp table instead*/
    OUTPUT
        Deleted.TABLE_ID
    INTO #DeletedIds
    WHERE bt.TABLE_ID IN (SELECT t.TABLE_ID FROM #temp AS t)
    /*Optimize for the correct batch size*/
    OPTION(OPTIMIZE FOR(@BatchSize = 50000));

    DELETE 
        t
    FROM #temp AS t
    WHERE t.TABLE_ID IN (SELECT d.TABLE_ID FROM #DeletedIDs AS d);

    SET @RowsDeleted = @@ROWCOUNT;
END;

If you would like additional feedback, please consider posting an actual execution plan that would include operator times and wait stats, etc.

0

Generic advice

  1. Make sure that the target table has a supporting index so you can seek
  2. Identify rows for deletion and insert them into a temp table with an identity column. This will serve as a watermark so you don't have to delete from the temp table
  3. Make the delete batch size 2000 to prevent lock escalation
  4. Delete from the target table joined to the temp table - use the watermark as a batching mechanism. Add hints OPTION(OPTIMIZE FOR(), keepfixed plan). Do not output the deleted rows.
  5. Move the watermark

Partial code

set nocount on

create table #toDelete (
    Id     int identity(1, 1) primary key clustered /* non-named */
   , TablePK bigint 
)

insert #toDelete with (tablockx)
(
    t.TableID
)
select 
    t.TableID
from dbo.Table t
where condition = 'true'

declare
    @deletedRows int = 1
    , @batchSize int = 2000
    , @watermark int = 0
        
while @deletedRows > 0
begin
    delete t
    from dbo.Table t
    where exists
    (
        select 1
        from #toDelete as d
        where
            t.TableID = d.TableID /* correlation with the main table */
            and d.Id >= @watermark 
            and d.Id < @watermark + @batchSize
    )
    option 
    (
        optimize for (@watermark = 1)
        , keepfixed plan
    )
    
    set @deletedRows = @@ROWCOUNT
    set @watermark += @batchSize
end

Alternatively, if the number of deleted rows is a high percentage of the total table size, you could move the rows you want to keep into a new table and switch them around.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.