-1

I have two tables related to each other, each with roughly 200M records.

CREATE TABLE [dbo].[AS_tblTBCDEF](
    [CDEF_SOC_NUM] [numeric](5, 0) NULL,
    [CDEF_EFF_DATE] [date] NULL,
    [CDEF_TYP_BUS] [nvarchar](1) NULL,
    [CDEF_CLASS_NUM] [smallint] NULL,
    [CDEF_GROUP] [smallint] NULL,
    [CDEF_COV_EXP_TYP] [nvarchar](1) NULL,
    [CDEF_SCHEDULE] [nvarchar](9) NULL,
    [CDEF_LIMIT] [numeric](9, 2) NULL,
    [CDEF_LIMIT_PCTILE] [nvarchar](2) NULL,
    [CDEF_WHY_NOT_COV] [smallint] NULL,
    [CDEF_PROVIEW_GRP] [smallint] NULL,
    [CDEF_BAS_ADJ_IND] [nvarchar](1) NULL,
    [CDEF_BAS_ADJ_AMT] [numeric](9, 2) NULL,
    [CDEF_DEF_TYPE] [nvarchar](1) NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[AS_tblTBCDEFD](
    [CDEF_DESC_SOC_NUM] [numeric](5, 0) NULL,
    [CDEF_DESC_EFF_DATE] [date] NULL,
    [CDEF_DESC_TYP_BUS] [nvarchar](1) NULL,
    [CDEF_DESC_CLASS] [smallint] NULL,
    [CDEF_DESC_GROUP] [smallint] NULL,
    [CDEF_DESC_TEXT] [nvarchar](77) NULL
) ON [PRIMARY]
GO

They are joined like this:

FROM [dbo].[AS_tblTBCDEF] GC_TBCDEF
    LEFT JOIN [dbo].[AS_tblTBCDEFD] GC_TBCDEFD 
        ON (GC_TBCDEF.CDEF_GROUP = GC_TBCDEFD.CDEF_DESC_GROUP) 
        AND (GC_TBCDEF.CDEF_CLASS_NUM = GC_TBCDEFD.CDEF_DESC_CLASS) 
        AND (GC_TBCDEF.CDEF_TYP_BUS = GC_TBCDEFD.CDEF_DESC_TYP_BUS) 
        AND (GC_TBCDEF.CDEF_EFF_DATE = GC_TBCDEFD.CDEF_DESC_EFF_DATE) 
        AND (GC_TBCDEF.CDEF_SOC_NUM = GC_TBCDEFD.CDEF_DESC_SOC_NUM)

These two tables get re-created monthly via an ETL script that is run by a different department. I'm trying to figure out the best way to key/index these tables so that it will actually return data. Right now, it just times out and returns nothing.

I have the following lines I run which add indexes, but it's clearly not enough. I'm looking for suggestions to optimize the join.

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEF]') AND NAME ='idx_Soc_Num')
CREATE INDEX idx_Soc_Num
ON [dbo].[AS_tblTBCDEF] (CDEF_SOC_NUM);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEF]') AND NAME ='idx_Class_Num')
CREATE INDEX idx_Class_Num
ON [dbo].[AS_tblTBCDEF] (CDEF_CLASS_NUM);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEF]') AND NAME ='idx_Eff_Date')
CREATE INDEX idx_Eff_Date
ON [dbo].[AS_tblTBCDEF] (CDEF_EFF_DATE);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEF]') AND NAME ='idx_Typ_Bus')
CREATE INDEX idx_Typ_Bus
ON [dbo].[AS_tblTBCDEF] (CDEF_TYP_BUS);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEF]') AND NAME ='idx_Group')
CREATE INDEX idx_Group
ON [dbo].[AS_tblTBCDEF] (CDEF_GROUP);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEFD]') AND NAME ='idx_Soc_Num')
CREATE INDEX idx_Soc_Num
ON [dbo].[AS_tblTBCDEFD] (CDEF_DESC_SOC_NUM);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEFD]') AND NAME ='idx_Class_Num')
CREATE INDEX idx_Class_Num
ON [dbo].[AS_tblTBCDEFD] (CDEF_DESC_CLASS);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEFD]') AND NAME ='idx_Eff_Date')
CREATE INDEX idx_Eff_Date
ON [dbo].[AS_tblTBCDEFD] (CDEF_DESC_EFF_DATE);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEFD]') AND NAME ='idx_Typ_Bus')
CREATE INDEX idx_Typ_Bus
ON [dbo].[AS_tblTBCDEFD] (CDEF_DESC_TYP_BUS);

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = object_id('[dbo].[AS_tblTBCDEFD]') AND NAME ='idx_Group')
CREATE INDEX idx_Group
ON [dbo].[AS_tblTBCDEFD] (CDEF_DESC_GROUP);
3
  • @ScottHunter Why is this closed? The DDL is all there, it's empirically proven to result in a poor execution plan. Commented Oct 16 at 15:02
  • 1
    @Charlieface I don't see an "empirically proven ... poor execution plan" (I agree but your comment fails to make that point). I think the question was poorly worded though, formulations like "the best way" and "I'm looking for suggestions" tend to stand out in close-vote review. It could've also done a better job at providing a minimal reproducible example by including INSERT statements. Fixing such things before submitting it to re-open review generally works a lot better than just expressing disagreement in a comment. Commented Oct 16 at 17:31
  • This seems like a duplicate of any question about multicolumns index, e.g. stackoverflow.com/questions/28475877/… Commented Oct 20 at 17:24

1 Answer 1

3

All your indexes are a complete waste of time, as they are single column indexes, with no INCLUDE columns. This means they are mostly only useful if doing a single point-lookup on that column. A giant join is not going to work, the optimizer will fall back to a hash match or sort/merge, which is going to be faster than a naive nested loop without proper indexing.

Delete all those indexes. Instead, create a single multi-column index on each table. Best to make them unique and clustered. Even better, make them the primary key, although that won't work with nullable columns (why are they nullable anyway??)

CREATE UNIQUE CLUSTERED INDEX idx_1 ON dbo.AS_tblTBCDEF
  (CDEF_SOC_NUM, CDEF_CLASS_NUM, CDEF_EFF_DATE, CDEF_TYP_BUS, CDEF_GROUP);

CREATE UNIQUE CLUSTERED INDEX idx_1 ON dbo.AS_tblTBCDEFD
  (CDEF_SOC_NUM, CDEF_CLASS_NUM, CDEF_EFF_DATE, CDEF_TYP_BUS, CDEF_GROUP);

The column ordering should ideally go from most selective (most distinct values) to least selective. But if you have other queries which only join or filter by some of the columns then put those columns first.

You can see from this fiddle that a much more efficient merge join with no sort.

Sign up to request clarification or add additional context in comments.

3 Comments

The ETL team doesn't make our lives easy. :-) They're actually a bit lazy and tend to just use defaults a lot. I've added some code to change the columns to NOT NULL and I'm running that now. I'll give this a spin once that completes.
Additional ask while I'm waiting for this to finish: Would that look like ALTER TABLE [dbo].[AS_tblTBCDEF] ADD CONSTRAINT [PK_Soc_Class_Date_Bus_Grp] PRIMARY KEY CLUSTERED (CDEF_SOC_NUM ASC, CDEF_CLASS_NUM ASC, CDEF_EFF_DATE ASC, CDEF_TYP_BUS ASC, CDEF_GROUP ASC)
Yes that would be the equivalent as a PK instead

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.