4

I have some performance issues.

I have a table with about 2 million rows.

CREATE TABLE [dbo].[M8](
    [M8_ID] [int] IDENTITY(1,1) NOT NULL,
    [APPLIC] [char](8) NOT NULL,
    [NIVALERTE] [numeric](1, 0) NOT NULL,
    [LOGDH] [datetime2](7) NULL,
    [USERX] [char](20) NOT NULL,
    [TACHE] [char](3) NOT NULL,
    [PRG] [char](32) NOT NULL,
    [DOS] [numeric](3, 0) NOT NULL,
    [ERRNUM] [numeric](5, 0) NOT NULL,
    [LOGTXT] [char](200) NOT NULL)

I read them with C# and ADO.NET

In the management studio (SQL Server 2008 R2), with that query :

SELECT 
    M8.M8_ID, M8.APPLIC, M8.NIVALERTE, M8.LOGDH, M8.USERX, M8.TACHE, 
    M8.PRG, M8.DOS, M8.ERRNUM, M8.LOGTXT
FROM 
    M8 AS M8 WITH(NOLOCK) 
WHERE 
    ((M8.APPLIC LIKE 'DAV' ) )
ORDER BY 
    M8.LOGDH DESC, M8.M8_ID ASC
OPTION (FAST 1)

It take about 1 minute to have the first rows.

But, with

DECLARE @APPLIC_ZOOMAPRESCLE_ZOOM_LIKE_APPLIC_WHERE_0 as char(8) = 'DAV'

SELECT 
   M8.M8_ID, M8.APPLIC, M8.NIVALERTE, M8.LOGDH, M8.USERX, M8.TACHE, 
   M8.PRG, M8.DOS, M8.ERRNUM, M8.LOGTXT
FROM 
   M8 AS M8 WITH(NOLOCK) 
WHERE 
   ((M8.APPLIC LIKE @APPLIC_ZOOMAPRESCLE_ZOOM_LIKE_APPLIC_WHERE_0 ) )
ORDER BY 
   M8.LOGDH DESC, M8.M8_ID ASC
OPTION(FAST 1)

I get the first rows after 4 seconds.

PS : I know, I have no % in the like.

Edit: Here are the execution plans https://www.dropbox.com/sh/jgai5f9txbs84x6/EP5_hj8DNv

18
  • 1
    do you run them in that order?
    – Jodrell
    Commented Sep 17, 2012 at 8:44
  • 3
    Then it is better to use = instead of like if you don't need it. Commented Sep 17, 2012 at 8:44
  • @Jodrell : I cleared the cache between them
    – Xavinou
    Commented Sep 17, 2012 at 8:45
  • @AndrásOttó : I kown, but it is a generic query. In that program, I can't...
    – Xavinou
    Commented Sep 17, 2012 at 8:46
  • 1
    @Xavinou Reported here so see what they say Commented Sep 17, 2012 at 15:09

2 Answers 2

5

Your table has 1,517,820 rows. Of these nearly one third (476,672) contain the value DAV (or more accurately the value DAV      as it is of CHAR(8) datatype so is padded out with trailing spaces.

In a LIKE comparison trailing spaces in the match_expression are not significant (though they are significant in the pattern itself).

Therefore the expression WHERE APPLIC LIKE 'DAV' does in fact match 476,672 rows. Neither of the execution plans estimate anywhere near this however. Though the faster plan (with variables) is three orders of magnitude closer.

+-----------------------+-----------+-----------+
|                       | Slow Plan | Fast Plan |
+-----------------------+-----------+-----------+
| Estimated # Rows      | 32        | 47,343    |
| Memory Grant          | 1 MB      | 333 MB    |
| Degree of Parallelism | 1         | 4         |
+-----------------------+-----------+-----------+

For the plan with the variables as SQL Server does not do variable sniffing (without e.g. the OPTION (RECOMPILE) hint) it falls back on guesses as to how many rows will match the predicate and comes up with an estimate that around 3.1% of the table will qualify.

Fast Plan

Fast

The plan with the literal value ought to have much better estimates. The screen shot you supplied of the DBCC SHOW_STATISTICS output (after adding another million rows) shows that DAV is definitely in there

Stats

Unfortunately it seems that although the trailing space in the column values are not significant for the query result their presence does mess up the cardinality estimates (Reported as a bug here and currently stated to be fixed in the next version). As a result of this problem it estimates only a handful of rows will be returned and comes up with the following plan.

Slow Plan

Slow

As well as performing half a million key lookups because of the poor cardinality estimates the memory grant is probably no where near adequate for the size of data being sorted resulting in spills to tempdb.

There are many work arounds you might consider if you can change the query or table schema.

  • Using = instead of LIKE
  • Changing the WHERE clause to LIKE CAST('DAV' AS CHAR(8))
  • Changing the column datatype to VARCHAR(8) (and ensuring all stored values are trimmed).
  • Dropping the current index being seeked (Index_A). You haven't supplied its definition but if it is a single column index on a column with few distinct values its presence may be more of a hindrance than a help (depending on your query workload))
  • Adding a covering index with key column APPLIC (and possibly LOGDH DESC, M8_ID ASC to avoid a sort) and the other referenced columns as INCLUDED.
2
  • Thank you very much for these explanations and workarounds. I think I'll either add a %, because with DB2 = 'DAV' not return 'DAV ' or cast 'DAV' as char(8).
    – Xavinou
    Commented Sep 19, 2012 at 7:35
  • @Xavinou - The reason I didn't suggest the % is because it might change the semantics if you were to (at some time in the future) need another value of APPLIC that begins with DAV. It will bring back those rows as well. Commented Sep 19, 2012 at 7:38
0

Maybe these 2 questions will give you a better understanding of the performance issues involved on the LIKE and = operators:

9
  • Thanks, but it is not really the use of like or (=) i'm interested in, but the perf issues with using a DECLARE or not
    – Xavinou
    Commented Sep 17, 2012 at 8:56
  • 1
    @Xavinou but reading the articles might give you some insight. I beleive that when the right hand side of the comparison is the right length the engine treats it like an equals and makes use of an index. When it is shorter, it does not.
    – Jodrell
    Commented Sep 17, 2012 at 8:59
  • @Jodrell - Without a leading wildcard the LIKE can use a range seek. Even with a leading wildcard it can use a range seek but the range is the whole index. The issue is with the DECLARE as suspected by the OP. SQL Server doesn't do variable sniffing so doesn't know how selective the query will be. Commented Sep 17, 2012 at 9:05
  • 1
    @Jodrell - True hadn't spotted that the OP was saying that the variable version was faster. They would need to post the plans so we can see the difference. But yes SQL Server compiles a plan that does not take account of the actual variable value unless the statement is subject to a recompile after the variable is assigned. Commented Sep 17, 2012 at 9:16
  • 1
    @Jodrell - Not sure if you looked at the plans but turns out that the slow one was in fact using an index and that was the problem. The fast one does a scan. In the OP's case they have many rows with the value DAV and even though these are present in the stats the statistics estimation for the LIKE seems to fail to realise these values will match the predicate. Commented Sep 18, 2012 at 7:15

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.