I found some T-SQL code, and i'm trying to figure out if it is:
- an elegant solution that i never knew about
- a horrible abomination that happens to work, and should absolutely be removed
- or somewhere in between.
Standardize Names
We have a requirement to try to cleanup names:
Bob
→Robert
Bill
→William
Dick
→Richard
So a table was created to hold the value and the value to replace it with. It holds the above words (as well as more esoteric, domain-specific, industry specific, mappings):
FillerWords:
Value | ReplaceWith |
---|---|
dick | Richard |
bill | William |
bob | Robert |
ted | William |
And the code I saw used to perform the replacment was either genius or madness:
DECLARE @firstName varchar(200) = 'bill';
SELECT @firstName = REPLACE(@firstName, FillerWords.Value, FillerWords.ReplaceWith) FROM FillerWords;
SELECT @firstName;
Which gives:
(no column name)
----------------
William
1 row(s) affected
Even more impressive is that it can do multiple replacements at once:
DECLARE @firstName varchar(200) = 'teddickbobbill';
SELECT @firstName = REPLACE(@firstName, FillerWords.Value, FillerWords.ReplaceWith) FROM FillerWords;
SELECT @firstName;
Which gives:
(no column name)
----------------
WilliamRichardRobertWilliam
1 row(s) affected
It works, but is it valid?
How does it work?
If you eliminate the variable assignment during the select:
SELECT REPLACE(@firstName, fw.Value, fw.ReplaceWith) FROM #FillerWords fw
you see that it is performing the REPLACE
on every row in the table:
(No column name)
teddickRobertbill
tedRichardbobbill
Williamdickbobbill
teddickbobWilliam
And so what happens is that each time a row is evaluated, the @firstName
variable is updated:
@firstName | Value | ReplaceWith | new value of @firstName |
---|---|---|---|
teddickbobbill | dick | Richard | tedRichard bobbill |
tedRichardbobbill | bill | William | tedRichardbobWilliam |
tedRichardbobWilliam | bob | Robert | tedRichardRobert William |
tedRichardRobertWilliam | ted | William | William RichardRobertWilliam |
So it seems to rely on a quirk of variable assignment, that you can keep revising the same variable row-after-row; it's almost like a cursor operation.
Is that style of T-SQL programming supported? I know recursive common-table-expressions (CTEs) do it all the time; but is it allowed here?
I know the code's success relies on the order in which rows are processed, which is generally not guaranteed in SQL. This could lead to inconsistent results, especially if the underlying data or the SQL Server's query execution plan changes. I do not care about this, here, in this question.
One Possible Answer
I asked one oratrice mechanique d'analyse cardinal, and it said:
The code exploits a specific behavior of T-SQL where a variable can be updated in a SELECT statement. However, this is not standard SQL behavior and may not be intuitive or maintainable. While this works in current versions of SQL Server, there's no guarantee that such behavior will remain consistent or supported in future versions, as it's not a documented feature for such purposes.
Is setting variables resursively not an accepted pattern?