0

I have a table with strings in one column, which are actually storing other SQL Queries written before and stored to be ran at later times. They contain parameters such as '@organisationId' or '@enterDateHere'. I want to be able to extract these.

Example:

ID Query
1 SELECT * FROM table WHERE id = @organisationId
2 SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere
3 SELECT name + '@' + domain FROM user

I want the following:

ID Parameters
1 @organisationId
2 @startDate, @endDate, @enterOrgHere
3 NULL

No need to worry about how to separate/list them, as long as they are clearly visible and as long as the query lists all of them, which I don't know the number of. Please note that sometimes the queries contain just @ for example when email binding is being done, but it's not a parameter. I want just strings which start with @ and have at least one letter after it, ending with a non-letter character (space, enter, comma, semi-colon). If this causes problems, then return all strings starting with @ and I will simply identify the parameters manually.

It can include usage of Excel/Python/C# if needed, but SQL is preferable.

5
  • Just build an appropriate regular expression. You can do it in any language from your list. Commented Nov 29, 2022 at 14:27
  • 1
    Looks like a simple regular expression: @\w+. In Python: matches = re.findall(r'@\w+', your_sql_string).
    – Matthias
    Commented Nov 29, 2022 at 14:27
  • 1
    prolly OP doesn't wana catch something like name + '@example.' + domain ... but there is more ... hehe what about SELECT 1 FROM WHERE /* @example = 1 AND */ 1 = 1 ... just use some SQL parser ...
    – Selvin
    Commented Nov 29, 2022 at 15:10
  • While asking a question, you need to provide a minimal reproducible example: (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). Commented Nov 29, 2022 at 15:10
  • Why do you want this? It matters. SQL in general and T-SQL in particular are very bad at text parsing. It's better to use a parser like eg python-sqlparse. In SQL Server 2017 and later you can use Python scripts with sp_execute_external_script to process data using Python Commented Nov 29, 2022 at 15:30

4 Answers 4

2

The official way to interrogate the parameters is with sp_describe_undeclared_parameters, eg

exec sp_describe_undeclared_parameters @tsql =  N'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'  
1
  • The best and most correct method, but requires correct code because the code is compiled
    – Arzanis
    Commented Nov 30, 2022 at 7:45
1

It is very simple to implement by using tokenization via XML and XQuery.

Notable points:

  1. 1st CROSS APPLY is tokenazing Query column as XML.
  2. 2nd CROSS APPLY is filtering out tokens that don't have "@" symbol.

SQL #1

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, Query VARCHAR(2048));
INSERT INTO @tbl (Query) VALUES
('SELECT * FROM table WHERE id = @organisationId'),
('SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'),
('SELECT name + ''@'' + domain FROM user');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);

SELECT t.ID
    , Parameters = IIF(t2.Par LIKE '@[a-z]%', t2.Par, NULL)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
      REPLACE(Query, @separator, ']]></r><r><![CDATA[') + 
      ']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"@")])').value('text()[1]','VARCHAR(1024)'))) AS t2(Par)

SQL #2

A cleansing step was added to handle other than a regular space whitespaces first.

SELECT t.*
    , Parameters = IIF(t2.Par LIKE '@[a-z]%', t2.Par, NULL)
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + Query + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t0(pure)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
      REPLACE(Pure, @separator, ']]></r><r><![CDATA[') + 
      ']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"@")])')
                .value('text()[1]','VARCHAR(1024)'))) AS t2(Par);

Output

ID Parameters
1 @organisationId
2 @startDate @endDate @enterOrgHere
3 NULL
7
  • That's not what anyone would call "simple" Commented Nov 29, 2022 at 15:31
  • This one didn't work unfortunately. It returned me things like "@orgId >=@fromDate <=@toDate", "table.organisationId=@organisationId" and also "@organisationId ORDER" picking up SQL keywords and other symbols in most cases
    – Joe
    Commented Nov 29, 2022 at 18:21
  • As I commented earlier, please provide a minimal reproducible example: ##1-4 We can work off just what is provided. Commented Nov 29, 2022 at 18:25
  • Also, my answer is based on tokenization where space is used as a separator between tokens. Very possible that your Query column contains other whitespace characters. Commented Nov 29, 2022 at 18:34
  • Please try SQL #2 Commented Nov 29, 2022 at 18:50
0

You can use string split, and then remove the undesired caracters, here's a query :

DROP TABLE IF EXISTS #TEMP

SELECT 1 AS ID ,'SELECT * FROM table WHERE id = @organisationId' AS Query
INTO #TEMP
UNION ALL SELECT 2, 'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere'
UNION ALL SELECT 3, 'SELECT name + ''@'' + domain FROM user'

;WITH cte as
(
    SELECT ID,
        Query,
        STRING_AGG(REPLACE(REPLACE(REPLACE(value,'<',''),'>',''),'=',''),', ') AS Parameters
    FROM #TEMP
    CROSS APPLY string_split(Query,' ')
    WHERE value LIKE '%@[a-z]%'
    GROUP BY ID,
        Query
)
SELECT #TEMP.*,cte.Parameters
FROM #TEMP
LEFT JOIN cte on #TEMP.ID = cte.ID
1
  • This one also didn't work unfortunately. It picks up SQL keywords like '@insertIdHere ORDER' from a following ORDER BY clause, even though there is a space between them. And it picks up more text before and after the parameter in other scenarios.
    – Joe
    Commented Nov 29, 2022 at 18:43
0

Using SQL Server for parsing is a very bad idea because of low performance and lack of tools. I highly recommend using .net assembly or external language (since your project is in python anyway) with regexp or any other conversion method. However, as a last resort, you can use something like this extremely slow and generally horrible code (this code working just on sql server 2017+, btw. On earlier versions code will be much more terrible):

DECLARE @sql TABLE
(
  id INT PRIMARY KEY IDENTITY
, sql_query NVARCHAR(MAX)
);

INSERT INTO @sql (sql_query)
VALUES (N'SELECT * FROM table WHERE id = @organisationId')
     , (N'SELECT * FROM topic WHERE creation_time <=@startDate AND creation_time >= @endDate AND id = @enterOrgHere')
     , (N'  SELECT name + ''@'' + domain FROM user')
     ;


WITH prepared AS
(
  SELECT id
        , IIF(sql_query LIKE '%@%'
             , SUBSTRING(sql_query, CHARINDEX('@', sql_query) + 1, LEN(sql_query))
             , CHAR(32)
             ) prep_string
    FROM @sql
),
parsed AS
(
SELECT id
     , IIF(CHARINDEX(CHAR(32), value) = 0
          , SUBSTRING(value, 1, LEN(VALUE))
          , SUBSTRING(value, 1, CHARINDEX(CHAR(32), value) -1)
          ) parsed_value
  FROM prepared p
  CROSS APPLY STRING_SPLIT(p.prep_string, '@')
)
SELECT id, '@' + STRING_AGG(IIF(parsed_value LIKE '[a-zA-Z]%', parsed_value, NULL) , ', @')
  FROM parsed
GROUP BY id
3
  • This one also didn't work unfortunately. For example for a query which is just 'SELECT * FROM generic_table_name' it returned '@SELECT'. And it picks up SQL keywords too like '@parameter1' ORDER from an ORDER BY clause, although there is a space after it in the original query.
    – Joe
    Commented Nov 29, 2022 at 18:37
  • Yeah, another reason not to try parsing in SQL is edge cases. Changed the code for this case.
    – Arzanis
    Commented Nov 30, 2022 at 7:36
  • In general, the most reliable and simple way is to loop through the line. But I hate this method, so I don't give this example
    – Arzanis
    Commented Nov 30, 2022 at 7:37

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.