2

I have these JSON:

Application 1:

{
"business_industry": "Agriculture",
"docs": [
    {
    "internal": false,
    "type": "Asset & Liability Statement"
    },
    {
    "internal": false,
    "name": "Privacy Consent",
    "type": "Privacy Consent"
    }
],
"quote": {
    "principal": 0,
    "short_id": "3856545"
}
}

Application 2:

{
"business_industry": "Construction",
"docs": [
    {
    "internal": false,
    "name": "Privacy Consent",
    "type": "Privacy Consent"
    }
],
"asset": {
    "model": null,
    "make": "3856545"
}
}

Application 3:

{
"business_industry": "Coal Mining",
"business_business_names": [
    {
    "business_organisation_name": "Centron Consulting Services",
    "business_effective_from": "2018-04-11"
    }
],
"lite_doc": {
    "total_sales": 0,
    "total_sales2": 0,
    "refunds_present": false,
    "payment_arrangement_evident": false
}
}

I would like to query all applications and get the keys with objects as values, automatically. Because I will use those keys as references for creating new models in another database.

Something like:

+----------+-----------+-------+----------------------------+-----------------------------+
|   docs   | quotes    | asset | business_business_names    | lite_doc                    |
+----------+-----------+-------+----------------------------+-----------------------------+
| internal | principal | model | business_organisation_name | total_sales                 |
| type     | short_id  | make  | business_effective_from    | total_sales2                |
| name     |           |       |                            | refunds_present             |
|          |           |       |                            | payment_arrangement_evident |
+----------+-----------+-------+----------------------------+-----------------------------+

I will then create five models: docs, quotes, asset, business_business_names, and lite_doc which have properties listed above. The objects can either be another dictionary or an array.

This code is currently what I have:

WITH docs AS (
    SELECT docs = STRING_AGG(j.[key], '
')
    FROM (
        SELECT DISTINCT j.[key]
        FROM Application a1
        CROSS APPLY OPENJSON(a1.DataReceived, '$.docs') j0
        CROSS APPLY OPENJSON(j0.value) j
    ) j
),
quotes AS (
    SELECT quotes = STRING_AGG(j.[key], '
')
    FROM (
        SELECT DISTINCT j.[key]
        FROM Application a1
        CROSS APPLY OPENJSON(a1.DataReceived, '$.quote') j
    ) j
),
asset AS (
    SELECT asset = STRING_AGG(j.[key], '
')
    FROM (
        SELECT DISTINCT j.[key]
        FROM Application a1
        CROSS APPLY OPENJSON(a1.DataReceived, '$.asset') j
    ) j
),
business_business_names AS (
    SELECT business_business_names = STRING_AGG(j.[key], '
')
    FROM (
        SELECT DISTINCT j.[key]
        FROM Application a1
        CROSS APPLY OPENJSON(a1.DataReceived, '$.business_business_names') j0
        CROSS APPLY OPENJSON(j0.value) j
    ) j
),
lite_doc AS (
    SELECT lite_doc = STRING_AGG(j.[key], '
')
    FROM (
        SELECT DISTINCT j.[key]
        FROM Application a1
        CROSS APPLY OPENJSON(a1.DataReceived, '$.lite_doc') j
    ) j
)
SELECT *
FROM docs
CROSS JOIN quotes
CROSS JOIN asset
CROSS JOIN business_business_names
CROSS JOIN lite_doc;

but if I add a new key-value pair where the value is either an object or array, I'd have to add another query as well.

How do I do it automatically and catching DataReceived values that are not objects?

1 Answer 1

2

Dynamic column names are problematic. But if you are happy to get the data in the column format ParentKey, SubKeys as multiple rows then it's fairly easy.

SELECT
  BaseKey,
  STRING_AGG(SubKey, '
') AS SubKeys
FROM (
    SELECT DISTINCT
      BaseKey = baseKey.[key],
      SubKey = ISNULL(subKey.[key], arrayOrObj.[key])
    FROM Application a
    CROSS APPLY OPENJSON(a.DataReceived) baseKey
    -- type 4 is array, type 5 is object
    CROSS APPLY OPENJSON(CASE WHEN baseKey.type in (4, 5) THEN baseKey.value END) arrayOrObj
    OUTER APPLY OPENJSON(CASE WHEN baseKey.type = 4 THEN arrayOrObj.value END) subKey
    WHERE baseKey.type IN (4, 5)
) j
GROUP BY
  j.BaseKey
ORDER BY
  BaseKey;

db<>fiddle

  • We begin by breaking out the root object into key-value pairs.
  • We filter to only include arrays or objects in the value.
  • We break out those into key-value pairs also.
  • Arrays come back as index-object pairs, so we need to conditionally break that out too.
  • We DISTINCT over the base key and sub-key names.
  • Then simply group by base key, and string aggregate the sub-keys.
  • We could have simplified to only one level of aggregation using STRING_AGG(DISTINCT, but SQL Server does not support that yet.
1
  • Thanks again! I guess I don't need the outer select with the string_agg. ssms doesn't format the new line properly and it's easier for me to just use the inner query you have Commented Jun 14, 2023 at 0:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.