65.9K
CodeProject is changing. Read more.
Home

Controlling the XML output when using the FOR XML AUTO statement in SQL Server

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.91/5 (17 votes)

Jan 26, 2010

CPOL

3 min read

viewsIcon

117887

Describes how to get more control over the XML output when using the FOR XML AUTO statement. For instance, when adding XML tags.

Introduction

This article describes how to get more control over the XML output when using the FOR XML AUTO statement. For instance, when adding XML tags. This is instead of using the more difficult to understand FOR XML EXPLICIT statement. If you are about to deserialize an XML output for use in an application, you will perhaps find this information useful.

Background

In a FOR XML clause, you normally use one of these modes:

  • RAW
  • AUTO
  • EXPLICIT
  • PATH

When you want full control over the produced XML, you use FOR XML EXPLICIT, but it's rather difficult to understand, read, and maintain the complex SELECT statement at the end. FOR XML AUTO produces the most readable SELECT statement, but has the downside that you have minimal control over the produced XML. But with some tricks, and sometimes by using the PATH option in addition, you can do more than you will expect. The RAW option is rarely used and therefore not discussed. The PATH option allows you to mix attributes and elements easier. For now, we use FOR XML AUTO.

This example

For this example, we use two simple tables in a single database with a 1:N relationship. One table (SalesOrder) contains the orders along with customer information, and the other table (Items) contains the items. An order can have multiple items, and an item always belongs to one single order.

Code

I'd always like to start easy. So:

SELECT * FROM salesorder

produces::

ordernumber customername customerstreet
----------- ------------ --------------
1           parker       first av  
2           lesley       sec av

If we want the resultset to be XML, we add the FOR XML AUTO statement:

SELECT * FROM salesorder FOR XML AUTO

which produces::

<salesorder ordernumber="1" customername="parker" customerstreet="first av"/>
<salesorder ordernumber="2" customername="lesley" customerstreet="sec av"/>

But now, you have the fields as attributes. Most of the time, you will want them as elements. To do this, you add the ELEMENTS parameter:

SELECT * FROM salesorder FOR XML AUTO, ELEMENTS

Which produces::

<salesorder>
    <ordernumber>1</ordernumber>
    <customername>parker</customername>>
    <customerstreet>first av</customerstreet>
</salesorder>
<salesorder>
    <ordernumber>2</ordernumber>
    <customername>lesley</customername>
    <customerstreet>sec av</customerstreet>
</salesorder>

If you want to change the 'salesorder' tag, you can use:

SELECT * FROM salesorder AS niceorder FOR XML AUTO, ELEMENTS

Which produces:

<niceorder>
    <ordernumber>1</ordernumber>
    <customername>parker</customername>>
    <customerstreet>first av</customerstreet>
</niceorder>
<niceorder>
    <ordernumber>2</ordernumber>
    <customername>lesley</customername>
    <customerstreet>sec av</customerstreet>
</niceorder>

And of course, this trick also works for column names:

SELECT ordernumber AS order_no FROM salesorder WHERE ordernumber = 1 FOR XML AUTO, ELEMENTS

Which produces:

<salesorder>
    <order_no>1</order_no >
</salesorder>

What if you want to add additional tags or add nodes? For instance, add 'customer' tags around customer information? This proves to be difficult with FOR XML AUTO. A possible solution is a SELF JOIN (join to the same table), but I found something easier. After much fiddling and tinkering, we use a subquery and slightly abuse the FOR XML PATH command, like this:

SELECT 
    ordernumber,
    (SELECT    customername , 
            customerstreet FOR XML PATH(''), 
            TYPE, ELEMENTS) 
            as customer
FROM 
    salesorder 
FOR XML AUTO, ELEMENTS

Which produces:

<salesorder>
    <ordernumber>1</ordernumber>
    <customer>
        <customername>parker</customername>
        <customerstreet>first av</customerstreet>
    </customer>
</salesorder>
<salesorder>
    <ordernumber>2</ordernumber>
    <customer>
        <customername>lesley</customername>
        <customerstreet>sec av</customerstreet>
    </customer>
</salesorder>

Notice the use of the additional 'TYPE' parameter. This will ensure the result of the subquery is returned as an XML type (as part of the the whole XML type result), rather than the NVARCHAR(MAX) type. This neat little trick also works if you want to add surrounding tags to your complete result:

SELECT(
    SELECT 
        customername
    FROM 
        salesorder 
    FOR XML AUTO, TYPE, ELEMENTS
) AS orderrequest FOR XML PATH(''), TYPE, ELEMENTS

Which produces:

<orderrequest>
    <salesorder>
        <customername>parker</customername>
    </salesorder>
    <salesorder>
        <customername>lesley</customername>
    </salesorder>
</orderrequest>

Why don't we use the standard FOR XML AUTO in the subquery? Try it, it will generate an error. You can only use FOR XML AUTO in a subquery when the subquery is a query on an actual table (which the above obviously isn't).

The subquery construction is the way to go if you want full control over the produced XML output. Let's say we want, per order, the customer name and all the items belonging to the order. For this, you use correlated subqueries like this:

SELECT
    customername ,
    (SELECT    * FROM item WHERE item.ordernumber = 
       salesorder.ordernumber FOR XML AUTO, TYPE, ELEMENTS)
FROM
    salesorder
FOR XML AUTO, ELEMENTS

Which produces:

<salesorder>
    <customername>parker</customername>
    <item>
        <itemnumber>10</itemnumber>
        <description>pen</description>
        <ordernumber>1</ordernumber>
    </item>
    <item>
        <itemnumber>11</itemnumber>
        <description>paper</description>
        <ordernumber>1</ordernumber>
    </item>
</salesorder>

When using correlated subqueries, you can use the regular FOR XML AUTO, TYPE, ELEMENTS statement. If you want surrounding tags around the 'items', just add AS after the subquery, like this:

SELECT
    customername ,
    (SELECT    * FROM item WHERE item.ordernumber = 
      salesorder.ordernumber FOR XML AUTO, TYPE, ELEMENTS) 
      AS orderitems
FROM
    salesorder
FOR XML AUTO, ELEMENTS

Which produces:

<salesorder>
    <customername>parker</customername>
    <orderitems>
        <item>
            <itemnumber>10</itemnumber>
            <description>pen</description>
            <ordernumber>1</ordernumber>
        </item>
        <item>
            <itemnumber>11</itemnumber>
            <description>paper</description>
            <ordernumber>1</ordernumber>
        </item>
    </orderitems>
</salesorder>

Why don't we just simply join the item table to the order table? This will sometimes lead to unwanted and unpredictable behaviour when it comes to the layout of the produced XML. For instance:

SELECT
    item.description,
    salesorder.customername
FROM
    salesorder
INNER JOIN item ON item.ordernumber = salesorder.ordernumber
FOR XML AUTO, ELEMENTS

Will produce garbage like this:

<item>
    <description>pen</description>
    <salesorder>
        <customername>parker</customername>
    </salesorder>
</item>
<item>
    <description>paper</description>
    <salesorder>
        <customername>parker</customername>
    </salesorder>
</item>

Also, you will have a hard time adding surrounding tags when using joins. So, for most controls, when using FOR XML AUTO, use (correlated) subqueries.