Common Table Expressions (CTEs) are a part of SQL that really made me rethink what is possible in the language when I first discovered them.
For those that haven’t used them before and don’t fancy trawling through the detailed documentation, there are essentially two types of CTEs, recursive and non-recursive.
A non-recursive CTE is a shorthand that lets you reuse a subquery easily. For example, this query:
SELECT name FROM account WHERE primarycontactid IN (SELECT contactid FROM contact WHERE parentcustomerid IS NULL AND statecode = 0) UNION ALL SELECT fullname FROM systemuser WHERE systemuserid IN (SELECT ownerid FROM contact WHERE parentcustomerid IS NULL AND statecode = 0)
You could rewrite this with a CTE to make it easier to understand:
WITH orphanedcontacts AS (SELECT contactid, ownerid FROM contact WHERE parentcustomerid IS NULL AND statecode = 0) SELECT name FROM account WHERE primarycontactid IN (SELECT contactid FROM orphanedcontacts) UNION ALL SELECT fullname FROM systemuser WHERE systemuserid IN (SELECT ownerid FROM orphanedcontacts);
The two queries are exactly equivalent, but the CTE version has a number of advantages. It’s easier to understand the intent of the query when you’re reading it, and if you need to make a change in future you only have to update the CTE once instead of making sure each subquery is updated in the same way.
This is where the real magic happens. Querying any hierarchical data structure, such as parent accounts or user/manager relationships, is very difficult in SQL. You could use a query like this to find the details of a user and their manager:
SELECT user.fullname, manager.fullname FROM systemuser AS user INNER JOIN systemuser AS manager ON user.parentsystemuserid = manager.systemuserid WHERE user.fullname = 'Mark Carrington'
But what about that manager’s manager? We could add another join:
SELECT user.fullname, manager.fullname, manager_manager.fullname FROM systemuser AS user INNER JOIN systemuser AS manager ON user.parentsystemuserid = manager.systemuserid INNER JOIN systemuser AS manager_manager ON manager.parentsystemuserid = manager_manager.systemuserid WHERE user.fullname = 'Mark Carrington'
But how many levels should we add in? If we knew that we probably wouldn’t have needed to write the query in the first place. This is where a recursive CTE comes in.
WITH userhierarchy AS (SELECT systemuserid, parentsystemuserid, fullname FROM systemuser WHERE fullname = 'Mark Carrington' UNION ALL SELECT manager.systemuserid, manager.parentsystemuserid, manager.fullname FROM systemuser AS manager INNER JOIN userhierarchy ON userhierarchy.parentsystemuserid = manager.systemuserid) SELECT fullname FROM userhierarchy
Wait, what? The CTE is referring to itself in the
FROM clause. How does that work?
In any recursive CTE there are at least two parts joined with
UNION ALL – the anchor part and the recursive part.
The anchor part defines a starting point for the query – one or more records that get us going on the hierarchy.
The recursive part takes each record that’s already been found and uses it to find more related records – usually the parent or the children of the current record.
The query keeps running the recursive part until it doesn’t find any more records, i.e. it reaches the top or bottom of the hierarchy.
CTEs are standard SQL, so can’t we just use the TDS endpoint to run them? Unfortunately not, they’re explicitly blocked for some reason.
However, as we’ve seen, non-recursive CTEs can be rewritten as subqueries. SQL 4 CDS will do exactly this. You can run a query containing a non-recursive CTE and SQL 4 CDS can use the TDS Endpoint to run it using the equivalent subqueries.
FetchXML has some specialised filter conditions for working with hierarchical data, which presumably translate to recursive CTEs behind the scenes. SQL 4 CDS will use these where possible, for example:
WITH cte (systemuserid, name) AS (SELECT systemuserid, fullname FROM systemuser WHERE systemuserid = '45fc4284-c607-eb11-a813-000d3a654aeb' UNION ALL SELECT systemuser.systemuserid, fullname FROM systemuser INNER JOIN cte ON systemuser.parentsystemuserid = cte.systemuserid) SELECT * FROM contact WHERE ownerid NOT IN (SELECT systemuserid FROM cte)
<fetch xmlns:generator='MarkMpn.SQL4CDS' count='1000'> <entity name='contact'> <all-attributes /> <link-entity name='systemuser' to='ownerid' from='systemuserid' alias='Expr7' link-type='outer'> <filter> <condition attribute='systemuserid' operator='eq-or-under' value='45fc4284-c607-eb11-a813-000d3a654aeb' /> </filter> </link-entity> <filter> <condition attribute='systemuserid' entityname='Expr7' operator='null' /> </filter> </entity> </fetch>
To use these operators your CTE must:
- reference only a single table
- join using the columns defined as the hierarchy relationship for that table
- not include any calculated columns
You can run more complex queries that can’t be translated directly to FetchXML or Dataverse SQL, making use of calculations in the CTE to get out more information. For example you could include the position of a record in the hierarchy:
WITH cte (systemuserid, name, depth) AS (SELECT systemuserid, fullname, 0 FROM systemuser WHERE fullname = 'Mark Carrington' UNION ALL SELECT systemuser.systemuserid, fullname, depth + 1 FROM systemuser INNER JOIN cte ON systemuser.parentsystemuserid = cte.systemuserid) SELECT * FROM cte
Or the path to get from the starting record:
WITH cte (systemuserid, path) AS (SELECT systemuserid, fullname FROM systemuser WHERE fullname = 'Mark Carrington' UNION ALL SELECT systemuser.systemuserid, fullname + ' / ' + path FROM systemuser INNER JOIN cte ON systemuser.parentsystemuserid = cte.systemuserid) SELECT * FROM cte
You could also not use any tables at all and use it to calculate factorials:
WITH Factorial (N, Factorial) AS (SELECT 1, 1 UNION ALL SELECT N + 1, (N + 1) * Factorial FROM Factorial WHERE N < 10) SELECT N, Factorial FROM Factorial
When SQL 4 CDS has to execute a recursive CTE itself, it generates a rather more complex execution plan. This is the plan for the second sample query above, getting a user hierarchy including the path:
And this is the plan for the exact same query but without the path calculation, getting only the individual user names:
And finally, the plan when we can filter the anchor part of the query on a specific guid:
Although they might look pretty, simpler execution plans will mean your query can execute quicker. If you find your CTE-based query is taking a long time to run, check if you can alter it slightly to take advantage of any of these optimizations.
As so often, I’m deeply indebted to Hugo Kornelis for his detailed documentation of the internals of SQL Server. This time most of the credit has to go to his Plansplaining article on how recursive CTEs work.
SQL 4 CDS has had some support for JSON functions since the first release of v7. That version introduced the
JSON_PATH_EXISTS functions as a way of querying individual values from JSON strings, such as those returned from the
This release extends on this by adding the
ISJSON functions. The
OPENJSON function in particular can be very useful as it allows you to extract multiple values from a single JSON string.
OPENJSON is a table-valued function, i.e. it returns a whole table of results rather than a single value, so you use it in the
FROM clause of your query.
You can use this to find out which fields are the trigger for a Power Automate Flow:
SELECT w.name, trigger_params.* FROM workflow AS w CROSS APPLY OPENJSON (w.clientdata, '$.properties.definition.triggers.manual.inputs.schema.properties') WITH (title NVARCHAR (100), type NVARCHAR (100), description NVARCHAR (MAX)) AS trigger_params WHERE w.category = 5
- takes all “Modern Flow” workflows (category 5)
- parses the JSON stored in the
- navigates to the
- produces an output row for each item in that JSON array, including the
descriptionproperties from the JSON
You could also find out which fields are on a form by parsing the
systemform.formjson column. This is a bit harder as you need to loop over all the tabs in the form, then all the columns in each tab, then all the sections in each column, then all the rows in each section, and finally all the cells in each row to find the individual fields. Fortunately we now have a tool to help us with this recursion – CTEs!
WITH cte (json) AS ( SELECT value FROM systemform CROSS APPLY OPENJSON(formjson) j WHERE objecttypecode = 'account' AND name = 'Account' AND j.type in (4, 5) UNION ALL SELECT value FROM cte CROSS APPLY OPENJSON(cte.json) WHERE type in (4, 5) ) SELECT JSON_VALUE(json, '$.DataFieldName') FROM cte WHERE JSON_VALUE(json, '$."$type"') LIKE '%CustomControl%' AND JSON_VALUE(json, '$.DataFieldName') IS NOT NULL
I’d love to hear how you’d like to use these new features, feel free to drop your example queries in the comments!
As well as various fixes and improvements for some specific types of queries there are a few more usability improvements to highlight:
This new query hint allows you to write an
INSERT statement that will silently ignore errors caused by a duplicate primary key. This can be particularly useful when adding many-to-many relationship records. For example, you can copy the security role assignments from one user to another with:
DECLARE @NewUser AS UNIQUEIDENTIFIER, @OldUser AS UNIQUEIDENTIFIER; SELECT @NewUser = systemuserid FROM systemuser WHERE fullname = 'New User'; SELECT @OldUser AS systemuserid FROM systemuser WHERE fullname = 'Old User'; INSERT INTO systemuserroles (systemuserid, roleid) SELECT @NewUser, roleid FROM systemuserroles WHERE systemuserid = @OldUser;
Normally this would cause an error if the user already has one of these roles, but by adding the
IGNORE_DUP_KEY hint the query will succeed and add all the missing roles regardless:
INSERT INTO systemuserroles (systemuserid, roleid) SELECT @NewUser, roleid FROM systemuserroles WHERE systemuserid = @OldUser OPTION (USE HINT('IGNORE_DUP_KEY'))
Using the Format button in SQL 4 CDS is a quick way to tidy up your SQL code, but this has previously lost any comments in your script. This version will keep the comments, although you may need to move them back around slightly to make sure they’re exactly where you want them.
The confirmation message that is shown on DML queries (
DELETE) now has a “Yes To All” option:
This also applies to the Azure Data Studio extension:
“All” means the current batch, so if you execute 10
INSERT statements in one go you can click “Yes To All” on the first one and only be prompted once, but you’ll be prompted again the next time you execute a query.
You can now right-click on a record identifier in the results grid and select to copy the URL of the record rather than having to open it directly: