Inside FetchXML pt 10 – link-entity (again)

A follow-up on my previous post on link-entity, prompted by this tweet from Daryl LaBar:

I’d previously stated (incorrectly):

FetchXML supports two link types – inner and outer.

Me, in more ignorant times

While these are the most common link types – and the only two that FetchXML Builder exposes – the platform has supported a third for some time, and now appears to have several more too!

TL;DR

There are some tantalising hints at new join types, but you probably shouldn’t be jumping into using them unless Microsoft support advises you to just yet!

Natural Join

The documentation for the natural join type is not exactly clear to me:

Only one value of the two joined attributes is returned if an equal-join operation is performed and the two values are identical

docs.microsoft.com

I really have no idea what this is trying to say – if you can enlighten me, please do! Given that all the joins it does are based on the attributes being equal, and therefore the two values must be identical, I can’t see what this is trying to describe.

As I can’t tell from the docs what it is trying to do, I’ve tried to work it out by running some queries and having a look at the generated SQL (thank you on-premise!) and the results.

My conclusion: natural is a synonym for inner.

If you substitute natural for inner, you will get exactly the same results, as it generates exactly the same SQL.

Many to many relationships

One thing I did wonder based on the docs for the natural join type was whether it would try to imply the join attributes by name, i.e. you could miss out the from and to attributes on the link-entity if they were both the same and it would imply them from the metadata. This failed, but it did produce this intriguing error message:

No system many-to-many relationship exists between account and contact.  If attempting to link through a custom many-to-many relationship ensure that you provide the from and to attributes.

This seemed interesting, so I gave it a go with this simple query:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="list" link-type="inner">
      <attribute name="listname" />
    </link-entity>
  </entity>
</fetch>

And it worked! This is a simple shortcut for navigating built-in many-to-many relationships without having to use the more explicit version of joining via the intersect entity:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="listmember" to="accountid" from="entityid" alias="listmember" link-type="inner">
      <link-entity name="list" to="listid" from="listid" alias="list" link-type="inner">
        <attribute name="listname" />
      </link-entity>
    </link-entity>
  </entity>
</fetch>

New Join Types

Anyway, onto the original challenge. The docs now lists a total of 10 different join types:

  • Inner
  • LeftOuter
  • Natural
  • MatchFirstRowUsingCrossApply
  • In
  • Exists
  • Any
  • NotAny
  • All
  • NotAll

As there’s no documentation on any of these I headed over to the simple but trusty FetchXml Tester XrmToolBox tool and tried one at random:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="contact" to="accountid" from="parentcustomerid" link-type="notall">
    </link-entity>
  </entity>
</fetch>

This gave the remarkably helpful error message:

An error occured: Invalid link-type specified, valid values are: 'natural', 'inner', 'in', 'matchfirstrowusingcrossapply','exists' and 'outer'. link-type = notall

So maybe CDS doesn’t support all 10 join types yet, but it does appear to support some we didn’t have before:

  • In
  • MatchFirstRowUsingCrossApply
  • Exists

I also tried the same thing on an on-premise installation, and that does only support the original 3: inner, outer and natural, so this is online-only for now at least.

So, in the absence of any documentation on these, or the ability to see what SQL gets generated for them, what are we to do? Well, these all seem to have directly equivalent SQL keywords, so starting from a simple query:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="contact" from="parentcustomerid" to="accountid" link-type="inner">
      <attribute name="lastname" />
      <filter>
        <condition attribute="firstname" operator="eq" value="Mark" />
      </filter>
    </link-entity>
  </entity>
</fetch>

My guess for the generated SQL when changing the link-type to different options would be:

Inner

SELECT name,
       lastname
FROM   account
       INNER JOIN
       contact
       ON
       account.accountid = contact.parentcustomerid
WHERE  contact.lastname = 'Mark'

In

SELECT name
FROM   account
WHERE  accountid IN (
  SELECT parentcustomerid
  FROM   contact
  WHERE  firstname = 'Mark'
)

MatchFirstRowUsingCrossApply

SELECT name,
       contact.lastname
FROM   account
       CROSS APPLY (
         SELECT TOP 1 lastname
         FROM         contact
         WHERE firstname = 'Mark' AND
               parentcustomerid = account.accountid
       ) AS contact

Exists

SELECT name
FROM   account
WHERE  EXISTS (
  SELECT *
  FROM   contact
  WHERE  firstname = 'Mark' AND
         parentcustomerid = account.accountid
)

This is just conjecture at this point, but if correct it would mean:

  1. Each of the new join types would only produce one row per account that has a contact called Mark, regardless of whether there was one such contact or 100. The standard inner join type however would produce one row for each contact
  2. The in and exists versions would not be able to include any attributes from the linked entity (lastname in this example), as it is only referenced from the WHERE clause, but the inner and matchfirstrowusingcrossapply would be able to

So, let’s give it a go!

I’ve created two accounts, one with no contacts called Mark and one with two:

And the results?

Inner Join

No surprise here, but good for a simple baseline:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="contact" to="accountid" from="parentcustomerid"  link-type="inner">
      <attribute name="lastname" />
      <filter>
        <condition attribute="firstname" operator="eq" value="Mark" />
      </filter>
    </link-entity>
    <filter>
      <condition attribute="name" operator="like" value="Join Operator %" />
    </filter>
  </entity>
</fetch>
namelastname
Join Operator 2Carrington
Join Operator 2Duplicate

In

Executing the exact same query but replacing inner with in gave the error:

An error occured: In doesn't support attribute inside linkentity expression.

Removing the <attribute> from inside the <link-entity> gave the expected result:

name
Join Operator 2

So far, so good!

Exists

Going through the same process with exists, again we get the same error about not being able to use an <attribute> within the <link-entity>. Removing that gives us the same results again!

MatchFirstRowUsingCrossApply

Trying again with matchfirstrowusingcrossapply, and it almost works as I expected. I can run the query with the <attribute name="lastname" /> element inside the <link-entity> without an error, but I don’t actually get that column back in the result set – I get the same results again as I did with in and exists!

Given how close the other predictions were, I’m going to chalk this down overall as a success, with the missing column in the results possibly being due to a CDS bug that will hopefully be fixed at some point. As always though, I’d be very happy to get any firm information on it one way or another!

Why?

So now we have a reasonable idea of what these new join operators are doing, why might we want to use them? We’ve managed quite happily with inner and outer until now, thank you very much.

Uniqueness

The main thing that these operators give us that we couldn’t do before is removing the duplication problem. We saw earlier that we can use distinct to remove duplicate rows in query results that were introduced by joining to multiple child records, but this also removed the automatic primary key from the results. Because these operators do not produce duplicate rows regardless of the number of child records, there is no need to use distinct. This was never that much of a problem though, as we could always include the primary key column explicitly anyway. So what else is there?

Single Child Entity

The matchfirstrowusingcrossapply option looks the most powerful, although as I haven’t managed to actually get any attributes out of the linked entity I can’t realise that yet. It would be most powerful if we could combine that with a sort order, so you could get a list of accounts along with the “best” contact for each account:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="contact" to="accountid" from="parentcustomerid"  link-type="matchfirstrowusingcrossapply">
     <attribute name="contactid" />
     <attribute name="fullname" />
     <order attribute="cdi_score" descending="true" />
    </link-entity>
  </entity>
</fetch>

That looks like it could give us a unique list of accounts, along with the details of the top contact in the account according to the ClickDimensions score. Unfortunately, running that gives us an error:

MatchFirstRowUsingCrossApply doesn't support order clause inside linkentity expression.

So far, that looks like a missed opportunity, as the corresponding SQL would work perfectly fine:

SELECT name,
       contact.contactid,
       contact.fullname
FROM   account
       CROSS APPLY (
         SELECT TOP 1 contactid,
                      fullname
         FROM         contact
         WHERE        parentcustomerid = account.accountid
         ORDER BY     cdi_score DESC
       ) AS contact

Performance

So what else? How about performance? If we compare the four simple examples we looked at before, we get three quite different query plans:

Comparative performance of different join operators

Even when adding in a DISTINCT option to the simple inner join type, this takes effectively no time at all to execute. All three of the other new options take significantly longer, with the matchfirstrowusingcrossapply option taking the vast majority of the time. So there’s no quick performance win here by just turning one of these options on blindly.

My best guess then is that these options have been added following some performance analysis of other, more complex queries that Microsoft have seen from their telemetry. This could possibly be for their 1st party Dynamics 365 apps, AI or other applications that have a particular need for more generating specific queries. This would probably explain the lack of documentation as much as anything else.

As there are still some other join types listed in the documentation which the server doesn’t appear to support yet, it could also be the case of the documentation getting ahead of the code and we’ve got some more exciting platform extensions coming in a future release!

Leave a comment

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.