FetchXML Late Materialize

The latest addition to FetchXML is Late Materialize. Turning this on or off doesn’t change the results of your query, but it does change how it’s executed behind the scenes, which could lead to some performance improvements.

How is FetchXML executed?

To understand what this new option does, it’s helpful to know what’s going on behind the scenes normally.

Ultimately the data lives in SQL, and CDS translates the FetchXML to a SQL SELECT query.

For a simple query like:

<fetch>
  <entity name="account">
    <attribute name="name" />
  </entity>
</fetch>

it gets translated quite easily to something like:

SELECT TOP 5000 name
FROM            account

However, it gets more interesting when you ask for lookup fields:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <attribute name="primarycontactid" />
  </entity>
</fetch>

If you run this query, the results you get back contain both the ID and name of the primary contact. It gets the name by translating this to a SQL join:

SELECT TOP 5000 account.name,
                account.primarycontactid,
                contact.fullname
FROM            account
                LEFT OUTER JOIN
                contact
                ON
                account.primarycontactid = contact.contactid

What’s the problem?

Each of these joins comes at a cost. Depending on how SQL Server decides to run the query, it could incur an index lookup for each record being considered. Each one should be very quick, but the time quickly adds up with large datasets and many lookup fields.

How does Late Materialize help?

Rather than getting all the results in one query, Late Materialize splits the query execution in two. In the first part it gets the IDs of the records that should be returned:

SELECT TOP 100 accountid
FROM           account
WHERE          <conditions>

Once it’s got the IDs it can get all the required details just for those records:

SELECT account.name,
       account.primarycontactid,
       contact.fullname
FROM   account
       LEFT OUTER JOIN
       contact
       ON account.primarycontactid = contact.contactid
WHERE  account.accountid IN (<guid1>, <guid2>, ..., <guid100>)

If your query is only returning a small proportion of all the possible records, this could give a performance boost.

However, it doesn’t improve every query. In many common cases it’s still more efficient to use the standard execution model.

When should I use it?

The documentation indicates that Late Materialize can improve performance when:

  • The entity you are querying has one or more links to other entities for column data
  • There are many columns in the entity
  • The entity contains logical attributes

So I gave it a try in a few situations to see how it compares. My test data set had around 20K account records.

Single string attribute

Completely ignoring the advice above, I tried getting just account names as a baseline. This worked out much as I expected:

OriginalLate Materialize% Improvement
635 ms643 ms-1%

There is more overhead in splitting the request into two different queries than there was getting all the data in one query, so don’t just turn this option on for all your queries!

All attributes

Going to the other end of the spectrum, how about retrieving all the attributes? (You don’t do this, right? Please always only select the attributes you need!) This certainly ticks two of the criteria for seeing a benefit from this option, as there are a large number of attributes including plenty of lookup attributes:

OriginalLate Materialize% Improvement
14,070 ms13,670 ms3%

This was a little disappointing, I was hoping to see some noticeable impact at this point.

Limiting result count

In these examples I haven’t limited the number of results, so I’ve got the default 5,000 records. What if I limit that to 500?

OriginalLate Materialize% Improvement
1,668 ms1,606 ms4%

Still rather underwhelming I’m afraid.

I’ve tried various combinations of page sizes, additional joins etc. and consistently get around 0-3% improvement in query execution time.

If you’ve got a query where you’ve seen a more dramatic performance boost by enabling this option, please let me know!

Leave a comment

Your e-mail address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.