I’ve recently been looking at a few cases where I’ve got seemingly incorrect results from my FetchXML queries. This can manifest itself as the results limited to 50,000 records or records being skipped. I’ve narrowed these down to some behaviour I wasn’t aware of when you retrieve multiple pages of data.

Sorting on linked entities

My first problem came with a query like:

<fetch>
  <entity name="contact">
    <attribute name="fullname" />
    <link-entity name="account" from="accountid" to="parentcustomerid">
      <attribute name="name" />
      <order attribute="name" />
    </link-entity>
  </entity>
</fetch>

Looks straightforward enough – I should get back a list of contacts and their associated account names, sorted by the account name. And this works fine, until it stops after 50,000 records.

I’ve written about paging in Dataverse twice before, and it seems simple on the surface, but I seem to keep coming across situations like this where it’s easy to get tripped up.

For paging to work reliably we need a defined sort order, and we do here. However, when the sort is on a link-entity then Dataverse doesn’t return a paging cookie to move between pages. Instead it uses what’s called “legacy paging”. In this case to read the second page it loads in two pages worth of data from the database, throws the first page away and returns the second. To go to the third page it loads in three pages of data and throws away the first two. By the time you get to the tenth page it’s loaded 1+2+3+…+10=55 pages of data.

As this method of paging data is very inefficient, and gets exponentially worse on later pages, Dataverse limits the number of records you can get like this to 50,000. Unfortunately there isn’t a way of forcing it to use a paging cookie instead, so without changing the query there’s no way to get more than 50k records back.

Multiple linked entities

As we’ve got a problem here with a sort on a linked entity, how about we rewrite the query so the entity we want to sort on is the main entity? Something like this:

<fetch>
  <entity name="account">
    <attribute name="name" />
    <link-entity name="contact" from="parentcustomerid" to="accountid">
      <attribute name="fullname" />
    </link-entity>
    <order attribute="name" />
  </entity>
</fetch>

This time we do get a paging cookie back, so we can safely retrieve all our records, right? Turns out there’s a problem here too. Let’s take a look at that paging cookie:

<cookie page="1">
  <name last="Ben's Buns Plc" first="Aardvark Ltd" />
  <accountid last="{2E7F18B4-0D88-E311-9A9C-00155D00A305}" first="{AC3727AE-0761-E011-8D9D-00155D007501}" />
</cookie>

⚠️ Important: Although you shouldn’t try to parse the paging cookie and should just treat it as a black box, it’s informative here to help understand what’s going on.

The idea of the paging cookie is to indicate where the last page finished, so you can pass it into the next request so Dataverse can jump straight to the next page. For this to work reliably it needs a unique way of identifying each record, which it could have done had it included the contactid field. However, because the main entity was account Dataverse used only the name and accountid to identify the page boundary. If our data looks like:

The first page ended on record 5000, but using this paging cookie Dataverse will start the next page at the next account name & ID at record 5,011, so your app will never receive records 5,001 – 5,010.

So what does work?

Including my previous post, that makes three situations where paging might not give you the results you’re expecting:

So how can we reliably retrieve large amounts of data from Dataverse?

To get past the 50,000 row limit we need to use paging cookies, and to do that reliably we need to ensure that:

  1. If we’re using linked entities then the child entity is the root entity in the query and link entities are used only for parent (N:1) relationships
  2. There are no sorts on linked entities
  3. If the query is using the “distinct” option, there is a sort on the main entity

What if we can’t arrange the query to meet these requirements, like the list of contacts sorted by account name we looked at before? We can make this work, but we’ve got to do the paging work ourselves.

Manual paging

We need to make sure that, in addition to the sort on account name, it’s also sorted by the contact name. Now, as we retrieve each page we can track the account and contact we’ve reached, and use that to modify the query for the next page:

var lastAccountName = "";
var lastAccountId = Guid.Empty;
var lastContactId = Guid.Empty;

while (true)
{
  var fetch = @$"
  <fetch>
    <entity name="account">
      <attribute name="accountid" />
      <attribute name="name" />
      <link-entity name="contact" from="parentcustomerid" to="accountid" alias="contact" >
        <attribute name="contactid" />
        <attribute name="fullname" />
        <order attribute="contactid" />
      </link-entity>
      <order attribute="name" />
      <order attribute="accountid" />
      <filter type="or" >
        <filter type="and" >
          <condition attribute="name" operator="eq" value="{SecurityElement.Escape(lastAccountName)}" />
          <condition attribute="accountid" operator="eq" value="{SecurityElement.Escape(lastAccountId.ToString())}" />
          <condition entityname="contact" attribute="contactid" operator="gt" value="{SecurityElement.Escape(lastContactId.ToString())}" />
        </filter>
        <filter type="and" >
          <condition attribute="name" operator="eq" value="{SecurityElement.Escape(lastAccountName)}" />
          <condition attribute="accountid" operator="gt" value="{SecurityElement.Escape(lastAccountId.ToString())}" />
        </filter>
        <condition attribute="name" operator="gt" value="{SecurityElement.Escape(lastAccountName)}" />
      </filter>
    </entity>
  </fetch>";

  var results = svc.RetrieveMultiple(new FetchExpression(fetch));

  foreach (var entity in results.Entities)
  {
    // TODO: Process entity
    lastAccountName = entity.GetAttributeValue<string>("name");
    lastAccountId = entity.GetAttributeValue<Guid>("accountid");
    lastContactId = (Guid) entity.GetAttributeValue<AliasedValue>("contact.contactid").Value;
  }

  if (!results.MoreRecords)
    break;
}

This is obviously more work than using the native paging functionality, but gives the best of both worlds: we reliably get all the records, in the expected order, and with the same efficiency as paging cookies.

Although we started off with only one sort order on the account name field, to ensure we got a reliable sequence we also needed to add sorts on the primary key of each entity. That left us with three sorts:

  1. account name
  2. account ID
  3. contact ID

We need to track the last value of each of these fields in each page of records that we retrieve. The next page is then made up of either:

  1. the same name and accountid but higher contactid
  2. the same name but higher accountid
  3. higher name

As this is essentially an extended version of what the paging cookie logic is doing already, I wonder if that could be extended in future to cover these cases so we don’t have to do this manually?

4 thoughts on “MSDyn365 Internals: Paging Gotchas”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.