Inside FetchXML pt 5 – aggregate

One feature of FetchXML that sets it apart from other CDS query options is aggregates. This is the option to get summary data (number of records, sum of opportunity value etc.) rather than the individual values from each record. If your only concept of how many records you have is “5,000+” then this is for you.

Let’s take a look at an example to get the most common first names of your active contacts:

<fetch aggregate="true">
  <entity name="contact">
    <attribute name="firstname" alias="firstname" groupby="true" />
    <attribute name="contactid" alias="occurrences" aggregate="count" />
    <filter>
      <condition attribute="statecode" operator="eq" value="0" />
    </filter>
    <order alias="occurrences" descending="true" />
  </entity>
</fetch>

There are 4 main points to note that are different from a regular query:

  1. The <fetch> element has the aggregate="true" attribute
  2. All the <attribute> elements have an alias. This is required for all aggregate queries, and is the one exception to my “don’t use attribute aliases” rule.
  3. All the <attribute> elements also have EITHER a groupby="true" or aggregate="xxx" attribute
  4. The <order> element uses an alias="xxx" attribute to refer to the previously aliased <attribute> instead of the normal attribute="xxx"

Run this query and you’ll get a list of different firstnames along with the number of active contacts that have that name. The list will be sorted with the most popular name at the top.

Available Aggregates

There are only a few aggregate functions you can apply:

  • count
  • countcolumn
  • min
  • max
  • sum
  • avg

If you’re familiar with SQL these will mostly be familiar as the ways of retrieving the number of records, and the lowest, highest, total and average values in a column.

The one that’s not so obvious is the distinction between count and countcolumn. The only reference I could find for this was a blog dating back to CRM 4.0. In brief, count will give the total number of records, while countcolumn gives the number of records that have a value in the attribute you apply it to. Note that it doesn’t matter which column you apply count to, you’ll get the same total number of records either way.

Distinct Counts

The countcolumn aggregate also supports a distinct attribute. This lets you get the number of different values in that column, e.g.

<fetch aggregate="true">
  <entity name="contact">
    <attribute name="address1_country" alias="country" groupby="true" />
    <attribute name="address1_city" alias="cities" aggregate="countcolumn" distinct="true" />
    <filter>
      <condition attribute="statecode" operator="eq" value="0" />
    </filter>
    <order alias="cities" descending="true" />
  </entity>
</fetch>

This will get a list of countries and the number of different cities in each country.

What “different” means in this context depends on the collation settings for your organisation. For the English installations that I have access to this defaults to ignoring case and accents, so “M√ľnich” and “MUNICH” would be counted as being the same.

Filtering

You can use <filter> specifications in aggregate queries in the same way as normal. See part 3 of this series for more details about filters. The filter will be applied before the aggregate. There is no way to filter the results of the aggregation, e.g. show results where the count is greater than 1, as helpful as that would be.

Grouping

You can run an aggregate query without including any groupby="true" attributes. In this case, all the records that match your filter will be aggregated into a single result record. This is useful for finding the total number of records or highest/lowest values across all your records, e.g.

<fetch aggregate="true">
  <entity name="opportunity">
    <attribute name="opportunityid" alias="count" aggregate="count" />
    <attribute name="estimatedvalue" alias="min" aggregate="min" />
    <attribute name="estimatedvalue" alias="max" aggregate="max" />
    <filter>
      <condition attribute="statecode" operator="eq" value="0" />
    </filter>
  </entity>
</fetch>

This query gives the number of open opportunities, along with the minimum and maximum values of those opportunities.

However, it is often more useful to group the records to see the differences in the summary values in different groups. For example, you can see the pipelines of different sales people by grouping opportunities by their owners and getting the total value of the opportunities in each group:

<fetch aggregate="true">
  <entity name="opportunity">
    <link-entity name="systemuser" to="ownerid" from="systemuserid" link-type="inner">
      <attribute name="fullname" alias="fullname" groupby="true" />
      <order alias="fullname" />
    </link-entity>
    <attribute name="estimatedvalue" alias="value" aggregate="sum" />
  </entity>
</fetch>

This is the first example we’ve looked at that uses a <link-entity>, but they work in aggregate queries in exactly the same way as in regular queries.

Ordering

It’s very useful to sort the results of aggregate queries. For producing reports such as the opportunity pipeline query earlier, the results could be sorted by user to make it easy to find a particular individual, or by the pipeline value to highlight the best or worst achievers.

With the first example to find the number of contacts with each first name, sorting this to find the most and least popular names can often highlight data quality issues. Try it on your data and see what you find!

Limitations

There are a few limitations to aggregate queries you should be aware of

  1. Maximum of 50,000 records. If you try to run an aggregate query that uses more than 50,000 records you’ll get the error AggregateQueryRecordLimit exceeded
  2. No paging support. As for normal queries you can get back up to 5,000 records from your query, but if you have more groups than that there is no way to move on to the next page.

Alternatives

To get around these limitations you’ll need to break up your query with more restrictive filters and combine the results. In SQL 4 CDS I worked around these limits by retrieving all the individual records and applying the aggregation in code when necessary.

For one special case of getting the total number of records in an entity without any filters, i.e.

<fetch aggregate="true">
  <entity name="contact">
    <attribute name="contactid" alias="count" aggregate="count" />
  </entity>
</fetch>

there is a new RetrieveTotalRecordCountRequest message available since version 9 to get this more efficiently and bypassing the 50,000 record limit. SQL 4 CDS will use this optimisation where possible.

SQL Equivalents

These options are equivalent to the GROUP BY clause and aggregate functions in SQL.

The count aggregate is equivalent to count(*) in SQL, and countcolumn is equivalent to count(attributename).

The SQL HAVING clause doesn’t have a FetchXML equivalent – this is one improvement to FetchXML I’d love to see.

RetrieveTotalRecordCountRequest is equivalent to SELECT count(*) FROM entity, with no option to apply a filter. I have a feeling it may actually map to the sp_spaceused SQL Server stored procedure under the hood.

Join the conversation

2 Comments

Leave a comment

Your e-mail address will not be published. Required fields are marked *