I wrote a post a while ago looking at the performance of the T-SQL endpoint compared to using FetchXML. Now I’ve got a better understanding of how it works, I wanted to check if the same performance considerations I’ve previously encountered when working with the filtered views on-premise still apply.
For my testing I have set up an environment with 50,000 account records and timed running a number of queries. I timed each query within the SQL Server by running:
DECLARE @starttime datetime = CURRENT_TIMESTAMP -- Query here DECLARE @endtime datetime = CURRENT_TIMESTAMP SELECT DATEDIFF(ms, @starttime, @endtime)
This gave me the results of the query and the time it took to execute, in milliseconds.
My first test was the simple query:
SELECT * FROM account
With 50,000 accounts, and accounts having a large number of columns by default, I expected this to take a while. I didn’t, however, expect to receive the error:
The request channel timed out while waiting for a reply after 00:02:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout.
This is the standard error generated by a timeout from the SDK. While my earlier testing indicated there isn’t a record count limit imposed by the T-SQL endpoint, there is a 2 minute execution time limit, and this appears to have exceeded it.
SELECT * FROM account is a very lazy way to write a query – you should only ever select as many columns as you actually need. Next up I tried:
SELECT name FROM account
This ran in a much healthier 117 milliseconds, although the result took a total of 4 seconds to transfer to my PC. Phew!
One major performance problem I’ve encountered previously with the on-premise filtered views is around date/time columns. These are converted by the view to the time zone of the user running the query, which makes things nice and consistent when you are displaying the data in a report. However, the underlying data is always stored in UTC, and the conversion at runtime to the user’s local timezone comes at a cost. It’s implemented via a function, which therefore needs to be executed separately for each date/time value being considered by the query.
To test the effect of this I ran:
SELECT DATEPART(day, createdon), count(*) FROM account GROUP BY DATEPART(day, createdon)
This took 21,013 milliseconds – almost 200 times longer than selecting a simple string field!
Note that this performance hit is incurred whenever a date/time value is required for the query to be processed, not just when they are present in the results returned to the user. For example:
SELECT name FROM account WHERE createdon >= '2020-01-01'
would need to calculate the user-local version of the
createdon date for every account in order to determine which records to return.
For each date/time column there is a corresponding UTC column, e.g.
createdonutc. This maps through directly to the column in the underlying table without any further calculation. If you can use the UTC version in your application, this has a major impact on performance. Modifying the example from above:
SELECT DATEPART(day, createdonutc), count(*) FROM account GROUP BY DATEPART(day, createdonutc)
This only took 110 milliseconds, compared to the 21,013 milliseconds the user-local version took above – almost a 200x improvement! Definitely one to watch out for! If you are filtering by a date/time field using a user-local timezone value, consider converting that value to UTC and filtering on the corresponding UTC field instead for an easy performance boost. If you’re displaying date/time values back to the user you still need to convert them to the correct time zone at some point of course, but you may well find this more efficient to do in your own code than using the conversion supplied by SQL.
A common pattern for reports is to show records that have been created or modified in the last day/week/month. As we’ve seen above, using a simple query like:
SELECT name, createdon FROM account WHERE createdon >= DATEADD(d, -1, GETDATE()) ORDER BY createdon
will give some major performance problems. Tweaking this to:
SELECT name, createdon FROM account WHERE createdonutc >= DATEADD(d, -1, GETUTCDATE()) ORDER BY createdonutc
will give the same results (possibly with some minor edge case differences around daylight saving time changes) much, much faster.
Although this second query still includes the local timezone version of the createdon date, all the filtering and sorting is done on the UTC version. This means that the timezone conversion code only needs to run on the matches records, not the entire table.
Given the performance hit on date/time columns, I thought I’d also check the difference betwen the integer and label versions of an optionset column. Take this query for example:
SELECT industrycode, count(*) FROM account GROUP BY industrycode
This produces the underlying value for each industry code, and runs in 104 milliseconds.
Now to get the corresponding label for each industry, in the current user’s language, this needs to change to:
SELECT industrycodename, count(*) FROM account GROUP BY industrycodename
This version runs in 170 milliseconds. Noticeably slower than the previous version, but nowhere near the performance hit incurred by timezone conversions.