I’ve been studying lately for the Azure Data Engineer Associate certification, and the main point of one of the exams (DP-201 – Designing an Azure Data Solution) is all about picking the data platform(s) that best meet the requirements.
Having used SQL Server for many years it’s a simple trap to fall into to just use it for everything. It’s a very powerful tool for running complex queries and can cope with just about anything you want to throw at it, but it’s not always the best choice.
Some of the key things to consider:
- Scale of data
- Integration with other tools
- Query complexity
- Required performance
With enough investment it’s normally possible to get SQL Server to handle all the technical requirements, but the cost can quickly spiral.
A recent example where I was able to put this to the test was AutotuneWeb. This is an Azure-based system for running the open source Autotune application, which analyses data from people with diabetes and suggests changes to their insulin dosages.
I’ve written about AutotuneWeb and how it consumes Azure services previously, but at that point I hadn’t explored the different data services enough and was using an Azure SQL Database to store a lot of the required data.
As AutotuneWeb became more widely used, two things happened. Firstly I would see intermittent performance problems. As a hobby project with no money behind it I had the database in the Basic tier to save money, but that came at the cost of poor performance when many people were using it at the same time. To try to solve these problems I experimented with scaling the database to higher performance levels. The serverless options worked pretty well for me, as it allowed me to run the database at a low price point normally but with auto-scaling when demand was higher.
Regardless of what option I picked though, the second impact was unavoidably an increase in cost. At one point the total running cost was around £130 per month, the majority of which was SQL. It was time to look for a different solution.
Looking at the queries that were running, none were very complex. Rather, they were all simple key lookups. This meant the data could be easily moved to a simple key-value-pair data store with minimal impact on the application.
There were two main options at this point – Azure Storage Tables or Cosmos DB. Cosmos DB would give better guarantees of performance and availability, and give the option to continue using a SQL-like syntax. However, the primary reason for moving away from SQL was cost, and Azure Storage is easily the cheapest option. As a hobby project I wasn’t too worried about performance SLAs, so I went for Azure Storage.
The monthly cost for running AutotuneWeb is now around £40 and with more consistent performance than with SQL. It takes a bit more thinking about how to structure the data with partitioning, but now the data storage is practically free.