Since moving to a new hosting provider our service has suffered from several service disruptions and degraded performance periods. We are very sorry for the inconvenience this has caused to our customers.
The issues started on 15th of November, one week after the migration and they still continue as spikes in our performance, causing occasional slowness for end-users.
One of the goals of new hosting platform was to upgrade SQL Server version from 2014 to 2017. Extensive testing was done with this database version. For example our development environment was running with this version over a year without issues. However under production load we discovered that some critical complex database queries didn't work well when thousands of users used the system simultaneously and the SQL Optimizer started to make incorrect decisions.
This issue is called "Execution plan regression" and requires optimization of problematic queries. We reverted our databases to the SQL server version 2014 but still several queries had to be optimized and the Optimizer had to be told to behave differently.
A second issue was with SQL Server’s temporary tables and objects, which are used to cache and collect data from the queries. SQL Managed Instance didn’t properly handle the queries using temporary objects which caused all the queries to queue up and hang there.
We had to change our queries not to use SQL server’s temporary tables and objects. While we were fixing the problematic queries, a bug regarding the temporary objects was found in SQL Managed Instance and is the root cause for these issues.
Since the 15th of November we have had several production releases, where most of the problematic queries have been fixed. More fixes are coming to production during this week and we are constantly monitoring our service and fixing the poorly behaving queries as we see them occurring. Our supplier has promised a fix for the SQL Managed Instance bug during Q1 2020.