On Monday 7th Jan (first normal business day after the public holiday) we had extraordinary constant traffic (double the normal we test for). One of our components (Database Connector) did not cope with this load in one of our APIs.
We were not able to bring the service back to life by scaling it up with a lot more servers or restarting them.
The final solution was to use another Database Connector for handling database-connections of the component that did not scale properly. This development was done in parallel with other activities to bring the service up again. The development took about two hours to complete, deploy and test in the test environment before it was brought to our production environment.
No changes were done to code or the environment between mid-December and Monday 7th of Jan. It was just a matter of double in load compared to previous peaks.
We will test with higher load going forward and continue our efforts in scaling the service.