Multiple Spot Instances Shutting Down affecting our Service.

Today, from around 10 AM to 10 PM GMT+07:00, we experienced an issue where a large number of our Spot instances were terminating at the same time. Normally, we see 2-3 Spot instance shutdowns per day, but during this period, we saw over 10 instances shut down in the morning alone, and close to 30 by 4 PM.

We have around 28 nodes total. Having so many nodes go down at once put a strain on our system and caused disruptions.

Has anyone else experienced this type of issue with multiple Spot instances shutting down concurrently and impacting their workloads? We are running in the taiwan asia-east1-a zone. Are there any best practices or preventative measures that can be taken to avoid or mitigate this type of situation?

Thank you.

1 1 156
1 REPLY 1

Hi Rawit_S,

Welcome to Google Cloud Community.

There are various reasons on why your VMs might be terminated, as Google might need the resources back for a higher priority task or due to fluctuations in demand. This might typically happen during peak hours. "Rarely, a VM might fail due to an unexpected outage, hardware error, or another system issue."

As for best practices for your Spot VMs instances you may refer to this documentation.

I hope this information is helpful.

If you need further assistance, you can always file a ticket on our support team.

Top Labels in this Space