Recovering to healthy state .... AppSheet service outage

The system is back to working … some of the servers are ramping up and we expect to be back to full health in 10 mins or so.


Right now, our service is experiencing an outage. It is related to a problem in the underlying database infrastructure.

It is being actively worked and we’ll update with status regularly.

5 Likes

Thank you for the update Praveen!

Thanks for the update - do we know Root Cause ?

1 Like

Hi @Peter_Yates , the immediate reason was a config change in the underlying database system used by AppSheet to maintain all app definitions, user accounts, etc. This change was made by the Google Cloud Platform database team as part of regular database maintenance, but it had some unexpected effect that caused the database server to stop accepting new connections from the AppSheet servers. Existing connections continued to work for a while. So starting around 2PM PST, our servers gradually started to have fewer and fewer database connections to use, leading to longer latencies (more requests waiting for the same shrinking pool of database connections).
This was not noticeable in our metrics at the beginning but it got very noticeable around 4PM PST. After that tracking down the cause and reverting it took a while, especially because this was a change made outside our team, so it is not normally where we’d look for a root cause. And requires more coordination to get the change fixed and reverted.

So that is what we know now. The actual root cause and post mortem (why did they make that change, how can we prevent something like this happening in the future) is still ahead of us.

5 Likes

Praveen,

Thank you very much for a detailed reply for the know circumstances. Its unfortunately that they did this without the APPSHEET teams knowledge.

Hopefully, this will not occur in the future.

We appreciate the prompt responses. Thank you to the team.

1 Like

Hi @Peter_Yates , in a way, this is the plus and the minus of building on cloud hosted services. Just like you build on AppSheet, we deploy every day, we try not to break you but sometimes we do (too often at the moment but it will get better this year), the same is true for the services we build on. And we depend on many services starting from sign in services to cloud file systems to email services, PDF converters, etc.

It is generally a very good thing that Google Cloud Platform updates its services regularly. The virtual machines and database servers need regular upgrades, security patches, etc. We really shouldn’t need to know about them or care about them. In fact, this has not happened in the last year while we have run the database on GCP. So this is an anomalous occurrence. We have to understand why it happened though.

6 Likes

Praveen,

Thank you and appreciate the professional response. Well done. :slight_smile:

2 Likes