Hi there. At around 11:00 UTC, our main managed Postgres (14.9) instance was restarted with no apparent reason after spiking to almost 100% cpu usage. This is one of the quieter times of the day for us traffic-wise. The cpu spike was not reflected on the database's replica either.
The only relevant entry in the logs seems to be:
ALERT 2024-02-09T10:11:26.176283Z 2024-02-09 10:11:15.368 UTC [365597]: [1-1] db=[...],user=dbdatastream FATAL: the database system is shutting down
We are using the DataStream service to exfiltrate data from the instance. Can DataStream cause a database instance restart?
Cloud SQL instances, including PostgreSQL, can experience restarts due to various reasons, ranging from system maintenance, resource limits being hit, to issues with connected services. The DataStream service, which you're using to exfiltrate data from your Cloud SQL instance, is designed to continuously capture data changes and stream them to a destination such as GCS, BigQuery, or another Cloud SQL database. While DataStream itself is intended to operate without causing disruptions to the source database, certain conditions or misconfigurations could potentially lead to issues.
Given the scenario you described, where the CPU usage spiked to nearly 100% before the instance restarted and the log indicated a shutdown, there are a few possibilities to consider:
To troubleshoot and prevent such occurrences:
While DataStream is not typically known to cause database restarts directly, the increased load or misconfigurations could lead to situations requiring a restart. It's crucial to ensure that both your Cloud SQL instance and DataStream configurations are aligned with your operational needs and resource capacities.
The automatic restart happened some more times due to a DDoS attack and this last time it won't restart with the error:
2024-02-09 18:45:01.275 UTC [97382]: [1-1] db= [...],user=dbuser PANIC: stuck spinlock detected at WaitBufHdrUnlocked, third_party/postgres/servers/postgres14gce/src/backend/storage/buffer/bufmgr.c:4645
Can someone help here? There's nothing we can do apparently.
The "stuck spinlock" error is indicative of a concurrency issue within PostgreSQL, where multiple processes are contending for the same resource, leading to a deadlock. This is a rare condition and often suggests a bug in PostgreSQL itself, a system-level issue, or extreme conditions such as those caused by DDoS attacks.
Given the complexity and potential severity of this issue, direct intervention from Google Cloud Support is the most effective path forward. They can provide specific guidance, potential workarounds, or fixes that are not available in public documentation.
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |