CloudSQL cannot shrink disk

jokester · 12-14-2023 09:26 AM

Hi there,

I'm the administrator of a 7TiB CloudSQL PostGreSQL database, which contains only 3TiB of relevant data. The automatic storage increase is enabled, and is an absolute requirement for our business.

However trying to export some data, a team member added a replication process months ago. This operation could never end due to the tremendous data throughput:

* a huge portion of our data changes frequently (4 times a day)

* the schema also changes (new partitions for new clients every day)

Even worse, the PostGreSQL wall for this subscription ended up to represent 4TiB in a few days, and we had to stop it. Back to normal, the extra data space was released, but the reserved space however, still lives there. Paying 7TiB instead of 3TiB is really expensive, and as you can guess, does no add any value to our business.

That said, we need to migrate our database. Solutions:

1. DMS is out of consideration, it works on a subscription mechanism, and as expected with our testing: it does clog the pg wall as well.

2. Hot Backup & Recovery is doable, and it is fast enough to minimize the downtime to an acceptable range. But this does not free the extra disk space unfortunately.

3. pg_dump is estimated to take arround 14h, which is not acceptable.

Hence my question: how can we shrink a Cloud SQL database disk? Can we perform cold defragmenting of a backup? Do you have any clue?

Thank you,

ms4446

Unfortunately, directly shrinking a Cloud SQL database disk is not possible. While storage size can be increased, decreasing it is challenging due to the inherent limitations of the underlying storage system. However, there are several approaches you can take to optimize your data and migrate to a more efficient storage size:

Modified Use of Database Migration Service (DMS): Although DMS faced challenges with the PostgreSQL wall in your initial attempt, a staged migration approach might be more effective. This involves using DMS to gradually migrate specific table data in batches to a new Cloud SQL instance with a smaller disk size. However, given the frequent schema changes and high data modification rate in your database, the success of this approach would depend heavily on these specific dynamics.
Hot Backup & Recovery with Additional Optimization: After performing a hot backup and restoring it to a new instance, consider using the VACUUM FULL command in PostgreSQL. This command can help reclaim unused space by defragmenting the database. Be aware that VACUUM FULL can be time-consuming and requires significant downtime, as it locks the tables during the process.
Exploring Alternative Backup Tools: Tools like pg_basebackup or third-party solutions might offer faster backup and restore capabilities compared to pg_dump. While these tools can potentially reduce the downtime, the overall time required will still largely depend on the database size and network bandwidth. Additionally, these methods may not directly address the disk size reduction.
Cold Defragmentation Approach: Cold defragmentation using external tools like pg_repack involves exporting the data, defragmenting it offline, and then uploading it to a new Cloud SQL instance with a smaller disk size. This process is complex and requires a deep understanding of PostgreSQL. It's effectiveness in reducing disk size also varies based on the database's specific characteristics.
Creating a New Instance with Desired Disk Size: One effective way to reduce disk size is to create a new Cloud SQL instance with the desired smaller disk size and migrate your data to this new instance. This method involves backing up your data and restoring it to the new instance, which can help in achieving the disk size reduction you're aiming for.
Consulting Google Cloud Support: For tailored solutions or additional guidance, reaching out to Google Cloud Support is advisable. They may offer insights or custom options that are particularly suited to your scenario.

jokester

Thank you for your reply.

Modified Use of Database Migration Service (DMS):

Our database get many new partitions a day. The DMS tools was not designed to address such use case, it will fail every time the schema changes. And I cannot accept 18 hours of downtime.

Hot Backup & Recovery with Additional Optimization:

"VACUUM FULL" command requires to be superuser to be performed:

WARNING: skipping "pg_toast_2964_index" --- only superuser can vacuum it

WARNING: skipping "pg_toast_1213_index" --- only superuser can vacuum it

etc... This is not a good advise. At least not when a PostGreSQL instance is cloud vendor locked.

I did another test: creating a second database within the same instance. Then I could copy the data from databases with a Foreign Data Wrapper. But this strategy is not working, the disk space is never freed (storage quota still at 7.35 TiB).

Cold Defragmentation Approach:

This is something I would really like to do, but Google Cloud Platform does not put good will into it, it is not currently possible to extract the backup Disk to a Compute Disk image. nor a bucket file. Otherwise we could more easily defragment the disk in a Compute Engine instance and reimport back-in... Or did I missed something?

Isn't it also a good use case for Disaster Recovery?

Creating a New Instance with Desired Disk Size:

Our team is currently trying this option, and measure how long this takes.

Best,

jokester

Creating a New Instance with Desired Disk Size:

Estimated time is about 55h to migrate, for either of these 2 solutions:

* a Cloud SQL flat file export to a bucket

* a Foreign Data Wrapper data copy

We cannot afford this downtime.

Consulting Google Cloud Support

Last option, we are currently going that way.

Zac7

@jokester did you manage to have your Cloud SQL instance down sized by contacting cloud support and how did that work out for you? I am currently faced with the exact same problem.

jokester

Yes we did solved the issue. You have to reach GCP support to allowlist
your project to disk shrink API. Once enabled, we did performed tests on a
clone db : 9h-10h were estimated.
During shrink time, the instance is unavailable.

Many tries were required to determine what was our expected shrink size
(ex: --storage-size=4000GB), and could manage to do it in 4 hours.

We also used this kind of restoration in another project :
https://cloud.google.com/sql/docs/mysql/backup-recovery/restoring#projectid

Good luck, it is doable but really tricky to avoid downtime.