I was looking at Bigtable Garbage collection based on TTL.
Bigtable GC is on Column Family level : https://cloud.google.com/bigtable/docs/garbage-collection#overview
A garbage collection policy is a set of rules you create that state when data in a specific column family is no longer needed.
Expiring values (age-based): https://cloud.google.com/bigtable/docs/garbage-collection#age-based
Is a row automatically deleted if all the values are garbage-collected?
>> If all cells are garbage collected for a row key, then the row is indeed deleted. Note that garbage collection is asynchronous, and it can take up to a week for data to be totally removed.
When Data is deleted :
Question:
I am trying to find out if there is a way for an application to listen to (or Calledback) on which rowKey(s) are being deleted by Bigtable async process... Or is there a way to setup a pub-sub or kafka topic where the deleted rowKeys will be posted.
I need this information to sync-up some of the other application data we keep in Elastic Search based on these rowKeys
Solved! Go to Solution.
Hi @viveksharma0wmt,
Welcome to Google Cloud Community!
I would suggest using Google Cloud PubSub for this as you can create a schema and view the change logs.
Please check this documentation on Stream changes to Pub/Sub using optional Cloud Function trigger as this contains the steps and sample schema that you can follow that could be useful to your setup.
Hope this helps.
Hi @viveksharma0wmt,
Welcome to Google Cloud Community!
I would suggest using Google Cloud PubSub for this as you can create a schema and view the change logs.
Please check this documentation on Stream changes to Pub/Sub using optional Cloud Function trigger as this contains the steps and sample schema that you can follow that could be useful to your setup.
Hope this helps.
Thanks @robertcarlos for you reply.
Looked at the document you mentioned, this looks good.
Looking at "Configure change stream" doc it says :
A change stream tracks data changes made by calls to the Bigtable Data API, during garbage collection, and when ranges of rows are dropped. Changes resulting from schema changes, like deletions from dropping a column family, are not captured in a change stream
Few follow up questions :
and I see in that document the possible values for mod type enum are : "name": "ModType", "type": "enum", "symbols": ["SET_CELL", "DELETE_FAMILY", "DELETE_CELLS", "UNKNOWN"]}
not immediately clear what the type will be when GC async deletes the entire rowKey when all its cells have been deleted.
One other question I had is that is there a way to use Kafka topic instead of pub-sub for this change events stream from bigtable ?