Datastream on google : Can It Synch data From Self-hosted PostgreSQL(VM) on Azure to Bigquery?

Everyone, I am new to Datastream which is a data service on the Google Cloud Platform.

My requirements have 2 things as below.
1. I want to stream data by using CDC from Self-hosted PostgreSQL on Azure( Public IP) to Biqquery using DataStream, is it possible?

2. If it works as the first question, what are some configurations and settings that I have to prepare especially on Azure Cloud apart from PublicIP on the database?

I tried lab on Google and did some research about it as the following link.

https://cloud.google.com/datastream/docs/sources-postgresql#versionsforpostgresqlsourcedb These are all sources to be allowed to stream or replicate data to BigQuery/Cloud Storage one of them is Self-hosted PostgreSQL, aside from VM on Google, Does it include VM on Azure, AWS, or even On-Prems?

  • Self-hosted PostgreSQL
  • Cloud SQL for PostgreSQL
  • AlloyDB for PostgreSQL
  • Amazon RDS for PostgreSQL
  • Amazon Aurora PostgreSQL
Solved Solved
0 2 660
2 ACCEPTED SOLUTIONS

Yes, Datastream can sync data from self-hosted PostgreSQL on Azure to BigQuery. You can use Datastream's Change Data Capture (CDC) functionality to stream changes from your PostgreSQL database to BigQuery in near real time.

Required configuration and settings on Azure:

  • Your PostgreSQL database must have a public IP address.
  • You must allow incoming connections from Datastream's public IP addresses.
  • Ensure that the PostgreSQL instance has the pgoutput plugin enabled.
  • Set up appropriate user permissions in PostgreSQL to allow Datastream to read the change logs and create a logical replication sl*t for Datastream.

To configure Datastream:

  1. Create a Datastream service account and grant it the necessary permissions to access your PostgreSQL database.
  2. Create a Datastream stream that specifies your PostgreSQL database as the source and BigQuery as the destination.
  3. Start the Datastream stream.

Datastream will begin streaming changes from your PostgreSQL database to BigQuery. You can view the data in BigQuery using the BigQuery web UI or the BigQuery API.

Additional notes:

  • Datastream is a serverless service, so you do not need to manage any infrastructure.
  • Datastream is highly scalable, so you can stream data from even the largest databases.
  • Datastream is secure, so your data is encrypted in transit and at rest.

Does Self-hosted PostgreSQL include VM on Azure, AWS, or even On-Prems?

Yes, self-hosted PostgreSQL includes VM on Azure, AWS, and even on-premises. Any PostgreSQL database that is not hosted on a Google Cloud service, such as Cloud SQL for PostgreSQL, is considered self-hosted.

References:

View solution in original post

Thank you for advice.

It is comprehensive advice. 

View solution in original post

2 REPLIES 2

Yes, Datastream can sync data from self-hosted PostgreSQL on Azure to BigQuery. You can use Datastream's Change Data Capture (CDC) functionality to stream changes from your PostgreSQL database to BigQuery in near real time.

Required configuration and settings on Azure:

  • Your PostgreSQL database must have a public IP address.
  • You must allow incoming connections from Datastream's public IP addresses.
  • Ensure that the PostgreSQL instance has the pgoutput plugin enabled.
  • Set up appropriate user permissions in PostgreSQL to allow Datastream to read the change logs and create a logical replication sl*t for Datastream.

To configure Datastream:

  1. Create a Datastream service account and grant it the necessary permissions to access your PostgreSQL database.
  2. Create a Datastream stream that specifies your PostgreSQL database as the source and BigQuery as the destination.
  3. Start the Datastream stream.

Datastream will begin streaming changes from your PostgreSQL database to BigQuery. You can view the data in BigQuery using the BigQuery web UI or the BigQuery API.

Additional notes:

  • Datastream is a serverless service, so you do not need to manage any infrastructure.
  • Datastream is highly scalable, so you can stream data from even the largest databases.
  • Datastream is secure, so your data is encrypted in transit and at rest.

Does Self-hosted PostgreSQL include VM on Azure, AWS, or even On-Prems?

Yes, self-hosted PostgreSQL includes VM on Azure, AWS, and even on-premises. Any PostgreSQL database that is not hosted on a Google Cloud service, such as Cloud SQL for PostgreSQL, is considered self-hosted.

References:

Thank you for advice.

It is comprehensive advice.