Which service account is used for Datastream which replicates PostgreSQL to BigQuery?

Hi,

I have multiple streams configured from PostgreSQL to BigQuery. I want to determine the load these streams are generating for BigQuery and need to know which Service Accounts (SAs) they are using. I checked the BQ destination profile in the GCP Console, but could only find dataset information. I also searched through other configurations and used some commands in the gcloud CLI, but couldn't find the specific Service Accounts being used by GCP Datastream to work with BigQuery.

Could you please advise me on where to find this information?

Regards,

0 1 347
1 REPLY 1

Unfortunately, Google Cloud Datastream doesn't explicitly display the specific service account used for its integration with BigQuery in a straightforward manner. However, there are several methods you can use to deduce the service accounts involved:

  1. Using Cloud Audit Logs:

    • Navigate to the "Logging" section in the Cloud Console.
    • Select "Cloud Audit Logs" and filter for logs from datastream.googleapis.com.
    • Look for entries related to operations like "CreateWriteStream" or "StartBackfillJob".
    • The service account email should be listed under the "authenticationInfo" field in these logs.
  2. Using BigQuery Jobs:

    • Go to the BigQuery section in the Cloud Console and access the datasets associated with your Datastream streams.
    • Check the "Job History" for recent jobs that are related to your streams.
    • The job details should display the service account used, which can be found under "Job Information". Note that this shows the service accounts interacting with BigQuery, not necessarily the ones used by Datastream for data transfer.
  3. Using the gcloud CLI:

    • Execute the gcloud datastream streams describe [STREAM_NAME] command.
    • In the output, look for the "destinationConfig" section.
    • The "authenticationMethod" field may provide details. If it's set to "serviceAccountFile", the "credentialsPath" field will show the path to the service account key file. However, this information might not always be available or explicitly list the service account email or ID.
  4. Using the Datastream API:

    • Utilize the "projects.locations.streams.get" API method.
    • Provide the necessary project, location, and stream name parameters.
    • In the API response, examine the "destinationConfig" section for "authenticationMethod" and "credentialsPath" fields. These fields can offer insights but might not always explicitly list the service account.
  5. Checking IAM Roles:

    • Review the IAM roles assigned within your project, particularly those associated with Datastream (like roles related to datastream.serviceAgent).
    • This can provide clues about the service accounts Datastream might use, though it may not pinpoint the exact account for each stream.
  6. Additional Considerations:

    • Datastream may use a pool of service accounts for its operations, and the specific account used for each stream can vary based on different factors.