Skip to main content

All Questions

0 votes
0 answers
15 views

Disable auto scaling for templated jobs

In Dataflow, you can run jobs without autoscaling. This is typically achieved by setting a pipeline_option called autoscaling_algorithm to NONE. Attempting the equivalent on Templated Dataflow Jobs ...
user30237673's user avatar
0 votes
0 answers
33 views

How to prevent deletions from source (GCP CloudSQL MySQL) reflecting in GCP BigQuery using Datastream?

Description: We are currently using Google Cloud Datastream to replicate data from a CloudSQL (MySQL) instance into BigQuery in near real-time. The replication works perfectly for insert and update ...
Ashwini Kumar's user avatar
0 votes
1 answer
52 views

How can we optimize the Cloud Data Flow Job to minimize the startup time?

Apache Beam with Cloud Data Flow executors takes 5 minutes or more to cold start the Data Pipeline ? Is there any way to minimize the start up time ? Tried optimizing the Dockerfile but still slower. ...
Farrukh Naveed Anjum's user avatar
0 votes
0 answers
33 views

Apache beam dataflow logging formatter

I have some pipeline, it use some of my modules there I am using a logging etc. The problem is with logging, i use log level overrides to setup log level but what with formatter? I am using loggers ...
Dawid 's user avatar
  • 11
0 votes
2 answers
62 views

Using Google Cloud Dataflow with a Custom Service Account, Pub/Sub, and Least Privilege

I want to run Dataflow jobs with a per job dedicated custom service account. Upon creation, the Dataflow job wants to create a new Pub/Sub subscription, on deployment, to use as the watermark tracking ...
Joseph Lust's user avatar
0 votes
0 answers
78 views

How to load common dependencies into dataflow?

Our team has a set of data pipelines built as DAGs triggered on Composer (Airflow) that run Beam (Dataflow) jobs. Across these dataflow pipelines, there are a set of common utilities engineers need to ...
Espresso Engineer's user avatar
0 votes
0 answers
58 views

High frequency authentication requests between GCP Dataflow and Managed Kafka

I'm experiencing an issue with frequent authentication requests between Google Cloud Dataflow and GCP Managed Kafka. Each Dataflow streaming job is making approximately 150 authentication requests per ...
louis.dev's user avatar
0 votes
0 answers
43 views

Best POM for Google Spanner and Dataflow

I've been getting bizarre ClassNotFound errors in this io.grpc library, strange timeout errors, etc. Plus I saw on an old document there's a need to enforce the minimum version. I've tried building a ...
Woodsman's user avatar
  • 1,189
0 votes
0 answers
31 views

Missing _metadata_uuid and _metadata_lsn in BigQuery Dataset for Datastream Pipeline

I could really use some assistance! I've set up a pipeline to copy data from my managed SQL (PostgreSQL) on GCP to BigQuery. I followed these guides: Google Cloud Datastream Documentation Above guide ...
sargis's user avatar
  • 1
0 votes
0 answers
41 views

How to Run a Workflow Multiple Times with Different Inputs (Using Apache Beam or Native Workflow Features)?

I'm working on a workflow using Google Cloud Workflow, and I want to run the same workflow multiple times with different input values. I’ve been researching this, and I found that Apache Beam can be ...
rabin adeikari's user avatar
0 votes
0 answers
31 views

NameError: name 'MyschemaName' is not defined [while running 'Attaching the schema-ptransform-67'] while deploying apache beam pipeline in Dataflow

How does one effectively define a schema for all the workers in dataflow to have access to the schema defined .Below is section of my code failing since the schema name cannot be found. I have ...
oyugi.collins's user avatar
0 votes
1 answer
126 views

Beam Dataflow Reshuffle Failing

We have a batch GCP Dataflow job that is failing on a Reshuffle() step with the following error: ValueError: Error decoding input stream with coder WindowedValueCoder[TupleCoder[LengthPrefixCoder[...
Ben Delany's user avatar
0 votes
0 answers
104 views

Can't set service account properly on dataflow flex-template run

I want to overwrite the default SA used by the dataflow worker e.g [email protected] that get created and used by default if you don't specify anything. ButI want my own ...
Jh123's user avatar
  • 93
1 vote
1 answer
108 views

How to write data from apache beam using gcp dataflow to bigquery table?

I am trying to run below program using Apache Beam on GCP Dataflow. The program should read CSV file, do some transformation like sum, max and join. Then write to BQ table. Till step 4 I am getting ...
Santanu Ghosh's user avatar
0 votes
1 answer
187 views

"TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union" while Writing to BigQuery table on a Dataflow pipeline using Apache Beam

I am working on a Dataflow pipeline in Pthon using apache-beam==2.57.0 and google-cloud-bigquery==3.26.0 to read data from a Cloud SQL database and write it to a BigQuery table. The script runs into ...
Sharanya J's user avatar

15 30 50 per page
1
2 3 4 5
114