All Questions
Tagged with google-cloud-platform google-cloud-dataflow
1,704 questions
0
votes
0
answers
15
views
Disable auto scaling for templated jobs
In Dataflow, you can run jobs without autoscaling. This is typically achieved by setting a pipeline_option called autoscaling_algorithm to NONE. Attempting the equivalent on Templated Dataflow Jobs ...
0
votes
0
answers
33
views
How to prevent deletions from source (GCP CloudSQL MySQL) reflecting in GCP BigQuery using Datastream?
Description:
We are currently using Google Cloud Datastream to replicate data from a CloudSQL (MySQL) instance into BigQuery in near real-time. The replication works perfectly for insert and update ...
0
votes
1
answer
52
views
How can we optimize the Cloud Data Flow Job to minimize the startup time?
Apache Beam with Cloud Data Flow executors takes 5 minutes or more to cold start the Data Pipeline ? Is there any way to minimize the start up time ?
Tried optimizing the Dockerfile but still slower.
...
0
votes
0
answers
33
views
Apache beam dataflow logging formatter
I have some pipeline, it use some of my modules there I am using a logging etc. The problem is with logging, i use log level overrides to setup log level but what with formatter? I am using loggers ...
0
votes
2
answers
62
views
Using Google Cloud Dataflow with a Custom Service Account, Pub/Sub, and Least Privilege
I want to run Dataflow jobs with a per job dedicated custom service account.
Upon creation, the Dataflow job wants to create a new Pub/Sub subscription, on deployment, to use as the watermark tracking ...
0
votes
0
answers
78
views
How to load common dependencies into dataflow?
Our team has a set of data pipelines built as DAGs triggered on Composer (Airflow) that run Beam (Dataflow) jobs.
Across these dataflow pipelines, there are a set of common utilities engineers need to ...
0
votes
0
answers
58
views
High frequency authentication requests between GCP Dataflow and Managed Kafka
I'm experiencing an issue with frequent authentication requests between Google Cloud Dataflow and GCP Managed Kafka. Each Dataflow streaming job is making approximately 150 authentication requests per ...
0
votes
0
answers
43
views
Best POM for Google Spanner and Dataflow
I've been getting bizarre ClassNotFound errors in this io.grpc library, strange timeout errors, etc. Plus I saw on an old document there's a need to enforce the minimum version. I've tried building a ...
0
votes
0
answers
31
views
Missing _metadata_uuid and _metadata_lsn in BigQuery Dataset for Datastream Pipeline
I could really use some assistance!
I've set up a pipeline to copy data from my managed SQL (PostgreSQL) on GCP to BigQuery.
I followed these guides:
Google Cloud Datastream Documentation
Above guide ...
0
votes
0
answers
41
views
How to Run a Workflow Multiple Times with Different Inputs (Using Apache Beam or Native Workflow Features)?
I'm working on a workflow using Google Cloud Workflow, and I want to run the same workflow multiple times with different input values. I’ve been researching this, and I found that Apache Beam can be ...
0
votes
0
answers
31
views
NameError: name 'MyschemaName' is not defined [while running 'Attaching the schema-ptransform-67'] while deploying apache beam pipeline in Dataflow
How does one effectively define a schema for all the workers in dataflow to have access to the schema defined .Below is section of my code failing since the schema name cannot be found.
I have ...
0
votes
1
answer
126
views
Beam Dataflow Reshuffle Failing
We have a batch GCP Dataflow job that is failing on a Reshuffle() step with the following error:
ValueError: Error decoding input stream with coder WindowedValueCoder[TupleCoder[LengthPrefixCoder[...
0
votes
0
answers
104
views
Can't set service account properly on dataflow flex-template run
I want to overwrite the default SA used by the dataflow worker e.g [email protected] that get created and used by default if you don't specify anything. ButI want my own ...
1
vote
1
answer
108
views
How to write data from apache beam using gcp dataflow to bigquery table?
I am trying to run below program using Apache Beam on GCP Dataflow. The program should read CSV file, do some transformation like sum, max and join. Then write to BQ table.
Till step 4 I am getting ...
0
votes
1
answer
187
views
"TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union" while Writing to BigQuery table on a Dataflow pipeline using Apache Beam
I am working on a Dataflow pipeline in Pthon using apache-beam==2.57.0 and google-cloud-bigquery==3.26.0 to read data from a Cloud SQL database and write it to a BigQuery table. The script runs into ...