Skip to main content

All Questions

0 votes
0 answers
17 views

Disable auto scaling for templated jobs

In Dataflow, you can run jobs without autoscaling. This is typically achieved by setting a pipeline_option called autoscaling_algorithm to NONE. Attempting the equivalent on Templated Dataflow Jobs ...
user30237673's user avatar
0 votes
0 answers
33 views

Apache beam dataflow logging formatter

I have some pipeline, it use some of my modules there I am using a logging etc. The problem is with logging, i use log level overrides to setup log level but what with formatter? I am using loggers ...
Dawid 's user avatar
  • 11
0 votes
0 answers
31 views

NameError: name 'MyschemaName' is not defined [while running 'Attaching the schema-ptransform-67'] while deploying apache beam pipeline in Dataflow

How does one effectively define a schema for all the workers in dataflow to have access to the schema defined .Below is section of my code failing since the schema name cannot be found. I have ...
oyugi.collins's user avatar
0 votes
1 answer
126 views

Beam Dataflow Reshuffle Failing

We have a batch GCP Dataflow job that is failing on a Reshuffle() step with the following error: ValueError: Error decoding input stream with coder WindowedValueCoder[TupleCoder[LengthPrefixCoder[...
Ben Delany's user avatar
0 votes
1 answer
187 views

"TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union" while Writing to BigQuery table on a Dataflow pipeline using Apache Beam

I am working on a Dataflow pipeline in Pthon using apache-beam==2.57.0 and google-cloud-bigquery==3.26.0 to read data from a Cloud SQL database and write it to a BigQuery table. The script runs into ...
Sharanya J's user avatar
0 votes
0 answers
164 views

Error Syncing Pod Python Dataflow Workers

I have built my custom Python image for streaming Dataflow job and tested it locally that it works. When deployed to GCP it pulls the images, JOB logs initialize the minimum 1 worker, job is in state ...
bozhyte's user avatar
  • 37
0 votes
0 answers
116 views

How to design a Dataflow pipeline requiring several external api calls and database operations per flow?

The pipeline I am building is specifically using Dataflow with Apache Beam Python's sdk, however, I just need opinions on Beam's concepts, so examples in Java for instance I am very fine with. In ...
Dan Q's user avatar
  • 1
0 votes
0 answers
38 views

GC Dataflow workers can't find my python modules

I've written a python script and built a folder structure for a small ETL application that is meant to run as a Google Cloud Dataflow job. It works locally and I'm able to load, transform, and offload ...
Aaron Graston's user avatar
0 votes
0 answers
25 views

Why windowing in apache beam not waiting for window to finish?

I am writing a apache beam streaming pipeline, where I want to read msg from pubsusb, and want to iterate over all the msg present in window , after my window time, in this case 1 min. But, although ...
Ajay S Pal's user avatar
0 votes
0 answers
119 views

Pip install failed for package: -r while Running dataflow Flex template with providing Requirements.txt NewConnectionError

Getting following errors after launching dataflow template - code execution starts and once pipeline graph generation starts, getting this 👇👇 "subprocess.CalledProcessError: Command '['/usr/...
ShubhGurukul's user avatar
1 vote
0 answers
57 views

How to specify the python dependency versions for Dataflow where Cloud NAT is disabled and custom container can't be used

I am running Python Datalfow jobs and I deploy the dataflow template to gcs from Gitlab. I am using --requirements_file=requirement.txt when I deploy my python template to gcs. Cloud NAT is diabled in ...
Ananth's user avatar
  • 119
0 votes
0 answers
57 views

How to add delay to Apache Beam WriteToJdbc for MySQL Database

I am trying to stream data from a Google Cloud Pub/Sub to a MySQL Cloud Database for a uni project. The data contains a series of values for various attributes of which some need to go into one table ...
Ryan Hogan's user avatar
0 votes
0 answers
73 views

Configure message retention duration of PubSub subscription created by Dataflow

I have a Dataflow pipeline that ingests messages from PubSub. This automatically creates a subscription however the retention duration is 7 days. I like that it creates the subscription so I don't ...
Joe Moore's user avatar
  • 2,023
0 votes
0 answers
199 views

Dataflow streaming pipeline from pubsub to bigquery stalls (python beam sdk, streaming inserts)

I am trying to set up a streaming pipeline using the python beam sdk (2.54.0). The pipeline works fine but is not scaling well when there is a important increase of input pubsub messages ( when pubsub ...
Kevin Zhou's user avatar
2 votes
1 answer
394 views

How to use logging with custom jsonPayload in GCP Dataflow

by default, GCP Dataflow 'just works' with the standard Python logging library, so: import logging logging.info('hello') yields a log message in Google Logging while the Dataflow job runs in GCP. ...
Duck Ling's user avatar
  • 2,180

15 30 50 per page
1
2 3 4 5
22