All Questions
175 questions
0
votes
0
answers
268
views
Is there any way In Apache beam Java to upsert (Update + Insert) the rows in BigQuery tables
Is there any way in apache beam java to update the rows in BigQuery table ? My use case is I run my dataflow job once in a day and it takes data from one BQ table and after transforming, it writes to ...
-1
votes
1
answer
30
views
Unable to add delay/batching to records being processed from Changestream in Apache Beam Dataflow
We have an Apache Beam (Streaming) job which is getting records from a Changestream. We read change streams using,
Step 1.
PCollection inputChangeStreams = pipeline.apply("readChangeStream1",...
0
votes
1
answer
128
views
How to check status of a BigQuery Table
I have a Dataflow job that writes to a BigQuery table. Every Dataflow job will create a new table.
I realize the write operation to the BigQuery table is asynchronous, i.e. the write operation to the ...
0
votes
0
answers
140
views
Convert streaming dataflow pipeline having kafka as source to batch pipeline in GCP
I have a streaming dataflow pipeline in which kafka is acting as input and data is loaded to bigquery. As this operation is expensive and KafkaIO source returns unbounded collection of Kafka records ...
0
votes
1
answer
214
views
How to catch exception or ACK pubsub message in Google dataflow PubsubIO.write() method in case of non existing pubsub topic?
I have lots of customers. Each customer has its own namespace. I am using Google dataflow to manage lots of messages. Each namespace might or might not have target output topic for said dataflow in ...
0
votes
1
answer
176
views
Cannot call getSchema when there is no schema when join two Pcollections
I hope all is good for you.
I have two PCollection and I want to apply a leftOuterJoin but when I do this apply, I have this error and I don't understand why : java.lang.IllegalStateException: Cannot ...
0
votes
1
answer
304
views
Apache Beam BigqueryIO(Java)io.grpc.StatusRuntimeException:INVALID_ARGUMENT:The primary clustering keys are required to create upsert stream
I am using apache beam java to read from one bigquery table and write into another bigquery table using applyRowMutations(), but it's not working.
I have created the destination table with the ...
0
votes
0
answers
284
views
How to set allowed lateness for late data and discard all data after that lateness period in java Apache beam dataflow windows?
I want to have a window of 1 minute, with allowed lateness of 2 minutes and I want to discard all other data incoming after that time.
I could set the allowed lateness, but the data that comes after ...
2
votes
1
answer
1k
views
Apache Beam version upgrade fails the ETL Pipeline
I am currently using Apache Beam version 2.39.0 and it is showing me errors on dataflow
This version of the SDK is deprecated and will eventually no longer be
supported. Learn more
I am trying to ...
1
vote
1
answer
341
views
Unit testing DoFn with ProcessContext in argument list
I trying to test the following ParDo handler:
public class DoFnHandlerFn extends DoFn<KV<Row, Row>, Row> {
@ProcessElement
public void processElement(@NotNull ProcessContext ctx, @...
0
votes
1
answer
267
views
Can google dataflow job process the data from google spanner change stream record when it is stopped and then resumes
I am currently working on a dataflow job which picks data from spanner change stream and write it into another spanner table after some transformation and it works fine. But my doubt is if the ...
0
votes
1
answer
191
views
Dataflow Flex Template max size allowed
Hi I trying to find the max size allowed for the dataflow run in flex mode and a way to check the size.
I know the classic template has a limit of 10MB per template.
But Searching on the Google Cloud ...
0
votes
1
answer
158
views
How to cancel a GCP Dataflow job programmingly using Beam and just the job ID
We have a GCP Dataflow project which requires us to cancel a running Dataflow job.
All we have at the time of cancellation is the Job ID.
From other posts on Stackoverflow, I learned we can cancel a ...
0
votes
1
answer
86
views
How to properly use TemplatesServiceClient google-cloud-dataflow
I'm fairly new to google cloud and even more so with google cloud dataflow. I have created a java application with the sole purpose of running a dataflow job. The details are as follows:
First I ...
0
votes
1
answer
359
views
Apache Beam GCP Dataflow Suggestion Implementation
I am running apache beam pipeline on GCP dataflow.
The dataflow pipeline suggests following item
A fusion break can be inserted after the following transforms to increase parallelism: ReadFromGCS/...