Skip to main content

All Questions

0 votes
0 answers
268 views

Is there any way In Apache beam Java to upsert (Update + Insert) the rows in BigQuery tables

Is there any way in apache beam java to update the rows in BigQuery table ? My use case is I run my dataflow job once in a day and it takes data from one BQ table and after transforming, it writes to ...
CHANDRA B.'s user avatar
-1 votes
1 answer
30 views

Unable to add delay/batching to records being processed from Changestream in Apache Beam Dataflow

We have an Apache Beam (Streaming) job which is getting records from a Changestream. We read change streams using, Step 1. PCollection inputChangeStreams = pipeline.apply("readChangeStream1",...
Somnath Mukherjee's user avatar
0 votes
1 answer
128 views

How to check status of a BigQuery Table

I have a Dataflow job that writes to a BigQuery table. Every Dataflow job will create a new table. I realize the write operation to the BigQuery table is asynchronous, i.e. the write operation to the ...
ZZZ's user avatar
  • 955
0 votes
0 answers
140 views

Convert streaming dataflow pipeline having kafka as source to batch pipeline in GCP

I have a streaming dataflow pipeline in which kafka is acting as input and data is loaded to bigquery. As this operation is expensive and KafkaIO source returns unbounded collection of Kafka records ...
Y2Jepic's user avatar
  • 51
0 votes
1 answer
214 views

How to catch exception or ACK pubsub message in Google dataflow PubsubIO.write() method in case of non existing pubsub topic?

I have lots of customers. Each customer has its own namespace. I am using Google dataflow to manage lots of messages. Each namespace might or might not have target output topic for said dataflow in ...
Virx's user avatar
  • 105
0 votes
1 answer
176 views

Cannot call getSchema when there is no schema when join two Pcollections

I hope all is good for you. I have two PCollection and I want to apply a leftOuterJoin but when I do this apply, I have this error and I don't understand why : java.lang.IllegalStateException: Cannot ...
DiR95's user avatar
  • 49
0 votes
1 answer
304 views

Apache Beam BigqueryIO(Java)io.grpc.StatusRuntimeException:INVALID_ARGUMENT:The primary clustering keys are required to create upsert stream

I am using apache beam java to read from one bigquery table and write into another bigquery table using applyRowMutations(), but it's not working. I have created the destination table with the ...
User10007's user avatar
0 votes
0 answers
284 views

How to set allowed lateness for late data and discard all data after that lateness period in java Apache beam dataflow windows?

I want to have a window of 1 minute, with allowed lateness of 2 minutes and I want to discard all other data incoming after that time. I could set the allowed lateness, but the data that comes after ...
sg_rs's user avatar
  • 501
2 votes
1 answer
1k views

Apache Beam version upgrade fails the ETL Pipeline

I am currently using Apache Beam version 2.39.0 and it is showing me errors on dataflow This version of the SDK is deprecated and will eventually no longer be supported. Learn more I am trying to ...
SRJ's user avatar
  • 2,876
1 vote
1 answer
341 views

Unit testing DoFn with ProcessContext in argument list

I trying to test the following ParDo handler: public class DoFnHandlerFn extends DoFn<KV<Row, Row>, Row> { @ProcessElement public void processElement(@NotNull ProcessContext ctx, @...
Shlomi Elbaz's user avatar
0 votes
1 answer
267 views

Can google dataflow job process the data from google spanner change stream record when it is stopped and then resumes

I am currently working on a dataflow job which picks data from spanner change stream and write it into another spanner table after some transformation and it works fine. But my doubt is if the ...
Nikhil's user avatar
  • 21
0 votes
1 answer
191 views

Dataflow Flex Template max size allowed

Hi I trying to find the max size allowed for the dataflow run in flex mode and a way to check the size. I know the classic template has a limit of 10MB per template. But Searching on the Google Cloud ...
Jhonathan Wolff's user avatar
0 votes
1 answer
158 views

How to cancel a GCP Dataflow job programmingly using Beam and just the job ID

We have a GCP Dataflow project which requires us to cancel a running Dataflow job. All we have at the time of cancellation is the Job ID. From other posts on Stackoverflow, I learned we can cancel a ...
ZZZ's user avatar
  • 955
0 votes
1 answer
86 views

How to properly use TemplatesServiceClient google-cloud-dataflow

I'm fairly new to google cloud and even more so with google cloud dataflow. I have created a java application with the sole purpose of running a dataflow job. The details are as follows: First I ...
Kristian Martinez's user avatar
0 votes
1 answer
359 views

Apache Beam GCP Dataflow Suggestion Implementation

I am running apache beam pipeline on GCP dataflow. The dataflow pipeline suggests following item A fusion break can be inserted after the following transforms to increase parallelism: ReadFromGCS/...
user3863788's user avatar

15 30 50 per page
1
2 3 4 5
12