Skip to main content

All Questions

1 vote
1 answer
39 views

Early(?) Triggering of Flink's Event Time Windows and Non-Deterministic Results

I'm reading a small sample of data stored in a Kafka but instead of applying watermarks directly to the source I do some processing on the data then extract the event timestamps. I then apply event ...
K.M's user avatar
  • 39
0 votes
0 answers
44 views

Failed to create stage bundle factory, Pyflink and Kafka error

I'm trying the Pyflink - Kafka connection. I'm using Python 3.11 on Pycharm, with the flink-sql-connector-kafka-3.3.0-1.20.jar and apache-flink 1.20. I'm running Kafka on Docker, and I've tested the ...
userloser's user avatar
1 vote
0 answers
28 views

Watermark strategy passed to env.from_source not being used,

Setup: flink 1.19.1 python 3.10 Apache flink 1.19.1 python package java 11 Kafka flink connector 3.4.0 I have am creating a KafkaSource that's reading from a Kafka topic where messages have ...
Abhishek Bhrushundi's user avatar
0 votes
0 answers
41 views

Flink partition consumption behavior with watermark alignment

I am running a Flink job with watermark alignment enabled and a max drift of 12 hours. My input is a Kafka stream with 128 partitions. When running the job with 3 task managers, each with 4 slots, I ...
Sabmit's user avatar
  • 173
0 votes
0 answers
35 views

the transaction.id created by flink is wrong

I am using flink 1.20 with kafka-client 3.4.0. I am using KafkaSink with Exactly Once and checkpoints. I set the transactionIdPrefix and noticed there is a problem in the beginning of using my ...
GreenA's user avatar
  • 1
0 votes
0 answers
29 views

Hudi Compaction is very slow via Apache Flink

I have written a pipeline in which I am sinking the data from Kafka to Hudi-S3. It is working, but compaction is very very slow. It is a batch job that runs every hour and sinks the last hour data to ...
Vishal's user avatar
  • 109
0 votes
1 answer
56 views

Flink Connector vs Confluent Connector

What are the pros and cons of Confluent connectors vs Flink connector APIs? Are the Flink connector APIs supported on Confluent Flink? These would allow more customization but not managed like the ...
Sindhu's user avatar
  • 23
0 votes
1 answer
52 views

Stream processing with flink using BroadcastStream is providing inconsistent results

// Define the incoming raw data stream sourcing from kafka topic DataStream<RawMessage> mainStream = env.addSource(...); // Define the reference data stream sourcing from kafka topic DataStream&...
Hariharan Janakiraman's user avatar
1 vote
1 answer
58 views

Problem with Kafka committing offsets when stopping a Flink job (AT_LEAST_ONCE and EXACTLY_ONCE)

We are using Apache Flink with Kafka and the AT_LEAST_ONCE strategy. The following problem occurs: when stopping a job, Flink commits offsets for messages that have not yet been processed, resulting ...
Руслан Цегельников's user avatar
-1 votes
1 answer
69 views

What data is stored in flink checkpoints?

I have a case to handle restart flink job. I need use checkpointing and use metadata (state of kafkasource input) of it to process. Currently, checkpointing auto use metadata to recovery, but i wanna ...
ndycuong's user avatar
0 votes
1 answer
62 views

Apache Flink and State Store from Kafka Data Source

I'm a bit confused about how Apache Flink works with commit offset when Kafka is the datasource. I have my consumer group configured and consuming messages normally, and committing it (because I ...
Ronaldo Lanhellas's user avatar
0 votes
1 answer
114 views

'org.apache.flink.formats.avro.AvroDeserializationSchema org.apache.flink.formats.avro.AvroDeserializationSchema.forGeneric(org.apache.avro.Schema)'

Hey I am stuck with the same problem here with getting flink to read from kafka with avro and schema registry. I am able to see logs on schema-registry with flink trying to read from the server but I ...
Rajat Sinha's user avatar
0 votes
1 answer
40 views

Flink job (64 tasks) routes records to 1 task

I have an AWS managed flink job writing to Kafka. I can't seem to figure out why most of my records get routed to a single task despite having 64 parallelism. The downstream kafka topic has 16 ...
nick_rinaldi's user avatar
0 votes
1 answer
241 views

Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes?

I am running a Flink job that reads data from Kafka, processes it into a Flink Row object, and writes it to an Iceberg Sink. To deploy new code changes, I restart the job from the latest savepoint, ...
sbrk's user avatar
  • 1,490
1 vote
0 answers
49 views

Flink Kafka connector timeout error: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listTopics

We are using FlinkKafkaconnector (v 3.3.0-1.20) with Flink 1.20. We have 2 jobs which are running in our flink k8s cluster, and while one of them is able to connect to kafka, the other one fails with ...
Logan's user avatar
  • 2,515

15 30 50 per page
1
2 3 4 5
55