All Questions
Tagged with apache-flink apache-kafka
811 questions
1
vote
1
answer
39
views
Early(?) Triggering of Flink's Event Time Windows and Non-Deterministic Results
I'm reading a small sample of data stored in a Kafka but instead of applying watermarks directly to the source I do some processing on the data then extract the event timestamps.
I then apply event ...
0
votes
0
answers
44
views
Failed to create stage bundle factory, Pyflink and Kafka error
I'm trying the Pyflink - Kafka connection.
I'm using Python 3.11 on Pycharm, with the flink-sql-connector-kafka-3.3.0-1.20.jar and apache-flink 1.20. I'm running Kafka on Docker, and I've tested the ...
1
vote
0
answers
28
views
Watermark strategy passed to env.from_source not being used,
Setup:
flink 1.19.1
python 3.10
Apache flink 1.19.1 python package
java 11
Kafka flink connector 3.4.0
I have am creating a KafkaSource that's reading from a Kafka topic where messages have ...
0
votes
0
answers
41
views
Flink partition consumption behavior with watermark alignment
I am running a Flink job with watermark alignment enabled and a max drift of 12 hours.
My input is a Kafka stream with 128 partitions.
When running the job with 3 task managers, each with 4 slots, I ...
0
votes
0
answers
35
views
the transaction.id created by flink is wrong
I am using flink 1.20 with kafka-client 3.4.0.
I am using KafkaSink with Exactly Once and checkpoints.
I set the transactionIdPrefix and noticed there is a problem in the beginning of using my ...
0
votes
0
answers
29
views
Hudi Compaction is very slow via Apache Flink
I have written a pipeline in which I am sinking the data from Kafka to Hudi-S3. It is working, but compaction is very very slow.
It is a batch job that runs every hour and sinks the last hour data to ...
0
votes
1
answer
56
views
Flink Connector vs Confluent Connector
What are the pros and cons of Confluent connectors vs Flink connector APIs?
Are the Flink connector APIs supported on Confluent Flink? These would allow more customization but not managed like the ...
0
votes
1
answer
52
views
Stream processing with flink using BroadcastStream is providing inconsistent results
// Define the incoming raw data stream sourcing from kafka topic
DataStream<RawMessage> mainStream = env.addSource(...);
// Define the reference data stream sourcing from kafka topic
DataStream&...
1
vote
1
answer
58
views
Problem with Kafka committing offsets when stopping a Flink job (AT_LEAST_ONCE and EXACTLY_ONCE)
We are using Apache Flink with Kafka and the AT_LEAST_ONCE strategy. The following problem occurs: when stopping a job, Flink commits offsets for messages that have not yet been processed, resulting ...
-1
votes
1
answer
69
views
What data is stored in flink checkpoints?
I have a case to handle restart flink job. I need use checkpointing and use metadata (state of kafkasource input) of it to process. Currently, checkpointing auto use metadata to recovery, but i wanna ...
0
votes
1
answer
62
views
Apache Flink and State Store from Kafka Data Source
I'm a bit confused about how Apache Flink works with commit offset when Kafka is the datasource.
I have my consumer group configured and consuming messages normally, and committing it (because I ...
0
votes
1
answer
114
views
'org.apache.flink.formats.avro.AvroDeserializationSchema org.apache.flink.formats.avro.AvroDeserializationSchema.forGeneric(org.apache.avro.Schema)'
Hey I am stuck with the same problem here with getting flink to read from kafka with avro and schema registry. I am able to see logs on schema-registry with flink trying to read from the server but I ...
0
votes
1
answer
40
views
Flink job (64 tasks) routes records to 1 task
I have an AWS managed flink job writing to Kafka. I can't seem to figure out why most of my records get routed to a single task despite having 64 parallelism.
The downstream kafka topic has 16 ...
0
votes
1
answer
241
views
Data Loss in Flink Job with Iceberg Sink After Restart: How to Ensure Consistent Writes?
I am running a Flink job that reads data from Kafka, processes it into a Flink Row object, and writes it to an Iceberg Sink. To deploy new code changes, I restart the job from the latest savepoint, ...
1
vote
0
answers
49
views
Flink Kafka connector timeout error: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: listTopics
We are using FlinkKafkaconnector (v 3.3.0-1.20) with Flink 1.20. We have 2 jobs which are running in our flink k8s cluster, and while one of them is able to connect to kafka, the other one fails with ...