I'm very confused with the concept of ParallelizationFactor
.
My understanding
https://stackoverflow.com/a/57534322/13000229
In the past, one KDS shard can send data to only one Lambda instance/invocation. More than one Lambda instance getting data from the same KDS shard can't run concurrently.
https://aws.amazon.com/blogs/compute/new-aws-lambda-scaling-controls-for-kinesis-and-dynamodb-event-sources/
In Nov 2019, a new parameter ParallelizationFactor
(Concurrent batches per shard) came out.
The default factor of one exhibits normal behavior. A factor of two allows up to 200 concurrent invocations on 100 Kinesis data shards.
Questions
- By using
ParallelizationFactor
, can more than one Lambda instance get different data from the same KDS shard concurrently?
For example, the shard has datad1
,d2
,d3
d4
,d5
andd6
, and we assumeBatchSize
= 2 andParallelizationFactor
= 2. Lambda instance A can consumed1
andd2
, while Lambda instance B can consumed3
andd4
at the same time. Then once Lambda instance A finishes the first batch, it starts processingd5
andd6
and so on.
If Question 1 is correct, what might be sacrificed? (e.g. the order in the same shard, one piece of data may be processed more than once)
If Question 1 is not correct, how will data in KDS shards be processed by Lambda concurrently?