WBWI supports writing timestamps as you go#14084
Conversation
|
@nickbrekhus has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85605627. |
Summary: Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations *must* supply a timestamp iff the CF is UDT-enabled. Callers *should not* call UpdateTimestamps when `required_timestamps=true` as this will overwrite the timestamps supplied with mutations. When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB. Differential Revision: D85605627
7bf995c to
dd77ee3
Compare
Summary: Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations *must* supply a timestamp iff the CF is UDT-enabled. Callers *should not* call UpdateTimestamps when `required_timestamps=true` as this will overwrite the timestamps supplied with mutations. When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB. Differential Revision: D85605627
Summary: Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations *must* supply a timestamp iff the CF is UDT-enabled. Callers *should not* call UpdateTimestamps when `required_timestamps=true` as this will overwrite the timestamps supplied with mutations. When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB. Differential Revision: D85605627
2ac1238 to
dd77ee3
Compare
Summary: Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations *must* supply a timestamp iff the CF is UDT-enabled. Callers *should not* call UpdateTimestamps when `required_timestamps=true` as this will overwrite the timestamps supplied with mutations. When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB. Differential Revision: D85605627
dd77ee3 to
18088c5
Compare
Summary: Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations *must* supply a timestamp iff the CF is UDT-enabled. Callers *should not* call UpdateTimestamps when `required_timestamps=true` as this will overwrite the timestamps supplied with mutations. When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB. Differential Revision: D85605627
18088c5 to
3b90dcf
Compare
Summary: When WBWI, UDT CF, overwrite and rollback are used together, WBWI index is corrupted and will contain duplicate key entries. The problem, which is fixed by this diff, is that the record reader does not strip the timestamp packed with the key. This results in corruption of the index during rebuild, since overwrite deduplication uses the slice read from the batch. The indexed key is still correct, as AddOrUpdate re-reads the key from the offset, resulting in the key being stripped. Differential Revision: D85616849
Summary: Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations *must* supply a timestamp iff the CF is UDT-enabled. Callers *should not* call UpdateTimestamps when `required_timestamps=true` as this will overwrite the timestamps supplied with mutations. When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB. Differential Revision: D85605627
3b90dcf to
89e4e52
Compare
| if (compare_result < 0) { | ||
| return {Status::OK(), false}; | ||
| } | ||
| return {Status::InvalidArgument("cannot write below max ts for key"), |
There was a problem hiding this comment.
Should be "below last ts for key"
Are we introducing the capability to write multiple entries in a single WB with the same key but different timestamps? It might seem obvious that the various timestamped versions of each key are all committed, but not necessarily when they are coalesced for the purposes of WBWI reads. That should be clarified.
There was a problem hiding this comment.
Yes, in general there can be multiple timestamps within a batch when this feature is enabled, including multiple timestamps for a given key. How do you suggest clarifying that? I tried to make at least the read behavior clear in the arg comment.
| // enabled. | ||
| // | ||
| // Note that regardless of whether require_timestamps is | ||
| // true or false, read semantics are RYOW. In other words, |
There was a problem hiding this comment.
I don't agree that RYOW implies ignoring unpersisted/uncommitted timestamps. RYOW means you can see your own writes, not that you get them when you don't ask for them.
There was a problem hiding this comment.
I see a few options here outside of the one I've chosen. I think implementation wise there isn't a major difference in complexity so long as we are OK with some extra cost when doing point reads below max(ts in batch). To avoid the extra cost the index needs to be (key, ts) instead of (key).
- the current behavior (described in comment)
- having two notions of read_timestamp, one that controls visibility of reads inside the batch and another that controls visibility of reads within the DB.
- ReadOptions::read_timestamp controls visibility of both.
I didn't like 2 because it is in my view a bit confusing. Further, my particular use case had no need for this level of control. (In practice the batch read_timestamp would always be max(ts for key, ts in batch)
3 I think is fairly intuitive but changes the snapshot position in the database as well. In principle that could cause some records in the DB to become visible that may not be intended. For our particular consistency model any records above the read snapshot that are actually read would be a bug in the system, so I think that's OK. The other minor gripe I have with this is that again in practice read_timestamp would always be max(ts for key, ts in batch).
There was a problem hiding this comment.
Sorry I didn't see this sooner. My open GitHub page apparently wasn't auto-refreshing like it's supposed to.
Isn't using max(ts for key, ts in batch) equivalent to no ReadOptions::timestamp (using the UDT assumption of timestamp monotonicity by key)? So wouldn't it suffice to get that working first without committing to a semantic for timestamp with WBWI? @nickbrekhus
There was a problem hiding this comment.
I was under the impression it wouldn't be possible to achieve snapshot isolation if timestamp is unconstrained and concurrent writes are possible, but I wasn't considering using read_options.snapshot. Assuming specifying both read_timestamp and snapshot is permissible (the documentation is unclear on this but I don't see why it wouldn't work) that gives user pretty much full control without introducing any new options.
Definitely an improvement over the approach taken in this diff.
snapshot isolation, ryow: ReadOptions{.snapshot = snapshot, .read_timestamp = nullptr}
read commited, ryow: ReadOptions{.snapshot = nullptr, .read_timestamp = nullptr}
There was a problem hiding this comment.
One flaw in the reasoning in my comment: As far as I can tell, you cannot select the position of a snapshot. It is always as-of the latest write flushed to the database. I don't think that gives users enough flexibility to do useful things with WBWI + UDT although it would be adequate for my current use cases.
For example, if you were implementing multi shard transactions with snapshot isolation (si), you would want to select a basis for the transaction that is the same for all shards participating. This is not likely to be the latest write to the database for at least some shards. So in this scenario specifying a snapshot does not work.
Using solely timestamp to control visibility doesn't work either (that either causes you to lose RYOW or get read committed semantics), so you are kind of stuck and would need to use UpdateTimestamps().
That said, this is not the scenario I want to add this feature for and maybe it's OK for that kind of use case to use UpdateTimestamps(), as I don't see a lot of utility in allowing multiple timestamps when the batches logically represent a single transaction. In our case they represent multiple transactions that we batch to amortize various overheads. We still need read-your-write semantics during this batching because various aspects of the system rely on it during batch application.
On the other hand, if we just continue to ignore timestamp visibility evaluation in write batches the multi-shard si scenario works. The other fix I can see for this is to allow specifying a batch_read_timestamp but I do have some concerns about the ergonomics of such an interface and I don't have any clear understanding of the scenarios that would benefit.
| const ReadableWriteBatch* write_batch = delta_iterator_->GetWriteBatch(); | ||
| assert(write_batch); | ||
|
|
||
| Slice raw_key, value_ignore, blob_ignore, xid_ignore; |
There was a problem hiding this comment.
"raw key" is not recognized, specific terminology. I guess it's a user key with timestamp.
| size_t ts_sz = ucmp->timestamp_size(); | ||
| if (ts_sz > 0) { | ||
| ret.key = StripTimestampFromUserKey(ret.key, ts_sz); | ||
| Slice rawKey = ret.key; |
There was a problem hiding this comment.
will fix naming to snake_case too.
| bool current_at_base_; | ||
| bool equal_keys_; | ||
| bool allow_unprepared_value_; | ||
| // if set, the iterator will return Entrys with timestamps filled in. |
There was a problem hiding this comment.
"Fill timestamps" is a bad name for how to read something. It implies modification.
Also, "will return Entrys with timestamps filled in" - does that mean only entries that already have timestamps or all entries but each with an associated timestamp?
Also, the data model with UDT includes timestamps. Doing something different should not be considered a default or normalized interpretation of that data model. And 'false' are considered default or normal values for bools. How about "ignore_timestamps"?
There was a problem hiding this comment.
What I'm trying to do is preserve the existing behavior when require_timestamps is not set on the WBWI. Namely, the iterator would return Slice() for the timestamp of batched keys in all CFs previously. The iterator itself cannot tell, so some sort of flag or sentinel is needed. The batch entry itself has a zero-fillled slice for timestamp. But It could be that the zero-filled slice is a valid timestamp.
There was a problem hiding this comment.
ignore_timestamps sounds good to me.
|
|
||
| Slice raw_key, value_ignore, blob_ignore, xid_ignore; | ||
| WriteType type_ignore; | ||
| Status s = write_batch->GetEntryFromDataOffset( |
There was a problem hiding this comment.
This function looks expensive overall. Why's it not simpler (like key())?
Summary:
Adds support to write timestamps as-you-go for UDT-enabled CFs within a WBWI (WriteBatchWithIndex). When a WBWI is instantiated with require_timestamps=true, mutations must supply a timestamp iff the CF is UDT-enabled. Callers should not call UpdateTimestamps when
required_timestamps=trueas this will overwrite the timestamps supplied with mutations.When require_timestamps = false, the previous behavior is maintained. In other words timestamps must never be supplied with mutations and UpdateTimestamps must be called before writing the batch to DB.
Differential Revision: D85605627