Augment cf_paths: add path placement strategy and dynamic path choosing logic#5120
Augment cf_paths: add path placement strategy and dynamic path choosing logic#5120tiden0614 wants to merge 52 commits into
Conversation
|
Apologies first for the large bundle of changes. We've been buffering quite a few tweaks for a while. I think it will get better for our future contributions. |
2c87a0d to
421bc82
Compare
|
Thank you for your contribution! While we still haven't got bandwidth to look into this giant PR, it's on our list. |
|
@siying Thanks! This is an important feature for us so we'd really like it to get merged. I noticed that I've broken 2 universal compaction tests with the change and I'm working on fixing those (I changed the way universal compaction calculating the size of compaction files in subtle ways). |
2edb911 to
45a6372
Compare
|
@siying Hey I have updated the PR to a relatively good state to be reviewed. The previous broken test cases have been fixed. Sorry for it taking me so long to update the PR. I would appreciate it if you guys can allocate some bandwidth to take a look at this change. |
|
ping @siying This PR hasn't been updated until recently. I'm pinging the team to make sure it has visibility. Thanks! |
siying
left a comment
There was a problem hiding this comment.
Thank you for working on this complicated PR. It's great! The basic approach looks good to me. I left some comments.
There was a problem hiding this comment.
Do we have to waste this 8 bytes? Can you explain why it is absolutely needed?
There was a problem hiding this comment.
The motivation for isolating the path_id to its own integer is that: the current way of packing path id and file number only gives a limit of 4 for path id.
I didn't realize that FileDescriptor can leave a big memory footprint, so took the simple approach of giving path id its own integer to avoid the casting.
Glad you asked. I made a change to give 2 more bits to path id so that the upper limit now goes up to 16, which should be sufficient for majority cases. However, it does shrink the upper limit of file numbers quite a bit. It is still a very large upper limit so I'm assuming that it should be fine.
Let me know if this works.
03e8bd1 to
e1a4b30
Compare
…rg list Since some c++ compilers don't support initializing structs using named args
…xiting the if branch
|
@siying this pr has been updated to a ready state to be reviewed. Thanks for allocating bandwidth on this change. I addressed comments from last revision. |
|
ping |
|
@siying appreciate a lot if the team can allocate some cycles review this change. We would like this feature to be merged into rocksdb so that we can continue to use it in a larger scale. It would be nice to provide an update, even if it's just an estimation of when we can get some traction makes us happy. |
|
ping |
|
@siying ping |
|
ping |
1 similar comment
|
ping |
Notes
We are prototyping to use rocksdb as our next-gen storage engine. One problem that we have encountered is that rocksdb doesn't provide a good way to utilize multiple disk mounts evenly on our hosts. We didn't really like the raid idea, which was mentioned in the rocksdb FAQ: our current prod software is running on these hosts with multiple equal-sized disk mounts; using a raid would require a risky disk setup before the data migration.
This change introduces a dynamic path choose class (
DbPathSupplier) when creating output files. It is built on top the already-existing feature of cf_paths. The original behavior of using paths (gradually moving cold data to towards the end of paths list) is kept as the default path placement strategy. We are introducing a new random placement strategy to allow us distribute sst files evenly on our disks.Major changes
DbPathSupplier) to provide dynamic db_path picking logic to replace the original fixed path_id logicdb_path_supplier_factoryoption in DBOptionsFileDescriptorstruct: removed 2 higher bits from thepacked_number_and_path_idthat used to represent file_num. This allows us to increase the limit ofpath_idfrom3to15DbPathSupplier) for every single output file.Testing