Skip to content

[Remote Compaction] Introduce kAborted Status#13438

Closed
jaykorean wants to merge 1 commit into
facebook:mainfrom
jaykorean:remote_compaction_aborted_status
Closed

[Remote Compaction] Introduce kAborted Status#13438
jaykorean wants to merge 1 commit into
facebook:mainfrom
jaykorean:remote_compaction_aborted_status

Conversation

@jaykorean

@jaykorean jaykorean commented Mar 5, 2025

Copy link
Copy Markdown
Contributor

Summary

If compaction job needs to be aborted inside Schedule() or Wait() today (e.g. Primary host is shutting down), the only two options are the following

  • Handle it as failure by returning CompactionServiceJobStatus::kFailure
  • Return CompactionServiceJobStatus::kUseLocal and let the compaction move on locally and eventually succeed or fail depending on the timing

In this PR, we are introducing a new status, CompactionServiceJobStatus::kAborted, so that the implementation of Schedule() and Wait() can return it. Just like how CompactionServiceJobStatus::kFailure is handled, compaction will not move on and fail, but the status will be returned as Status::Aborted() instead of Status::Incomplete()

Test Plan

Unit Test added

 ./compaction_service_test --gtest_filter="*CompactionServiceTest.AbortedWhileWait*"
@jaykorean jaykorean requested review from anand1976 and hx235 March 5, 2025 21:18
@jaykorean jaykorean marked this pull request as ready for review March 5, 2025 21:19

@hx235 hx235 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - can we have a little more context on this though the code looks fine in general?

Are we planning to do something different between kAborted and kFailure in rocksdb and/or remote compaction side? For the client (i.e, rocksdb that is looking at the result from Schedule()/Wait()..), the error kAborted is not indicating anything different from kFailure in practice right now IMO. Maybe some comments on the API that can return kAborted is needed.

  • Is this usually an error allowing re-try? For retry we have Status::kTryAgain().

@jaykorean

If compaction job needs to be aborted inside Schedule() or Wait() today (e.g. Primary host is shutting down)

Shouldn't it be the "secondary host" is shutting down if the status is returned from Schedule() or Wait()?? Ah i see this is for the case where primary host shutting down will also abort the remote compaction

@jaykorean

Copy link
Copy Markdown
Contributor Author

@hx235

The context requires some Meta internal implementation info. Let me sync with you offline.

@hx235 hx235 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking to the author it's now clearer to me why kAborted is needed though from RocksDB point of view currently we don't differentiate kAborted from kFailure

@jaykorean jaykorean force-pushed the remote_compaction_aborted_status branch from f4cfdbf to b64cf5b Compare March 6, 2025 02:06
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot

Copy link
Copy Markdown
Contributor

@jaykorean merged this pull request in 68b2d94.

anand1976 pushed a commit that referenced this pull request Mar 6, 2025
Summary:
If compaction job needs to be aborted inside `Schedule()` or `Wait()` today (e.g. Primary host is shutting down), the only two options are the following
- Handle it as failure by returning `CompactionServiceJobStatus::kFailure`
- Return `CompactionServiceJobStatus::kUseLocal` and let the compaction move on locally and eventually succeed or fail depending on the timing

In this PR, we are introducing a new status, `CompactionServiceJobStatus::kAborted`,  so that the implementation of `Schedule()` and `Wait()` can return it. Just like how `CompactionServiceJobStatus::kFailure` is handled, compaction will not move on and fail, but the status will be returned as `Status::Aborted()` instead of `Status::Incomplete()`

Pull Request resolved: #13438

Test Plan:
Unit Test added
```
 ./compaction_service_test --gtest_filter="*CompactionServiceTest.AbortedWhileWait*"
```

Reviewed By: anand1976, hx235

Differential Revision: D70655355

Pulled By: jaykorean

fbshipit-source-id: 22614ce9c7455cda649b15465625edc93978fe11
@jaykorean jaykorean deleted the remote_compaction_aborted_status branch March 7, 2025 16:21
ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request Apr 27, 2025
Summary:
If compaction job needs to be aborted inside `Schedule()` or `Wait()` today (e.g. Primary host is shutting down), the only two options are the following
- Handle it as failure by returning `CompactionServiceJobStatus::kFailure`
- Return `CompactionServiceJobStatus::kUseLocal` and let the compaction move on locally and eventually succeed or fail depending on the timing

In this PR, we are introducing a new status, `CompactionServiceJobStatus::kAborted`,  so that the implementation of `Schedule()` and `Wait()` can return it. Just like how `CompactionServiceJobStatus::kFailure` is handled, compaction will not move on and fail, but the status will be returned as `Status::Aborted()` instead of `Status::Incomplete()`

Pull Request resolved: facebook#13438

Test Plan:
Unit Test added
```
 ./compaction_service_test --gtest_filter="*CompactionServiceTest.AbortedWhileWait*"
```

Reviewed By: anand1976, hx235

Differential Revision: D70655355

Pulled By: jaykorean

fbshipit-source-id: 22614ce9c7455cda649b15465625edc93978fe11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

4 participants