Unify LZ4 and LZ4HC compression levels#14819
Conversation
Summary: kLZ4Compression and kLZ4HCCompression share the same on-disk format and decompressor, but historically kLZ4Compression only honored negative (acceleration) levels while kLZ4HCCompression only honored positive levels. This unifies them so `compression_opts.level` alone selects the variant: level <= 0 uses LZ4 fast (acceleration = -level) and level >= 1 uses LZ4HC (1..12), regardless of which of the two types is configured. The configured type now only determines the default compression level (LZ4: acceleration 1, equivalent to level -1; LZ4HC: level 9). For code simplicity, the recorded per-block type comes from the compression type derived from the level, which could differ from the configured type. To preserve the originally configured choice for debugging/tracking, it is recorded as a `_type=<decimal>` pseudo-option in the rocksdb.compression_options SST table property. Out-of-range non-default levels are clamped to the nearest effective value (LZ4 acceleration capped at 65537, which also avoids signed overflow negating INT_MIN; LZ4HC level capped at 12). The cost-aware (auto-tune) compressor's LZ4 level grid is changed to negative accelerations so it actually exercises fast LZ4 (positive levels now route to LZ4HC). Test Plan: New unit tests in compression_test.cc: - UnifiedLZ4LZ4HCLevels: for a representative set of non-default levels, both configured types produce identical output and the same recorded type (selected by the level), levels that clamp to the same effective parameter compress identically, and each round-trips; plus per-type default-level behavior. - ConfiguredCompressionTypeRecordedInProperties: the `_type=` pseudo- option appears in the SST table property for each configured type.
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D107536580. |
✅ clang-tidy: No findings on changed linesCompleted in 217.7s. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit a19c7a1 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit a19c7a1 SummaryWell-designed unification of LZ4/LZ4HC compression levels that fixes a real usability problem (auto-tune levels were ineffective) and provides a cleaner level API. Decompression compatibility is verified safe. Two medium-severity issues around db_bench and behavioral documentation warrant attention. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. db_bench rejects unified-routed compressors --
|
| Context | Safe? | Notes |
|---|---|---|
| Block-based table read/write | Yes | Same decompressor for both types |
| BlobDB | Yes | Assertion checks compressor's own preferred type |
| Remote compaction | Yes | Per-block type; decompression agnostic |
| Auto-tune | Yes | Level grids now correctly exercise variants |
| db_bench | No | See M1 |
| Dictionary compression | Yes | NoDict supports dict via CloneForDict() inheritance |
Debunked Findings
- Dictionary support loss (flagged by 5 agents as HIGH): Debunked --
BuiltinLZ4CompressorV2NoDictinherits dict capability viaCompressorWithSimpleDictBase::MaybeCloneSpecialized()->CloneForDict(). - LegacyForceBuiltinCompression assertion: Debunked -- assertion checks compressor's own preferred type, which always matches output.
- Existing test breakage: Debunked -- options_test.cc tests only verify parsing, not compression behavior.
Positive Observations
- INT_MIN overflow protection in LZ4AccelerationFromLevel is well-considered.
- Auto-tune fix is valuable: old {1,4,9} levels for kLZ4 were completely ineffective.
- Thorough test coverage including extreme edge cases.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D107536580. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 23e1abc ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 23e1abc SummaryWell-designed unification of LZ4/LZ4HC compression levels with good test coverage. The core approach is sound: level sign selects algorithm, configured type only affects default level. The ZSTD level-0 discontinuity fix and auto-tune level correction are welcome improvements. A few issues around incomplete ZSTD sanitization and a behavior change for dictionary compression with unified routing. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Dictionary support lost for LZ4 fast path via unified routing —
|
| Context | Safe? | Notes |
|---|---|---|
BlobDB (LegacyForceBuiltinCompression) |
YES | Assertion actual_type == GetPreferredCompressionType() holds — both come from the same compressor instance |
| SST property parsing | YES | compression_options is display-only, not parsed back. _type= follows existing _compression_manager= pattern |
| Rolling upgrades | YES | LZ4/LZ4HC share wire format and decompressor |
| CompactionService / Remote | YES | Options are serialized, not compressor objects |
| Dynamic SetOptions | YES | GetCompressor called each time |
Positive Observations
- Clean, well-commented unified routing in
GetCompressor() SanitizeZSTDCompressionLevel()eliminates 4 instances of duplicated codeLZ4AccelerationFromLevel()adds INT_MIN overflow protection previously missing- Comprehensive test coverage with boundary levels, clamping, round-trips
- Auto-tune fix from
{1,4,9}to{-1,-4,-9}is a genuine bug fix
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
joshkang97
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D107536580. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 9b772c6 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 9b772c6 SummaryWell-designed unification of LZ4/LZ4HC compression levels with good test coverage. One medium-severity consistency issue with ZSTD level 0 handling in the dictionary compression path. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Inconsistent ZSTD level 0 mapping in
|
| Context | Affected? | Impact | Notes |
|---|---|---|---|
| Decompression | No | None | Both types already share decompressor |
| CompactionService | No | None | Compressor created fresh per job |
| WritePreparedTxnDB | No | None | Compression is orthogonal |
| compression_per_level | Yes | Low | Mixed LZ4/LZ4HC arrays now behave identically for same level |
| Auto-tune (CostAwareCompressor) | Yes | Fixed | LZ4 levels correctly changed to {-1,-4,-9} |
| db_bench | Yes | Fixed | CanonicalCompressionType prevents false rejection |
| SST property parsing | Yes | Low | New _type= field; underscore prefix helps |
Positive Observations
- Clean centralization of ZSTD level handling via
SanitizeZSTDCompressionLevel() - Thorough test coverage for the core unification logic including level clamping
- INT_MIN overflow protection in
LZ4AccelerationFromLevelis correct - Auto-tune LZ4 levels properly changed to negative values to actually exercise acceleration
- The
CanonicalCompressionType()function cleanly handles the db_bench comparison - Release notes accurately describe the behavioral changes
- ZSTD level 0 fix removes a genuine usability issue for auto-tuning
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@pdillinger merged this pull request in 6d4a814. |
Summary:
kLZ4Compression and kLZ4HCCompression share the same on-disk format and decompressor, but historically kLZ4Compression only honored negative (acceleration) levels while kLZ4HCCompression only honored positive levels. This unifies them so
compression_opts.levelalone selects the variant: level <= 0 uses LZ4 fast (acceleration = -level) and level >= 1 uses LZ4HC (1..12), regardless of which of the two types is configured.The configured type now only determines the default compression level (LZ4: acceleration 1, equivalent to level -1; LZ4HC: level 9).
For code simplicity, the recorded per-block type comes from the compression type derived from the level, which could differ from the configured type. To preserve the originally configured choice for debugging/tracking, it is recorded as a
_type=<decimal>pseudo-option in the rocksdb.compression_options SST table property.Out-of-range non-default levels are clamped to the nearest effective value (LZ4 acceleration capped at 65537, which also avoids signed overflow negating INT_MIN; LZ4HC level capped at 12). The cost-aware (auto-tune) compressor's LZ4 level grid is changed to negative accelerations so it actually exercises fast LZ4 (positive levels now route to LZ4HC).
Related inclusion: The ZSTD library has a discontinuity at level=0, which maps to level 3, which is more aggressive than levels 1 and 2, which are more aggressive than levels -1, -2, etc. For better friendliness to auto-tuning (etc.), we now map level 0 to be the same as level -1, so that increasing compression level numbers have non-decreasing aggressiveness.
Test Plan:
New unit tests in compression_test.cc:
_type=pseudo- option appears in the SST table property for each configured type.