Disable parallel compression for fast built-in compressors#14841
Disable parallel compression for fast built-in compressors#14841pdillinger wants to merge 1 commit into
Conversation
Summary: Use Compressor::GetRecommendedParallelThreads() overrides to disable parallel compression for Snappy, LZ4 (accelerated, not LZ4HC), and for accelerated levels of ZSTD (level < 0). These do not generally benefit from parallel compression. Also add --verify_compression option to sst_dump for some basic "recompress" benchmarking with that option. Test Plan: unit test updated. In planning the scope of this change, I manually tested some production SST files with release build sst_dump --command=recompress and various settings for --compression_parallel_threads, --compression_types, and --compression_level, while I had the CPUs mostly busy using cache_bench in the background. Here's an example, before the change to override parallel_threads: ``` $ for PT in 1 8; do /usr/bin/time ./sst_dump --command=recompress --compression_parallel_threads=$PT --block_size=16384 --compression_types=kLZ4Compression --verify_compression=1 test.sst; done ... Compression: kLZ4Compression Block Size: 16384 Threads: 1 Cx level: 32767 Cx size: 168501634 Uncx size: 791721664 Ratio: 4.698599 Write usec: 2345894 Read usec: 429022 Cx count: 48661 (100.0%) Not cx for ratio: 0 ( 0.0%) Not cx otherwise: 0 ( 0.0%) 2.54user 0.29system 0:02.84elapsed 99%CPU (0avgtext+0avgdata 460816maxresident)k ... Compression: kLZ4Compression Block Size: 16384 Threads: 8 Cx level: 32767 Cx size: 168501634 Uncx size: 791721664 Ratio: 4.698599 Write usec: 2476459 Read usec: 439464 Cx count: 48661 (100.0%) Not cx for ratio: 0 ( 0.0%) Not cx otherwise: 0 ( 0.0%) 3.95user 0.33system 0:02.98elapsed 143%CPU (0avgtext+0avgdata 455504maxresident)k ``` Here as in many of the cases I'm changing, it actually takes longer to compress with parallel, despite the added parallel opportunity of verify_compression. And overall CPU is much higher from 2.54 CPU*s to 3.95 CPU*s. The difference disappears with the change, because both use single-threaded SST construction.
|
| Check | Count |
|---|---|
modernize-make-shared |
1 |
| Total | 1 |
Details
tools/sst_dump_test.cc (1 warning(s))
tools/sst_dump_test.cc:287:22: warning: use std::make_shared instead [modernize-make-shared]
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D108075938. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit b1b0162 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit b1b0162 SummaryClean, well-scoped change that correctly leverages the existing High-severity findings (0): Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Test gap: ZSTD edge cases not covered --
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | Yes (SST building) | Yes (compression independent of txn visibility) | None |
| ReadOnly DB | No (no SST building) | N/A | None |
| CompactionService | Yes (uses same table builder) | Yes (override happens at compressor level, not serialized) | None |
| User-defined timestamps | Yes | Yes (no interaction with compression threading) | None |
| CompressorWrapper | Yes (forwards to wrapped) | Yes (custom wrappers can override) | None |
| FIFO/Universal compaction | Yes | Yes (same table builder path) | None |
CompressorWrapper behavior: CompressorWrapper::GetRecommendedParallelThreads() (advanced_compression.h:545) forwards to the wrapped compressor. A wrapper around a fast compressor inherits return 1. Custom wrappers can override by implementing their own GetRecommendedParallelThreads(). Custom CompressionManager can also create entirely custom compressors. The release note claim that "this behavior can be overridden with a custom Compressor from a custom CompressionManager" is accurate.
ZSTD kDefaultCompressionLevel (32767): The override checks opts_.level < 0. kDefaultCompressionLevel = 32767 is not < 0, so it correctly falls through to the base class. The mapping to ZSTD_CLEVEL_DEFAULT(3) only happens in the compression code (compression.cc:1104-1107), not in the parallel threads logic. This is correct behavior.
Positive Observations
- Leverages existing
GetRecommendedParallelThreads()infrastructure exactly as designed - Clean separation: Snappy/LZ4 unconditionally return 1, ZSTD conditionally checks level
- LZ4HC correctly excluded via separate class hierarchy (
BuiltinLZ4HCCompressorV2) BuiltinLZ4CompressorV2NoDictcorrectly inherits override from parentWithDictclass- Hard structural constraints in table builder (
block_based_table_builder.cc:1223-1227) still override compressor recommendations when needed - sst_dump
--verify_compressionfollows existing patterns for CLI option parsing - Empirical benchmark data provided in PR description validates the approach
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
joshkang97
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
|
@pdillinger merged this pull request in 01084e7. |
Summary: Use Compressor::GetRecommendedParallelThreads() overrides to disable parallel compression for Snappy, LZ4 (accelerated, not LZ4HC), and for accelerated levels of ZSTD (level < 0). These do not generally benefit from parallel compression.
Also add --verify_compression option to sst_dump for some basic "recompress" benchmarking with that option.
Test Plan: unit test updated.
In planning the scope of this change, I manually tested some production SST files with release build sst_dump --command=recompress and various settings for --compression_parallel_threads, --compression_types, and --compression_level, while I had the CPUs mostly busy using cache_bench in the background. Here's an example, before the change to override parallel_threads:
Here as in many of the cases I'm changing, it actually takes longer to compress with parallel, despite the added parallel opportunity of verify_compression. And overall CPU is much higher from 2.54 CPUs to 3.95 CPUs.
The difference disappears with the change, because both use single-threaded SST construction.