I recently upgraded from v10.9.1 --> v11.1.1 and started seeing errors like the following during compaction:
2026/06/30-00:35:21.541704 173606 [ERROR] [/db_impl/db_impl_compaction_flush.cc:3729] Waiting after background compaction error: IO error: No such file or directory: while stat a file for size: /tmp/compact_bug_repro/000889.sst: No such file or directory, Accumulated background error counts: 1
The errors always follow an "Empty SST file not kept" status like the following:
2026/06/30-00:35:21.537015 173606 EVENT_LOG_v1 {"time_micros": 1782794121536847, "cf_name": "default", "job": 348, "event": "table_file_creation", "file_number": 0, "file_size": 0, "file_checksum": "", "file_checksum_func_name": "Unknown", "smallest_seqno": 72057594037927935, "largest_seqno": 0, "table_properties": {"data_size": 0, "index_size": 13, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 1, "index_value_is_delta_encoded": 1, "filter_size": 0, "raw_key_size": 0, "raw_average_key_size": 0, "raw_value_size": 0, "raw_average_value_size": 0, "num_data_blocks": 0, "num_entries": 0, "num_filter_entries": 0, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 7, "fixed_key_len": 0, "filter_policy": "", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "user_defined_timestamps_persisted": 1, "key_largest_seqno": 0, "key_smallest_seqno": 18446744073709551615, "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": ";;", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; max_dict_buffer_bytes=0; use_zstd_dict_trainer=1; max_compressed_bytes_per_kb=896; checksum=0; ", "creation_time": 1782794121, "oldest_key_time": 1782794121, "newest_key_time": 0, "file_creation_time": 1782794121, "slow_compression_estimated_data_size": 0, "fast_compression_estimated_data_size": 0, "db_id": "b1ab3e4a-23eb-4ec4-af97-f827682c014c", "db_session_id": "ET603AQHW1DIR96IXHZ1", "orig_file_number": 889, "seqno_to_time_mapping": "N/A"}, "status": "Operation aborted: Empty SST file not kept"}
Claude theorized that the following commit introduced the regression: 656b734. Specifically, this line which now passes a vector containing all file types, not just blob files for the blob_file_paths parameter in the BlobFileBuilder ctor:
|
sub_compact->Current().GetOutputFilePathsPtr(), |
The theory is that there now exists the following race condition:
- Open a blob file, and push that to the tail of the vector
- Open an SST file, and push that to the tail of the vector
- Decide that the SST file from [2] should not be written (the "Operation aborted..." outcome)
- Close the blob file from [1], which assumes it's the last file in the vector and tries to open that file to get the metadata <-- this is where the failure happens
a) It's worth noting that even if the SST file was written, the metadata retrieved here would report the incorrect size IIUC.
The following is able to consistently reproduce the failure in ~5 seconds:
// OPTIONS
[Version]
rocksdb_version=11.1.1
options_file_version=1.1
[DBOptions]
create_if_missing=true
max_subcompactions=2
[CFOptions "default"]
blob_garbage_collection_age_cutoff=1.000000
target_file_size_base=1024
enable_blob_files=true
max_bytes_for_level_base=2048
max_bytes_for_level_multiplier=2.000000
enable_blob_garbage_collection=true
level_compaction_dynamic_level_bytes=false
[TableOptions/BlockBasedTable "default"]
block_size=16384
// C++ repro
std::string key(uint8_t header1, uint8_t header2, uint64_t body) {
auto bytes = std::to_string(header1) + std::to_string(header2);
bytes.resize(2 + sizeof(body));
for (size_t i = 0; i < sizeof(body); ++i) {
size_t shift_amount = (sizeof(body) - 1 - i) * 8;
bytes[i + 2] = static_cast<char>((body >> shift_amount) & 0xFF);
}
return bytes;
}
int main() {
rocksdb::Options options;
std::vector<rocksdb::ColumnFamilyDescriptor> all_cfs;
rocksdb::Status s = rocksdb::LoadOptionsFromFile(
{}, "/path/to/OPTIONS", &options, &all_cfs);
assert(s.ok());
std::vector<rocksdb::ColumnFamilyHandle*> cf_handles;
std::unique_ptr<rocksdb::DB> db;
s = rocksdb::DB::Open(options, "/path/to/db", all_cfs, &cf_handles, &db);
assert(s.ok());
rocksdb::CompactRangeOptions compact_options;
compact_options.bottommost_level_compaction =
rocksdb::BottommostLevelCompaction::kForce;
std::string largeVal(4096, '0');
for (uint8_t i = 1; i <= 3; i++) {
s = db->Put(rocksdb::WriteOptions(), key(i, 0, 0), largeVal);
assert(s.ok());
s = db->Put(rocksdb::WriteOptions(), key(i, 2, 0), largeVal);
assert(s.ok());
}
s = db->Flush(rocksdb::FlushOptions());
assert(s.ok());
s = db->CompactRange(compact_options, db->DefaultColumnFamily(), nullptr,
nullptr);
assert(s.ok());
for (uint64_t i = 1;; i++) {
s = db->Put(rocksdb::WriteOptions(), key(1, 1, i), largeVal);
assert(s.ok());
s = db->Put(rocksdb::WriteOptions(), key(2, 1, i), largeVal);
assert(s.ok());
s = db->Put(rocksdb::WriteOptions(), key(3, 1, i), largeVal);
assert(s.ok());
if (i % 10 == 0) {
s = db->Flush(rocksdb::FlushOptions());
assert(s.ok());
s = db->DeleteRange(rocksdb::WriteOptions(), db->DefaultColumnFamily(),
key(2, 0, 0), key(2, UINT8_MAX, 0));
assert(s.ok());
s = db->DeleteRange(rocksdb::WriteOptions(), db->DefaultColumnFamily(),
key(3, 0, 0), key(3, UINT8_MAX, 0));
assert(s.ok());
s = db->Flush(rocksdb::FlushOptions());
assert(s.ok());
}
if (i % 100 == 0) {
s = db->CompactRange(compact_options, db->DefaultColumnFamily(), nullptr,
nullptr);
assert(s.ok());
}
}
db.reset();
return 0;
}
This consistently fails on 656b734 in ~5s and runs for several minutes on 21a8b5f until I interrupt it, so I think this is almost certainly the root cause.
CC @xingbowang as the original author and @anand1976 as the reviewer
I recently upgraded from v10.9.1 --> v11.1.1 and started seeing errors like the following during compaction:
The errors always follow an "Empty SST file not kept" status like the following:
Claude theorized that the following commit introduced the regression: 656b734. Specifically, this line which now passes a vector containing all file types, not just blob files for the
blob_file_pathsparameter in theBlobFileBuilderctor:rocksdb/db/compaction/compaction_job.cc
Line 1603 in ffb7788
The theory is that there now exists the following race condition:
a) It's worth noting that even if the SST file was written, the metadata retrieved here would report the incorrect size IIUC.
The following is able to consistently reproduce the failure in ~5 seconds:
This consistently fails on 656b734 in ~5s and runs for several minutes on 21a8b5f until I interrupt it, so I think this is almost certainly the root cause.
CC @xingbowang as the original author and @anand1976 as the reviewer