Fix flaky preallocation tests on btrfs, zfs, tmpfs, and overlayfs#14744
Fix flaky preallocation tests on btrfs, zfs, tmpfs, and overlayfs#14744pdillinger wants to merge 4 commits into
Conversation
Summary: Fix flaky test failures in EnvPosixTestWithParam.AllocateTest and DBWALTest.TruncateLastLogAfterRecoverWithFlush that occur when running on filesystems where preallocated space is not reliably reflected in st_blocks. The tests use stat() to check st_blocks and verify that fallocate with FALLOC_FL_KEEP_SIZE actually preallocates disk space. However, on certain filesystems, this check is unreliable: 1. btrfs and zfs: Copy-on-write filesystems where preallocated extents are not reliably reflected in st_blocks, especially under load. 2. tmpfs: Memory filesystem that may not report preallocated blocks in st_blocks. 3. overlayfs: Union filesystem common in containers that may not pass through fallocate properly or report preallocated space. 4. Any filesystem where FALLOC_FL_KEEP_SIZE is not supported: The preallocation will fail silently (error ignored with PermitUncheckedError), leaving only the written data in st_blocks. Changes made: env/env_test.cc: - Added filesystem magic number definitions for TMPFS_MAGIC, OVERLAYFS_SUPER_MAGIC, and ZFS_SUPER_MAGIC - Extended the AllocateTest to skip block count checks on zfs, tmpfs, and overlayfs in addition to btrfs - Added runtime fallback: if st_blocks is less than expected, print a warning and skip the check instead of failing. This handles unknown filesystems or configurations where preallocation isn't supported. db/db_wal_test.cc: - Added includes and filesystem magic number definitions - Added ShouldSkipAllocationCheck() helper function to detect problematic filesystems - Modified TruncateLastLogAfterRecoverWithoutFlush, TruncateLastLogAfterRecoverWithFlush, TruncateLastLogAfterRecoverWALEmpty, and ReadOnlyRecoveryNoTruncate tests to skip allocation checks on problematic filesystems - Added runtime fallback checks similar to env_test.cc These changes make the tests robust against filesystem differences while still validating preallocation behavior on filesystems where it works correctly (ext4, xfs). Test Plan: Many local 'make -j100 check' runs that would previously fail with good probability.
|
| Check | Count |
|---|---|
cppcoreguidelines-pro-type-member-init |
1 |
| Total | 1 |
Details
db/db_wal_test.cc (1 warning(s))
db/db_wal_test.cc:117:5: warning: uninitialized record type: 'fs_stat' [cppcoreguidelines-pro-type-member-init]
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D105331256. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D105331256. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 611b91a ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 611b91a SummarySound fix for flaky preallocation tests on non-ext4 filesystems. Test-only change, no production code affected. The approach is consistent with the existing btrfs skip pattern and correctly extends it to zfs, tmpfs, and overlayfs. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1.
|
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D105331256. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit ce4cb13 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit ce4cb13 SummaryClean test-only PR that fixes flaky preallocation tests on btrfs, zfs, tmpfs, and overlayfs. The approach is sound and consistent with the existing btrfs-skip pattern. No production code changes. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Printf format specifier portability --
|
| Context | Does code execute? | Assumptions hold? | Action needed? |
|---|---|---|---|
| Linux ext4/xfs | YES | YES - all assertions fire | None |
| Linux btrfs/zfs/tmpfs/overlayfs | YES | NO - st_blocks unreliable | Skipped (this PR) |
| Non-Linux (macOS, Windows) | NO | N/A - guarded by POSIX/fallocate | None |
| 32-bit Linux | Theoretically YES | %lu format risk |
M1 covers this |
Positive Observations
- The
(void)file_name;pattern inShouldSkipAllocationCheckcorrectly suppresses unused parameter warnings on non-Linux. - The
static_cast<decltype(fs_stat.f_type)>pattern for magic number comparison is correct and avoids signed/unsigned comparison warnings. - Filesystem magic numbers are all verified correct against Linux kernel headers (
linux/magic.h). - The runtime fallback in
env_test.ccis a nice defense-in-depth: even if a new unknown filesystem doesn't support preallocation, the test gracefully degrades instead of flaking. ShouldSkipAllocationCheckfails safe -- ifstatfs()fails, it returnsfalse(checks are NOT skipped), which is the conservative direction.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@pdillinger merged this pull request in acfa68c. |
…4744) Summary: Fix flaky test failures in EnvPosixTestWithParam.AllocateTest and DBWALTest.TruncateLastLogAfterRecoverWithFlush that occur when running on filesystems where preallocated space is not reliably reflected in st_blocks. The tests use stat() to check st_blocks and verify that fallocate with FALLOC_FL_KEEP_SIZE actually preallocates disk space. However, on certain filesystems, this check is unreliable: 1. btrfs and zfs: Copy-on-write filesystems where preallocated extents are not reliably reflected in st_blocks, especially under load. 2. tmpfs: Memory filesystem that may not report preallocated blocks in st_blocks. 3. overlayfs: Union filesystem common in containers that may not pass through fallocate properly or report preallocated space. 4. Any filesystem where FALLOC_FL_KEEP_SIZE is not supported: The preallocation will fail silently (error ignored with PermitUncheckedError), leaving only the written data in st_blocks. Changes made: env/env_test.cc: - Added filesystem magic number definitions for TMPFS_MAGIC, OVERLAYFS_SUPER_MAGIC, and ZFS_SUPER_MAGIC - Extended the AllocateTest to skip block count checks on zfs, tmpfs, and overlayfs in addition to btrfs - Added runtime fallback: if st_blocks is less than expected, print a warning and skip the check instead of failing. This handles unknown filesystems or configurations where preallocation isn't supported. db/db_wal_test.cc: - Added includes and filesystem magic number definitions - Added ShouldSkipAllocationCheck() helper function to detect problematic filesystems - Modified TruncateLastLogAfterRecoverWithoutFlush, TruncateLastLogAfterRecoverWithFlush, TruncateLastLogAfterRecoverWALEmpty, and ReadOnlyRecoveryNoTruncate tests to skip allocation checks on problematic filesystems - Added runtime fallback checks similar to env_test.cc These changes make the tests robust against filesystem differences while still validating preallocation behavior on filesystems where it works correctly (ext4, xfs). Pull Request resolved: #14744 Test Plan: Many local 'make -j100 check' runs that would previously fail with good probability. Reviewed By: hx235 Differential Revision: D105331256 Pulled By: pdillinger fbshipit-source-id: 862a1512da1466cb037af15342404939b677c02a
Summary:
Fix flaky test failures in EnvPosixTestWithParam.AllocateTest and DBWALTest.TruncateLastLogAfterRecoverWithFlush that occur when running on filesystems where preallocated space is not reliably reflected in st_blocks.
The tests use stat() to check st_blocks and verify that fallocate with FALLOC_FL_KEEP_SIZE actually preallocates disk space. However, on certain filesystems, this check is unreliable:
btrfs and zfs: Copy-on-write filesystems where preallocated extents are not reliably reflected in st_blocks, especially under load.
tmpfs: Memory filesystem that may not report preallocated blocks in st_blocks.
overlayfs: Union filesystem common in containers that may not pass through fallocate properly or report preallocated space.
Any filesystem where FALLOC_FL_KEEP_SIZE is not supported: The preallocation will fail silently (error ignored with PermitUncheckedError), leaving only the written data in st_blocks.
Changes made:
env/env_test.cc:
db/db_wal_test.cc:
These changes make the tests robust against filesystem differences while still validating preallocation behavior on filesystems where it works correctly (ext4, xfs).
Test Plan:
Many local 'make -j100 check' runs that would previously fail with good probability.