[release/10.0] Enhance createdump to detect alt stack execution#127071
Merged
Conversation
Contributor
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes createdump’s native stack unwinding when a crash occurs on an alternate signal stack (sigaltstack/SA_ONSTACK), ensuring the unwind can cross the signal trampoline back to the original thread stack so the dump includes the missing stack memory.
Changes:
- Extend
PAL_VirtualUnwindOutOfProcto report whether the current frame is a signal trampoline using libunwind’sunw_is_signal_frame. - Update createdump’s
UnwindNativeFramesmonotonic-SP guard to allow a one-time SP decrease immediately after a signal trampoline frame. - Plumb the new PAL API parameter through createdump’s PAL shim/wrapper.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/pal/src/exception/remote-unwind.cpp | Detect signal trampoline frames (via unw_get_proc_info + unw_is_signal_frame) and return the result to the caller. |
| src/coreclr/pal/inc/pal.h | Update PAL_VirtualUnwindOutOfProc declaration to include the isSignalFrame out parameter. |
| src/coreclr/debug/createdump/threadinfo.cpp | Relax SP monotonicity check to permit crossing from alt stack back to original stack when the prior frame is a signal trampoline. |
| src/coreclr/debug/createdump/createdumppal.cpp | Update the dynamically-resolved PAL function pointer typedef and wrapper to match the new signature. |
This was referenced Apr 17, 2026
Open
Use CFI entries in libc trampoline frames to detect and allow for stack unwind finding SP backwards jumps.
a86fb86 to
f0106fb
Compare
Member
|
/ba-g Known issues |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes .NET 10 version of #126981
Description
createdump's UnwindNativeFrames fails to capture the original thread stack when a crash occurs on a thread using an alternate signal stack. The native unwinder's monotonic-SP guard breaks we cross crosses the signal trampoline back to the original stack, because the SP legitimately decreases. This causes the unwinder to stop early, omitting the original stack memory from the dump.
The fix uses libunwind's
unw_is_signal_frameinPAL_VirtualUnwindOutOfProcto detect signal trampoline frames. When a signal frame is detected,UnwindNativeFramesallows a SP decrease, enabling the unwinder to cross back to the original stack and capture its memory.Customer Impact
Minidumps collected via createdump for crashes on alternate signal stacks are missing the original thread's stack memory. This makes the dumps incomplete and difficult to debug - native frames below the signal handler are absent from the stack walk, and you can only get the managed stack separately via clrstack. Watson and WinDBG both fail to do this automatically.
Regression
We added the SP monotonic check ~7 years ago to prevent corruption unwinding issues.
Testing
The following scenario was tested as proxy of customer's issue: Pinvoke into native library with some frames before hitting a nullref on a secondary thread. Pre-fix the repro shows the early bail unwind. The fix captured the full unwind across the signal trampoline, identifies the libc trampoline, and includes original stack memory in the dump. I also validated the fix works both with and without dwarf unwind info in the crashing native library.
Risk
Low. The change is narrowly scoped to createdump's native unwind path and some of the DAC's lazy state machine unwinding. unw_get_proc_info reuses the same cache unw_step populates - no additional remote memory reads. If unw_is_signal_frame returns false due to some libc not marking the trampoline correctly, behavior is identical to before - no regression.