Skip to content

fix(core): scope file search gitignore to repository context#13250

Merged
etraut-openai merged 3 commits intoopenai:mainfrom
fcoury:fix/3493-parent-gitignore-leak
Mar 3, 2026
Merged

fix(core): scope file search gitignore to repository context#13250
etraut-openai merged 3 commits intoopenai:mainfrom
fcoury:fix/3493-parent-gitignore-leak

Conversation

@fcoury
Copy link
Copy Markdown
Contributor

@fcoury fcoury commented Mar 2, 2026

Closes #3493

Problem

When a user's home directory (or any ancestor) contains a broad .gitignore (e.g. * + !.gitignore), the @ file mention picker in Codex silently hides valid repository files like package.json. The picker returns no matches for searches that should succeed. This is surprising because manually typed paths still work, making the failure hard to diagnose.

Mental model

Git itself never walks above the repository root to assemble its ignore list. Its .gitignore resolution is strictly scoped: it reads .gitignore files from the repo root downward, the per-repo .git/info/exclude, and the user's global excludes file (via core.excludesFile). A .gitignore sitting in a parent directory above the repo root has no effect on git status, git ls-files, or any other git operation. Our file search should replicate this contract exactly.

The ignore crate's WalkBuilder has a require_git flag that controls whether it follows this contract:

  • require_git(false) (the previous setting): the walker reads .gitignore files from all ancestor directories, even those above or outside the repository root. This is a deliberate divergence from git's behavior in the ignore crate, intended for non-git use cases. It means a ~/.gitignore with * will suppress every file in the walk—something git itself would never do.

  • require_git(true) (this fix): the walker only applies .gitignore semantics when it detects a .git directory, scoping ignore resolution to the repository boundary. This matches git's own behavior: parent .gitignore files above the repo root have no effect.

The fix is a one-line change: require_git(false) becomes require_git(true).

How require_git(false) got here

The setting was introduced in af338cc (#2981, "Improve @ file search: include specific hidden dirs such as .github, .gitlab"). That PR's goal was to make hidden directories like .github and .vscode discoverable by setting .hidden(false) on the walker. The require_git(false) was added alongside it with the comment "Don't require git to be present to apply git-related ignore rules"—the author likely intended gitignore rules to still filter results even when no .git directory exists (e.g. searching an extracted tarball that has a .gitignore but no .git).

The unintended consequence: with require_git(false), the ignore crate walks above the search root to find .gitignore files in ancestor directories. This is a side effect the original author almost certainly didn't anticipate. The PR message says "Preserve .gitignore semantics," but require_git(false) actually breaks git's semantics by applying ancestor ignore files that git itself would never read.

In short: the intent was "apply gitignore even without .git" but the effect was "apply gitignore from every ancestor directory." This fix restores git-correct scoping.

Non-goals

  • This PR does not change behavior when respect_gitignore is false (that path already disables all git-related ignore rules).
  • The first test (parent_gitignore_outside_repo_does_not_hide_repo_files) intentionally omits git init. The ignore crate's require_git(true) causes it to skip gitignore processing entirely when no .git exists, which is the desired behavior for that scenario. A second test (git_repo_still_respects_local_gitignore_when_enabled) covers the complementary case with a real git repo.

Tradeoffs

Behavioral shift: With require_git(true), directories that contain .gitignore files but are not inside a git repository will no longer have those ignore rules applied during @ search. This is a correctness improvement for the primary use case (searching inside repos), but changes behavior for the edge case of searching non-repo directories that happen to have .gitignore files. In practice, Codex is overwhelmingly used inside git repositories, so this tradeoff strongly favors the fix.

Two test strategies: The first test omits git init to verify parent ignore leakage is blocked; the second runs git init to verify the repo's own .gitignore is still honored. Together they cover both sides of the require_git(true) contract.

Architecture

The change is in walker_worker() within codex-rs/file-search/src/lib.rs, which configures the ignore::WalkBuilder used by the file search walker thread. The walker feeds discovered file paths into nucleo for fuzzy matching. The require_git flag controls whether the walker consults .gitignore files at all—it sits upstream of all ignore processing.

walker_worker
  └─ WalkBuilder::new(root)
       ├─ .hidden(false)         — include dotfiles
       ├─ .follow_links(true)    — follow symlinks
       ├─ .require_git(true)     — ← THE FIX: only apply gitignore in git repos
       └─ (conditional) git_ignore(false), git_global(false), etc.
            └─ applied when respect_gitignore == false

Tests

  • parent_gitignore_outside_repo_does_not_hide_repo_files: creates a temp directory tree with a parent .gitignore containing *, a child "repo" directory with package.json and .vscode/settings.json, and asserts that both files are discoverable via run() with respect_gitignore: true.
  • git_repo_still_respects_local_gitignore_when_enabled: the complementary test—runs git init inside the child directory and verifies that the repo's own .gitignore exclusions still work (e.g. .vscode/extensions.json is excluded while .vscode/settings.json is whitelisted). Confirms that require_git(true) does not disable gitignore processing inside actual git repositories.
fcoury added 2 commits March 2, 2026 10:38
Update `codex-file-search` to require a git context before applying
`.gitignore` semantics during `@` file discovery.

Git itself never reads `.gitignore` from ancestor directories above
the repository root. The previous `require_git(false)` setting caused
the `ignore` crate to walk parent directories for `.gitignore` files,
which could hide valid files like `package.json` and cause false
`no matches` results.

This prevents parent directories outside the workspace from leaking
ignore rules into repository file search.
Rename `whitelist_results` to `nested_file_results` in the no-git-context
regression test so the assertion intent is explicit and doesn't imply repo
whitelist semantics are being exercised.

This aligns the test wording with actual behavior and addresses reviewer
feedback while keeping the underlying regression guarantees unchanged.
Copilot AI review requested due to automatic review settings March 2, 2026 16:34
@fcoury
Copy link
Copy Markdown
Contributor Author

fcoury commented Mar 2, 2026

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes @ file mention picker’s file-walk behavior to align .gitignore scoping with git’s repository boundary, preventing parent-directory ignore files (e.g. from $HOME) from suppressing valid repo results.

Changes:

  • Switch ignore::WalkBuilder from require_git(false) to require_git(true) to stop applying ancestor .gitignore files above the repo root.
  • Add/expand documentation explaining the ignore scoping behavior.
  • Add regression tests covering (1) leakage from parent .gitignore outside a repo and (2) honoring repo .gitignore inside an initialized repo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector
Copy link
Copy Markdown
Contributor

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Clarify `respect_gitignore` documentation to reflect that disabling it turns
off `.gitignore`, git-global/exclude rules, `.ignore`, and parent-directory
ignore scanning.

Make git-context tests more hermetic by creating a minimal `.git` directory
instead of shelling out to `git init`, and add `!package.json` to fixture
rules to reduce flakiness from user global ignore settings.
Copy link
Copy Markdown
Collaborator

@etraut-openai etraut-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good analysis and fix!

@etraut-openai etraut-openai merged commit 745c48b into openai:main Mar 3, 2026
47 of 53 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants