Skip to content

Fix O(n^2) performance bug in uniqArr; avoid duplicate keyF evaluations#575

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
JoshRosen:fix-nsquared-uniqueArr
Dec 22, 2025
Merged

Fix O(n^2) performance bug in uniqArr; avoid duplicate keyF evaluations#575
stephenamar-db merged 1 commit intodatabricks:masterfrom
JoshRosen:fix-nsquared-uniqueArr

Conversation

@JoshRosen
Copy link
Copy Markdown
Contributor

@JoshRosen JoshRosen commented Dec 21, 2025

This PR implements two performance optimizations in uniqueArr:

  • Fix an O(n^2) performance bug: chore: Use ArrayBuilder to build the array #564 changed this code to call out.result() on every loop iteration, resulting in copying of an O(n) sized array on each loop iteration. We can avoid this by keeping a reference to the last output element.
  • Avoid duplicate keyF evaluations: in the old code, each loop iteration would repeat the keyF evaluation for the last element of the output array; the new code calls keyF exactly once per element by maintaining a lastAddedKey variable during the loop.

Claude (Opus 4.5) spotted the O(n^2) bug and designed that part of the fix. I spotted the duplicate keyF evaluation and suggested this fix (plus the switch to a while loop).

The previous implementation called ArrayBuilder.result() inside the loop,
which creates a new array copy on each iteration, resulting in O(n²) behavior.

This fix tracks lastAddedKey separately to compare against the previous
element's key without rebuilding the array. Also uses a while loop instead
of foreach to avoid closure allocation overhead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@stephenamar-db stephenamar-db merged commit fb2a9cd into databricks:master Dec 22, 2025
9 checks passed
@He-Pin
Copy link
Copy Markdown
Contributor

He-Pin commented Dec 22, 2025

@JoshRosen Thanks, I think we may need use Ai agent to do a detailed review around the current implementation before we cut 1.0.0

@JoshRosen JoshRosen deleted the fix-nsquared-uniqueArr branch December 22, 2025 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants