Skip to content

chore: Use ArrayBuilder to build the array#564

Merged
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:arrayBuilder
Dec 9, 2025
Merged

chore: Use ArrayBuilder to build the array#564
stephenamar-db merged 1 commit intodatabricks:masterfrom
He-Pin:arrayBuilder

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Dec 9, 2025

The ArrayBuffer#toArray will do another copy.

@stephenamar-db stephenamar-db merged commit da96167 into databricks:master Dec 9, 2025
6 checks passed
for (v <- arrValue) {
if (out.isEmpty) {
out.append(v)
val outResult = out.result()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this change is introducing an O(n^2) bug because now we're making a fresh copy of the array on every loop iteration.

I've opened #575 with a suggested fix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need just use the length to fix this.

@He-Pin He-Pin deleted the arrayBuilder branch December 22, 2025 03:04
stephenamar-db pushed a commit that referenced this pull request Dec 22, 2025
…ns (#575)

This PR implements two performance optimizations in `uniqueArr`:

- **Fix an O(n^2) performance bug**:
#564 changed this code to
call `out.result()` on every loop iteration, resulting in copying of an
`O(n)` sized array on each loop iteration. We can avoid this by keeping
a reference to the last output element.
- **Avoid duplicate keyF evaluations**: in the old code, each loop
iteration would repeat the `keyF` evaluation for the last element of the
output array; the new code calls `keyF` exactly once per element by
maintaining a `lastAddedKey` variable during the loop.
- This is similar in spirit to
#245

Claude (Opus 4.5) spotted the O(n^2) bug and designed that part of the
fix. I spotted the duplicate keyF evaluation and suggested this fix
(plus the switch to a `while` loop).

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants