Skip to content

Improve performance of std.stripChars and std.trim functions#555

Merged
stephenamar-db merged 1 commit intomasterfrom
stripPerf
Nov 24, 2025
Merged

Improve performance of std.stripChars and std.trim functions#555
stephenamar-db merged 1 commit intomasterfrom
stripPerf

Conversation

@stephenamar-db
Copy link
Copy Markdown
Collaborator

Remove regular expressions usage - it's an overkill.

Before:

[1-113] Java HotSpot(TM) 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
0.943 ms/op
0.761 ms/op

After:

[1-113] Java HotSpot(TM) 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
0.790 ms/op
0.674 ms/op
@stephenamar-db stephenamar-db force-pushed the stripPerf branch 2 times, most recently from abd85a9 to ae243d0 Compare November 21, 2025 21:36
@stephenamar-db stephenamar-db merged commit a80a3b4 into master Nov 24, 2025
6 checks passed
@stephenamar-db stephenamar-db deleted the stripPerf branch November 24, 2025 22:25
stephenamar-db pushed a commit that referenced this pull request Dec 4, 2025
#560)

This is a followup to #555
and the subsequent fix in
#557.

Even after that fix, there are two remaining bugs related to stripping
from the right side of a string:

1. Stripping emoji from end: `std.rstripChars("hello🎉🎉🎉", "🎉")` returned
`"hello🎉🎉🎉"` instead of `"hello"` (nothing stripped)
2. Stripping ASCII after emoji: `std.trim("🌍 ")` returned `"?"` instead
of`"🌍"` (i.e. the emoji was corrupted)

The root cause (explained by Claude) is that when iterating from the
right of a string, `codePointAt(str.length - 1)` points to a low
surrogate and it gets treated as an unpaired surrogate rather than
seeking backwards to find the full code point.

The fix: use `codePointBefore(end)` (where `end` ranges from
`string.length` down to `1`) for right-to-left iteration. Unlike
`codePointAt()`, `codePointBefore()` correctly reads surrogate pairs
when scanning backwards.

Fix + test are authored by Claude Code.

Co-authored-by: Claude <noreply@anthropic.com>
stephenamar-db added a commit that referenced this pull request Dec 4, 2025
Remove regular expressions usage - it's an overkill.

Before:

```
[1-113] Java HotSpot(TM) 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
0.943 ms/op
0.761 ms/op
```

After:
```
[1-113] Java HotSpot(TM) 64-Bit Server VM warning: -XX:ThreadPriorityPolicy=1 may require system level permission, e.g., being the root user. If the necessary permission is not possessed, changes to priority will be silently ignored.
0.790 ms/op
0.674 ms/op
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants