Skip to content

optimize lz matchfinding loop (#826)#826

Closed
Victor-C-Zhang wants to merge 1 commit into
facebook:devfrom
Victor-C-Zhang:export-D109051308
Closed

optimize lz matchfinding loop (#826)#826
Victor-C-Zhang wants to merge 1 commit into
facebook:devfrom
Victor-C-Zhang:export-D109051308

Conversation

@Victor-C-Zhang

@Victor-C-Zhang Victor-C-Zhang commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary:

This is v2 of D108382572. During testing, it was discovered that an optimized scalar loop (this diff) was equal to, and in some cases faster than a vectorized implementation, on both x86 and arm. This diff makes this change and leaves a marker for future contributors interested in optimizing the LZ kernels.

To future contributors: the code attached in D108382572 is neutral to slightly worse on Skylake, Bergamo, and Turin, and significantly worse on Grace. However, this is not a universal fact across the test corpus, where individual file variation is sometimes +/-5% on total compression speed. This likely depends on the data being compressed, but for the general case, it's worse.

Reviewed By: terrelln

Differential Revision: D109051308

@meta-cla meta-cla Bot added the cla signed label Jun 18, 2026
@meta-codesync

meta-codesync Bot commented Jun 18, 2026

Copy link
Copy Markdown

@Victor-C-Zhang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D109051308.

Summary:

This is v2 of D108382572. During testing, it was discovered that an optimized scalar loop (this diff) was equal to, and in some cases faster than a vectorized implementation, on both x86 and arm. This diff makes this change and leaves a marker for future contributors interested in  optimizing the LZ kernels.

To future contributors: the code attached in D108382572 is neutral to slightly worse on Skylake, Bergamo, and Turin, and significantly worse on Grace. However, this is not a universal fact across the test corpus, where individual file variation is sometimes +/-5% on *total* compression speed. This likely depends on the data being compressed, but for the general case, it's worse.

Reviewed By: terrelln

Differential Revision: D109051308
@meta-codesync meta-codesync Bot changed the title optimize lz matchfinding loop Jun 18, 2026
@meta-codesync meta-codesync Bot closed this in ea4e4e3 Jun 18, 2026
@meta-codesync meta-codesync Bot added the Merged label Jun 18, 2026
@meta-codesync

meta-codesync Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request has been merged in ea4e4e3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment