Skip to content

ascii.rs optimization since ptr is aligned #193

@danakj

Description

@danakj

On line 186, there is a call to_mm_loadu_si128:

let chunk = _mm_loadu_si128(ptr as *const __m128i);

That call does not require the pointer to be aligned at all. But it could be replaced with _mm_load_si128, which requires alignment to a 16-byte boundary.

On line 139, the ptr is aligned to the VECTOR_ALIGN mask, which aligns it to size_of::<__m128i>(), or to 16 bytes.

bstr/src/ascii.rs

Lines 120 to 121 in 3dc5939

const VECTOR_SIZE: usize = core::mem::size_of::<__m128i>();
const VECTOR_ALIGN: usize = VECTOR_SIZE - 1;

https://github.com/BurntSushi/bstr/blob/3dc5939f30daa1a8a6e5cc346bb77841f19ea415/src/ascii.rs#L139C9-L139C12

The ptr is always advanced by VECTOR_LOOP_SIZE in a loop, which is a multiple of 16 bytes:

bstr/src/ascii.rs

Lines 120 to 122 in 3dc5939

const VECTOR_SIZE: usize = core::mem::size_of::<__m128i>();
const VECTOR_ALIGN: usize = VECTOR_SIZE - 1;
const VECTOR_LOOP_SIZE: usize = 4 * VECTOR_SIZE;

https://github.com/BurntSushi/bstr/blob/3dc5939f30daa1a8a6e5cc346bb77841f19ea415/src/ascii.rs#L180C36-L180C52

And then further advanced by VECTOR_SIZE which is 16 bytes:

const VECTOR_SIZE: usize = core::mem::size_of::<__m128i>();

ptr = ptr.add(VECTOR_SIZE);

So in the loop at L186, the pointer is always aligned to 16 bytes:

let chunk = _mm_loadu_si128(ptr as *const __m128i);

The aligned version of the function was pointed out by @anforowicz during unsafe code audit: https://chromium-review.googlesource.com/c/chromium/src/+/5925797/comment/f08dc00c_1b24061c/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions