Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

5
  • For a 3-vector sum, I'd need to set the fourth component to zero first. What's the fastest way to do that? I'm tending towards "load mask, andps" - is there a fast way to mask out an element? Commented Aug 9, 2011 at 16:19
  • I don't see any faster way than ANDPS, which is one instruction (the mask being constant of course). Commented Aug 9, 2011 at 16:35
  • @FeepingCreature __m128 vector3 = _mm_castps_si128(_mm_castsi128_ps(_mm_srli_si128(vector4, 4))); - this may be faster than masking depending on whether your mask is already loaded from memory Commented Dec 20, 2013 at 20:53
  • Hi, How does it compare to @Peter Cordes SSE3 Solution? Thank You. Commented Feb 6, 2017 at 19:55
  • 1
    @Royi: see Peter's comments in his answer, under the heading "SSE3 optimizing for code-size". Commented Feb 6, 2017 at 23:37