Fun fact: fld m32/m64 can raise / flag an FP exception (#IA) if the source operand is SNaN, but Intel's manual says this can't happen if the source operand is in double extended-precision floating-point format. So it can just stuff the bits into an x87 register without looking at them, unlike fld m32 / m64fld m32/m64 where it has to expand the significand/exponent fields.
- to 80-bit long double: 64-bit significand precision. The
finit default, and normal setting except with MSVC.
- to 64-bit double: 53-bit significand precision. 32-bit MSVC sets this.
- to 2432-bit float: 24-bit significandsignificand precision.
Apparently the D3D9 library init function sets x87 precision to 24-bit significand single-precision float, making everything less precise for a speed gain on fdivfdiv/fsqrtfsqrt (and maybe fcos/fsin and other slow microcoded instructions, too.) But x87 precision settings are per-thread, so it matters which thread you call the init function from! (The x87 control word is part of the architectural state that context switches save/restore.)
Of course you can set it back to 64-bit significand with _controlfp_s, so you could usefulusefully use asm, or call a function using long double compiled by GCC, clang, or ICC. But beware the ABI differences: you can only pass it inputs as float, double, or integer, because MSVC won't ever create objects in memory in the 80-bit x87 format.