xor will tag the register as having the upper parts zeroed, so xor eax, eax / inc al / inc eax avoids the usual partial-register penalty that pre-IvB CPUs have. Even without xor, IvB and later only needs a merging uop when the high 8bits (AH) are modified and then the whole register is read, and. (Agner incorrectly states that Haswell even removes thatAH merging penalties.)
...
call some_func
xor ecx,ecx ; zero *before* thesetting testFLAGS
testcmp eax,eax 42
setnz cl ; ecx = cl = (some_func() != 042)
add ebx, ecx ; no partial-register penalty here
This has optimal performance on all CPUs (no stalls, merging uops, or false dependencies). (If the condition was ebx += (eax != 0), there are tricks like cmp eax, 1; sbb ebx, -1 using the carry flag with adc or sbb to add or subtract it directly, instead of materializing it as a 0/1 integer, as @l4m2 pointed out in comments. It might even be worth it to do sub eax, 42 (or LEA into another reg) / cmp eax,1 / sbb. Especially if it's hard to arrange to xor-zero before setting FLAGS, since cmp/setcc/movzx/add has all 4 operations on the critical path for latency.)
There are no recognized zeroing idioms that don't affect flags, so the best choice depends on the target microarchitecture. On Core2, inserting a merging uop might cause a 2 or 3 cycle stall. It appears to be It's cheaper on SnB, but I didn't spend much time trying to measurelike 1 cycle at worst, and Haswell and later don't rename partial registers separately from full regs. Using mov reg, 0 / setcc is probably best on recent CPUs, but would have a significant penalty on older Intel CPUs, (Nehalem and still be somewhat worse onearlier). On newer IntelCPUs it's close to as good as xor-zeroing, but has worse code-size than movzx.
Using setcc / movzx r32, r8 is probably the best alternative for Intel P6 & SnB families, if you can't xor-zero ahead of the flag-setting instruction. That should be better than repeating the test after an xor-zeroing. (Don't even consider sahf / lahf or pushf / popf). IvB and later (except for Ice Lake) can eliminate movzx r32, r8 (i.e. handle it with register-renaming with no execution unit or latency, like xor-zeroing). Haswell and later AMD Zen family can only eliminate regular mov instructions, so movzx takes an execution unit and has non-zero latency, making test/setcc/movzx worse than xor/test/setcc, but still at least as good as.
Also worse than test/mov r,0/setcc (andbut much better on older Intel CPUs with partial-register stalls).
Using setcc / movzx with no zeroing first is bad on AMD/P4/Silvermont, because they don't track deps separately for sub-registers. There would be a false dep on the old value of the register. Using mov reg, 0/setcc Using mov reg, 0/setcc for zeroing / dependency-breaking is probably the best alternative when xor/test/setcc isn't an option. At least for zeroing / dependency-breaking"hot" code where this is probably the best alternative whenpart of an important latency chain. Otherwise go for xor/test/setccmovzx isn't an optionto save a bit of code size.