Revisions to How does a single thread run on multiple cores?

added 442 characters in body

Source Link

edited Jun 3, 2017 at 7:22

489
5
10

summary: Finding and exploiting the (instruction-level) parallelism in a single-threaded program is done purely in hardware, by the CPU core it's running on. And only over a window of a couple hundred instructions, not large-scale reordering.

Single-threaded programs get no benefit from multi-core CPUs, except that other things can run on the other cores instead of taking time away from the single-threaded task.

added 1324 characters in body

Source Link

edited Jun 2, 2017 at 15:32

Peter Cordes

489
5
10

Totally wrong there, too. EachA CPU core has its own front-end which fetches&decodes the instruction-stream of whatever it's currentlyis only running one stream of instructions, if it isn't halted (a user-spaceasleep until the next interrupt, e.g. timer interrupt). Often that's a thread, but it could also be a kernel interrupt handler, or miscellaneous kernel code when user-space makes aif the kernel decided to do something other than just return to the previous thread after handling and interrupt or system call,.

With HyperThreading or other SMT designs, a kernel interrupt handlerphysical CPU core acts like multiple "logical" cores. The only difference from an OS perspective between a quad-core-with-hyperthreading (4c8t) CPU and a plain 8-core machine (8c8t) is that an HT-aware OS will try to schedule threads to separate physical cores so they don't compete with each other. An OS that didn't know about hyperthreading would just see 8 cores (unless you disable HT in the BIOS, then it would only detect 4).

The instructions areterm "front-end" refers to the part of a CPU core that fetches machine code, decodes the instructions, and issues them into the out-of-order part of the core. Each core has its own front-end, and it's part of the core as a whole. Instructions it fetches issuedare intowhat the CPU is currently running.

Inside the out-of-order part of the core, instructions (or uops) are dispatched to execution ports when their input operands are ready and it's only there that reordering happensthere's a free execution port. This doesn't have to exploithappen in program order, so instruction-level parallelism withinthis is how an OOO CPU can exploit the instruction-level parallelism within a single thread.

If you replace "core" with "execution unit" in your idea, you're close to correct. Yes, the CPU does distribute independent instructions/uops to execution units in parallel. (But there's a single threadterminology mix-up, since you said "front-end" when really it's the CPU's instruction-scheduler aka Reservation Station that picks instructions ready to execute).

Totally wrong there, too. Each core has its own front-end which fetches&decodes the instruction-stream of whatever it's currently running (a user-space thread, kernel code when user-space makes a system call, or a kernel interrupt handler).

The instructions are issued into the out-of-order part of the core, and it's only there that reordering happens to exploit instruction-level parallelism within a single thread.

A CPU core is only running one stream of instructions, if it isn't halted (asleep until the next interrupt, e.g. timer interrupt). Often that's a thread, but it could also be a kernel interrupt handler, or miscellaneous kernel code if the kernel decided to do something other than just return to the previous thread after handling and interrupt or system call.

With HyperThreading or other SMT designs, a physical CPU core acts like multiple "logical" cores. The only difference from an OS perspective between a quad-core-with-hyperthreading (4c8t) CPU and a plain 8-core machine (8c8t) is that an HT-aware OS will try to schedule threads to separate physical cores so they don't compete with each other. An OS that didn't know about hyperthreading would just see 8 cores (unless you disable HT in the BIOS, then it would only detect 4).

The term "front-end" refers to the part of a CPU core that fetches machine code, decodes the instructions, and issues them into the out-of-order part of the core. Each core has its own front-end, and it's part of the core as a whole. Instructions it fetches are what the CPU is currently running.

Inside the out-of-order part of the core, instructions (or uops) are dispatched to execution ports when their input operands are ready and there's a free execution port. This doesn't have to happen in program order, so this is how an OOO CPU can exploit the instruction-level parallelism within a single thread.

If you replace "core" with "execution unit" in your idea, you're close to correct. Yes, the CPU does distribute independent instructions/uops to execution units in parallel. (But there's a terminology mix-up, since you said "front-end" when really it's the CPU's instruction-scheduler aka Reservation Station that picks instructions ready to execute).

deleted 69 characters in body

Source Link

edited Jun 2, 2017 at 6:44

Peter Cordes

489
5
10

Congratulations on realizing that your understanding was wrong :P

added 745 characters in body

Source Link

edited Jun 2, 2017 at 1:29

Peter Cordes

489
5
10

Loading

Source Link

answered Jun 2, 2017 at 1:18

Peter Cordes

489
5
10

Loading

Stack Exchange Network

Return to Answer