Newest 'cpu' Questions - Stack Overflow

Advice

1 vote

2 replies

134 views

How the Computer Handles Interrupts

What is the difference between an interrupt and a context switch? I understand the concept of an interrupt and how it occurs. However, I'm digging deeper into the topic. I studied Computer ...

Gabriele

11

asked Nov 8 at 19:25

3 votes

1 answer

146 views

How to catch EXCEPTION_PRIV_INSTRUCTION from RDPMC directly in Assembly (and without SEH)?

I'm experimenting with measuring CPU's instructions latency and throughput on P and E cores using RDPMC on Win 11, something like that: MOV ECX, 0x40000000 ; Instructions Counter RDPMC ; Read ...

Andrey Dmitriev

179

asked Oct 21 at 18:37

0 votes

1 answer

64 views

Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]

I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false. How is this possible? How ...

Ali Hosseini

1

asked Oct 10 at 12:38

1 vote

1 answer

88 views

Randomness instructions vs syscalls [closed]

I've been digging into "true" randomness idea, and I've noticed that modern CPUs support instructions for generating randomness. X64 has RDRAND instruction, while ARM has RNDR (I'm not ...

freakish

57k

asked Sep 29 at 8:00

1 vote

1 answer

108 views

Is CPU multithreading effected by divergence?

Building on this question here The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...

bigcodeszzer

960

asked Sep 18 at 1:37

0 votes

1 answer

285 views

How to handle "Could not initialize NNPACK! Reason: Unsupported hardware" warning in PyTorch / Silero VAD on cloud CPU?

I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...

Uktamjon

11

asked Sep 15 at 14:16

7 votes

1 answer

226 views

Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?

I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled). My test loop is ...

Andrey Dmitriev

179

asked Sep 12 at 9:26

2 votes

0 answers

71 views

Need to do CPU profiling of Jruby application

Need to do CPU profiling for Jruby application (jruby version : 1.7.20.1-8) which uses ruby version (1.9.3). I tried using default profiler but getting below error due to version compatibility issue ...

maulik trapasiya

745

asked Sep 7 at 18:30

0 votes

1 answer

52 views

Fargate Cloudwatch CPU Utilisation differs from docker stats

Looking at the CPUUtilized Cloudwatch metric for my Fargate service, it's showing max cpu units used as 1040 over the past 4 weeks, using a sampling period of 1 minute. I have 4 vCPUs provisioned to ...

Seanf123

1

asked Sep 7 at 17:41

0 votes

1 answer

170 views

Performance regression in a Kubernetes deployment that does not occur locally [closed]

I have a docker image and an EC2. When I run this image on my EC2, it takes x seconds to finish. When I run the app natively, it also takes x seconds. But if I deploy the exact image in a container in ...

wildcat

81

asked Sep 1 at 17:50

2 votes

0 answers

209 views

Why does floating point division take less than 50% of the latency of integer division and also 10x more latency than usual when underflow occurs?

I am measuring the latency of instructions. For 64-bit primitives, integer division takes about 25 cycles each, usually on my 2.3GHz Digital Ocean vCPU, while floating point division takes about 10 ...

Zack Light

362

asked Aug 22 at 5:35

0 votes

0 answers

71 views

Why must align memory address

Memory addresses must be aligned before they are used. I know that if they are not, performance costs more in CPU caching. I discovered that certain processors raise exceptions when unaligned memories ...

LEE LUNA

1

asked Jul 8 at 9:39

-3 votes

1 answer

110 views

Understanding when a hazard in MIPS occurs

I have a question regarding these two instructions: lw r2, 10(r1) lw r1, 10(r2) Is there a hazard here, do I need stalls in between two of them? I want to know if any kind of hazard happens here? I ...

mer mer

17

asked Jun 28 at 15:34

1 vote

0 answers

43 views

How to optimize CPU tensor slicing and asynchronous transfer to the GPU?

My code involves slicing large tensors on the CPU by index and asynchronously transmitting them back to the GPU. However, through the Profiler debugging tool, I found that this step would seriously ...

Ponytail

11

asked Jun 19 at 16:19

1 vote

0 answers

85 views

popcnt instruction not as fast as loop on core ultra 155h [duplicate]

I think the title says it all: i have implemented a popcnt function that counts bits as a loop with shifts and one with inline asm with the actual cpu instruction. This is my c code: #define ...

newbee.a

12

asked Jun 17 at 10:25

Collectives™ on Stack Overflow

How the Computer Handles Interrupts

How to catch EXCEPTION_PRIV_INSTRUCTION from RDPMC directly in Assembly (and without SEH)?

Cache Allocation Technology in 13th Generation Core i9 13900E Intel CPU [closed]

Randomness instructions vs syscalls [closed]

Is CPU multithreading effected by divergence?

How to handle "Could not initialize NNPACK! Reason: Unsupported hardware" warning in PyTorch / Silero VAD on cloud CPU?

Why are all IMUL µOPs dispatched to Port 1 only (on Haswell), even when multiple IMULs are executed in parallel?

Need to do CPU profiling of Jruby application

Fargate Cloudwatch CPU Utilisation differs from docker stats

Performance regression in a Kubernetes deployment that does not occur locally [closed]

Why does floating point division take less than 50% of the latency of integer division and also 10x more latency than usual when underflow occurs?

Why must align memory address

Understanding when a hazard in MIPS occurs

How to optimize CPU tensor slicing and asynchronous transfer to the GPU?

popcnt instruction not as fast as loop on core ultra 155h [duplicate]

Hot Network Questions