Skip to main content
Advice
1 vote
2 replies
134 views

What is the difference between an interrupt and a context switch? I understand the concept of an interrupt and how it occurs. However, I'm digging deeper into the topic. I studied Computer ...
Gabriele's user avatar
3 votes
1 answer
146 views

I'm experimenting with measuring CPU's instructions latency and throughput on P and E cores using RDPMC on Win 11, something like that: MOV ECX, 0x40000000 ; Instructions Counter RDPMC ; Read ...
Andrey Dmitriev's user avatar
0 votes
1 answer
64 views

I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false. How is this possible? How ...
Ali Hosseini's user avatar
1 vote
1 answer
88 views

I've been digging into "true" randomness idea, and I've noticed that modern CPUs support instructions for generating randomness. X64 has RDRAND instruction, while ARM has RNDR (I'm not ...
freakish's user avatar
  • 57k
1 vote
1 answer
108 views

Building on this question here The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...
bigcodeszzer's user avatar
0 votes
1 answer
285 views

I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...
Uktamjon's user avatar
7 votes
1 answer
226 views

I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled). My test loop is ...
Andrey Dmitriev's user avatar
2 votes
0 answers
71 views

Need to do CPU profiling for Jruby application (jruby version : 1.7.20.1-8) which uses ruby version (1.9.3). I tried using default profiler but getting below error due to version compatibility issue ...
maulik trapasiya's user avatar
0 votes
1 answer
52 views

Looking at the CPUUtilized Cloudwatch metric for my Fargate service, it's showing max cpu units used as 1040 over the past 4 weeks, using a sampling period of 1 minute. I have 4 vCPUs provisioned to ...
Seanf123's user avatar
0 votes
1 answer
170 views

I have a docker image and an EC2. When I run this image on my EC2, it takes x seconds to finish. When I run the app natively, it also takes x seconds. But if I deploy the exact image in a container in ...
wildcat's user avatar
  • 81
2 votes
0 answers
209 views

I am measuring the latency of instructions. For 64-bit primitives, integer division takes about 25 cycles each, usually on my 2.3GHz Digital Ocean vCPU, while floating point division takes about 10 ...
Zack Light's user avatar
0 votes
0 answers
71 views

Memory addresses must be aligned before they are used. I know that if they are not, performance costs more in CPU caching. I discovered that certain processors raise exceptions when unaligned memories ...
LEE LUNA's user avatar
-3 votes
1 answer
110 views

I have a question regarding these two instructions: lw r2, 10(r1) lw r1, 10(r2) Is there a hazard here, do I need stalls in between two of them? I want to know if any kind of hazard happens here? I ...
mer mer's user avatar
  • 17
1 vote
0 answers
43 views

My code involves slicing large tensors on the CPU by index and asynchronously transmitting them back to the GPU. However, through the Profiler debugging tool, I found that this step would seriously ...
Ponytail's user avatar
1 vote
0 answers
85 views

I think the title says it all: i have implemented a popcnt function that counts bits as a loop with shifts and one with inline asm with the actual cpu instruction. This is my c code: #define ...
newbee.a's user avatar

15 30 50 per page
1
2 3 4 5
315