Number of branches reported by perf stat not equal to real number of test command

Ask Question

Asked 5 days ago

Modified 4 days ago

Viewed 112 times

I test a simple program via perf stat

#include <cassert>
#include <cstddef>
#include <iostream>

int main(int argc, const char* argv[]) {
    assert(argc == 3);
    int64_t iters = atoll(argv[1]);
    int64_t step = atoll(argv[2]);

    int64_t value = 0;

    for(int64_t i = 0; i < iters; ++i) {
        value += step;
    }
    std::cout << value << std::endl;

    return 0;
}

Easy to see in godbolt (https://godbolt.org/z/d831q1c9W) that there is only one repeated branch according to the cycle (jge .LBB0_4 by link, i guess)

The command to build I use is

clang++ -std=c++2b bp.cpp tp2.cpp -o bp.exe -Wall -O0 -DNDEBUG

If I run

perf stat ./bp.exe 10000000 700 500 1013 2>&1 | grep branches | tee run4_1.txt
perf stat ./bp.exe 20000000 700 500 1013 2>&1 | grep branches | tee run4_2.txt
perf stat ./bp.exe 30000000 700 500 1013 2>&1 | grep branches | tee run4_3.txt

It outputs:

+ perf stat ./bp_arc.exe 10000000 700
+ grep branches
+ tee run4_1_arc.txt
          15516156      branches:u                #  483.498 M/sec                    (66.64%)
              4390      branch-misses:u           #    0.03% of all branches          (63.53%)
+ perf stat ./bp_arc.exe 20000000 700
+ grep branches
+ tee run4_2_arc.txt
          30860233      branches:u                #  534.704 M/sec                    (67.69%)
              6400      branch-misses:u           #    0.02% of all branches          (65.99%)
+ perf stat ./bp_arc.exe 30000000 700
+ grep branches
+ tee run4_3_arc.txt
          47616449      branches:u                #  535.042 M/sec                    (67.21%)
              6531      branch-misses:u           #    0.01% of all branches          (66.81%)

So, number of branches reported by perf stat is in ~1.5 greater than the actual number of iterations.

In some other setup I see numbers with multiplier about 2.1.

So the question: what kind of branches is reported by perf stat -> branches counters, that reasons of difference between number of checks (tests) visibly in asm and reported counters.

edited Apr 26 at 15:13

RKou

5,4493 gold badges14 silver badges39 bronze badges

asked Apr 25 at 13:04

ilnurKh

113 bronze badges

New contributor

I suspect this has something to do with hardware branch prediction.

Barmar
– Barmar

2026-04-25 17:14:22 +00:00
Commented Apr 25 at 17:14
1

I've found perf to be one of the buggiest parts of the kernel - it's extremely hardware-dependent, and I suspect something about it's not getting saved/restored properly when rescheduling happens (which is pretty frequent). It's quite possible that the recorded stats are actually a mix of your process and some other process (including the idle process) or kernel interrupts or ...

o11c
– o11c

2026-04-25 18:48:23 +00:00
Commented Apr 25 at 18:48
1

There is lot more that happened in the process than just the loop in your code. First ld had to load your binary and the shared libraries. Then you are also calling shared lib functions. So it makes sense that it's much higher than the number of iterations in your code. P.S. As an answer in related question suggested by SO says, use perf_event_open.

singhatulks
– singhatulks

2026-04-28 05:54:14 +00:00
Commented Apr 28 at 5:54
yes, some other actions can explain constant diff, but not the multiplication

ilnurKh
– ilnurKh

2026-04-28 22:57:30 +00:00
Commented 2 days ago

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Number of branches reported by perf stat not equal to real number of test command

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked