I test a simple program via perf stat
#include <cassert>
#include <cstddef>
#include <iostream>
int main(int argc, const char* argv[]) {
assert(argc == 3);
int64_t iters = atoll(argv[1]);
int64_t step = atoll(argv[2]);
int64_t value = 0;
for(int64_t i = 0; i < iters; ++i) {
value += step;
}
std::cout << value << std::endl;
return 0;
}
Easy to see in godbolt (https://godbolt.org/z/d831q1c9W) that there is only one repeated branch according to the cycle (jge .LBB0_4 by link, i guess)
The command to build I use is
clang++ -std=c++2b bp.cpp tp2.cpp -o bp.exe -Wall -O0 -DNDEBUG
If I run
perf stat ./bp.exe 10000000 700 500 1013 2>&1 | grep branches | tee run4_1.txt
perf stat ./bp.exe 20000000 700 500 1013 2>&1 | grep branches | tee run4_2.txt
perf stat ./bp.exe 30000000 700 500 1013 2>&1 | grep branches | tee run4_3.txt
It outputs:
+ perf stat ./bp_arc.exe 10000000 700
+ grep branches
+ tee run4_1_arc.txt
15516156 branches:u # 483.498 M/sec (66.64%)
4390 branch-misses:u # 0.03% of all branches (63.53%)
+ perf stat ./bp_arc.exe 20000000 700
+ grep branches
+ tee run4_2_arc.txt
30860233 branches:u # 534.704 M/sec (67.69%)
6400 branch-misses:u # 0.02% of all branches (65.99%)
+ perf stat ./bp_arc.exe 30000000 700
+ grep branches
+ tee run4_3_arc.txt
47616449 branches:u # 535.042 M/sec (67.21%)
6531 branch-misses:u # 0.01% of all branches (66.81%)
So, number of branches reported by perf stat is in ~1.5 greater than the actual number of iterations.
In some other setup I see numbers with multiplier about 2.1.
So the question: what kind of branches is reported by perf stat -> branches counters, that reasons of difference between number of checks (tests) visibly in asm and reported counters.
perfto be one of the buggiest parts of the kernel - it's extremely hardware-dependent, and I suspect something about it's not getting saved/restored properly when rescheduling happens (which is pretty frequent). It's quite possible that the recorded stats are actually a mix of your process and some other process (including the idle process) or kernel interrupts or ...