You can’t measure one execution of code in a meaningful way. But if something supposedly takes a nanosecond then you just do it a billion times and count the seconds, without any tools. If that’s too fast then you run it 10 billion times and count the seconds. If that is still too fast, the optimiser has optimised your code away :-)
Another possibility is automatic vectorising, and I once found a compiler that unrolled an empty loop eight times so instead of a billion iterations doing nothing it did 125 million iterations doing nothing. So my code said “one billion”, and it took 3/8ths of a billion cycles :-) (3 cycles for an empty loop).