The part of the code on an ATmega core that does setup() and loop() is at follows:
#include <Arduino.h>
int main(void)
{
init();
#if defined(USBCON)
USBDevice.attach();
#endif
setup();
for (;;) {
loop();
if (serialEventRun) serialEventRun();
}
return 0;
}
Pretty simple, but there is the overhead of the serialEventRun(); in there.
Let's compare two simple sketches:
void setup()
{
}
volatile uint8_t x;
void loop()
{
x = 1;
}
and
void setup()
{
}
volatile uint8_t x;
void loop()
{
while(true)
{
x = 1;
}
}
The x and volatile is just to ensure it isn't optimised out.
In the ASM produced, you get different results:

You can see the while(true) just performs a rjmp (relative jump) back a few instructions, whereas loop() performs a subtraction, comparison and call. This is 4 instructions vs 1 instruction.