16

Very simple code:

void *allocateMemory5DArray(size_t x, size_t y, size_t z, size_t q, size_t r)
{
    int (*array)[x][y][z][q][r];

    array = malloc(sizeof(*array));
    return array;
}

The -O0 gcc needs 296bytes of the stack and the generated code is > 180 lines long. Can anyone explain the rationale behind it?

Other compilers (except clang) also generate strange code, but not as strange as gcc :)

https://godbolt.org/z/1zx4YE

12
  • What is the problem? Optimized assembly looks fine. "-O0" is used only for fast compilation and debugging. Commented Jan 17, 2021 at 15:02
  • @tstanisl no problem - curiosity only. I do not ask what for the -O0 is. Please focus on the question Commented Jan 17, 2021 at 15:03
  • Simpler: godbolt.org/z/T3zoMP . By varying the number of [x] it seems like O(n^2) code generation. Commented Jan 17, 2021 at 15:06
  • @PaulHankin it came from stackoverflow.com/questions/65761774/… when I have noticed this strange behaviour Commented Jan 17, 2021 at 15:12
  • I think it's doing something for each dimension of the array it's computing the size (which is writes onto the stack), but for dimension i it recomputes the sizes of the lower dimensions. So it's computing something like x, xy, xyz, xyzq, xyzqr (with 10 muls rather than 4). The generated code does some "sub 1" too, so it's slightly more complicated than this, but I guess this is something like what's happening. Commented Jan 17, 2021 at 15:19

1 Answer 1

6
+50

This behaviour also happens with VLAs and Clang also generates a shorter code than GCC.

Although the generated code by GCC is longer, -O0 has the fastest compilation time (and apparently that's the fastest for GCC), the assembly code is not optimised but we didn't ask for that. When -O1, GCC sacrifices time by optimisation and the generated code is quite similar to clang.


There are differences between Clang and GCC regarding VLAs. The first one doesn't support VLAs in structures, the reasons:

  • is tricky to implement
  • the extension is completely undocumented
  • the extension appears to be rarely used

Clang is happy with C-99 VLAs, but that's all. GCC 4.1 (consider that C99 was "substantially completely supported" with GCC 4.5 ) generates a similar (small) size:

   ...
    mov     %rax, QWORD PTR [%rbp-56]
    mov     %rdx, QWORD PTR [%rbp-48]
    mov     %rcx, QWORD PTR [%rbp-40]
    mov     %rsi, QWORD PTR [%rbp-32]
    mov     %rdi, QWORD PTR [%rbp-24]
    ...

However, with GCC 4.8, the code gets larger. GCC 4.8 file doesn't say anything about changes regarding VLAs which is weird considering the clear differences in the generated code.

Status of C99 features in GCC indicates that there were "Various corner cases fixed in GCC 4.5" related to VLAs. However, 4.5 changelog says nothing. Surprisingly, assembly is slighly different in 4.4 but not in 4.5.

It looks like Clang's reasons regarding VLAs in structs were very accurate and in some cases they may be extended to the whole VLA feature.


This poor behaviour is well-known. Linux's kernel is free of them in the name of performance:

   Buffer allocation |  Encoding throughput (Mbit/s)
 ---------------------------------------------------
  on-stack, VLA      |   3988
  on-stack, fixed    |   4494
  kmalloc            |   1967

which is also good news for CLang builders.

Sign up to request clarification or add additional context in comments.

3 Comments

The goal of -O0 is not necessarily the fastest compilation time, it happens regularly that -O1 is actually faster.
@MarcGlisse Agree that -O1 may be faster (IMO under certain circumstances), but GCC is categorical: "-O0 generates unoptimized code but has the fastest compilation time" and "-O1 optimizes reasonably well but does not degrade compilation time significantly" . Anyway, I edit the answer to provide similar info to the official one, thanks for that.
Bounty has been automatically granted, but this answer does not answer my question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.