1

As far as I understand, you are allowed to access inactive members of a union, if they share a "common initial sequence" with an active one. This can be used to tag active member with a type field common to all members:

enum command_type {
    command_type_foo,
    command_type_bar,
};

struct foo_command {
    enum command_type type;
    int x;
};

struct bar_command {
    enum_command_type type;
    const char* s;
};

union command {
    struct foo_command foo;
    struct bar_command bar;
};

// second case means `cmd.foo` was not active, but fine due to CIS
switch (cmd.foo.type) {
    case command_type_foo:
        return do_foo(&cmd.foo);
    case command_type_bar:
        return do_bar(&cmd.bar);
}

Is the same true if the common bit is part of a bitfield?

struct i2c_address {
    uint32_t is_aux : 1;
    uint32_t _reserved : 7;
    uint32_t address : 8;
    uint32_t offset : 8;
    uint32_t length : 8;
};

struct aux_address {
    uint32_t is_aux : 1;
    uint32_t _reserved : 3;
    uint32_t address : 20;
    uint32_t length : 8;
};

union address {
    struct i2c_address i2c;
    struct aux_addredd aux;
};

// is accessing the `is_aux` bit from inactive member UB?
return addr.i2c.is_aux ? do_aux(&addr.aux, buffer) : do_i2c(&addr.i2c, buffer);
4
  • 3
    C++ does not support union type punning, so speaking of common initial sequence doesn't make much sense in C++. Please stick to one language per question. Commented Nov 4 at 11:29
  • 1
    @Lundin OK, I dropped the C++ part Commented Nov 4 at 11:42
  • 4
    “Active member” is a C++ term. The C standard does not define active or inactive members. Its only mention of anything like that is in C 2024 footnote 93, which says “If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6…” This essentially renders the “common initial sequence” rule superfluous, as accessing a union member by name always works per footnote 93… Commented Nov 4 at 12:44
  • 1
    … (Footnote 93 speaks to the member access operators, . and ->. So it is telling us how to understand the text about those operators; they always reinterpret union data. But there is another way to access union members: Via pointers. So, if we have a pointer to a union member that is not the last one written, it may be that reading it has undefined behavior. However, you cannot have a pointer to a bit-field, so this would not apply to the case in the question.) Commented Nov 4 at 12:48

4 Answers 4

3

Does union "common initial sequence" include bitfields?

Yes. The relevant part of the spec says:

Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

(C23 6.5.3.4/6)

That constitutes the definition of "common initial sequence". The word "member" within does not discriminate between bitfields and other members, and the parenthetical part further clarifies that the relevant members can include bitfields.

Thus, if the addr in your second example is a union addr then there is no inherent problem with accessing addr.i2c.is_aux as shown, as far as the C language is concerned. Note, however, that the common initial sequence of struct aux_address and struct i2c_address contains only their respective is_aux members. Although the next two members of each structure are named _reserved and address, and they all have type uint32_t, the _reserved members have different bitfield widths, so neither they nor any subsequent members are part of the common initial sequence.

Sign up to request clarification or add additional context in comments.

Comments

2

It is a very common practice in low level programming.

  1. Because both structs are standard-layout and share the same first bit-field (uint32_t is_aux : 1), they have a common initial sequence. The standard allows you to read members of that common initial sequence from either union member, even when the other member is active. So addr.i2c.is_aux is fine regardless of which member was last written.

  2. Passing &addr.aux to do_aux(...) (or &addr.i2c to do_i2c(...)) is only valid if that member is the active one. If the other member is active and the callee reads fields beyond the common initial sequence (e.g., address, offset, length), that’s undefined behavior.

Add raw value to the union and initalize the correct struct from it. Then it will be safe

3 Comments

"Add raw value to the union and initalize the correct struct from it." - can you please elaborate?
uint32_t raw value which you cann assign to any of them.
I don't see the need for the last paragraph, but now I can't get the thought of raw onions out of my head!
0

In practice, there is no such thing as active or inactive member of a union. Members just are. It is your application which uses one member or another, but members of a union are just types. It is better to think of struct and union as of a chunk of memory with a lot of named pointers inside of it.

Taking your example:

struct i2c_address {
    uint32_t is_aux : 1;
    uint32_t _reserved : 7;
    uint32_t address : 8;
    uint32_t offset : 8;
    uint32_t length : 8;
};

struct aux_address {
    uint32_t is_aux : 1;
    uint32_t _reserved : 3;
    uint32_t address : 20;
    uint32_t length : 8;
};

union address {
    struct i2c_address i2c;
    struct aux_addredd aux;
};

And now run through gdb or other debugger of your choice:

(gdb) ptype /o union address
/* offset      |    size */  type = union address {
/*                     4 */    struct i2c_address {
/*      0: 0   |       4 */        uint32_t is_aux : 1;
/*      0: 1   |       4 */        uint32_t _reserved : 7;
/*      1: 0   |       4 */        uint32_t address : 8;
/*      2: 0   |       4 */        uint32_t offset : 8;
/*      3: 0   |       4 */        uint32_t length : 8;

                                   /* total size (bytes):    4 */
                               } i2c;
/*                     4 */    struct aux_address {
/*      0: 0   |       4 */        uint32_t is_aux : 1;
/*      0: 1   |       4 */        uint32_t _reserved : 3;
/*      0: 4   |       4 */        uint32_t address : 20;
/*      3: 0   |       4 */        uint32_t length : 8;

                                   /* total size (bytes):    4 */
                               } aux;

                               /* total size (bytes):    4 */
                             }

As you can see, the addr.aux.is_aux and addr.i2c.is_aux are having the same offset 0:0 and same size 4. That means you can read/write both and have the same value.

Read/write of union members with different addresses or sizes will be UB and will give strange results, yes. But if two members start on the same address in memory and has the same type - you can use them interchangeably.

10 Comments

"there is no such thing as active or inactive member of a union" and "will be UB and will give strange results, yes" shows complete misunderstanding of UB
@DominikKaszewski "there is no such thing as active or inactive member of a union" This is correct in C. C++ is different. As for reading another member than the one last used for writing, whether a sensible conversion is possible or not depends on the types involved.
I stand corrected, though it seems to me that while C does not use "active member" wording, it effectively uses the same rules - you are generally not allowed to read from members other than the one you wrote to.
This has always been permitted by the C standard, but you need to be careful not to read a non-value representation because doing so results in UB. See C23 6.2.6.1, 6.5.1, and 6.5.3.4 with footnote 93.
Although such type punning by necessity relies on implementation defined behavior in general.
There are times when the Standard simultaneously specifies the behavior of an action and characterizes it as invoking Undefined Behavior. When the Standard was written, many of the people who approved it expected the question of whether to give priority to the specification of the behavior or the characterization at UB would be, at worst, viewed as a quality-of-implementation matter. If it was obvious that quality implementations claiming to be suitable for a particular purpose should support a particular corner case, there was no perceived need to care about whether support was mandated.
@DominikKaszewski The C standard provides informative text: _"If the member used to read the contents of a union object is not the same as the member last used to store a value in the object the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called type punning). This can possibly be a non-value representation." Reading a non-value ("trap") representation can indeed invoke UB. So this answer is partially correct.
However, it is not as trivial as "members just are". The type punning, type conversion and aliasing rules of C will apply.
The authors of the Standard included aliasing rules to say that conforming implementations need not handle correctly all corner cases that had been defined in the language the Standard was chartered to describe, but expected compiler writers to, on a quality-of-implementation basis, make a good faith effort to correctly process any corner cases that would be relevant to their customers. When clang and gcc are configured using -fno-strict-aliasing to process real C, rather than Garbage C, the aliasing rules those compilers choose to inflict upon programmers are inapplicable.
@DominikKaszewski "you are generally not allowed to read from members other than the one you wrote to" Yes you are. For example it is perfectly safe to read a uint32_t through a uint16_t [2] union member (though endianness applies). In C - it would be UB in C++. But going from lets say uint32_t to uint32_t* by type punning might invoke UB due to misalignment or the target having trap representations for certain addresses.
0

Consider the types:

struct s1 { unsigned char x1:4; };
struct s2 { unsigned char x2:4, y:4; };

Reading field x1 of an a struct s1 would require reading the storage in a manner compatible with any way field x2 of a struct s2 might have been written. A compiler need not process writes of x1 in a fashion that would avoid disturbing field y of a struct s2. I suspect C89 limited common-initial-sequence guarantees to actions that "inspect" shared fields to avoid requiring that compilers process writes of x1 using a read-modify-write sequence.

Note, however, that clang and gcc do not meaningfully support Common Initial Sequence guarantees except when using the -fno-strict-aliasing compiler option. The Standard specifies that such guarantees apply in places where a complete union type definition containing the applicable structures is "visible", but fails to say "is visible under the same rules of type visibility and scope that apply elsewhere, without regard for whether the compiler writer feels like ignoring it". If one uses the -fno-strict-aliasing, clang and gcc will correctly honor the CIS guarantees, but without it they interpret the Standard using the latter definition of "visible" which doesn't support most of the situations where CIS guarantees would be useful.

2 Comments

This statement from the definition of bit-fields: "The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined." makes me wonder if s1.x1 and s2.x2 are guaranteed to refer to the same bits. While that would be a perverse implementation, I'm not convinced it's precluded.
I think part of the intention of the Common Initial Sequence guarantees is to specify that.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.