29
$\begingroup$

In C, if you declare a struct like:

struct point {
    int x, y;
};

you also have to use struct when referring to point, e.g.:

struct point p;  // declare p as struct point

because all struct, union, and enum names are in a distinct "tags" namespace. Yes, I know you can use a typedef to import a struct's tag name into the current scope:

typedef struct point point;
point p;         // don't need "struct" now

Clearly, a tags namespace isn't needed since C++ effectively auto-typedefs struct (and class) declarations.

My question is: does anybody know what Ritchie's rationale for using a separate tags namespace was? Why weren't struct names just put directly into the current scope so a typedef would be unnecessary?

His The Development of the C Language mentions structures, but nothing about the tags namespace.

Note that I’m really looking for a definitive answer, not speculation.

The only minor benefit is that you can have a struct and variable with the same name, e.g.:

struct stat stat;

To me, however, that's outweighed by ordinarily having to use struct all over the place if you don't use typedef.

For self-referential structures, you don't need the struct prefix either since in C++ you could do:

struct link;     // forward declare link
struct link {
    void *data;
    link *next;
};

That works just fine.


Note that I'm not asking about why a lot of Unix struct members have prefixes, e.g., sin_family, sin_port, sin_addr. Aside from already knowing the answer to that question, it's unrelated since that's about struct members whereas I'm currently asking about struct names in the tags namespace.

FYI: I've been programming in C on-and-off since the 1980s, so I know C. But I've never seen any explanation of Ritchie's rationale for the tags namespace.


I originally asked this question here at Retrocomputing, but it was suggested by Raffzahn and agreed to by Adam Hyland that I ask this question here on Langdev.

$\endgroup$
15
  • 9
    $\begingroup$ I don't know the answer to your question -- which I have idly wondered myself before. But I like your idea that it might be a solution to what in the C# spec we called the "Color Color" problem: you have a type named Color and you naturally want a property also named Color of that type. The C# name ambiguity rules were carefully designed to allow that; maybe there's some reason why in K&R C that was tricky? $\endgroup$ Commented Oct 13, 2024 at 7:06
  • 1
    $\begingroup$ You don't even need to forward-declare link in C++ to be able to use it self-referentially. $\endgroup$ Commented Oct 13, 2024 at 9:40
  • 4
    $\begingroup$ @EricLippert, also known as the "type-token distinction" in wider discourse: en.m.wikipedia.org/wiki/Type%E2%80%93token_distinction $\endgroup$ Commented Oct 13, 2024 at 15:10
  • 1
    $\begingroup$ @PaulJ.Lucas: Constructs in Pascal that use an already-declared type name precede it by a colon or the keyword of, followed by zero or more carets, and nothing other than a type name could appear in the places a type name could appear. $\endgroup$ Commented Oct 14, 2024 at 19:28
  • 2
    $\begingroup$ @JohnBollinger: The existence of the 1974 C Reference Manual that documents an earlier version of the language requiring that the stem of all types start with a reserved word should be recognized as supplying a clear answer to the question. $\endgroup$ Commented Oct 15, 2024 at 19:46

5 Answers 5

24
$\begingroup$

The reason one might use tags is the same as why one would use $sigils. It makes parsing a whole lot easier if you can differentiate the identifiers during the parsing step rather than need semantic knowledge to do so.

Typedefs feel like a later addition to the language. Which then caused issues regarding parsing because now a type isn't always marked by a keyword and an identifier might now be a variable or a type.

The only symbols that need to be exported from the compiled translation unit being sent to the linking step are global variables and functions (which could be considered special kind of global).

Putting everything else into a tagged namespace prevents pollution of that inter-translation unit namespace with type names.

$\endgroup$
11
  • 4
    $\begingroup$ Do you have any evidence that typedef came later? $\endgroup$ Commented Oct 13, 2024 at 12:28
  • 20
    $\begingroup$ @PaulJ.Lucas the C compiler in V6 Unix doesn’t support typedef, whereas the V7 compiler does. So typedef was introduced sometime between 1975 and 1979 (or 1978 when the first K&R C book was published). $\endgroup$ Commented Oct 13, 2024 at 15:41
  • 3
    $\begingroup$ @StephenKitt I just found this that documents C for 6th Ed. Unix — no typedef. $\endgroup$ Commented Oct 13, 2024 at 22:29
  • 3
    $\begingroup$ @PaulJ.Lucas what git logs exist for really old changes? C is a lot older than git. $\endgroup$ Commented Oct 14, 2024 at 16:44
  • 2
    $\begingroup$ @PaulJ.Lucas I think what Mark wants to know is, when you said "I tried poking around in the git logs", which logs were you looking in and where can they be found? $\endgroup$ Commented Oct 15, 2024 at 0:15
4
$\begingroup$

Why weren't struct names just put directly into the current scope so a typedef would be unnecessary?

typedef is unnecessary. There's nothing that you can express in C with the help of typedef that you cannot also express without. It does make a few things clearer, but it can also be, and too often is, (mis)used to make things more obscure.

I acknowledge that

Note that I’m really looking for a definitive answer, not speculation.

, but I don't think there's a definitive answer available at this point.

Dennis Ritchie has acknowledged Algol 68 as having a significant impact on his design of C's type system, so we know that it was well known to him. He may even have drawn C's struct and union keywords directly from Algol, which uses the same ones to declare its structure and union types. Algol 68 has an analog of C's typedef (modes), but no analog of C's tags. Neither did C's direct ancestors, BCPL and B, have any such feature (they had only a very simple type system, whose shortcomings were the initial inspiration for C), so this seems to have been Ritchie's own invention.

Why introduce tags and the tag namespace when Ritchie's main model for C's type system did things differently? I see only a few alternatives, none exclusive:

  • tags and keywords may have been a convenience, making the compiler easier to write or easier to keep small (remember that executable size was a major consideration in that day).

    • as a special case of this, tags, and especially type-category keywords, might have simplified the initial introduction of structures into the early C language parser.
  • Ritchie may have considered tags and pervasive use of the struct keyword to be a superior design. It does have the advantage of being clear at the point of a declaration what type category each declared object belongs to, which I personally value today. I prefer to refer to structure, union, and enum types in my C code using keyword / tag syntax.

I'd be inclined to lean in the pragmatism direction.

$\endgroup$
1
  • 1
    $\begingroup$ When I said "... so a typedef would be unnecessary," I meant, "... so a typedef would be unnecessary for making using structure types not require the use of the struct prefix." $\endgroup$ Commented Oct 14, 2024 at 23:13
2
$\begingroup$

Note that I’m really looking for a definitive answer, not speculation.

Unfortunately, Dennis is 84 years old and he's dead.

I think the reason why there seems to be a dearth of authoritative rationale, is that the question only arises crystal-clear from the context of widespread modern assumptions about how a type system and compiler should work, which were not necessarily assumptions in 1972 or leading up to then, when different circumstances reigned and fewer questions had been settled.

In C, structures did not originate as data types in their own right, but as something like lightweight frames that are overlaid (perhaps transiently or in an ad-hoc way) over a block of memory, much as one might lay a sheet of transparency film over the top of a wordsearch puzzle so as to highlight the answers amongst the grid of jumbled letters.

As noted in the question, C (as originally designed) added all structure member names to a global namespace. A variable did not need to be typed as a particular structure prior to being treated as a structure. Instead, the structure member operator (.) could be applied freely to any variable, in order to access the memory at the computed offset of that member.

It's in this context that the "untagged struct" can be a thing, because the tag actually performs no essential function.

The following declares the names and types of members that make up a particular structure. This is sufficient information to calculate the offsets of each named member and the overall size of the structure. The (presumed) members of any variable could then be addressed using the structure member operator.

struct { int x, y; };
...
//assume an int* called 'ptr' is in scope
*ptr.x = 1;

A variable can be declared by specifying the shape of the structure as the type and then providing a variable name. This will allocate sufficient memory for the variable.

struct { int x, y; } varname;

As well as these untagged structures, it's still easy to see the benefit of declaring the shape once then using the tag name in multiple places as an alias for that shape.

I suspect once the untagged form is implemented in the compiler, then implementing the tagged version is a simple matter of substituting the member list of the struct in place of the tag name after the struct keyword is encountered.

The only two roles the tag originally performs appears to be (a) human-readable description, and (b) the avoidance of repeating the shape definition when it is used more than once.

So I think that's most of how I think we get the struct keyword in C in the first place. Because the tag name was optional and played a fairly minor role, and because there needs to be a keyword in the absence of a tag name. Equality with the use of the union keyword probably also seemed natural.

I think the inevitable inference is that typedef is a feature that arrived some time later, after the basic syntax and compiler structure had already been laid down, after the mentality has shifted further along how type names can represent useful information to the programmer beyond that which the compiler can use. I note that there is an ergonomic benefit to aliasing type names rather than having to slavishly state standard type names then put all the meaningful information into the variable/function names.

I suspect what also moved along in the 70s was (a) the horsepower of computers that Ritchie could reasonably access, and compiler techniques that may have been unacceptable a few years earlier became feasible and justifiable; and (b) the complexity of development that Ritchie was attempting in C.

And once there were substantial compilers in existence and expanding use of C by Ritchie's colleagues, I suspect there was reluctance to revisit relatively modest syntactic infelicities, or make existing keywords sometimes optional in a way that might foreclose the flexibility of the language.

So that's how we get to the position where the struct keyword is required.

$\endgroup$
3
  • $\begingroup$ Your example with ptr doesn't compile with modern compilers (ptr is an int*, not a structure, so it has no x member) and, unless you can prove it used to be legal C, I'm highly suspect. The correct way to write that is struct { int x, y } *ptr; then ptr->x = 1. As someone else commented, the struct keyword was likely borrowed from ALGOL. In the '70s, there were also other languages (Pascal, COBOL, PL/I, etc.) that all had struct-like things, but none had the "tag" concept like C. $\endgroup$ Commented Feb 19 at 17:33
  • 1
    $\begingroup$ @PaulJ.Lucas, that's exactly the point, C worked differently when Ritchie first designed it, and he evolved it from there. It's bizarre to ask me to prove that point when the relevant point (and supporting link) is in your own bloody question! 😂 (And I was already independently aware of it). Seriously though, I think the "tag" concept can only be understood as being a minimal or even slightly half-baked concept to begin with, but then typedef took over and allows you to easily declare a pointer alias as well as a direct type alias. (1/2) $\endgroup$ Commented Feb 19 at 18:57
  • 1
    $\begingroup$ And I think that is neat and succinct enough that Ritchie wasn't going to overhaul the syntactic principles and the existing compiler techniques and introduce ad-hoc rules just to sometimes save a couple of chars on the struct keyword. (2/2) $\endgroup$ Commented Feb 19 at 18:57
1
$\begingroup$

The 1974-75 C Language Reference doesn't mention typedef, presumably because it didn't yet exist. When C was designed, every type's name contained a reserved word, making it possible for a compiler to fully parse a program which didn't use the preprocessor without needing to distinguish any symbols other than reserved words. The ability to parse programs without needing to care about how symbols are defined was a useful trait, but typedef was added in a way that defenestrated it. Within a function, it would be impossible to know, without knowing the definition of x, whether x * y; means "multiply x by y an discard the result" or "define a new symbol named y of type x*".

Note also that if a header file contains the declarations (at file scope):

struct foobzil;
void doSomething(struct foo *p);

they will be equally usable whether a definition of struct foobzil appeared earlier in the compilation unit, will appear later in the compilation unit, isn't defined anywhere in the compilation unit but is defined in a different one, or isn't defined anywhere in the program. All of this is made possible by the struct keyword. While the initial struct foobzil line could have been treated as typedef struct foobzil foobzil;, the C language has historically not tolerated duplicate definitions. While some compilers require the pre-declaration of the structure name before the function prototype, compilers which extended the language to process prototypes usefully without the predeclaration could simply ignore it.

$\endgroup$
1
  • $\begingroup$ The lack-of typedef theory was already covered by Steven Kitt's comment. The preprocessor isn't relevant. The struct foo in your example also doesn't seem relevant since you never mention it in your comment. A struct T puts T into the tags namespace; typedef struct T T puts T also into the global namespace. They're not equivalent. $\endgroup$ Commented Feb 17 at 3:45
1
$\begingroup$

For some context, let's observe that when you write:

struct S { int x; };

This is the exact same, grammatically, as:

int; // this is a legal C declaration!

Isn't that surprising?

There is only 1 context where you can use struct in C: the type of a declaration. What's a declaration? Predominantly a "type-specifier" followed by an optional variable name.

According to K&R (1978), both of the examples above are "type specifiers":

<type-specifier>   ::= int | char | ... | <struct-specifier>

<struct-specifier> ::=
   | struct <identifier>? { <struct-declaration-list> }
   | struct <identifier>

The key point is that the "struct-declaration-list" is optional! When you declare a variable like struct S my_var;, you're choosing to omit the braced definition of the struct.

And when you introduce a new struct tag with struct S { int x; };, that is also a declaration. BUT you're choosing to omit the name of a variable.

C lets you omit the variable name, and that's why int; is a legal declaration.


In the entirety of the C language, there is only 1 place you can use struct. You choose to add a variable name to the declaration, or a braced definition to the type-specifier.

This perspective is important. The language is not imposing that you "prepend" struct to a seemingly normal variable declaration.

Compare this with modern languages like Java, where a class definition is syntactically and semantically distinct. In C, it's a single, multi-faceted language feature.

And as a C compiler maintainer, this IS how you implement struct. It really is one language feature!

But why not always parse struct tags as type specifiers?

I don't think we can assume Ritchie knew about this any more than he knew about inheritance, for-each loops, or lambdas. Either nobody needed it, or it hadn't been invented!

But we know users did establish the idiom that type-specifiers should have synonymous identifiers. This came in the form as #define MyType struct S, but, as K&R (1978) states, this didn't work for function type-specifiers.

This was an aesthetic problem with the language, and instead of adding a messy special case for struct tags, Kernighan and Ritchie generalized the solution to something far more useful. typedef lets you give a name to ANY type-specifier! struct S, or struct S { ... }, or enums, unions, function pointers, platform dependent sizes.

That's what K&R cites as the purpose of typedef. A multi-faceted solution to an already multi-faceted declaration syntax. No special cases, and it respects the existing grammar: A declaration is a type-specifier with an optional variable name. And now, any type-specifier you want can have a unique name.

Please feel free to ask for clarification.

$\endgroup$
11
  • $\begingroup$ In order for anything to be a variable declaration, it actually has to declare a variable. struct S { int x; }; is not a variable declaration. You can't look to the grammar for answers since grammars ignore semantics. The grammar also doesn't tell you why Ritchie made the grammar that way. The grammar is a red herring. And, yes, struct is still a prefix since it comes before identifier. Anything that comes before something else is, by definition, a prefix. $\endgroup$ Commented Feb 17 at 3:35
  • 1
    $\begingroup$ @PaulJ.Lucas This is what Kernighan and Ritchie themselves state in The C Programming Language (2nd Edition), in "A.8 Declarations". The grammar is clearly important to them when justifying what a declaration is. If you asked them "why do I have to put struct before the tag?", it's because struct definitions and variable declarations are the same language construct. But "what inspired C's grammar?" We could speculate. We know it's easier to implement one language construct than several. We know that with a prefix, the parser doesn't have to backtrack. Best we can do is email Kernighan. $\endgroup$ Commented Feb 17 at 4:52
  • $\begingroup$ Exactly which part of A.8? Be specific. Repeating that they’re variable declarations doesn’t make it true. $\endgroup$ Commented Feb 17 at 6:12
  • 1
    $\begingroup$ @PaulJ.Lucas I'll use the 1st edition (1978), same section (if you'd like to peruse it: github.com/etrigan976/CSBooks). A declaration is a type-specifier followed by an optional variable name. A type-specifier is a familiar type like int, or a struct specifier, which is struct followed by an optional tag name and optional braced definition. Quoting directly, "A structure specifier of the form struct identifier { struct-decl-list } declares the identifier to be the structure tag ... a subsequent declaration may then use the form of the specifier struct identifier" $\endgroup$ Commented Feb 17 at 7:16
  • 1
    $\begingroup$ @PaulJ.Lucas You just answered your own question ;) I agree it's the same struct keyword, in the same context, whether you provide a variable name or braced definition. Is there something else you're looking for? $\endgroup$ Commented Feb 18 at 0:32

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.