0
$\begingroup$

In implementing the C preprocessor, I'm considering implementing the # operator not as converting the token sequence to a string literal, but to a special "string blob" - this type of token doesn't exist in the spec, it's not dequoted when used in later compilation phases, or in the pragma directive, but in every other aspect, it's used in the same way as a string literal.

However, there's one thing, I'm not certain. Is it possible that the # can be applied to a string literal resulting from another # operator? In which case escaping must be done. My gut feeling says that it's not possible, as the # apply to the argument itself and not the macro-expanded argument in function-like macros.

I did a test on object-like literals, and here's the result:

#define foo abcd
#define bar #foo
#define baz #bar
void *p = baz;
$ cc --version
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ cc -E fundef.c
# 1 "fundef.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 465 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "fundef.c" 2




void *p = # #abcd;
```
$\endgroup$
1

1 Answer 1

0
$\begingroup$

Credit goes to @rpjohnst for providing insightful example.

This trick is suitable as optimization, and I believe it'll work for most existing codebase, but it needs a counter to track the total number of times escaping must be done beyond what's eliminated by the initial use of so-called "string blob" - otherwise it'll not be standard-conforming.


Excerpt from the example:

#define F(X) #X
#define G(X) F(X)

char s[] = G(F("abc"));

To explain, when G() is expanded, because the parameter X in its token sequence isn't preceeded by #, it's macro-expanded first.

This expansion leads to stringification of "abc" due to F(), which becomes the string literal "\"abc\"". Next, when the token sequence is further scanned, F() is found again, so it's stringified again, which becomes "\"\\\"abc\\\"\"".

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.