13

Consider the following code:

char *str = "1234";
long n = strtol(str, &str, 10);

The prototype of strtol() in C99 is:

long strtol(const char *restrict nptr, char **restrict endptr, int base);

My question is: does this call violate the restrict contract and produce undefined behaviour?

I am aware that a similar question has been asked before (Aliased arguments in strtol), but the accepted answer does not clarify my doubts because it lacks of any explanations.


My analysis so far:

As I understand it:

  • const char *restrict nptr promises the compiler that nptr will be the only way to access the memory region defined by the string "1234".

  • char **restrict endptr promises the compiler that endptr will be the only way to access the variable str.

At the point of the call, assuming:

  • the string "1234" is at address 0x100

  • the variable str is at address 0x500

then:

  • nptr = 0x100 (points to the string)

  • endptr = 0x500 (points to the variable str)

The two pointers themselves point to distinct memory locations, so restrict appears to be respected between nptr and endptr directly.

However, note that nptr and *endptr hold the same address. When strtol writes *endptr = ..., it only modifies the variable str, it does not directly access the string "1234". The string would only be accessed through endptr if strtol() performed a double dereference via **endptr.

If an implementation of strtol() were to access the string "1234" via **endptr for any reason, then the restrict promise on nptr would be violated (if one calls strtol(str, &str, 10)), since the string would be accessed through a pointer other than nptr.


The question

Does this constitute a restrict violation according to §6.7.3.1 of C99? Specifically:

  1. Does restrict on endptr only constrain endptr itself, or does it also constrain *endptr with respect to nptr?

  2. If *endptr and nptr point to the same memory, is that sufficient to produce undefined behaviour, even if endptr and nptr themselves are distinct?

  3. Would a hypothetical implementation of strtol() that accesses the string "1234" via **endptr be considered non-conforming to the C99 standard?

4
  • The value of ptr (i.e. 0x100 in your example) would be copied into the functions local nptr variable. Any modifications to either nptr or *endptr would be unrelated to each other. IMO the behavior is well defined, or at worst unspecified (but not undefined). And without looking at any specification, plain common sense would dictate that **endptr would never be used in any normal implementation of strtol. It could (and perhaps should) have been declared as char const ** const restrict endptr. Commented Apr 21 at 13:23
  • If the function would write to where endptr points and somehow use the contents of nptr after that (I can't see why it would but lets assume so), then nptr will still remain a copy of where the original pointer pointed at. I rather think the standard uses restrict here because the character pointer nptr is allowed to point at anything, including the first byte of a pointer. It would of course be senseless to call the function as strtol((const char*)&str, &str, 10); but strictly speaking we are allowed that due to the exception of character types in the strict aliasing rules. Commented Apr 21 at 14:27
  • Related: stackoverflow.com/q/13952015/1606345 Commented Apr 21 at 16:25
  • A side note: To find out, if a valid number was read, you usually compare nptr and endptr: if ( endptr == nptr) { ERROR }. With your usage you lost the ability to validate the result. Commented Apr 25 at 18:58

3 Answers 3

12

Since you have tagged this as language-lawyer, then the answer is yes (the call is “legal”) for reasons you may not have expected. However, it is also yes for the reasons you were likely concerned about.

First, as you note, the prototype of strtol is long strtol(const char *restrict nptr, char **restrict endptr, int base);. In this, restrict has no effect. This is because qualifiers are effectively not part of a function type. C 1999 6.7.5.3 15 states that, for purposes of compatibility between function types, “each parameter declared with qualified type is taken as having the unqualified version of its declared type.” This means that long strtol(const char *restrict nptr, char **restrict endptr, int base); has the same effect as long strtol(const char *nptr, char **endptr, int base);; the qualifiers are not part of the function type.

(The qualifiers do affect the code inside a function body, because there the qualifiers are part of the types of the parameters.)

Second, the definition of restrict imposes requirements on code that uses the identifiers (which would be code inside the implementation of the function), not on code that calls a function with qualifiers on its parameters. Therefore, the call is “legal” because it does not violate any requirements related to the qualifiers.

What you are probably wondering about is, if you call strtol as you have shown, would the code inside it satisfy the requirements on the qualifiers. From a language-lawyer perspective, we cannot know this because we cannot know what the code inside the function is. It might not even be C code, but, even if it is, there are ways to implement functions that work with overlapping memory through restricted pointers. Consider this:

void square(double * restrict result, double * restrict x)
{
    double t = *x;
    t = t*t;
    if (result == x)
        *x = t;
    else
        *result = t;
}

This code writes the result to *result without violating the restrict requirements even if x and result point to the same place, because, if they do point to the same place, it uses *x to access that place and does not use *result.

However let’s suppose strtol was written naïvely, oblivious to the possibilities the pointers might point to overlapping memory, so that this strtol reads data through nptr (or expressions based on nptr and not endptr) and writes the end pointer through endptr (or expressions based on endptr and not nptr). Observe that all the objects in memory needed to read the string, those in the array induced by "1234", are completely separate from the str object. The formal definition of restrict in C 1999 6.7.3.1 only imposes requirements on accesses to an object X that is modified during execution of the function (in this situation). They say that if X is accessed with an lvalue based on a restricted pointer, then every access to X will be based on that pointer. For the object str pointed to by endptr, our supposition is this is true: Every access to the object str is through an lvalue based on endptr. For the objects in "1234", there is no modification to those objects, so the formal definition of restrict does not impose any requirements.

Therefore, even a naïve implementation of strtol satisfies the requirements of restrict.

Sign up to request clarification or add additional context in comments.

18 Comments

Consider when access of double is allowed on byte addresses, In square(), *result = t; remains a concern as if (result == x) is not enough to detect partial overlap. Still a nice detailed answer.
That’s why I wrote only “even if x and result point to the same place” rather than “even if the memory for x and result overlap.” Given the language-lawyer tag, I did not want to get into the gory details of testing whether objects overlap.
The way clang and gcc treat the Standard's hand-wavy concept of "based upon", the pointer address used in the statement *x = t; need not be treated as "based upon" the value of the object double * restrict x, because there is no way that changing the value of x without changing the value of result could change the address used in that assignment. Both, given int x[1]; int test(int *restrict p) { *p = 1; if (p == x) *p = 2; return *p; } will generate machine code that unconditionally returns 1.
I wonder whether a “smart compiler” could remove the branch if (result == x) *x = t; in your example code because of the two restrict qualifiers, which, as far as I understand, should guarantee that result and x will never point to the same memory area.
Re “the two restrict qualifiers, which, as far as I understand, should guarantee that result and x will never point to the same memory area”: That is not what the standard says. That is an informal rewording, and it is inaccurate. The standard contains a formal (and yet flawed) statement of what restrict means. Still informally, but more accurate, it says that different pointers will not be used to access the same object in memory if one of them is qualified with restrict unless the object is not modified…
… For example, consider calling void foo(int * restrict a, int * restrict b) with int A[1000000]; … foo(A, A + 500000);. Then a and b do not point to the same memory, so they satisfy that inaccurate informal rewording. But foo might contain code that accesses a[250000] and b[-250000] (both of which are individually legal). Then foo would be accessing the same object (A[250000]) through different pointers, even though those pointers are unequal. So restrict does not directly tell the compiler it can assume anything about the values of the pointers.
“Second, the definition of restrict imposes requirements on code that uses the identifiers (which would be code inside the implementation of the function), not on code that calls a function with qualifiers on its parameters.” – nonsense; memcpy is a counterexample – it is illegal to call with pointers to overlapping memory ranges, and that requirement is expressed via restrict.
That constraint on memcpy is expressed explicitly; it does not come from the qualifiers in the function declaration but from the statement in C 1999 7.21.2.1: “If copying takes place between objects that overlap, the behavior is undefined.” This question is tagged language-lawyer, which means it is about the formal specification of the language…
… As I wrote, the C standard makes qualifiers on function declarations that are not definitions irrelevant; void *memcpy(void * restrict s1, const void * restrict s2, size_t n); has the same effect in a program as void *memcpy(void *s1, const void *s2, size_t n);, due to the text in C 1999 6.7.5.3 15. Further, the meaning of restrict is formally defined in C 1999 6.7.3.1. You may have some belief about what restrict means, but, for a language-lawyer question, it is the actual text of the standard that matters.
So, since it is a language-lawyer question: Can you show the actual text in the standard that says restrict in a prototype shown in clause 7 (the library) imposes requirements on the caller of the function? Cite the clause, the paragraph number, and the text within it.
I'm not sure you are applying restrict correctly, especially in the second case, where the restrict keyword is intended to allow the implementor to make whatever use they want to of the target value. For example, a software implementation of floating point multiply might perform the integer multiply of the mantassas into the target integer, then OR in the summing of the exponents, then OR in the sign bit. It may be a "silly"/"register saving" implementation, but it is a completely valid one that relies on the caller NOT aliasing the arguments.
Re “… the restrict keyword is intended…”: This question is tagged language-lawyer. That means it is about the formal or authoritative specification of the language. That is, it is about the actual text of the C standard, not the “intent” or informal meaning. You have some notion in your head about what restrict means, and undoubtedly the purpose of putting it the standard library declarations was to convey an intent for the associated memory not to overlap. But none of that is relevant; the question is about what the standard actually says. I answered the question on that basis…
… As I stated, there is an actual explicitly formal definition of restrict in the standard. And the standard does not impose any requirements on the caller of a function, in regard to restrict pointers in the function parameters. And, as far as I have seen, there is nothing in the standard that says restrict has any other meaning. I looked for text in the clause about the standard library that says restrict indicates the caller should not pass overlapping memory and did not find it. If you want to dispute a language-lawyer question, then cite the standard.
One thing a language-lawyer question does is not just explore what the standard says but how it provides the semantics we desire or how it accomplishes its goals. Looking at restrict illustrates that: Nowhere does the standard say the caller has to do anything. Instead, it tells us how the called routine may use the pointers. From this, we are supposed to infer that callers ought to cooperate. But the details of that cooperation are not stated in the standard. And, as I showed with the square example, they do not follow strictly logically from the restrict requirements.
Let me get this straight: You are saying the implementor must write the code assuming aliasing because the client may use aliasing anyway? Or are you saying the caller has been warned not to use aliasing, but can do so anyway, if they know the particular implementation is aliasing safe? The former means the restrict keyword is a waste of space - neither side is restricted at all. The latter means the caller is not portable - which matters in some contexts but not others.
I am telling you what the C standard says. The only thing it says about restrict is that the relevant pointers must not be used to access overlapping memory, if there is any modification to the accessed memory. How the people writing routines and the people calling routines is left for them to work out—it is not in the standard. It is perfectly reasonable for the writer of a routine to document that the will use pointers with ‘restrict` and require the caller to provide non-overlapping memory. In other words, the routine implementor can pass the standard’s requirement to the routine caller…
… But that is for them to do. It is not stated in the standard. The documentation for the standard library routines should probably have a statement that this is so for them, that callers are required to provide non-overlapping memory for the pointers declared restrict, given the memory those pointers are expected to access. But the documentation does not have that. It is simply not in the standard. Look for yourself.
Correction: “How the people writing routines and the people calling routines is left for them to work out” ➝ “How the people writing routines and the people calling routines deal with this is left for them to work out”
9

Is strtol(str, &str, 10) legal in C99?

Yes, it is valid C99.

If an implementation of strtol() were to access the string "1234" via **endptr for any reason, then the restrict promise on nptr would be violated

That'a true. But an implementation doesn't need to make such an access.

Does this constitute a restrict violation according to §6.7.3.1 of C99?

No, it does not.

Specifically:

Does restrict on endptr only constrain endptr itself, or does it also constrain *endptr with respect to nptr?

The former. Or, to be more exact, the restrict on endptr doesn't constrain endptr, it constrains accesses through pointers other than endptr, to the object pointed to by endptr.

However, the restrict on nptr constrains accesses to the object nptr points to, through endptr (and thus also through *endptr).

If *endptr and nptr point to the same memory, is that sufficient to produce undefined behaviour, even if endptr and nptr themselves are distinct?

No, that is not sufficient.

Would a hypothetical implementation of strtol() that accesses the string "1234" via **endptr be considered non-conforming to the C99 standard?

Yes it would. But nobody would write such an implementation, since, again, there is no reason to access the string with **endptr.

5 Comments

I think the implementation of strtol() is free to access the string contents in any way it likes, including via **endptr, because it is a black box as far as the program is concerned. That's probably a silly way to do it though, especially as endptr may be null.
Yes and no, @IanAbbott. The restrict qualification in the function prototype does nothing more than advertise to programmers that some kinds of argument aliasing might provoke UB from strtol(). Details are impossible to determine because the function is indeed a black box. Sure, strtol() is free to act as it wishes, but it would be foolish for a programmer to disregard the information that that behavior might not be safe against aliased arguments. One should at least give a thought to which kinds of aliasing can reasonably be expected to be dangerous, and which not.
You are mistaken. A valid implementation of standard library functions must have defined behavior for valid inputs to the function, and the inputs OP presented are certainly valid. So - the implementation cannot access the string via **endptr.
That's true. But an implementation doesn't need to make such an access.: as far as I know, the C99 standard does not explicitly or implicitly state that strtol() must not access the string "1234" via **endptr, or am I mistaken? (note the OP’s language-lawyer tag)
@ParminderSingh: As I see it, the standard implicitly and indirectly forbids such access, through the use of restrict.
0

no, it doesn't

char *str = "1234";
long n = strtol(str, &str, 10);

The first parameter (str points to the compiler statically allocated string "1234", which is pointed to by a local variable str. The second parameter is the address of the local variable, a completely different object, normally allocated in the thread stack, so both pointers are different and completely unrelated (well, completely not, one is the address of a pointer that points to the string)

What is not permited is to pass addresses pointing to addresses that overlap in memory, but that is not the case. Once the number is parsed, str will contain the address pointing to the last \0 character of the string "1234".

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.