0

Okay, well, I've narrowed down my question to spesific problem by removing questions related to inotify API.

I needed to be able to detect file changes, so I started learning the Linux inotify API. I searched for tutorials online and found one written by IBM (https://developer.ibm.com/tutorials/l-ubuntu-inotify). As I read the article and learned from it, the example code became very confusing.

From my understanding (These are not questions that I'm asking):

  1. EVENT_SIZE: Is basically defines size of inotify_event struct which is 16 byte
  2. struct inotify_event *event = ( struct inotify_event ∗ ) &buffer[ i ];: Here, a char pointer is cast as an inotify_event struct and later used as if like normal struct. How is that possible? Isn't this code prone to Undefined behaviours and Strict Aliasing problems?
  3. i += EVENT_SIZE + event‑>len: Why are we increasing the index by EVENT_SIZE + event->len? I really don't understand. Is it because when casting an element of a char array to a struct, the selected element behaves like a starting offset or something?

Okay I'm asking two related question :>, Is using char array safe to store struct data? If it is safe in the code below, how does this process work?

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/inotify.h>

#define EVENT_SIZE  ( sizeof (struct inotify_event) )
#define BUF_LEN     ( 1024 * ( EVENT_SIZE + 16 ) )

int main( int argc, char **argv ) 
{
  int length, i = 0;
  int fd;
  int wd;
  char buffer[BUF_LEN];

  fd = inotify_init();

  if ( fd < 0 ) {
    perror( "inotify_init" );
  }

  wd = inotify_add_watch( fd, "/home/strike", 
                         IN_MODIFY | IN_CREATE | IN_DELETE );
  length = read( fd, buffer, BUF_LEN );  

  if ( length < 0 ) {
    perror( "read" );
  }  

  while ( i < length ) {
    struct inotify_event *event = ( struct inotify_event * ) &buffer[ i ];
    if ( event‑>len ) {
      if ( event‑>mask & IN_CREATE ) {
        if ( event‑>mask & IN_ISDIR ) {
          printf( "The directory %s was created.\n", event‑>name );       
        }
        else {
          printf( "The file %s was created.\n", event‑>name );
        }
      }
      else if ( event‑>mask & IN_DELETE ) {
        if ( event‑>mask & IN_ISDIR ) {
          printf( "The directory %s was deleted.\n", event‑>name );       
        }
        else {
          printf( "The file %s was deleted.\n", event‑>name );
        }
      }
      else if ( event‑>mask & IN_MODIFY ) {
        if ( event‑>mask & IN_ISDIR ) {
          printf( "The directory %s was modified.\n", event‑>name );
        }
        else {
          printf( "The file %s was modified.\n", event‑>name );
        }
      }
    }
    i += EVENT_SIZE + event‑>len;
  }

  ( void ) inotify_rm_watch( fd, wd );
  ( void ) close( fd );

  exit( 0 );
}
16
  • 1
    before posting try to compile the code Commented Feb 18 at 12:57
  • 1
    Regarding 4) yes that UB. Badly written code like this was why Linux fell to pieces when gcc started to do aggressive optimizations abusing strict aliasing in the early 2000s. Commented Feb 18 at 13:24
  • 3
    The long and short of it is that this is bad code. It should not have been written this way, and nobody should attempt to learn from that article. Commented Feb 18 at 13:40
  • 1
    @Davislor: It is in fact impossible to write an arena allocator in strictly conforming C code. That is why it is not done in strictly conforming C code; all arena allocators make use of specific implementation features. Thus the statement “If… it would not be possible to write an arena allocator in C at all” is false; it is impossible to do it in strictly conforming C code, but it is not impossible to do it “at all.” The code for malloc maybe written in a language other than C (putting it beyond what the C standard covers) or may be isolated in a separate… Commented Feb 18 at 20:32
  • 1
    … translation unit in C implementations with limited cross-unit information transfer or may have some aspects built into the C compiler. Commented Feb 18 at 20:32

2 Answers 2

3

Regarding this issue specifically:

struct inotify_event *event = ( struct inotify_event * ) &buffer[ i ];

Yes, that is undefined behavior and a strict aliasing violation at the point where event‑> is used. Even more alarming is that there are zero alignment considerations in this code so the code is undefined behavior for alignment reasons too.

We could dodge the strict aliasing part of the problem by embedding the struct in a union where the other union member is a uint8_t [sizeof (struct inotify_event)] array. But that won't fix the alignment bug - &buffer[ i ] could be any address.

Similarly, one should obviously not allocate a local variable with size 32768 or something. That will eventually lead to stack overflow even in Linux.

Sign up to request clarification or add additional context in comments.

21 Comments

@Davislor No, your comment is incorrect. C24 6.5.1 "An object shall have its stored value accessed only by an lvalue expression that has one of the following types:" /--/ "- a character type." That does not somehow magically give us a free pass to do the other way around and access a character array as any lvalue expression. And of course the pointer can be misaligned since this struct contains all manner of large data types with alignment requirements, whereas a character array has no alignment requirements and can be allocated at any odd address.
...and I feel like we should write a separate strict aliasing FAQ about character type misconceptions since this comes up quite often.
Anyone attempting to do this must be able to diagram an English sentence containing parallel clauses, since that is at the root of your misunderstanding.
You're the one misunderstanding strict aliasing. Per 6.5 Expressions, paragraph 7: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... a character type." Nowhere in any version of the C standard does it say a "character type" can be accessed as any random structure.
The section you cite says that an object can also be accessed by a pointer to its effective type, and there are in fact other places where the Standard says copying an object in various ways also sets the effective type. The debate is over when this is guaranteed to happen. I am not claiming that all char[] can always be aliased by any other pointer.
It should also explain whether you think it’s possible to write an arena allocator in C at all. Your interpretation appears to make that impossible
More misunderstanding. The C standard does not apply to implementations of the language. Implementations of C have to provide the functionality specified by the C standard(s) they implement. They are not bound by them. Implementations of C don't even have to be implemented in C.
If there really were no way to implement an allocator in C without undefined behavior, it would only be possible to write an OS kernel, C standard library implementation or arena allocator on a compiler that documented that it didn’t interpret the strict-aliasing rule that way, and it wouldn’t be portable. No compiler I know of actually says this (although that’s because implementers do not believe that it really is undefined behavior to set the bytes of an object representation through a pointer to unsigned char, cast the address, and use it.).
@ikegami Pedantically, using a a struct cast followed by a lvalue access of that memory as if it was a struct is a strict aliasing violation. Not the pointer cast itself.
As a bonus, the pointer cast can invoke UB all by itself if alignment restrictions aren't met. No dereference is necessary.
@Davislor "were already copied into it by the system call" What happened before the cast is completely irrelevant. It may have changed the effective type in a few special scenarios but no code changed the effective type of the character array into struct inotify_event. I have already spelled out to you repeatedly over and over why this is UB and then you ignore me and start talking about an unrelated paragraph having nothing to do with the type of the lvalue expression used by the access. Read or write does not matter, "An object shall have its stored value accessed...".
You are correct that I stopped responding to your comments and added my explanation to my answer, which anyone can scroll down and read.
To continue, "effective type" only applies "If a value is stored into an object having no declared type". Again, bolding is mine, because - again - it's necessary
And all you handwaving about "copying" depends on "If a value is copied into an object having no declared type". That does not apply to any object declared as a char array.
You are ignoring 6.5 Expressions, p6: "The effective type of an object for an access to its stored value is the declared type of the object". Everything after that only applies to objects with no declared type. An array declared with as an array of char literally and quite obviously has a "declared type".
Per footnote 87: "Allocated objects have no declared type." So "effective type" only applies to dynamically-allocated objects. "Effective type" is utterly irrelevant for declared objects.
As long as the char * is correctly aligned for the first struct inotify_event, advancing the char * by sizeof(struct inotify_event) + event->len (where event is the char * converted to struct inotify_event *) will make it correctly aligned for the next event, because the Linux kernel will make it so.
|
3

You have UB here. I think it is more safe to use memcpy. Instead of pointer punning I use memcpy for the struct and offset calculation for flexible array member reference. memcpy call will be optimized out (by any sane compiler) if not needed for safe access.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/inotify.h>
#include <unistd.h>
#include <string.h>
#include <stddef.h>

#define EVENT_SIZE  ( sizeof (struct inotify_event) )
#define BUF_LEN     ( 1024 * ( EVENT_SIZE + 16 ) )

#define hdr  offsetof(struct inotify_event, name)

int main(void) 
{
  int length, i = 0;
  int fd;
  int wd;
  char buffer[BUF_LEN]; //find better way of allocating this space

  fd = inotify_init();

  if ( fd < 0 ) {
    perror( "inotify_init" );
  }

  wd = inotify_add_watch( fd, "/home/strike", 
                         IN_MODIFY | IN_CREATE | IN_DELETE );
  
  length = read( fd, buffer, BUF_LEN );  

  if ( length < 0 ) {
    perror( "read" );
  }  

  while ( i < length ) {
    struct inotify_event event;
    memcpy(&event,&buffer[ i ], sizeof(event));
    char *name = &buffer[ i + hdr];
    if ( event.len ) {
      if ( event.mask & IN_CREATE ) {
        if ( event.mask & IN_ISDIR ) {
          printf( "The directory %s was created.\n", name );       
        }
        else {
          printf( "The file %s was created.\n", name );
        }
      }
      else if ( event.mask & IN_DELETE ) {
        if ( event.mask & IN_ISDIR ) {
          printf( "The directory %s was deleted.\n", name );       
        }
        else {
          printf( "The file %s was deleted.\n", name );
        }
      }
      else if ( event.mask & IN_MODIFY ) {
        if ( event.mask & IN_ISDIR ) {
          printf( "The directory %s was modified.\n", name );
        }
        else {
          printf( "The file %s was modified.\n", name );
        }
      }
    }
    i += EVENT_SIZE + event.len;
  }

  ( void ) inotify_rm_watch( fd, wd );
  ( void ) close( fd );

  exit( 0 );
}

https://godbolt.org/z/jfhaEbEq4

16 Comments

If you scroll down, you’ll see my very pedantic explanation of why it is not undefined behavior to copy an object as a properly-aligned buffer of character type, cast the pointer to its effective type, and dereference that pointer. Since the system call copied the objects to the buffer as arrays of character type, the pointer casts are legal. In general buffers should be declared alignas(struct inotify_event) in case there is an alignment requirement on the contents. Violating that is UB.
Unfortunately, your "detailed explanation" is invalid. Any further discussion is IMO pointless.
I’m disappointed that you gave no explanation for why you disagree and tell me that you are not going to, but I will respect your wishes and drop it.
Lundin has explained it to you many times, and I’m not going to repeat the explanation. I know the C standard can be confusing in this regard, and many people misunderstand these topics and make the same mistake as you. Do not take it personally.
Oh, then anyone who reads the sentence we were arguing about can easily see that Lundin overlooked the “or is copied as an array of character type,” clause. As you say, you both found the sentence confusing. They have, indeed, repeated themself often.
Declared objects - such as char buffer[BUF_LEN]; - never have an "effective type" other than the declared type - in this case char. C11 6.5p6: "The effective type of an object for an access to its stored value is the declared type of the object"
Followed (at least in C24; I haven’t checked C11) by a sentence about how the effective type after certain operations is something else.
However, this is a good answer, since memcpy() works in more situations and performs just as well.
Please stop using "If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type" You are flat out ignoring "an object having no declared type" while also ignoring the fact that all of "using memcpy or memmove, or is copied as an array of character type" merely describes how the data is copied and in no way overrules "an object having no declared type". If you're sooo certain you're correct, you should file a defect in the standard - contacts are at open-std.org/jtc1/sc22/wg14/www/contacts.html
If you cannot clearly and cogently explain in clear words exactly how "copied as an array of character type" applies to anything other than the act of copying data and then somehow overrides "an object having no declared type", you have no basis whatsoever for claiming that "copied as an array of character type" sets the effective type of a declared object, overriding both the limiting clause of "an object having no declared type" and the opening "The effective type of an object for an access to its stored value is the declared type of the object"
Again, you must explain, not just regurgitate. HOW does "Since the system call copied the objects to the buffer as arrays of character type, the pointer casts are legal" override both "an object having no declared type" and "The effective type of an object for an access to its stored value is the declared type of the object" Explain how the act of copying data in a manner described as one of three parallel descriptions of how copying data limited only to objects of "no declared type" somehow sets an "effective type" for a declared object of declared char
For starters, you'll have to overcome en.wikipedia.org/wiki/Parallelism_(grammar)
And explaining your reasoning is almost certain to require you to actually parse the C standard's statements. See languagetools.info/grammarpedia/parse.htm for what that entails. IMO if you can not parse the language of the C standard and demonstrate your "Since the system call copied the objects to the buffer as arrays of character type, the pointer casts are legal" claim is true, that claim is literally baseless.
If you believe you understand parallel structures in English grammar better than just about everyone else here, please demonstrate that.
Please try not to take disagreement personally. But: There are many examples in the Standard of similar sentences where the clause after “or” is contrasting or a separate case, contrary to claims being made in this discussion. “Except when it is the operand of the sizeof operator, or typeof operators, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted ....” for example, cannot only apply to array expressions that are used to initialize arrays if they are also operands of sizeof. As that is impossible.
There are similarly examples of paragraphs where later sentences present exceptions or separate cases from the first sentence, which you and others here appear to be claiming never happens in technical writing.. Now, can you please give an example of an sentence in professionally-published writing, not a transcription of non-standard dialogue, with similar comma placement and repetition of the main verb in the disjunctive clause, where the disjunctive clause is not parallel?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.