-2

I have a TCP/IP socket set to non-blocking that is blocking anyway. The socket is only referenced in one thread. This code works on Windows (with a few call substitutions) but not on Linux. I have code that looks like this (Don't mind the C-style casts -- this was written long ago. Also, I trimmed it up a bit, so let me know if I accidentally trimmed off a step. Chances are that I'm actually doing that step. The actual code is on another computer, so I can't copy-paste.):

// In the real code, these are class members. I'm not bonkers
int mSocket;
sockaddr_in mAddress;

void CreateSocket(
    unsigned int ipAddress,
    unsigned short port)
{        
    // Omitting my error checking in this question for brevity because everything comes back valid
    mSocket = socket(AF_INET, SOCK_STREAM, 0);  // Not -1

    int oldFlags = fctnl(mSocket, F_GETFL, 0);  // Not -1
    fcntl(mSocket, F_SETFL, oldFlags | O_NONBLOCK);  // Not -1

    mAddress.sin_family = AF_INET;
    mAddress.sin_addr.s_addr = ipAddress;  // address is valid
    mAddress.sin_port = htons((u_short)port);  // port is not 0 and allowed on firewall
    memset(mAddress.sin_zero, 0, sizeof(mAddress.sin_zero));

    // <Connect attempt loop starts here>
    connect(mSocket, (sockaddr*)&mAddress, sizeof(mAddress));  // Not -1 to exit loop
    // <Connect attempt loop ends here>
    // Connection is now successful ('connect' returned a value other than -1)
}

// ... Stuff happens ...

// ... Then this is called because 'select' call shows read data available ...
void AttemptReceive(
    MyReturnBufferTypeThatsNotImportant &returnedBytes)
{
    // Read socket
    const size_t bufferSize = 4096;
    char buffer[bufferSize];
    int result = 0;

    do {
        // Debugging code: sanity checks
        int socketFlags = fcntl(mSocket, F_GETFL, 0);  // Not -1
        printf("result=%d\n", result);
        printf("O_NONBLOCK? %d\n", socketFlags & O_NONBLOCK);  // Always prints "O_NONBLOCK? 2048"

        result = recv(mSocket, buffer, bufferSize, 0);  // NEVER -1 or 0 after hundreds to thousands of calls, then suddenly blocks

        // ... Save off and package read data into user format for output to caller ...
    } while (result == bufferSize);
}

I believe, because AttemptReceive is called in response to select, that the socket just happens to contain exactly a number of bytes equal to a multiple of the buffer size (4096). I've somewhat confirmed this with the printf statements, so it never blocks on the first loop-through. Every time this bug happens, the last two lines to get printed before the thread blocks are:

result=4096
O_NONBLOCK? 2048

Changing the recv line to recv(mSocket, buffer, bufferSize, MSG_DONTWAIT); actually "fixes" the issue (suddenly, recv occasionally returns -1 with errno EWOULDBLOCK/EAGAIN (both equal to each other on my OS)), but I'm afraid I'm just putting a band-aid on a gushing wound, so to speak. Any ideas?

P.S. the address is "localhost", but I don't think it matters.

Note: I'm using an old compiler (not by choice), g++ 4.4.7-23 from 2010. That may have something to do with the issue.

14
  • Please attach strace to this process, and show proof that the process enters the recv() call and blocks in there (strace will show it), instead of, just maybe, hitting a bug somewhere inside the hidden chunk of code modestly described as "... Save off and package read data into user format for output to caller ...", and then spinning in an infinite loop in there. Commented Feb 21, 2020 at 1:17
  • @SamVarshavchik Of course I thought of that. gdb bt shows the thread stuck in recv. Commented Feb 21, 2020 at 1:22
  • I found this question, but I can't read Perl, so I'm not sure if it's applicable: stackoverflow.com/questions/11895632/… Commented Feb 21, 2020 at 1:24
  • 1
    Then answering a question about the real code isn't possible either. Commented Feb 21, 2020 at 1:29
  • 1
    @KeithM: The perl issue you refer to was caused by the socket not being actually non-blocking despite being declared this way. This was due to a bug in the module where it did not do the Win32 specific non-blocking handling. In other words: likely unrelated to your case. Commented Feb 21, 2020 at 6:54

1 Answer 1

0

socket() automatically sets O_RDWR on the socket with my operating system and compiler, but it appears that O_RDWR had accidentally gotten unset on the socket in question at the start of the program (which somehow allowed it to read fine if there was data to read, but block otherwise). Fixing that bug caused the socket to stop blocking. Apparently, both O_RDWR and O_NONBLOCK are required to avoid sockets blocking, at least on my operating system and compiler.

Sign up to request clarification or add additional context in comments.

3 Comments

fcntl(fd, F_SETFL, O_RDWR) is ignored and has no effect in linux -- you CANNOT change the read/write flags after a file descriptor is opened. So this is not the cause of your problem -- whatever you did merely made the underlying problem go away.
@ChrisDodd I didn't change anything else, though, and I confirmed that I'm now occasionally getting the EAGAIN error (likely where it would've blocked before). The only other possibility is that gdb lied to me about the value of socketFlags, but I already thought of that and double-checked my debugging compiler settings to make sure the compile/link flags were correct. Perhaps your claim was only implemented at some point in the last decade or so. We can't use the newest compiler versions. My gcc/g++ is from 2010.
In hindsight, I'm going to start putting that fact into my questions... I'll edit it in right now.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.