I have a TCP/IP socket set to non-blocking that is blocking anyway. The socket is only referenced in one thread. This code works on Windows (with a few call substitutions) but not on Linux. I have code that looks like this (Don't mind the C-style casts -- this was written long ago. Also, I trimmed it up a bit, so let me know if I accidentally trimmed off a step. Chances are that I'm actually doing that step. The actual code is on another computer, so I can't copy-paste.):
// In the real code, these are class members. I'm not bonkers
int mSocket;
sockaddr_in mAddress;
void CreateSocket(
unsigned int ipAddress,
unsigned short port)
{
// Omitting my error checking in this question for brevity because everything comes back valid
mSocket = socket(AF_INET, SOCK_STREAM, 0); // Not -1
int oldFlags = fctnl(mSocket, F_GETFL, 0); // Not -1
fcntl(mSocket, F_SETFL, oldFlags | O_NONBLOCK); // Not -1
mAddress.sin_family = AF_INET;
mAddress.sin_addr.s_addr = ipAddress; // address is valid
mAddress.sin_port = htons((u_short)port); // port is not 0 and allowed on firewall
memset(mAddress.sin_zero, 0, sizeof(mAddress.sin_zero));
// <Connect attempt loop starts here>
connect(mSocket, (sockaddr*)&mAddress, sizeof(mAddress)); // Not -1 to exit loop
// <Connect attempt loop ends here>
// Connection is now successful ('connect' returned a value other than -1)
}
// ... Stuff happens ...
// ... Then this is called because 'select' call shows read data available ...
void AttemptReceive(
MyReturnBufferTypeThatsNotImportant &returnedBytes)
{
// Read socket
const size_t bufferSize = 4096;
char buffer[bufferSize];
int result = 0;
do {
// Debugging code: sanity checks
int socketFlags = fcntl(mSocket, F_GETFL, 0); // Not -1
printf("result=%d\n", result);
printf("O_NONBLOCK? %d\n", socketFlags & O_NONBLOCK); // Always prints "O_NONBLOCK? 2048"
result = recv(mSocket, buffer, bufferSize, 0); // NEVER -1 or 0 after hundreds to thousands of calls, then suddenly blocks
// ... Save off and package read data into user format for output to caller ...
} while (result == bufferSize);
}
I believe, because AttemptReceive is called in response to select, that the socket just happens to contain exactly a number of bytes equal to a multiple of the buffer size (4096). I've somewhat confirmed this with the printf statements, so it never blocks on the first loop-through. Every time this bug happens, the last two lines to get printed before the thread blocks are:
result=4096
O_NONBLOCK? 2048
Changing the recv line to recv(mSocket, buffer, bufferSize, MSG_DONTWAIT); actually "fixes" the issue (suddenly, recv occasionally returns -1 with errno EWOULDBLOCK/EAGAIN (both equal to each other on my OS)), but I'm afraid I'm just putting a band-aid on a gushing wound, so to speak. Any ideas?
P.S. the address is "localhost", but I don't think it matters.
Note: I'm using an old compiler (not by choice), g++ 4.4.7-23 from 2010. That may have something to do with the issue.
straceto this process, and show proof that the process enters therecv()call and blocks in there (stracewill show it), instead of, just maybe, hitting a bug somewhere inside the hidden chunk of code modestly described as "... Save off and package read data into user format for output to caller ...", and then spinning in an infinite loop in there.btshows the thread stuck inrecv.