Thursday, November 6, 2008

SIGIO terminates program

I have compiled the IPC/nt library (inter-process communications) for the i686 architecture, on a CentOS 5.2 release. Compilation was ok, but at runtime, the programs that link with this library die unexpectedly when receiving data from the IPC layer. The message shown is "I/O possible".

Of course, this did not happen on the original embedded systems this library was originally developed for, so I assume something has changed. There are two possibilities:

1. getpid() no longer returns the pid of the calling thread, but the pid of the entire process.
2. the default action of the SIGIO signal has changed, causing the program to terminate rather than ignoring the signal.

The mechanism that IPC/nt uses to work with asynchronous sockets is the following: a thread listening to SIGIO signals is created. Any asynchronous socket opened sets the owner of the SIGIO signals to the thread's pid (hence the getpid issue here). When data arrives (I/O is possible), a signal is sent to the thread which wakes up the other threads listening on the sockets using a mutex/condition variable.

Then I came across this note that explains what is happening (though it does not explain exactly what has changed):
If a nonzero value is given to F_SETSIG in a multi-threaded process running with 
a threading library that supports thread groups (e.g., NPTL), then a positive value
given to F_SETOWN has a different meaning: instead of being a process ID identifying
a whole process, it is a thread ID identifying a specific thread within a process.
Consequently, it may be necessary to pass F_SETOWN the result of gettid instead
of getpid(2) to get sensible results when F_SETSIG is used. (In current Linux threading
implementations, a main thread’s thread ID is the same as its process ID. This means that
a single-threaded program can equally use gettid(2) or getpid(2) in this scenario.)

So, I replaced the call to getpid() by a call to gettid(), which I have proven it is
also backwards compatible. It works fine.