Overhead in ``Modern'' Operating Systems
Andre DeHon
Original Issue: May 1993
Last Updated: Fri Nov 5 13:16:01 EST 1993
``Modern'' operating systems (which, for the most part, are UNIX derivatives of some form or another) coupled with modern processor architectures, are untenable for efficient exploitation of parallelism to reduce program run time. Their most obvious liability is the high overhead they attach to communication. This high overhead adds to the end-to-end network message latency in the form of high message injection and message reception latency. The magnitude of the latency added by modern operating systems ( e.g. on the order of 10's of microseconds [ALBL91]) is orders of magnitude greater than the range of transport latencies we can expect to achieve across modern multiprocessor networks ( e.g. on the order of 10's to 100's of nanoseconds [DeH93]). In these systems, operating-system overhead clearly dominates the latency of node-to-node communications. Since low communications latency is a critical limiting factor affecting the extent to which parallelism can be exploited to achieve application speedup (see pp. 16-17 of [DeH93]), this overhead has a significant effect on the efficiency we can achieve from our parallel computers.
The overhead in these operating systems arises largely because:
To avoid this overhead, researchers and companies have developed some short-term fixes which allow them to bypass the operating system and, to some extent, its associated overhead. The operating system is bypassed for communications by:
Such approaches are necessary to achieve reasonable performance using today's readily available operating systems. However, these solutions are incomplete and really only suitable for the short term. As long as the user has complete control of the threading and the network, it is not possible to interleave threads or processes from different users or interleave user and system threads. In the CM5 Active Message's model, one user has exclusive access to a set of processors and the network at a time. When the user needs to share the resources with another user or the system, the user's threads must be completely swapped on all processors. That is, only one agent is allowed to use the processors and network at a time. Additionally, these approaches really only get at part of the problem. System call overhead remains extremely high. Many common operations require system calls. Consequently, the high operating system overhead associated with system calls is having a notable impact upon execution latency even for operations which do not access the network [ALBL91].
Nevertheless, there are no emerging operating systems which provide relief from this overhead. WindowsNT, Unix's emerging competitor, has the same basic context and device architecture. Consequently, it, too, will suffer from comparable, high overheads when used in a parallel-computation setting.
To achieve decent performance for our parallel computers, we will have to engineer our own system-level software rather than adopting any of the standard, modern operating systems.
Here are some ideas we may wish to exploit to avoid or ameliorate the overhead associated with current operating systems and processor architectures: