Transit Note #83
Overhead in ``Modern'' Operating Systems

Andre DeHon

Original Issue: May 1993

Last Updated: Fri Nov 5 13:16:01 EST 1993

Problem with ``Modern'', Unix-style, Operating Systems

``Modern'' operating systems (which, for the most part, are UNIX derivatives of some form or another) coupled with modern processor architectures, are untenable for efficient exploitation of parallelism to reduce program run time. Their most obvious liability is the high overhead they attach to communication. This high overhead adds to the end-to-end network message latency in the form of high message injection and message reception latency. The magnitude of the latency added by modern operating systems ( e.g. on the order of 10's of microseconds [ALBL91]) is orders of magnitude greater than the range of transport latencies we can expect to achieve across modern multiprocessor networks ( e.g. on the order of 10's to 100's of nanoseconds [DeH93]). In these systems, operating-system overhead clearly dominates the latency of node-to-node communications. Since low communications latency is a critical limiting factor affecting the extent to which parallelism can be exploited to achieve application speedup (see pp. 16-17 of [DeH93]), this overhead has a significant effect on the efficiency we can achieve from our parallel computers.

The overhead in these operating systems arises largely because:

Network operations must be handled by the kernel using kernel system calls
Kernel system calls require a context switch to the kernel
The cost of a context switch to the kernel is quite large in almost all modern processor architectures

[ALBL91] takes a look at the magnitude of these effects for several modern computers and operating systems.

Short-Term Fixes

To avoid this overhead, researchers and companies have developed some short-term fixes which allow them to bypass the operating system and, to some extent, its associated overhead. The operating system is bypassed for communications by:

using user-level threading and messaging packages
providing user-level access to the network

By performing both threading and network access at the user level, such systems avoid the operating-system overhead by never switching contexts as messages are sent and received. For example, Berkeley's Active Messages (AM) and Threaded Abstract Machine (TAM) [E +92] make use of the user-level network access provided on the CM5 [Thi91] to achieve low-overhead communications.

Such approaches are necessary to achieve reasonable performance using today's readily available operating systems. However, these solutions are incomplete and really only suitable for the short term. As long as the user has complete control of the threading and the network, it is not possible to interleave threads or processes from different users or interleave user and system threads. In the CM5 Active Message's model, one user has exclusive access to a set of processors and the network at a time. When the user needs to share the resources with another user or the system, the user's threads must be completely swapped on all processors. That is, only one agent is allowed to use the processors and network at a time. Additionally, these approaches really only get at part of the problem. System call overhead remains extremely high. Many common operations require system calls. Consequently, the high operating system overhead associated with system calls is having a notable impact upon execution latency even for operations which do not access the network [ALBL91].

No Solution In Sight

Nevertheless, there are no emerging operating systems which provide relief from this overhead. WindowsNT, Unix's emerging competitor, has the same basic context and device architecture. Consequently, it, too, will suffer from comparable, high overheads when used in a parallel-computation setting.

Conclusion

To achieve decent performance for our parallel computers, we will have to engineer our own system-level software rather than adopting any of the standard, modern operating systems.

Ideas To Exploit

Here are some ideas we may wish to exploit to avoid or ameliorate the overhead associated with current operating systems and processor architectures:

All system calls should not require context switches
Context switches should be cheap. This may require some hardware/architectural support. e.g. reserved system context in single-cycle context-switch processor, operating-system co-processors
Many things which require system calls in modern operating systems should not. Many of these should be provided by the architecture itself. e.g. atomicity, semaphores
It should be possible to ``compile-in'' system routines in a manner similar to procedure calls. This may require some hardware/architectural support, as well. e.g. read-only code blocks

References

ALBL91: Thomas Anderson, Henry Levy, Brian Bershad, and Edward Lazowska. The Interaction of Architectures and Operating System Design. In Fourth International Conference on Architectural Support for Programming Languages, pages 108-120. ACM, April 1991.
DeH93: Andre DeHon. Robust, High-Speed Network Design for Large-Scale Multiprocessing. Master's thesis, MIT, 545 Technology Sq., Cambridge, MA 02139, February 1993. [FTP link].
E +92: Thorsten von Eicken et al. Active Messages: a Mechanism for Integrated Communication and Computation. In Proceedings of the 19th Annual Symposium on Computer Architecture, Queensland, Australia, May 1992.
Thi91: Thinking Machines Corporation, Cambridge, MA. CM5 Technical Summary, October 1991.

MIT Transit Project