Wednesday, August 25, 2004

Threads impair performance

Well I'm back after a vacation and chasing kids around. So my next entry is on how threads impair performance.

The basic reason that threads impair program performance is that each thread you spawn is competing for limited resources such as CPU time, or memory, or I/O bandwidth. When you spawn a thread you are competing with yourself. The more threads, the less resources for each.

That observation doesn't sound very convincing since modern operating systems are designed to share resources in a fair manner. The difference with thread is that they consume CPU timeslices. One of the jobs of an operating system is to split the CPU between all processes that are running on the system. Rather complex algorithms have been developed to insure fairness in a wide variety of situations.

When you spawn a second thread your application has potentially twice the number of CPU needs, however it will not get twice the number of time slices. All you have accomplished is limiting the number of CPU time each of your threads get. Each successive thread just digs you into a deeper hole.

But what about multiple-CPU machines? Sorry no dice. The operating system controls access to the CPUs. There is no way to guarantee that your threads will be efficiently scheduled to take advantage of the processors. Some operating systems allow processor affinity to be specified for threads. However at best this is only a hint.

A secondary effect is the impact multiple threads have on CPU cache coherency. Each thread has its own program pointer and stack pointer. When a thread is scheduled these pointers must be restablished, which takes time. When the new thread runs it establishes a new pattern of memory access in the CPU cache which churns the one from your other thread. Cache misses are expensive.

Up until now I've made the implicit presumption that threads your threads are running free and competeing with each other. Obviously this is the worst case scenario. Let's consider a more benign situation. We have two threads one of which is performing computation and the other which is waiting on a relatively long I/O call. The computing threads job is to perform operations on the results returned by the I/O thread.

All is good right? Your application is maximizing the use of computing resources. This should be very efficient. It is not! To see why consider the following cases:

Case 1: Computing thread wakes when I/O thread has posted new results. In this case there is no overlap between the threads, so they are not in contention for the same resources. However nothing has been gained. First there is the thread context switch overhead that is now incurred. Second on a modern computer calculations take a diminishingly short amount of time compared to I/O. All you have done in this case was gain a couple of microseconds of overlap with the I/O. Overall your application will not run any faster than the I/Os that it needs to perform. If this application were to perform a great deal of computation per I/O, ane execute billions of I/O you might gain an overall benefit. More likely than not these benefits will be swamped by context switching and lock overhead.

Case 2: The computation thread is still active when the I/O thread wakes up. You are so dead. You now have two threads that fighting for CPU time. Your performance will suffer greatly.

There is a great discussion of avoiding thread optimizations here. Don't just take my word for it.